mdadm: Manually fail a drive

Written by Benjamin Cane on 2011-07-06 01:20:30 | 2 min read

Just a quicky reference on removing a drive for those of you using mdadm.

Check the status of a raid device

[[email protected] ~]# mdadm --detail /dev/md10  
/dev/md10:  
 Version : 1.2  
 Creation Time : Sat Jul 2 13:56:38 2011  
 Raid Level : raid1  
 Array Size : 26212280 (25.00 GiB 26.84 GB)  
 Used Dev Size : 26212280 (25.00 GiB 26.84 GB)  
 Raid Devices : 2  
 Total Devices : 2  
 Persistence : Superblock is persistent  

 Update Time : Sat Jul 2 13:56:47 2011  
 State : clean, resyncing  
Active Devices : 2  
Working Devices : 2  
Failed Devices : 0  
 Spare Devices : 0  

Rebuild Status : 10% complete  

 Name : bcane.virtuals.local:10 (local to host bcane.virtuals.local)  
 UUID : 10a96ed5:92dc48e6:04b2bf43:3539e089  
 Events : 1  

 Number Major Minor RaidDevice State  
 0 8 33 0 active sync /dev/sdc1  
 1 8 49 1 active sync /dev/sdd1

In order to remove a drive it must first be marked as faulty. A drive can be marked as faulty either through a failure or if you want to manually mark a drive as faulty you can use the -f/--fail flag.

[[email protected] ~]# mdadm /dev/md10 -f /dev/sdc1
mdadm: set /dev/sdc1 faulty in /dev/md10  

[[email protected] ~]# mdadm --detail /dev/md10
/dev/md10:  
 Version : 1.2  
 Creation Time : Sat Jul 2 13:56:38 2011  
 Raid Level : raid1  
 Array Size : 26212280 (25.00 GiB 26.84 GB)  
 Used Dev Size : 26212280 (25.00 GiB 26.84 GB)  
 Raid Devices : 2  
 Total Devices : 2  
 Persistence : Superblock is persistent  

 Update Time : Sat Jul 2 14:00:18 2011  
 State : active, degraded  
Active Devices : 1  
Working Devices : 1  
Failed Devices : 1  
 Spare Devices : 0  

 Name : bcane.virtuals.local:10 (local to host bcane.virtuals.local)  
 UUID : 10a96ed5:92dc48e6:04b2bf43:3539e089  
 Events : 19  

 Number Major Minor RaidDevice State  
 0 0 0 0 removed  
 1 8 49 1 active sync /dev/sdd1  

 0 8 33 - faulty spare /dev/sdc1

Now that the drive is marked as failed/faulty you can remove it using the -r/--remove flag.

[[email protected] ~]# mdadm /dev/md10 -r /dev/sdc1
mdadm: hot removed /dev/sdc1 from /dev/md10  

[[email protected] ~]# mdadm --detail /dev/md10
/dev/md10:  
 Version : 1.2  
 Creation Time : Sat Jul 2 13:56:38 2011  
 Raid Level : raid1  
 Array Size : 26212280 (25.00 GiB 26.84 GB)  
 Used Dev Size : 26212280 (25.00 GiB 26.84 GB)  
 Raid Devices : 2  
 Total Devices : 1  
 Persistence : Superblock is persistent  

 Update Time : Sat Jul 2 14:02:04 2011  
 State : active, degraded  
Active Devices : 1  
Working Devices : 1  
Failed Devices : 0  
 Spare Devices : 0  

 Name : bcane.virtuals.local:10 (local to host bcane.virtuals.local)  
 UUID : 10a96ed5:92dc48e6:04b2bf43:3539e089  
 Events : 20  

 Number Major Minor RaidDevice State  
 0 0 0 0 removed  
 1 8 49 1 active sync /dev/sdd1

If you want to re-add the device you can do so with the -a flag.

[[email protected] ~]# mdadm /dev/md10 -a /dev/sdc1
mdadm: re-added /dev/sdc1  

[[email protected] ~]# mdadm --detail /dev/md10
/dev/md10:  
 Version : 1.2  
 Creation Time : Sat Jul 2 13:56:38 2011  
 Raid Level : raid1  
 Array Size : 26212280 (25.00 GiB 26.84 GB)  
 Used Dev Size : 26212280 (25.00 GiB 26.84 GB)  
 Raid Devices : 2  
 Total Devices : 2  
 Persistence : Superblock is persistent  

 Update Time : Sat Jul 2 18:02:21 2011  
 State : clean, degraded, recovering  
Active Devices : 1  
Working Devices : 2  
Failed Devices : 0  
 Spare Devices : 1  

Rebuild Status : 4% complete  

 Name : bcane.virtuals.local:10 (local to host bcane.virtuals.local)  
 UUID : 10a96ed5:92dc48e6:04b2bf43:3539e089  
 Events : 23  

 Number Major Minor RaidDevice State  
 0 8 33 0 spare rebuilding /dev/sdc1  
 1 8 49 1 active sync /dev/sdd1

One thing to keep an eye out for is that you need to specify the raid device when running these commands. If they are performed without specifying the raid device the flags take on a new meaning.


Picture of Benjamin Cane

Benjamin is a Systems Architect working in the financial services industry focused on platforms that require Continuous Availability. He has been working with Linux and Unix for over 10 years now and has recently published his first book; Red Hat Enterprise Linux Troubleshooting Guide.

Publications

Identify, capture and resolve common issues faced by Red Hat Enterprise Linux administrators using best practices and advanced troubleshooting techniques

What people are saying:
Excellent, excellent resource for practical guidance on how to troubleshoot a wide variety of problems on Red Hat Linux. I particularly enjoyed how the author made sure to provide solid background and practical examples. I have a lot of experience on Red Hat but still came away with some great practical tools to add to my toolkit. - Amazon Review

Sponsored by