I had some sort of an unrecoverable drive / controller error on my home Linux server which locked up the machine and caused it's software RAID to go inactive after reboot.
--------
box # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : inactive sdb1[1] sda1[0] sdh1[6] sdg1[7] sdd1[2] sdf1[5] sde1[4]
6837319552 blocks
unused devices: <none>
--------
One disk (/dev/sdc) was showing IO errors so I removed and replaced it.
--------
box # mdadm --manage /dev/md0 --remove /dev/sdc1
box # fdisk /dev/sdc
(one partition of type "fd" spanning the disk)
box # mdadm --manage /dev/md0 --add /dev/sdc1
--------
But when I attempted to re-assemble the RAID:
--------
box # mdadm --assemble /dev/md0 /dev/sd[abcdefgh]1
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has no superblock - assembly aborted
--------
What? No superblock? I can clearly read it:
--------
box ~ # mdadm --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 0.90.00
UUID : 2476ddcb:ac8eb7ae:d7a6d8c7:9aeca122 (local to host box)
Creation Time : Mon Jan 25 21:51:25 2010
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
Raid Devices : 8
Total Devices : 8
Preferred Minor : 0
Update Time : Sun Aug 28 16:18:58 2011
State : active
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Checksum : c1380029 - correct
Events : 8480985
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 97 7 active sync /dev/sdg1
--------
Upon a more rational dissection, I notice the "Device or resource busy" error on the line before so I suppose that second error is just a misleading remnant.
But then it occurred to me, this superblock lists 8 good drives and no spares. This clearly isn't the case. And why is /dev/md0 even defined? Clearly an earlier mdadm --assemble had already worked somehow but decided /dev/md0 was not fit to start. This must have happened at boot.
I needed to remove /dev/md0 and re-assemble it again. But this time, I use the --force option so the RAID comes up active:
--------
box # mdadm --stop /dev/md0
mdadm: stopped /dev/md0
box # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
unused devices: <none>
box # mdadm --assemble --force /dev/md0 /dev/sd[abcdefgh]1
mdadm: /dev/md0 has been started with 7 drives (out of 8) and 1 spare.
box main # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sda1[0] sde1[8] sdg1[7] sdh1[6] sdf1[5] sdd1[4] sdc1[2] sdb1[1]
6837319552 blocks level 5, 64k chunk, algorithm 2 [8/7] [UUU_UUUU]
[>....................] recovery = 0.0% (71456/976759936) finish=455.5min speed=35728K/sec
unused devices: <none>
--------
If you have an inactive /dev/md0, you have to stop it before retrying the assemble. The "Device or resource busy" and "no superblock" errors are slightly misleading.
All that is left is to watch the rebuild happen.
--------
watch -n .5 cat /proc/mdstat
--------