Fedora 34 - degraded RAID not starting at boot.
by ptf from LinuxQuestions.org on (#5Q9R3)
Hello to everyone.
This one has me scratching my head a bit, I have seen similar questions on searching the forum but none (so far as I can see) exactly describing the situation I found myself in.
I have a media/mail server running F34, generally reliably.
It hosts various filesystems over 11 mostly 8TB disks but the main media tree is a 15TB ext4 consisting of two RAID1 arrays striped using LVM. Quite which it was set up that way is lost in the mists of time (probably I thought it would be flexible to add storage, and it would be, but it needs pairs of PVs to continue striping so in practice I'd need to add 4 disks and there isn't room in the case, but I digress).
I wanted to migrate one of the disks to an SSD - in fact over time I'd like all of them to migrate to SSD but that's an expensive undertaking so I bought a single Samsung 870 QVO 8TB to play with/kick off the process. Yes, I know it's QLC.
Here starts the pain. I assumed I could just power down, remove one of the RAID disks, replace it with the SSD, power back up (the array at that point running on a single device) and add the new disk back to the array after partitioning the drive.
But, no, the array did not start, so one of the PV's was missing so the logical volume, in turn, was missing and systemd grumbled loudly about that dropping me to the single user shell on the console.
No worries I thought, I'll do it manually - but as soon as I tried to start the array then I lost keyboard input (except ctrl-alt-del still worked to re-boot the machine).
Perplexed I powered down and put the old drive back in - the system started fine and I manually failed and removed the drive I wanted to replace assuming this would fix MD's expectations on reboot.
But, nope, the system would still not start the array.
Finally I managed by removing the volume and dependent mounts from fstab, rebooting, manually starting the array from a normal console session - this time, running multi-user, it did not kill the keyboard input when I started the array and finally I could add in the new drive.
I am however puzzled
- why would Fedora not start the array, it is a mirrored pair so no reason not to run degraded with one drive - indeed that's the whole raison d'etre of RAID so not doing so seems a bit of a fail.
- what the heck was the business with "mdadm --run /dev/mdXXX" killing the single user console keyboard input?
Anyone got any suggestions?
This one has me scratching my head a bit, I have seen similar questions on searching the forum but none (so far as I can see) exactly describing the situation I found myself in.
I have a media/mail server running F34, generally reliably.
It hosts various filesystems over 11 mostly 8TB disks but the main media tree is a 15TB ext4 consisting of two RAID1 arrays striped using LVM. Quite which it was set up that way is lost in the mists of time (probably I thought it would be flexible to add storage, and it would be, but it needs pairs of PVs to continue striping so in practice I'd need to add 4 disks and there isn't room in the case, but I digress).
I wanted to migrate one of the disks to an SSD - in fact over time I'd like all of them to migrate to SSD but that's an expensive undertaking so I bought a single Samsung 870 QVO 8TB to play with/kick off the process. Yes, I know it's QLC.
Here starts the pain. I assumed I could just power down, remove one of the RAID disks, replace it with the SSD, power back up (the array at that point running on a single device) and add the new disk back to the array after partitioning the drive.
But, no, the array did not start, so one of the PV's was missing so the logical volume, in turn, was missing and systemd grumbled loudly about that dropping me to the single user shell on the console.
No worries I thought, I'll do it manually - but as soon as I tried to start the array then I lost keyboard input (except ctrl-alt-del still worked to re-boot the machine).
Perplexed I powered down and put the old drive back in - the system started fine and I manually failed and removed the drive I wanted to replace assuming this would fix MD's expectations on reboot.
But, nope, the system would still not start the array.
Finally I managed by removing the volume and dependent mounts from fstab, rebooting, manually starting the array from a normal console session - this time, running multi-user, it did not kill the keyboard input when I started the array and finally I could add in the new drive.
I am however puzzled
- why would Fedora not start the array, it is a mirrored pair so no reason not to run degraded with one drive - indeed that's the whole raison d'etre of RAID so not doing so seems a bit of a fail.
- what the heck was the business with "mdadm --run /dev/mdXXX" killing the single user console keyboard input?
Anyone got any suggestions?