SuSE Linux RAID Faulty disk replacement
by LinuGeek from LinuxQuestions.org on (#5CKPW)
Hello Experts,
We have a important Database Server with SUSE Linux Enterprise Server 12. The previous admin has setup it as follows.
4 internal disks :
1+1 --RAID-1 Software RAID --> ROOT Partitions
1+1 --RAID-1 Software RAID --> Data Partitions with Database.
Root Partitions have further LVM on top of it and then sliced to have Logical volumes of /usr /boot etc.
So there are 2 Volume groups. 1 System VG and 2. Data VG.
There are 4 Disks sda+sdb and sdc+sdd
Recently we noticed that one ofthe disks out of Software RAID group System is gone bad and the server
continued to work without any problem (Thanks to RAID 1 Mirroring).
See below, 3 Software RAID partitions are marked as Failed/degraded. md0, md1 and md2.
Which are System Partitions. md3 is for database.
So sda1,sda2 and sda3
Code:#cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0](F) <<<<<-------------
1051584 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
md1 : active raid1 sdb2[1] sda2[0](F) <<<<<-------------
18876288 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
md2 : active raid1 sdb3[1] sda3[0](F) <<<<<-------------
956832576 blocks super 1.0 [2/1] [_U]
bitmap: 2/8 pages [8KB], 65536KB chunk
md3 : active raid1 sdc1[0] sdd1[1]
976760640 blocks super 1.0 [2/2] [UU]
bitmap: 2/8 pages [8KB], 65536KB chunk
unused devices: <none>
We have to replace the faulty disk (sda) so that it builds back the original structure.
I have come up with following plan. Please suggest modifications.
1. Shutdown the server that will eventually also take down the database.
2. Take out the faulty disk
3. Replace with new one
4. And restart the server
5. Auto-Build process of mirroring the new disk from the existing one should start.
This sounds more of an automated process.
If this does not work then we can manually do few more steps.
Quote:
1. Mark the disk as failed if it is not already marked F by the system.
Code:# mdadm --manage /dev/md0 --fail /dev/sda1
# mdadm --manage /dev/md1 --fail /dev/sda2
# mdadm --manage /dev/md2 --fail /dev/sda3To verify that the disk is failed, check /proc/mdstat:
2. Remove the disk by mdadm
Code:# mdadm --manage /dev/md0 --remove /dev/sda1
# mdadm --manage /dev/md1 --remove /dev/sda2
# mdadm --manage /dev/md2 --remove /dev/sda33. Replace the disk
Quote:
4. Copy the partition table to the new disk
(Caution: This sfdisk command will replace the entire partition table on the target disk with that of the source disk - use an alternative command if you need to preserve other partition information)
Code:# sfdisk -d /dev/sdb | sfdisk /dev/sda5. Create the mirror of the disk:
Code:# mdadm --manage /dev/md0 --add /dev/sda1
# mdadm --manage /dev/md1 --add /dev/sda2
# mdadm --manage /dev/md2 --add /dev/sda36. To test the setup, enter the below command:
Code:# /sbin/mdadm --detail /dev/md0The following command will show the current progress of the recovery of the mirror disk:
Code:7.# cat /proc/mdstatSystem backup is in place.
Please give your valuable inputs. Quote:
Thank you in advance.
Regards,
Admin


We have a important Database Server with SUSE Linux Enterprise Server 12. The previous admin has setup it as follows.
4 internal disks :
1+1 --RAID-1 Software RAID --> ROOT Partitions
1+1 --RAID-1 Software RAID --> Data Partitions with Database.
Root Partitions have further LVM on top of it and then sliced to have Logical volumes of /usr /boot etc.
So there are 2 Volume groups. 1 System VG and 2. Data VG.
There are 4 Disks sda+sdb and sdc+sdd
Recently we noticed that one ofthe disks out of Software RAID group System is gone bad and the server
continued to work without any problem (Thanks to RAID 1 Mirroring).
See below, 3 Software RAID partitions are marked as Failed/degraded. md0, md1 and md2.
Which are System Partitions. md3 is for database.
So sda1,sda2 and sda3
Code:#cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0](F) <<<<<-------------
1051584 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
md1 : active raid1 sdb2[1] sda2[0](F) <<<<<-------------
18876288 blocks super 1.0 [2/1] [_U]
bitmap: 1/1 pages [4KB], 65536KB chunk
md2 : active raid1 sdb3[1] sda3[0](F) <<<<<-------------
956832576 blocks super 1.0 [2/1] [_U]
bitmap: 2/8 pages [8KB], 65536KB chunk
md3 : active raid1 sdc1[0] sdd1[1]
976760640 blocks super 1.0 [2/2] [UU]
bitmap: 2/8 pages [8KB], 65536KB chunk
unused devices: <none>
We have to replace the faulty disk (sda) so that it builds back the original structure.
I have come up with following plan. Please suggest modifications.
1. Shutdown the server that will eventually also take down the database.
2. Take out the faulty disk
3. Replace with new one
4. And restart the server
5. Auto-Build process of mirroring the new disk from the existing one should start.
This sounds more of an automated process.
If this does not work then we can manually do few more steps.
Quote:
Question can we do this on existing runlevel without any problem?? |
Code:# mdadm --manage /dev/md0 --fail /dev/sda1
# mdadm --manage /dev/md1 --fail /dev/sda2
# mdadm --manage /dev/md2 --fail /dev/sda3To verify that the disk is failed, check /proc/mdstat:
2. Remove the disk by mdadm
Code:# mdadm --manage /dev/md0 --remove /dev/sda1
# mdadm --manage /dev/md1 --remove /dev/sda2
# mdadm --manage /dev/md2 --remove /dev/sda33. Replace the disk
Quote:
Question how to identify the faulty disk?? |
(Caution: This sfdisk command will replace the entire partition table on the target disk with that of the source disk - use an alternative command if you need to preserve other partition information)
Code:# sfdisk -d /dev/sdb | sfdisk /dev/sda5. Create the mirror of the disk:
Code:# mdadm --manage /dev/md0 --add /dev/sda1
# mdadm --manage /dev/md1 --add /dev/sda2
# mdadm --manage /dev/md2 --add /dev/sda36. To test the setup, enter the below command:
Code:# /sbin/mdadm --detail /dev/md0The following command will show the current progress of the recovery of the mirror disk:
Code:7.# cat /proc/mdstatSystem backup is in place.
Please give your valuable inputs. Quote:
If there is any better option? |
Thank you in advance.
Regards,
Admin