m (→Finish Install with DVD: added ntp after network setup) Tag: sourceedit |
m (→Move /boot to RAID1: added copy of grub-mkconfig_lib) Tag: sourceedit |
||
Line 400: | Line 400: | ||
Patch GRUB2 due to bug - https://bugs.centos.org/view.php?id=7651 |
Patch GRUB2 due to bug - https://bugs.centos.org/view.php?id=7651 |
||
<pre> |
<pre> |
||
+ | # cp /usr/share/grub/grub-mkconfig_lib /usr/share/grub/grub-mkconfig_lib.orig |
||
# vi /tmp/grub-mkconfig_lib.patch |
# vi /tmp/grub-mkconfig_lib.patch |
||
</pre> |
</pre> |
Revision as of 22:15, 19 April 2015
Mbm329 (talk) 04:33, April 6, 2015 (UTC)
The purpose of this document is to chronicle the setup of a RAID1 boot within CentOS 7. In trying to set this up, I've encountered several pitfalls and complexities. I hope to address those here and provide an end-to-end guide for myself and others wishing to do similar activities.
System Description
- Hardware: Dell Poweredge R515
- UEFI Firmware
- 1 500GB Dell-packaged SATA Drive for OS
- 1 300GB Self-provided SATA Drive for OS
- 4 1TB Self-provided SATA Drives for Data
- Operating System: CentOS 7 (1503) - released 03/31/2015
As it seems, the Dell server is incapable of utilizing an alternate boot path in "BIOS" boot mode. I had to go with "UEFI" boot mode. As I've found in researching this setup, UEFI seems to be more "extensible" anyway so this worked out well. Unfortunately, the CentOS 7 Installation DVD I'm using doesn't do RAID1 setups very well for /boot. Because of this, there are some manual steps that I needed to execute to have a complete working installation and end-result.
In an attempt to cut costs, I ordered the server without a RAID card and only a single drive as I have SATA drives here I could use. In retrospect, it probably cost me more in time and effort of getting the additional drive carriers and screws for allowing the drives to fit into the hot-swap bays. I would advise if you go this route, to just spend the extra money on the drives.
But if you're interested in proceeding down this path, here's a google search of the part number for the drive carriers. I bought mine direct from Dell as a spare part. Here's where I bought the screws to mount the carriers to the drives. I had other drives I'm using in the server, so I bought enough carriers to utilize them.
One interesting item to note is that I couldn't order the server without a drive. The drive that came packaged with the R515, and I assume all Dell servers for that matter, had a diagnostic partition loaded on it. Because I'm not 100% sure if I'll ever want or need these diagnostic utilities, I decided to keep them and also mirror them onto the alternate drive of my RAID1 setup. If you're following this guide and you do not have a diagnostic partition packaged with your drive or you wish to remove it, these instructions do not vary much. Just be aware of those steps and skip them accordingly. In my case, the diagnostic partition is partition 1. So skipping should be easy.
With all that said, lets begin the quest!
Installation
BIOS Boot -> UEFI Boot
As mentioned previously, BIOS boot mode does not give an option for an alternate boot path if the primary drive goes south. Before we can boot the DVD and begin setup, we must change the boot mode to "UEFI". This may also come into play if you are booting from drives larger than 2TB.
- Set Boot Mode to UEFI
- Press F2 (System Setup) when visible in the top right corner during POST
- Navigate to "Boot Settings"
- Change Boot Mode from "BIOS" to "UEFI"
- Press Escape twice and select "Save changes and exit"
- Set DVD-ROM as primary boot device
- Press F11 (UEFI Boot Manager) when visible in the top right corner during POST
- Navigate to "UEFI Boot Sequence"
- Press Enter to enter "UEFI Boot Sequence"
- Press Enter again to REALLY enter "UEFI Boot Sequence"
- Use + to elevate the DVD-ROM to the top of the list
- Press Enter to accept the changes
- Press Escape and answer "Y" to save the changes
- Press Escape to exit the UEFI Boot Settings screen
- Select "Continue" and press Enter
Mirror Partition Table to Secondary Drive
Background
Dell loads, on the installed drive, an MBR (DOS) partition table containing a diagnostic partition with a handful of utilities and an empty EFI boot partition. The layout of these partitions will cause an error during installation of BIOS boot mode because the offset of the first partition does not give enough room for GRUB's core.img to be installed on the disk when using RAID modules for the /boot partition.
Generally, this core.img requires more than 30 sectors of space prior to the first partition. To solve this problem, we could continue using the MBR partition table and moving the partitions further out on the disk. Or, because we're using UEFI due to BIOS boot mode limitations not allowing alternate boot paths, we could go with GPT partition tables. With UEFI, we no longer need the core.img to load our GRUB boot loader. However, the partition table is still not optimal because it's starting at sector 20. This does not give very good partition alignment for performance. As most research into partition alignment will find, keeping partitions on a 1MB start, most block sizes will perform optimally performance-wise.
Also of note is the size of the EFI partition (partition 2). This partition is sized by Dell at 2GB. By most accounts, this is way overkill. I decided to lower it to 128MB. Which is still overkill for an EFI partition.
To achieve a proper GPT layout while still keeping the Dell diagnostic utilities, I utilize the second drive to build the initial partition table.
Currently, Disk 1 (/dev/sda), looks like this:
Partition | Type | Size | Start Sector | End Sector |
---|---|---|---|---|
1 | Diagnostic | 32 MB | 20 | 65579 |
2 | EFI System Partition | 2048 MB | 67584 | 4261887 |
For Disk 2 (/dev/sdc), I wanted to keep the same diagnostic partition size as it's rather small and the filesystem is already created by Dell. We create the 3rd partition for /boot because even though we can't RAID it properly in the installer, we can allow for it to consume a placeholder on both disks so that the 4th partition is created by the installer for the LVM Physical Volume containing all other partitions. NOTE: Due to how I have populated the drives in my system, my second disk is /dev/sdc. Yours may be /dev/sdb.
Partition | Type | Size | Start Sector | End Sector |
---|---|---|---|---|
1 | Diagnostic | 32 MB | 2048 | 67607 |
2 | EFI System Partition | 128 MB | 69632 | 331776 |
3 | /boot | 500 MB | 333824 | 1357824 |
The math behind this is fairly simple. The idea is to keep the start sector on a 1MB boundary. As noted above, the sector size of the disk is 512 Bytes.
- Calculate Diagnostic Partition Start/End Sectors
- Start Sector = 2048 (1MB)
- End Sector = Start Sector + (sda1EndSector - sda1StartSector)
- Calculate EFI Partition Start/End Sectors
- Start Sector = 2048 * (whole(sdc1EndSector / 2048) + 1)
- End Sector = sdc2StartSector + ((128 * 1024 * 1024) / 512)
- Calculate /boot Filesystem Partition Start/End Sectors
- Start Sector = 2048 * (whole(sdc2EndSector / 2048) + 1)
- End Sector = sdc3StartSector + ((500 * 1024 * 1024) / 512)
Setup Initial Partitions
First thing's first, we need to get to the commandline. There are two methods of doing this - with and without SSH. By using SSH to run the commands, it may be easier to copy/paste and the screen's terminal width may be easier on the eyes.
- Boot the system from the Installation DVD.
- To perform the tasks in the following section with SSH
- Highlight "Install CentOS 7"
- Press "e" key to edit boot parameters
- On line starting with "linuxefi", add "sshd" to the end of the line.
- Press CTRL+x to boot
- On the welcome screen, press CTRL+ALT+F2 to get a shell prompt
- To get a list of interfaces to use, run: ip link show
- To add an IP address to connect to, run: ip addr add <IP ADDRESS>/<CIDR PREFIX> dev <INTERFACE>
- Use your favorite SSH client to connect as root to the host on the IP address you assigned. There will be no password.
- To perform the tasks in the following section without SSH
- Highlight "Install CentOS 7"
- Press Enter key
- Press CTRL+ALT+F2 to get a shell prompt
Now that we have a commandline prompt, lets gather some information from the partition tables of the disks before we begin. If you are logged in via SSH, you may want to copy this information in case you would like to reset the partitions back like they were originally (see "Reset Back Partitions" section). If you're on the console, maybe snap a picture. Here I've captured with fdisk and parted for good measure. The "Disk identifier" may be worth writing down in either case if you decide to reset your system. Generally in the Linux world, this isn't really used much. But from what I understand, in the Windows world, it can be used in licensing software.
# fdisk -l /dev/sda Disk /dev/sda: 500.1 GB, 500107862016 bytes, 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0xd4e911ee Device Boot Start End Blocks Id System /dev/sda1 20 65579 32780 de Dell Utility /dev/sda2 * 67584 4261887 2097152 c W95 FAT32 (LBA)
# parted /dev/sda unit s print Model: ATA WDC WD5003ABYX-1 (scsi) Disk /dev/sda: 976773168s Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 20s 65579s 65560s primary fat16 diag 2 67584s 4261887s 4194304s primary fat32 boot, lba
Create the partitions on Disk 2 (/dev/sdc).
# parted /dev/sdc mklabel gpt mkpart Dell_Utility fat16 2048s 67607s mkpart EFI fat32 69632s 331776s mkpart BOOTFS ext4 333824s 1357824s set 1 diag on set 2 boot on set 3 raid on unit mib print align-check opt 1 align-check opt 2 align-check opt 3 Model: ATA ST320DM000-1BD14 (scsi) Disk /dev/sdc: 305245MiB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1.00MiB 33.0MiB 32.0MiB Dell_Utility diag 2 34.0MiB 162MiB 128MiB EFI boot 3 163MiB 663MiB 500MiB BOOTFS raid 1 aligned 2 aligned 3 aligned Information: You may need to update /etc/fstab.
Copy data from Dell Diagnostic Partition of Disk 1 to Disk 2
# dd if=/dev/sda1 of=/dev/sdc1 65560+0 records in 65560+0 records out 33566720 bytes (34 MB) copied, 0.958902 s, 35.0 MB/s
Wipe partition table on Disk 1 (/dev/sda)
# dd if=/dev/zero of=/dev/sda bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000375992 s, 1.4 MB/s
Replicate partition table from Drive 2 to Drive 1
# sgdisk -R /dev/sda /dev/sdc The operation has completed successfully.
Assign new UUIDs to Disk 1's partitions.
# sgdisk -G /dev/sda The operation has completed successfully.
Copy data from Dell Diagnostic Partition of Disk 2 to Disk 1
# dd if=/dev/sdc1 of=/dev/sda1 65560+0 records in 65560+0 records out 33566720 bytes (34 MB) copied, 0.988221 s, 34.0 MB/s
Reboot system for good measure since partition table has been changed
# reboot
Finish Install with DVD
Now that we have identical partitions on both drives that will participate in our RAID1 boot disks, we can install the OS and allow disk setup to use LVM for all other filesystems. Since SSH will not be needed during this step, just boot from the DVD and select "Install CentOS 7". Choose the proper keyboard setup and click "Continue". Since I've noticed that LVM seems to include the hostname as an owner of LVM volumes I just complete the networking setup first followed by date/time so that we can enable NTP.
In the "Installation Destination" screen, you will want to select both drive icons you wish to use as the boot drives. Then ensure the "I will configure partitioning" is selected and click "Done". You will be taken to a page where you can complete the disk setup. I used the following layout:
Device / VG | Device Type | Mount Point | Filesystem | Size (MiB) |
---|---|---|---|---|
sda2 | Standard Partition | /boot/efi | EFI System Partition | 128 |
sda3 | Standard Partition | /boot | ext4 | 500 |
vg00 | LVM | / | ext4 | 3072 |
vg00 | LVM | swap | swap | 6144 |
vg00 | LVM | /home | ext4 | 1024 |
vg00 | LVM | /opt | ext4 | 1024 |
vg00 | LVM | /tmp | ext4 | 1024 |
vg00 | LVM | /usr | ext4 | 3072 |
vg00 | LVM | /var | ext4 | 2048 |
Notice the devices "sda2" and "sda3" are listed as Standard Partition. For sda2, this is apparent as we cannot RAID this volume. For sda3, the installer will not work as intended for making this volume a RAID1 device. So we'll make it by hand in the next section below.
For vg00 above, when configuring the "/" partition, I renamed the Volume Group from "centos_<hostname>" to "vg00". Also, because my drives are not exactly identical, I chose Size Policy = Fixed and gave it a size fitting within the smaller drive (295 GiB).
When the configuration is satisfactory, click "Done". A window will appear with a summary of changes showing "Destroy Format" and "Create Device/Format". Click "Accept Changes" when satisfied.
Of course, add the root password and a standard user account for yourself.
When the install is complete, click "Reboot".
After Installation
Update and Install Additional Utilties
TIP: If you had connected to the server via SSH during the install, you should remove the host-key from your ~/.ssh/known_hosts file on the client system. Example:
[client]$ ssh-keygen -R 10.1.1.2 /home/user/.ssh/known_hosts updated. Original contents retained as /home/user/.ssh/known_hosts.old
Update to latest patches. This will also install any newer kernels needed wich will assist with creating initrd images later.
# yum -y update
Install helper apps to assist in completion of the rest of the exercise
# yum -y install patch gdisk
Reboot system to boot into fresh kernel and updated packages
# reboot
Configure Alternate Disk for Failover
Move /boot to RAID1
Create RAID1 device from a "missing" drive and sdc3. This will become /boot later.
# mdadm --create /dev/md/boot --level=1 --raid-devices=2 --metadata=default --bitmap=internal missing /dev/sdc3 mdadm: array /dev/md/boot started.
Create ext4 filesystem on the device just created.
# mkfs.ext4 /dev/md/boot mke2fs 1.42.9 (28-Dec-2013) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) Stride=4 blocks, Stripe width=4 blocks 128016 inodes, 511680 blocks 25584 blocks (5.00%) reserved for the super user First data block=1 Maximum filesystem blocks=34078720 63 block groups 8192 blocks per group, 8192 fragments per group 2032 inodes per group Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409 Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done
Add entry into /etc/mdadm.conf for newly created RAID device
# mdadm --examine --scan | grep /dev/md/boot >>/etc/mdadm.conf
Copy over /boot (/dev/sda3) to /tmp/boot (/dev/md/boot), modify /etc/fstab with new device, and remount filesystems
# umount /boot/efi # mkdir /tmp/boot # mount /dev/md/boot /tmp/boot # cd /boot # cp -a . /tmp/boot # cd # umount /boot # umount /tmp/boot # sed -i "s%^UUID=$(blkid /dev/sda3 | awk -F\" '{print $2}') %/dev/md/boot %" /etc/fstab # mount /boot # mount /boot/efi
Add in /dev/sda3 into /dev/md/boot RAID1 device
# mdadm /dev/md/boot --add /dev/sda3 mdadm: added /dev/sda3
Create a new initrd in /boot for pre-loading the system from RAID and LVM (we changed /etc/mdadm.conf and /etc/fstab)
# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak # dracut
Add UUID for /dev/md/boot to /etc/default/grub so that the grub config will contain it.
# sed -i "s/\(rd.md.uuid\)/\1=$(mdadm --detail /dev/md/boot | awk '/UUID/ {print $3}') \1/" /etc/default/grub
Patch GRUB2 due to bug - https://bugs.centos.org/view.php?id=7651
# cp /usr/share/grub/grub-mkconfig_lib /usr/share/grub/grub-mkconfig_lib.orig # vi /tmp/grub-mkconfig_lib.patch
Place following data into file
--- a/util/grub-mkconfig_lib.in 2014-06-30 16:16:11.000000000 +0000 +++ a/util/grub-mkconfig_lib.in 2014-12-08 23:05:56.936903046 +0000 @@ -263,13 +263,14 @@ version_find_latest () { - version_find_latest_a="" - for i in "$@" ; do - if version_test_gt "$i" "$version_find_latest_a" ; then - version_find_latest_a="$i" - fi - done - echo "$version_find_latest_a" + { + for i in "$@"; do + echo $i + done | grep -v rescue | sed 's/.x86_64$//g' | sort -V -r | sed 's/$/.x86_64/g' + for i in "$@"; do + echo $i + done | grep rescue | sort -V + } | head -n 1 } # One layer of quotation is eaten by "" and the second by sed; so this turns
Then patch the /usr/share/grub/grub-mkconfig_lib file with the patch just created.
# patch -b /usr/share/grub/grub-mkconfig_lib /tmp/grub-mkconfig_lib.patch patching file /usr/share/grub/grub-mkconfig_lib
Re-build the grub2 configuration now that /dev/md/boot is our new boot device
# cp -a /boot/efi/EFI/centos/grub.cfg /boot/efi/EFI/centos/grub.cfg.orig # grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg Generating grub configuration file ... Found linux image: /boot/vmlinuz-3.10.0-229.1.2.el7.x86_64 Found initrd image: /boot/initramfs-3.10.0-229.1.2.el7.x86_64.img /usr/sbin/grub2-probe: warning: Couldn't find physical volume ‘(null)’. Some modules may be missing from core image.. /usr/sbin/grub2-probe: warning: Couldn't find physical volume ‘(null)’. Some modules may be missing from core image.. /usr/sbin/grub2-probe: warning: Couldn't find physical volume ‘(null)’. Some modules may be missing from core image.. Found linux image: /boot/vmlinuz-3.10.0-229.el7.x86_64 Found initrd image: /boot/initramfs-3.10.0-229.el7.x86_64.img Found linux image: /boot/vmlinuz-0-rescue-fa93ec7a890f4ba2a29a0896ea7146b9 Found initrd image: /boot/initramfs-0-rescue-fa93ec7a890f4ba2a29a0896ea7146b9.img done
Configure Alternate UEFI Firmware Boot Path
Copy contents of /boot/efi partition (/dev/sda2) to second drive
# umount /dev/sda2 # dd if=/dev/sda2 of=/dev/sdc2 262145+0 records in 262145+0 records out 134218240 bytes (134 MB) copied, 2.49582 s, 53.8 MB/s # mount /boot/efi
Add in an EFI entry for the second drive HA booting
# efibootmgr -v -c -L CentOS-Alt -d /dev/sdc -p 2 -l '\EFI\centos\shim.efi' BootCurrent: 0006 Timeout: 0 seconds BootOrder: 0008,0006,0000,0001,0002,0003,0004,0005,0007 Boot0000* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0001* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0002* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc) Boot0008* CentOS-Alt HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)File(\EFI\centos\shim.efi)
Keep an eye out for when the drives are synched up.
# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sda3[2] sdc3[1] 511680 blocks super 1.2 [2/1] [_U] resync=DELAYED bitmap: 1/1 pages [4KB], 65536KB chunk md127 : active raid1 sdc4[1] sda4[0] 311472128 blocks super 1.2 [2/2] [UU] [==================>..] resync = 94.8% (295389696/311472128) finish=4.1min speed=64520K/sec bitmap: 1/3 pages [4KB], 65536KB chunk unused devices: <none>
Example of sync'd raid volumes
# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sda3[2] sdc3[1] 511680 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md127 : active raid1 sdc4[1] sda4[0] 311472128 blocks super 1.2 [2/2] [UU] bitmap: 0/3 pages [0KB], 65536KB chunk unused devices: <none>
Collect output from efibootmgr for comparison after reboot (next).
# efibootmgr -v BootCurrent: 0006 Timeout: 0 seconds BootOrder: 0008,0006,0000,0001,0002,0003,0004,0005,0007 Boot0000* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0001* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0002* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc) Boot0008* CentOS-Alt HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)File(\EFI\centos\shim.efi)
Reboot to ensure everything is working as expected.
NOTE: DO NOT REBOOT UNLESS MDSTAT OUTPUT LOOKS SYNC'D AS ABOVE!!!
# reboot
Collect output from efibootmgr for comparison against output taken before reboot.
# efibootmgr -v BootCurrent: 0008 Timeout: 0 seconds BootOrder: 0008,0006,0000,0001,0002,0003,0004,0005,0007,0009,000A Boot0000* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0001* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0002* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc) Boot0008* CentOS-Alt HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)File(\EFI\centos\shim.efi) Boot0009* CentOS HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)File(\EFI\centos\shim.efi) Boot000A* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)
As we can see, it appears my Dell UEFI firmware is auto-discovering my /dev/sdc drive (97c43028-85e9-4ecd-81f4-4c7d86a1819c) as it placed an additional entry automatically. If this is the case, feel free to remove the boot entry you just created before (CentOS-Alt).
# efibootmgr -v -B -b 8 BootCurrent: 0008 Timeout: 0 seconds BootOrder: 0006,0000,0001,0002,0003,0004,0005,0007,0009,000A Boot0000* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0001* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0002* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc) Boot0009* CentOS HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)File(\EFI\centos\shim.efi) Boot000A* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)
Confirm which disk the system booted from.
# efibootmgr -v | grep BootCurrent BootCurrent: 0008
Set proper boot order so that boot paths are Disk 1, Disk 2, DVD, and everything else.
# efibootmgr -v -o 0006,0009,0000,0001,0002,0003,0004,0005,0007,000A BootCurrent: 0008 Timeout: 0 seconds BootOrder: 0006,0009,0000,0001,0002,0003,0004,0005,0007,000A Boot0000* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0001* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0002* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc) Boot0009* CentOS HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)File(\EFI\centos\shim.efi) Boot000A* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)
Reboot System off Disk 1
# reboot
Confirm the system booted from Disk 1
# efibootmgr -v | grep BootCurrent BootCurrent: 0006
Set Disk 2 as next boot path
# efibootmgr -v -n 9 BootNext: 0009 BootCurrent: 0006 Timeout: 0 seconds BootOrder: 0006,0009,0000,0001,0002,0003,0004,0005,0007,000A Boot0000* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0001* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0002* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc) Boot0009* CentOS HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)File(\EFI\centos\shim.efi) Boot000A* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)
Reboot System off Disk 2
# reboot
Confirm the system booted from Disk 2
# efibootmgr -v | grep BootCurrent BootCurrent: 0009
Testing Drive Failures
Of course, who would trust this to be an accurate procedure. We need to test drive failures to negate a false sense of security. First, we'll test the alternate, then we'll test the primary. These tests are ran on a system which enabled hot-swap drive bays. If your system doesn't have hot-swap drive bays, you perform the operation and tests by shutting down the system, and removing/adding the drive then booting back up.
Alternate Drive
Simulate Immediate Failure of Drive 2
Boot the system.
Review drive and partition layout.
# lsblk -i NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk |-sda1 8:1 0 32M 0 part |-sda2 8:2 0 128M 0 part /boot/efi |-sda3 8:3 0 500M 0 part | `-md127 9:127 0 499.7M 0 raid1 /boot `-sda4 8:4 0 295.2G 0 part `-md126 9:126 0 295G 0 raid1 |-vg00-swap 253:0 0 6G 0 lvm [SWAP] |-vg00-usr 253:1 0 3G 0 lvm /usr |-vg00-root 253:2 0 3G 0 lvm / |-vg00-home 253:3 0 1G 0 lvm /home |-vg00-opt 253:4 0 1G 0 lvm /opt |-vg00-tmp 253:5 0 1G 0 lvm /tmp `-vg00-var 253:6 0 2G 0 lvm /var sdb 8:16 0 931.5G 0 disk sdc 8:32 0 298.1G 0 disk |-sdc1 8:33 0 32M 0 part |-sdc2 8:34 0 128M 0 part |-sdc3 8:35 0 500M 0 part | `-md127 9:127 0 499.7M 0 raid1 /boot `-sdc4 8:36 0 295.2G 0 part `-md126 9:126 0 295G 0 raid1 |-vg00-swap 253:0 0 6G 0 lvm [SWAP] |-vg00-usr 253:1 0 3G 0 lvm /usr |-vg00-root 253:2 0 3G 0 lvm / |-vg00-home 253:3 0 1G 0 lvm /home |-vg00-opt 253:4 0 1G 0 lvm /opt |-vg00-tmp 253:5 0 1G 0 lvm /tmp `-vg00-var 253:6 0 2G 0 lvm /var sdd 8:48 0 931.5G 0 disk sde 8:64 0 931.5G 0 disk sdf 8:80 0 931.5G 0 disk sr0 11:0 1 636M 0 rom
Identify /dev/sdc (Drive 2) if unsure. Look at the front of the server for a near-steady activity light. Press CTRL+c to end the dd command.
# # dd if=/dev/sdc of=/dev/null ^C1128969+0 records in 1128968+0 records out 578031616 bytes (578 MB) copied, 4.72669 s, 122 MB/s
Physically hot-pull /dev/sdc from the server.
Check dmesg for kernel output. Notice how the drive went offline.
# dmesg ... [ 2318.704296] md: md126 still in use. [ 2318.704321] md: md127 still in use. [ 2318.705400] md/raid1:md126: Disk failure on sdc4, disabling device. md/raid1:md126: Operation continuing on 1 devices. [ 2318.705401] md/raid1:md127: Disk failure on sdc3, disabling device. md/raid1:md127: Operation continuing on 1 devices. [ 2318.740408] RAID1 conf printout: [ 2318.740411] --- wd:1 rd:2 [ 2318.740413] disk 0, wo:0, o:1, dev:sda3 [ 2318.740415] disk 1, wo:1, o:0, dev:sdc3 [ 2318.744219] RAID1 conf printout: [ 2318.744221] --- wd:1 rd:2 [ 2318.744222] disk 0, wo:0, o:1, dev:sda4 [ 2318.744224] disk 1, wo:1, o:0, dev:sdc4 [ 2318.791619] sd 0:0:2:0: [sdc] Synchronizing SCSI cache [ 2318.791693] sd 0:0:2:0: [sdc] [ 2318.791696] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2318.791923] mpt2sas0: removing handle(0x000d), sas_addr(0x500065b36789abe6) [ 2318.800733] RAID1 conf printout: [ 2318.800736] RAID1 conf printout: [ 2318.800738] --- wd:1 rd:2 [ 2318.800740] disk 0, wo:0, o:1, dev:sda4 [ 2318.800742] --- wd:1 rd:2 [ 2318.800744] disk 0, wo:0, o:1, dev:sda3 [ 2318.815497] md: unbind<sdc3> [ 2318.819276] md: unbind<sdc4> [ 2318.825654] md: export_rdev(sdc3) [ 2318.828659] md: export_rdev(sdc4)
Check RAID status. Should show one drive missing in RAID1 arrays associated with this drive.
# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sda4[0] 309374976 blocks super 1.2 [2/1] [U_] bitmap: 1/3 pages [4KB], 65536KB chunk md127 : active raid1 sda3[2] 511680 blocks super 1.2 [2/1] [U_] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none>
Check the mounted filesystems. If you see /dev/sdc2 (/boot/efi) mounted, go ahead and unmount it. Instead, you may see an error as such. Unmount /boot/efi, then mount it again. This time, /dev/sda2 should show up.
# df df: ‘/boot/efi’: Input/output error Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg00-root 3030800 30216 2826916 2% / devtmpfs 1878604 0 1878604 0% /dev tmpfs 1921220 0 1921220 0% /dev/shm tmpfs 1921220 8712 1912508 1% /run tmpfs 1921220 0 1921220 0% /sys/fs/cgroup /dev/mapper/vg00-usr 3030800 876624 1980508 31% /usr /dev/md127 487314 153730 303904 34% /boot /dev/mapper/vg00-var 1998672 78916 1798516 5% /var /dev/mapper/vg00-opt 999320 2564 927944 1% /opt /dev/mapper/vg00-tmp 999320 2604 927904 1% /tmp /dev/mapper/vg00-home 999320 2580 927928 1% /home # umount /boot/efi # mount /boot/efi # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg00-root 3030800 30216 2826916 2% / devtmpfs 1878604 0 1878604 0% /dev tmpfs 1921220 0 1921220 0% /dev/shm tmpfs 1921220 8712 1912508 1% /run tmpfs 1921220 0 1921220 0% /sys/fs/cgroup /dev/mapper/vg00-usr 3030800 876624 1980508 31% /usr /dev/md127 487314 153730 303904 34% /boot /dev/mapper/vg00-var 1998672 78916 1798516 5% /var /dev/mapper/vg00-opt 999320 2564 927944 1% /opt /dev/mapper/vg00-tmp 999320 2604 927904 1% /tmp /dev/mapper/vg00-home 999320 2580 927928 1% /home /dev/sda2 130800 9980 120820 8% /boot/efi
Reboot to test full boot process with only Drive 1.
# reboot
Review drive and partition layout. Interestingly, because Drive 1 is removed, Drive 2 shows as /dev/sda now.
# lsblk -i NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk |-sda1 8:1 0 32M 0 part |-sda2 8:2 0 128M 0 part /boot/efi |-sda3 8:3 0 500M 0 part | `-md127 9:127 0 499.7M 0 raid1 /boot `-sda4 8:4 0 295.2G 0 part `-md126 9:126 0 295G 0 raid1 |-vg00-swap 253:0 0 6G 0 lvm [SWAP] |-vg00-usr 253:1 0 3G 0 lvm /usr |-vg00-root 253:2 0 3G 0 lvm / |-vg00-home 253:3 0 1G 0 lvm /home |-vg00-opt 253:4 0 1G 0 lvm /opt |-vg00-tmp 253:5 0 1G 0 lvm /tmp `-vg00-var 253:6 0 2G 0 lvm /var sdb 8:16 0 931.5G 0 disk sdc 8:32 0 931.5G 0 disk sdd 8:48 0 931.5G 0 disk sde 8:64 0 931.5G 0 disk sr0 11:0 1 636M 0 rom
Check to ensure /boot/efi is mounted with /dev/sda2.
# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg00-root 3030800 30220 2826912 2% / devtmpfs 1878612 0 1878612 0% /dev tmpfs 1921220 0 1921220 0% /dev/shm tmpfs 1921220 8708 1912512 1% /run tmpfs 1921220 0 1921220 0% /sys/fs/cgroup /dev/mapper/vg00-usr 3030800 876620 1980512 31% /usr /dev/md127 487314 149804 307830 33% /boot /dev/mapper/vg00-var 1998672 125628 1751804 7% /var /dev/mapper/vg00-tmp 999320 2604 927904 1% /tmp /dev/mapper/vg00-opt 999320 2564 927944 1% /opt /dev/mapper/vg00-home 999320 2580 927928 1% /home /dev/sda2 130800 9980 120820 8% /boot/efi
Fabricate a "New" Drive
We need to simulate an new drive insertion/rebuild, so we must properly clean the partitions of any filesystem and RAID metadata, then wipe the partition table.
Physically hot-plug Disk 2 back into the server.
Check dmesg for kernel output. Notice which drive it shows up as (/dev/sdf in my case).
# dmesg ... [ 6985.006249] scsi 0:0:6:0: Direct-Access ATA ST320DM000-1BD14 KC48 PQ: 0 ANSI: 5 [ 6985.006256] scsi 0:0:6:0: SATA: handle(0x0011), sas_addr(0x500065b36789abe6), phy(6), device_name(0xc5005000d04c64af) [ 6985.006259] scsi 0:0:6:0: SATA: enclosure_logical_id(0x500065b37689abff), slot(2) [ 6985.006330] scsi 0:0:6:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) [ 6985.006333] scsi 0:0:6:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 6985.011154] sd 0:0:6:0: [sdf] physical block alignment offset: 4096 [ 6985.011160] sd 0:0:6:0: [sdf] 625142448 512-byte logical blocks: (320 GB/298 GiB) [ 6985.011162] sd 0:0:6:0: [sdf] 4096-byte physical blocks [ 6985.040784] sd 0:0:6:0: [sdf] Write Protect is off [ 6985.040788] sd 0:0:6:0: [sdf] Mode Sense: 7f 00 00 08 [ 6985.075954] sd 0:0:6:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 6985.192100] sdf: sdf1 sdf2 sdf3 sdf4 [ 6985.230365] sd 0:0:6:0: [sdf] Attached SCSI disk
Wipe partition 1 (Dell Utilities)
# dd if=/dev/zero of=/dev/sdf1 bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.0184175 s, 27.8 kB/s
Wipe partition 2 (EFI System Partition)
# dd if=/dev/zero of=/dev/sdf2 bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000612183 s, 836 kB/s
Zero RAID superblock on partition 3 (/boot)
# mdadm --zero-superblock /dev/sdf3
Zero RAID superblock on partition 4 (pv00/vg00)
# mdadm --zero-superblock /dev/sdf4
Wipe partition table of Drive 2.
# dd if=/dev/zero of=/dev/sdf bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000822915 s, 622 kB/s
Physically hot-pull Drive 2 from system.
Reboot system from only Drive 1 again.
# reboot
Simulate Replacement Drive Insertion and Rebuild
Now that we have a "new" empty drive, we want to test the insertion and rebuild process.
NOTE: This can be referenced for actual replacement drive rebuild.
Physically hot-plug Drive 2 into the server.
Check dmesg for kernel output. You should see the new drive appear (in my case, it was /dev/sdf again).
# dmesg ... [ 102.366543] scsi 0:0:6:0: Direct-Access ATA ST320DM000-1BD14 KC48 PQ: 0 ANSI: 5 [ 102.366550] scsi 0:0:6:0: SATA: handle(0x0011), sas_addr(0x500065b36789abe6), phy(6), device_name(0xc5005000d04c64af) [ 102.366553] scsi 0:0:6:0: SATA: enclosure_logical_id(0x500065b37689abff), slot(2) [ 102.366628] scsi 0:0:6:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) [ 102.366631] scsi 0:0:6:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 102.371646] sd 0:0:6:0: [sdf] physical block alignment offset: 4096 [ 102.371651] sd 0:0:6:0: [sdf] 625142448 512-byte logical blocks: (320 GB/298 GiB) [ 102.371654] sd 0:0:6:0: [sdf] 4096-byte physical blocks [ 102.400786] sd 0:0:6:0: [sdf] Write Protect is off [ 102.400790] sd 0:0:6:0: [sdf] Mode Sense: 7f 00 00 08 [ 102.430437] sd 0:0:6:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 102.532535] sdf: unknown partition table [ 102.611927] sd 0:0:6:0: [sdf] Attached SCSI disk
As we can see from the above output, there is no partition table. We will replicate one from the partition table of Disk 1 (/dev/sda).
# sgdisk -R /dev/sdf /dev/sda Caution! Secondary header was placed beyond the disk's limits! Moving the header, but other problems may occur! The operation has completed successfully.
Due to the above error, we should confirm that both drive partition tables are identical. Everything looks alright here. The error is very likely the result of having two different sized drives. This is why we set the Volume Group during the install to "Fixed Size".
# parted /dev/sda unit s print Model: ATA WDC WD5003ABYX-1 (scsi) Disk /dev/sda: 976773168s Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 2048s 67607s 65560s fat16 Dell_Utility diag 2 69632s 331776s 262145s fat16 EFI System Partition boot 3 333824s 1357824s 1024001s ext4 BOOTFS raid 4 1359872s 620371967s 619012096s raid
# parted /dev/sdf unit s print Model: ATA ST320DM000-1BD14 (scsi) Disk /dev/sdf: 625142448s Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 2048s 67607s 65560s Dell_Utility diag 2 69632s 331776s 262145s EFI System Partition boot 3 333824s 1357824s 1024001s BOOTFS raid 4 1359872s 620371967s 619012096s raid
When we replicated the partition table, we also replicated the UUIDs of the partitions. We can rectify this with sgdisk's -G flag. New UUIDs will be generated for the partitions.
# sgdisk -G /dev/sdf The operation has completed successfully.
Copy partition 1 (Dell Utilities) from Drive 1 to Drive 2.
# dd if=/dev/sda1 of=/dev/sdf1 65560+0 records in 65560+0 records out 33566720 bytes (34 MB) copied, 0.942867 s, 35.6 MB/s
Copy partition 2 (EFI System Partition) from Drive 1 to Drive 2. It's good to do this while the filesystem is quiesed, so we unmount it.
# umount /boot/efi # dd if=/dev/sda2 of=/dev/sdf2 262145+0 records in 262145+0 records out 134218240 bytes (134 MB) copied, 2.26022 s, 59.4 MB/s # mount /boot/efi
Remove old entry for Drive 2 in UEFI Firmware. To ensure we remove the one for Drive 2, and because we changed the UUIDs above, we will search for Drive 1 (/dev/sda2), and remove the other entry.
# efibootmgr -v BootCurrent: 0006 Timeout: 0 seconds BootOrder: 0006,0009,0000,0001,0002,0003,0004,0005,0007 Boot0000* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0001* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0002* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc) Boot0009* CentOS HD(2,11000,40001,97c43028-85e9-4ecd-81f4-4c7d86a1819c)File(\EFI\centos\shim.efi) # blkid /dev/sda2 /dev/sda2: SEC_TYPE="msdos" UUID="074B-E6E6" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="c0f13fae-e6f4-4a59-8445-f42d2f81bd83" # efibootmgr -v -B -b 9 BootCurrent: 0006 Timeout: 0 seconds BootOrder: 0006,0000,0001,0002,0003,0004,0005,0007 Boot0000* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0001* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0002* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)
Add a new entry to the UEFI Firmware for Disk 2 since the UUID is different. By default, when a new entry is added, it is set first in the boot order. This is fine for now. It will need to be tested.
# efibootmgr -v -c -L CentOS-Alt -d /dev/sdf -p 2 -l '\EFI\centos\shim.efi' BootCurrent: 0006 Timeout: 0 seconds BootOrder: 0008,0006,0000,0001,0002,0003,0004,0005,0007 Boot0000* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0001* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0002* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc) Boot0008* CentOS-Alt HD(2,11000,40001,4b68bcc1-b78b-4ef8-a799-e61d2797a427)File(\EFI\centos\shim.efi)
Add partition 3 back to /dev/md/boot RAID 1 array.
# mdadm /dev/md/boot --add /dev/sdf3 mdadm: added /dev/sdf3
Add partition 4 back to /dev/md/pv00 RAID 1 array.
# mdadm /dev/md/pv00 --add /dev/sdf4 mdadm: added /dev/sdf4
Keep an eye out for when the drives are synched up.
# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sdf4[2] sda4[0] 309374976 blocks super 1.2 [2/1] [U_] [====>................] recovery = 23.9% (73994176/309374976) finish=34.7min speed=112796K/sec bitmap: 0/3 pages [0KB], 65536KB chunk md127 : active raid1 sdf3[3] sda3[2] 511680 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none>
Example of sync'd raid volumes
# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sdf4[2] sda4[0] 309374976 blocks super 1.2 [2/2] [UU] bitmap: 1/3 pages [4KB], 65536KB chunk md127 : active raid1 sdf3[3] sda3[2] 511680 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none>
Reboot the system to ensure booting from Disk 2 works as expected.
NOTE: DO NOT REBOOT UNLESS MDSTAT OUTPUT LOOKS SYNC'D AS ABOVE!!!
# reboot
Have a look at the UEFI Firmware configuration. Two things of interest here:
a) BootCurrent - This tells us we successfully booted from /dev/sdc2.
b) Boot0007,Boot000A - This tells us Dell UEFI firmware automatically discovered and added EFI entries for the alternate path.
# efibootmgr -v BootCurrent: 0008 Timeout: 0 seconds BootOrder: 0008,0006,0000,0001,0002,0003,0004,0005,0007,0009,000A Boot0000* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0001* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0002* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0005* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,6b8f1ce8-e0c1-443b-867e-d3455a8157dc) Boot0008* CentOS-Alt HD(2,11000,40001,4b68bcc1-b78b-4ef8-a799-e61d2797a427)File(\EFI\centos\shim.efi) Boot0009* CentOS HD(2,11000,40001,4b68bcc1-b78b-4ef8-a799-e61d2797a427)File(\EFI\centos\shim.efi) Boot000A* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,4b68bcc1-b78b-4ef8-a799-e61d2797a427)
PLACEHOLDER
Let's clear up a few items with the UEFI Firmware configuration:
a) Since it is reasonable to expect the Dell UEFI Firmware is automatically adding entries, it's safe to remove the entry we manually added (-B -b). If your UEFI Firmware is not doing this, keep the manual entry.
b) Set a proper boot order of preferred devices (-o).
c) Because of item "a", we should reboot onto the alternate path the server automatically added (-n).
# efibootmgr -v -B -b 0 -o 0006,000A,0001 -n A BootNext: 000A BootCurrent: 0000 Timeout: 0 seconds BootOrder: 0006,000A,0001 Boot0001* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0002* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,c0f13fae-e6f4-4a59-8445-f42d2f81bd83) Boot0003 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0004 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0005 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0006* CentOS HD(2,11000,40001,c0f13fae-e6f4-4a59-8445-f42d2f81bd83)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0008 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0009 Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot000A* CentOS HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1)File(\EFI\centos\shim.efi)
Confirm all RAID devices are sync'd.
# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sdc4[2] sda4[0] 311472128 blocks super 1.2 [2/2] [UU] bitmap: 0/3 pages [0KB], 65536KB chunk md127 : active raid1 sdc3[3] sda3[2] 511680 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none>
Reboot the system.
# reboot
Have a look at the UEFI Firmware configuration. Items to note: BootCurrent is from Drive 2 and no additional entries created by server UEFI Firmware.
# efibootmgr -v BootCurrent: 000A Timeout: 0 seconds BootOrder: 0006,0001,0003,0004,0005,0008,0009,0002,0007,000A Boot0001* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0002* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,c0f13fae-e6f4-4a59-8445-f42d2f81bd83) Boot0003 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0004 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0005 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0006* CentOS HD(2,11000,40001,c0f13fae-e6f4-4a59-8445-f42d2f81bd83)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0008 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0009 Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot000A* CentOS HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1)File(\EFI\centos\shim.efi)
Primary Drive
Simulate Immediate Failure of Drive 1
Boot the system.
Review drive and partition layout.
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk |-sda1 8:1 0 32M 0 part |-sda2 8:2 0 128M 0 part /boot/efi |-sda3 8:3 0 500M 0 part | `-md126 9:126 0 499.7M 0 raid1 /boot `-sda4 8:4 0 297.2G 0 part `-md127 9:127 0 297G 0 raid1 |-vg00-swap 253:0 0 6G 0 lvm [SWAP] |-vg00-usr 253:1 0 3G 0 lvm /usr |-vg00-root 253:2 0 3G 0 lvm / |-vg00-home 253:3 0 1G 0 lvm /home |-vg00-opt 253:4 0 1G 0 lvm /opt |-vg00-tmp 253:5 0 1G 0 lvm /tmp `-vg00-var 253:6 0 2G 0 lvm /var sdb 8:16 0 931.5G 0 disk sdc 8:32 0 298.1G 0 disk |-sdc1 8:33 0 32M 0 part |-sdc2 8:34 0 128M 0 part |-sdc3 8:35 0 500M 0 part | `-md126 9:126 0 499.7M 0 raid1 /boot `-sdc4 8:36 0 297.2G 0 part `-md127 9:127 0 297G 0 raid1 |-vg00-swap 253:0 0 6G 0 lvm [SWAP] |-vg00-usr 253:1 0 3G 0 lvm /usr |-vg00-root 253:2 0 3G 0 lvm / |-vg00-home 253:3 0 1G 0 lvm /home |-vg00-opt 253:4 0 1G 0 lvm /opt |-vg00-tmp 253:5 0 1G 0 lvm /tmp `-vg00-var 253:6 0 2G 0 lvm /var sdd 8:48 0 931.5G 0 disk sde 8:64 0 931.5G 0 disk sdf 8:80 0 931.5G 0 disk sr0 11:0 1 636M 0 rom
Identify /dev/sda (Drive 1) if unsure. Look at the front of the server for a near-steady activity light. Press CTRL+c to end the dd command.
# dd if=/dev/sda of=/dev/null ^C1043209+0 records in 1043208+0 records out 534122496 bytes (534 MB) copied, 3.89623 s, 137 MB/s
Physically hot-pull /dev/sda from the server.
Check dmesg for kernel output. Notice how the drive went offline.
# dmesg ... [ 1266.587093] md: md127 still in use. [ 1266.587109] md: md126 still in use. [ 1266.588274] md/raid1:md127: Disk failure on sda4, disabling device. md/raid1:md127: Operation continuing on 1 devices. [ 1266.588303] md/raid1:md126: Disk failure on sda3, disabling device. md/raid1:md126: Operation continuing on 1 devices. [ 1266.636331] RAID1 conf printout: [ 1266.636336] --- wd:1 rd:2 [ 1266.636338] disk 0, wo:1, o:0, dev:sda3 [ 1266.636340] disk 1, wo:0, o:1, dev:sdc3 [ 1266.649745] RAID1 conf printout: [ 1266.649747] --- wd:1 rd:2 [ 1266.649749] disk 0, wo:1, o:0, dev:sda4 [ 1266.649750] disk 1, wo:0, o:1, dev:sdc4 [ 1266.663228] RAID1 conf printout: [ 1266.663232] --- wd:1 rd:2 [ 1266.663235] disk 1, wo:0, o:1, dev:sdc3 [ 1266.663544] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 1266.663571] sd 0:0:0:0: [sda] [ 1266.663573] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 1266.663762] mpt2sas0: removing handle(0x000b), sas_addr(0x500065b36789abe2) [ 1266.666251] RAID1 conf printout: [ 1266.666254] --- wd:1 rd:2 [ 1266.666256] disk 1, wo:0, o:1, dev:sdc4 [ 1266.704295] md: unbind<sda3> [ 1266.710201] md: export_rdev(sda3) [ 1266.717735] md: unbind<sda4> [ 1266.726188] md: export_rdev(sda4)
Check RAID status. Should show one drive missing in RAID1 arrays associated with this drive.
# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sdc3[3] 511680 blocks super 1.2 [2/1] [_U] bitmap: 0/1 pages [0KB], 65536KB chunk md127 : active raid1 sdc4[2] 311472128 blocks super 1.2 [2/1] [_U] bitmap: 1/3 pages [4KB], 65536KB chunk unused devices: <none>
Check the mounted filesystems. If you see /dev/sda2 (/boot/efi) mounted, go ahead and unmount it. Instead, you may see an error as such. Unmount /boot/efi, then mount it again. This time, /dev/sdc2 should show up.
# df df: ‘/boot/efi’: Input/output error Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg00-root 3030800 30240 2826892 2% / devtmpfs 1878604 0 1878604 0% /dev tmpfs 1921220 0 1921220 0% /dev/shm tmpfs 1921220 8712 1912508 1% /run tmpfs 1921220 0 1921220 0% /sys/fs/cgroup /dev/mapper/vg00-usr 3030800 878880 1978252 31% /usr /dev/md126 487314 153730 303904 34% /boot /dev/mapper/vg00-tmp 999320 2604 927904 1% /tmp /dev/mapper/vg00-opt 999320 2564 927944 1% /opt /dev/mapper/vg00-home 999320 2580 927928 1% /home /dev/mapper/vg00-var 1998672 81280 1796152 5% /var # umount /boot/efi # mount /boot/efi # df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg00-root 3030800 30240 2826892 2% / devtmpfs 1878604 0 1878604 0% /dev tmpfs 1921220 0 1921220 0% /dev/shm tmpfs 1921220 8712 1912508 1% /run tmpfs 1921220 0 1921220 0% /sys/fs/cgroup /dev/mapper/vg00-usr 3030800 878880 1978252 31% /usr /dev/md126 487314 153730 303904 34% /boot /dev/mapper/vg00-tmp 999320 2604 927904 1% /tmp /dev/mapper/vg00-opt 999320 2564 927944 1% /opt /dev/mapper/vg00-home 999320 2580 927928 1% /home /dev/mapper/vg00-var 1998672 81280 1796152 5% /var /dev/sdc2 130800 9980 120820 8% /boot/efi
Reboot to test full boot process with only Drive 2.
# reboot
Review drive and partition layout. Interestingly, because Drive 1 is removed, Drive 2 shows as /dev/sda now.
# lsblk -i NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 298.1G 0 disk |-sda1 8:1 0 32M 0 part |-sda2 8:2 0 128M 0 part /boot/efi |-sda3 8:3 0 500M 0 part | `-md127 9:127 0 499.7M 0 raid1 /boot `-sda4 8:4 0 297.2G 0 part `-md126 9:126 0 297G 0 raid1 |-vg00-swap 253:0 0 6G 0 lvm [SWAP] |-vg00-usr 253:1 0 3G 0 lvm /usr |-vg00-root 253:2 0 3G 0 lvm / |-vg00-home 253:3 0 1G 0 lvm /home |-vg00-opt 253:4 0 1G 0 lvm /opt |-vg00-tmp 253:5 0 1G 0 lvm /tmp `-vg00-var 253:6 0 2G 0 lvm /var sdb 8:16 0 931.5G 0 disk sdc 8:32 0 931.5G 0 disk sdd 8:48 0 931.5G 0 disk sde 8:64 0 931.5G 0 disk sr0 11:0 1 636M 0 rom
Check to ensure /boot/efi is mounted with /dev/sda2.
# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg00-root 3030800 30240 2826892 2% / devtmpfs 1878604 0 1878604 0% /dev tmpfs 1921220 0 1921220 0% /dev/shm tmpfs 1921220 8708 1912512 1% /run tmpfs 1921220 0 1921220 0% /sys/fs/cgroup /dev/mapper/vg00-usr 3030800 878880 1978252 31% /usr /dev/mapper/vg00-tmp 999320 2604 927904 1% /tmp /dev/mapper/vg00-home 999320 2580 927928 1% /home /dev/mapper/vg00-opt 999320 2564 927944 1% /opt /dev/mapper/vg00-var 1998672 81856 1795576 5% /var /dev/md127 487314 153730 303904 34% /boot /dev/sda2 130800 9980 120820 8% /boot/efi
Fabricate a "New" Drive
We need to simulate an new drive insertion/rebuild, so we must properly clean the partitions of any filesystem and RAID metadata, then wipe the partition table.
Physically hot-plug Disk 1 back into the server.
Check dmesg for kernel output. Notice which drive it shows up as (/dev/sdf in my case).
# dmesg ... [ 518.440017] scsi 0:0:6:0: Direct-Access ATA WDC WD5003ABYX-1 1S05 PQ: 0 ANSI: 5 [ 518.440023] scsi 0:0:6:0: SATA: handle(0x0011), sas_addr(0x500065b36789abe2), phy(2), device_name(0x4ee050019c145942) [ 518.440025] scsi 0:0:6:0: SATA: enclosure_logical_id(0x500065b37689abff), slot(0) [ 518.440096] scsi 0:0:6:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) [ 518.440100] scsi 0:0:6:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 518.444628] sd 0:0:6:0: [sdf] 976773168 512-byte logical blocks: (500 GB/465 GiB) [ 518.452048] sd 0:0:6:0: [sdf] Write Protect is off [ 518.452053] sd 0:0:6:0: [sdf] Mode Sense: 7f 00 00 08 [ 518.454463] sd 0:0:6:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 518.517754] sdf: sdf1 sdf2 sdf3 sdf4 [ 518.532152] sd 0:0:6:0: [sdf] Attached SCSI disk
Wipe partition 1 (Dell Utilities)
# dd if=/dev/zero of=/dev/sdf1 bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000468993 s, 1.1 MB/s
Wipe partition 2 (EFI System Partition)
# dd if=/dev/zero of=/dev/sdf2 bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000407036 s, 1.3 MB/s
Zero RAID superblock on partition 3 (/boot)
# mdadm --zero-superblock /dev/sdf3
Zero RAID superblock on partition 4 (pv00/vg00)
# mdadm --zero-superblock /dev/sdf4
Wipe partition table of Drive 1.
# dd if=/dev/zero of=/dev/sdf bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00049619 s, 1.0 MB/s
Physically hot-pull Drive 1 from system.
Reboot system from only Drive 2 again.
# reboot
Simulate Replacement Drive Insertion and Rebuild
Now that we have a "new" empty drive, we want to test the insertion and rebuild process.
NOTE: This can be referenced for actual replacement drive rebuild.
Physically hot-plug Drive 1 into the server.
Check dmesg for kernel output. You should see the new drive appear (in my case, it was /dev/sdf again).
# dmesg ... [ 80.610518] scsi 0:0:6:0: Direct-Access ATA WDC WD5003ABYX-1 1S05 PQ: 0 ANSI: 5 [ 80.610526] scsi 0:0:6:0: SATA: handle(0x0011), sas_addr(0x500065b36789abe2), phy(2), device_name(0x4ee050019c145942) [ 80.610531] scsi 0:0:6:0: SATA: enclosure_logical_id(0x500065b37689abff), slot(0) [ 80.610605] scsi 0:0:6:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) [ 80.610609] scsi 0:0:6:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1) [ 80.614699] sd 0:0:6:0: [sdf] 976773168 512-byte logical blocks: (500 GB/465 GiB) [ 80.621857] sd 0:0:6:0: [sdf] Write Protect is off [ 80.621861] sd 0:0:6:0: [sdf] Mode Sense: 7f 00 00 08 [ 80.624116] sd 0:0:6:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 80.660722] sdf: unknown partition table [ 80.698154] sd 0:0:6:0: [sdf] Attached SCSI disk
As we can see from the above output, there is no partition table. We will replicate one from the partition table of Disk 2 (/dev/sda).
# sgdisk -R /dev/sdf /dev/sda The operation has completed successfully.
Confirm that both drive partition tables are identical.
# parted /dev/sda unit s print Model: ATA ST320DM000-1BD14 (scsi) Disk /dev/sda: 625142448s Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 2048s 67607s 65560s fat16 Dell_Utility diag 2 69632s 331776s 262145s fat16 EFI System Partition boot 3 333824s 1357824s 1024001s BOOTFS raid 4 1359872s 624566271s 623206400s raid # parted /dev/sdf unit s print Model: ATA WDC WD5003ABYX-1 (scsi) Disk /dev/sdf: 976773168s Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 2048s 67607s 65560s Dell_Utility diag 2 69632s 331776s 262145s EFI System Partition boot 3 333824s 1357824s 1024001s ext4 BOOTFS raid 4 1359872s 624566271s 623206400s raid
When we replicated the partition table, we also replicated the UUIDs of the partitions. We can rectify this with sgdisk's -G flag. New UUIDs will be generated for the partitions.
# sgdisk -G /dev/sdf The operation has completed successfully.
Copy partition 1 (Dell Utilities) from Drive 1 to Drive 2.
# dd if=/dev/sda1 of=/dev/sdf1 65560+0 records in 65560+0 records out 33566720 bytes (34 MB) copied, 0.971538 s, 34.6 MB/s
Copy partition 2 (EFI System Partition) from Drive 2 to Drive 1. It's good to do this while the filesystem is quiesed, so we unmount it.
# umount /boot/efi # dd if=/dev/sda2 of=/dev/sdf2 262145+0 records in 262145+0 records out 134218240 bytes (134 MB) copied, 2.25084 s, 59.6 MB/s # mount /boot/efi
Remove old entry for Drive 1 in UEFI Firmware. To ensure we remove the one for Drive 1, and because we changed the UUIDs above, we will search for Drive 2 (/dev/sda2), and remove the other entry.
# efibootmgr -v BootCurrent: 000A Timeout: 0 seconds BootOrder: 0006,000A,0001,0000,000B,000C,000D,000E,000F Boot0000* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0001* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0002* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,c0f13fae-e6f4-4a59-8445-f42d2f81bd83) Boot0003 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0004 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0005 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0006* CentOS HD(2,11000,40001,c0f13fae-e6f4-4a59-8445-f42d2f81bd83)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0008 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0009 Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot000A* CentOS HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1)File(\EFI\centos\shim.efi) Boot000B* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot000C* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot000D* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot000E* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot000F* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) # blkid /dev/sda2 /dev/sda2: SEC_TYPE="msdos" UUID="074B-E6E6" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="a7c56565-2676-441f-aaff-cfd5846bdcb1" # efibootmgr -B -b 6 BootCurrent: 000A Timeout: 0 seconds BootOrder: 0006,000A,0001,0000,000B,000C,000D,000E,000F Boot0000* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0001* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0002* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,c0f13fae-e6f4-4a59-8445-f42d2f81bd83) Boot0003 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0004 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0005 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0007* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0008 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0009 Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot000A* CentOS HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1)File(\EFI\centos\shim.efi) Boot000B* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot000C* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot000D* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot000E* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot000F* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0)
Add a new entry to the UEFI Firmware for Disk 1 since the UUID is different. By default, when a new entry is added, it is set first in the boot order. This is fine for now. It will need to be tested.
# efibootmgr -v -c -L CentOS-Pri -d /dev/sdf -p 2 -l '\EFI\centos\shim.efi' BootCurrent: 000A Timeout: 0 seconds BootOrder: 0006,000A,0001,0000,000B,000C,000D,000E,000F Boot0000* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0001* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0002* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,c0f13fae-e6f4-4a59-8445-f42d2f81bd83) Boot0003 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0004 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0005 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0007* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0008 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0009 Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot000A* CentOS HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1)File(\EFI\centos\shim.efi) Boot000B* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot000C* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot000D* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot000E* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot000F* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0006* CentOS-Pri HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792)File(\EFI\centos\shim.efi)
Add partition 3 back to /dev/md/boot RAID 1 array.
# mdadm /dev/md/boot --add /dev/sdf3 mdadm: added /dev/sdf3
Add partition 4 back to /dev/md/pv00 RAID 1 array.
# mdadm /dev/md/pv00 --add /dev/sdf4 mdadm: added /dev/sdf4
Keep an eye out for when the drives are synched up.
# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sdf4[3] sda4[2] 311472128 blocks super 1.2 [2/1] [_U] [==>..................] recovery = 14.0% (43714304/311472128) finish=40.1min speed=111221K/sec bitmap: 0/3 pages [0KB], 65536KB chunk md127 : active raid1 sdf3[2] sda3[3] 511680 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none>
Example of sync'd raid volumes
# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sdf4[3] sda4[2] 311472128 blocks super 1.2 [2/2] [UU] bitmap: 1/3 pages [4KB], 65536KB chunk md127 : active raid1 sdf3[2] sda3[3] 511680 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none>
Reboot the system to ensure booting from Disk 1 works as expected.
NOTE: DO NOT REBOOT UNLESS MDSTAT OUTPUT LOOKS SYNC'D AS ABOVE!!!
# reboot
Have a look at the UEFI Firmware configuration. Three things of interest here:
a) BootCurrent - This tells us we successfully booted from Disk 1.
b) Boot0006,Boot0010 - This tells us Dell UEFI firmware automatically discovered and added EFI entries for the primary path.
c) All these extra auto-discovered duplicates the Dell UEFI Firmware is crowding into the list.
# efibootmgr -v BootCurrent: 0006 Timeout: 0 seconds BootOrder: 0006,000A,0001,0000,000B,000C,000D,000E,000F,0010,0011,0012 Boot0000* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0001* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0002* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,c0f13fae-e6f4-4a59-8445-f42d2f81bd83) Boot0003 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0004 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0005 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0006* CentOS-Pri HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792)File(\EFI\centos\shim.efi) Boot0007* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0008 Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0009 Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot000A* CentOS HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1)File(\EFI\centos\shim.efi) Boot000B* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot000C* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot000D* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot000E* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot000F* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot0010* CentOS HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792)File(\EFI\centos\shim.efi) Boot0011* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792) Boot0012* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792)
Let's clear up a few items with the UEFI Firmware configuration:
a) Since it is reasonable to expect the Dell UEFI Firmware is automatically adding entries, it's safe to remove the manual entry we created and all the duplicate entries its been automatically adding (-q -B -b). If your UEFI Firmware is not doing this, keep the manual entry.
b) Set a proper boot order of preferred devices (-o).
# efibootmgr -q -B -b 0 # efibootmgr -q -B -b 2 # efibootmgr -q -B -b 3 # efibootmgr -q -B -b 4 # efibootmgr -q -B -b 5 # efibootmgr -q -B -b 6 # efibootmgr -q -B -b 7 # efibootmgr -q -B -b 8 # efibootmgr -q -B -b 9 # efibootmgr -q -B -b B # efibootmgr -q -B -b C # efibootmgr -q -B -b D # efibootmgr -q -B -b E # efibootmgr -q -B -b F # efibootmgr -q -B -b 11 # efibootmgr -q -B -b 12 # efibootmgr -v -o 0010,000A,0001 BootCurrent: 0006 Timeout: 0 seconds BootOrder: 0001,0010,000A Boot0001* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot000A* CentOS HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1)File(\EFI\centos\shim.efi) Boot0010* CentOS HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792)File(\EFI\centos\shim.efi)
Confirm all RAID devices are sync'd.
# cat /proc/mdstat Personalities : [raid1] md126 : active raid1 sdc3[3] sda3[2] 511680 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md127 : active raid1 sdc4[2] sda4[3] 311472128 blocks super 1.2 [2/2] [UU] bitmap: 0/3 pages [0KB], 65536KB chunk unused devices: <none>
Reboot the system.
# reboot
During Bootup you may see the following "error". This is most likely because the DVD is set as the primary boot device. If you don't like this, set it as tertiary.
error: failure reading sector 0x0 from 'hd0'. Press any key to continue...
Have a look at the UEFI Firmware configuration. Items to note: BootCurrent is from Drive 1 and Dell UEFI Firmware re-created entries for boot devices.
# efibootmgr -v BootCurrent: 0010 Timeout: 0 seconds BootOrder: 0001,0010,000A,0000,0002,0003,0004,0005,0006,0007 Boot0000* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792) Boot0001* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0002* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0005* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0006* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0007* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot000A* CentOS HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1)File(\EFI\centos\shim.efi) Boot0010* CentOS HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792)File(\EFI\centos\shim.efi)
Set system to boot from Disk 2 to confirm it still works.
# efibootmgr -v -n A BootNext: 000A BootCurrent: 0010 Timeout: 0 seconds BootOrder: 0001,0010,000A,0000,0002,0003,0004,0005,0006,0007 Boot0000* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792) Boot0001* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0002* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0005* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0006* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0007* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot000A* CentOS HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1)File(\EFI\centos\shim.efi) Boot0010* CentOS HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792)File(\EFI\centos\shim.efi)
Reboot the system.
# reboot
Have a look at the UEFI Firmware configuration. We see we successfully booted off Disk 2.
# efibootmgr -v BootCurrent: 000A Timeout: 0 seconds BootOrder: 0001,0010,000A,0000,0002,0003,0004,0005,0006,0007 Boot0000* EFI Fixed Disk Boot Device 1 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e2ab8967b3650050000000000000000012000100)HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792) Boot0001* DVDRAM SP60NB50 ACPI(a0841d0,0)PCI(12,2)USB(2,0)USB(0,0) Boot0002* EFI Fixed Disk Boot Device 2 ACPI(a0841d0,0)PCI(4,0)PCI(0,0)VenMsg(d487ddb4-008b-11d9-afdc-001083ffca4d,00000000e6ab8967b3650050000000000000000012020100)HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1) Boot0003* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,0)MAC(MAC(001018bfec10,0) Boot0004* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,1)MAC(MAC(001018bfec11,0) Boot0005* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,2)MAC(MAC(001018bfec12,0) Boot0006* Broadcom NetXtreme Gigabit Ethernet (BCM5719) ACPI(a0841d0,0)PCI(3,0)PCI(0,3)MAC(MAC(001018bfec13,0) Boot0007* Broadcom NetXtreme II Gigabit Ethernet (BCM5716C) ACPI(a0841d0,0)PCI(9,0)PCI(0,0)MAC(MAC(c45444da7f34,0) Boot000A* CentOS HD(2,11000,40001,a7c56565-2676-441f-aaff-cfd5846bdcb1)File(\EFI\centos\shim.efi) Boot0010* CentOS HD(2,11000,40001,14fe17c7-c46c-4a7a-b7ae-58e48dc27792)File(\EFI\centos\shim.efi)
Reset System Back
In case you feel you want to try again or you didn't like the procedure and wanted to try something else, this section is for putting things back like they were.
First boot the system from the Installation DVD. This will allow you to re-partition and wipe the drives without them being active.
Access the commandline as shown in section 2.2.2.
Assemble all RAID volumes so we can act on them appropriately.
# mdadm --assemble --scan mdadm: /dev/md/temp3.home.base:boot has been started with 2 drives.
Review all partitions and RAID configurations
# lsblk -i NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk |-sda1 8:1 0 32M 0 part |-sda2 8:2 0 128M 0 part |-sda3 8:3 0 500M 0 part | `-md126 9:126 0 499.7M 0 raid1 `-sda4 8:4 0 297.2G 0 part `-md127 9:127 0 297G 0 raid1 |-vg00-swap 253:3 0 6G 0 lvm |-vg00-home 253:4 0 1G 0 lvm |-vg00-opt 253:5 0 1G 0 lvm |-vg00-tmp 253:6 0 1G 0 lvm |-vg00-usr 253:7 0 3G 0 lvm `-vg00-var 253:8 0 2G 0 lvm sdb 8:16 0 931.5G 0 disk sdc 8:32 0 298.1G 0 disk |-sdc1 8:33 0 32M 0 part |-sdc2 8:34 0 128M 0 part |-sdc3 8:35 0 500M 0 part | `-md126 9:126 0 499.7M 0 raid1 `-sdc4 8:36 0 297.2G 0 part `-md127 9:127 0 297G 0 raid1 |-vg00-swap 253:3 0 6G 0 lvm |-vg00-home 253:4 0 1G 0 lvm |-vg00-opt 253:5 0 1G 0 lvm |-vg00-tmp 253:6 0 1G 0 lvm |-vg00-usr 253:7 0 3G 0 lvm `-vg00-var 253:8 0 2G 0 lvm sdd 8:48 0 931.5G 0 disk sde 8:64 0 931.5G 0 disk sdf 8:80 0 931.5G 0 disk sr0 11:0 1 636M 0 rom /run/install/repo loop0 7:0 0 274.8M 1 loop loop1 7:1 0 2G 1 loop |-live-rw 253:0 0 2G 0 dm / `-live-base 253:1 0 2G 1 dm loop2 7:2 0 512M 0 loop `-live-rw 253:0 0 2G 0 dm /
Remove RAID Volumes
Deactivate Volume Group vg00 so we can remove RAID volumes
# vgremove -f vg00 Logical volume "root" successfully removed Logical volume "swap" successfully removed Logical volume "home" successfully removed Logical volume "opt" successfully removed Logical volume "tmp" successfully removed Logical volume "usr" successfully removed Logical volume "var" successfully removed Volume group "vg00" successfully removed
Deactivate RAID devices
# mdadm --stop /dev/md127 mdadm: stopped /dev/md127 # mdadm --stop /dev/md126 mdadm: stopped /dev/md126
Remove RAID superblock on disk devices previously participating in RAID volume
# mdadm --zero-superblock /dev/sda3 # mdadm --zero-superblock /dev/sda4 # mdadm --zero-superblock /dev/sdc3 # mdadm --zero-superblock /dev/sdc4
Reset Partitions on Disk 1
Wipe Filesystem from /dev/sda2
# dd if=/dev/zero of=/dev/sda2 bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000296246 s, 1.7 MB/s
Wipe Filesystem from /dev/sda1
# dd if=/dev/zero of=/dev/sda1 bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000393731 s, 1.3 MB/s
Wipe entire partition table on /dev/sda
# dd if=/dev/zero of=/dev/sda bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000378895 s, 1.4 MB/s
Create original partition table on Disk 1
Place original disk identifier back on disk
NOTE: The disk identifier in this example is: 0xd4e911ee Replace with yours respectively.
# echo -e "x\ni\n0xd4e911ee\nw" | fdisk /dev/sda Welcome to fdisk (util-linux 2.23.2). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Device does not contain a recognized partition table Building a new DOS disklabel with disk identifier 0x472100cc. Command (m for help): Expert command (m for help): New disk identifier (current 0x472100cc): Disk identifier: 0xd4e911ee Expert command (m for help): The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks.
Place original partition table back on disk
# echo -e "20 65560 de -\n67584 4194304 c *" | sfdisk -u S /dev/sda Checking that no-one is using this disk right now ... OK Disk /dev/sda: 60801 cylinders, 255 heads, 63 sectors/track Old situation: Units: sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sda1 0 - 0 0 Empty /dev/sda2 0 - 0 0 Empty /dev/sda3 0 - 0 0 Empty /dev/sda4 0 - 0 0 Empty New situation: Units: sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sda1 20 65579 65560 de Dell Utility /dev/sda2 * 67584 4261887 4194304 c W95 FAT32 (LBA) /dev/sda3 0 - 0 0 Empty /dev/sda4 0 - 0 0 Empty Warning: partition 1 does not end at a cylinder boundary Warning: partition 2 does not start at a cylinder boundary Warning: partition 2 does not end at a cylinder boundary Successfully wrote the new partition table Re-reading the partition table ... If you created or changed a DOS partition, /dev/foo7, say, then use dd(1) to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1 (See fdisk(8).)
Add back Dell diagnostic utilities back to partition 1
# dd if=/dev/sdc1 of=/dev/sda1 65560+0 records in 65560+0 records out 33566720 bytes (34 MB) copied, 0.98346 s, 34.1 MB/s
Confirm partition 1 mounts appropriately and we see contents
# mkdir /tmpmnt # mount /dev/sda1 /tmpmnt # ls -l /tmpmnt total 114 -rwxr-xr-x. 1 root root 57389 Aug 13 2008 COMMAND.COM -r-xr-xr-x. 1 root root 23856 Aug 13 2008 DELLBIO.BIN -r-xr-xr-x. 1 root root 30978 Aug 13 2008 DELLRMK.BIN # umount /tmpmnt #
Add back the empty FAT32 filesystem back to partition 2
# mkfs.fat -F 32 /dev/sda2 mkfs.fat 3.0.20 (12 Jun 2013)
Confirm partition table layout looks like it did before we started work
# fdisk -l /dev/sda Disk /dev/sda: 500.1 GB, 500107862016 bytes, 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0xd4e911ee Device Boot Start End Blocks Id System /dev/sda1 20 65579 32780 de Dell Utility /dev/sda2 * 67584 4261887 2097152 c W95 FAT32 (LBA)
# parted /dev/sda unit s print Model: ATA WDC WD5003ABYX-1 (scsi) Disk /dev/sda: 976773168s Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 20s 65579s 65560s primary fat16 diag 2 67584s 4261887s 4194304s primary fat32 boot, lba
Reset Partitions on Disk 2
Wipe filesystem from /dev/sdc2
# dd if=/dev/zero of=/dev/sdc2 bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00052408 s, 977 kB/s
Wipe filesystem from /dev/sda1
# dd if=/dev/zero of=/dev/sdc1 bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.0312699 s, 16.4 kB/s
Wipe partition table from /dev/sdc
# dd if=/dev/zero of=/dev/sdc bs=512 count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.000776999 s, 659 kB/s
Remove all EFI entries from UEFI Firmware
Removing the EFI entries from the UEFI firmware can be achieved by the following three methods:
- Physically moving a designated jumper on the system board of your server to clear the NVRAM of your firmware. Follow your vendor-supplied documentation. I believe there may be a bug with efibootmgr or my Dell UEFI Firmware in which it does not release memory as entries are removed. This leads to a "No space left on device" error when modifying the UEFI Firmware from the OS. A google search for "efibootmgr enospc" yields plenty of other systems where this appears. My advice is that if you are running through this procedure as much as I am, you use this option before making each run. Example strace output:
- Accessing the UEFI firmware interface during POST of the system
- Removing the entries within Linux using efibootmgr
access("/sys/firmware/efi/efivars/BootNext-8be4df61-93ca-11d2-aa0d-00e098032b8c", F_OK) = 0 open("/sys/firmware/efi/efivars/BootNext-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_WRONLY|O_CREAT, 0600) = 3 write(3, "\7\0\0\0\t\0", 6) = -1 ENOSPC (No space left on device) close(3) = 0
# efibootmgr -v -B -b 1 # efibootmgr -v -B -b 2 # efibootmgr -v -B -b 3 # efibootmgr -v -B -b 4 # efibootmgr -v -B -b 5 ...etc # efibootmgr -v BootCurrent: 0001 Timeout: 0 seconds #