Saturday, January 26, 2013

How to Configure Oracle Enterprise Linux to be Highly Available Using RAID1


This was published by Oracle, and I'm very sorry for re-publishing it, but the author made some mistakes. I had lots of 'fun' with grub & linux rescue when I followed the original instructions.

I've cleaned it up, made some relevant changes and tested it with OEL 5 for my dear blog readers.


In this Document
  Goal
  Solution
     1. Original System Configuration
     2. Add Second Hard Disk
     3. Partition Second Hard Disk
     4. Modify Secondary Disk Partitions to Type RAID
     5. Create RAID1 Arrays on Second Disk
     6. Taking a Closer Look at RAID Devices
     7. RAID Configuration
     8. Create Filesystems/Swap Devices on RAID devices
     9. Backup Current System Configuration
     10. Mount Filesystems on RAID Devices
     11. Optionally mount/swapon filesystems/swap device on RAID devices as their non-RAID devices
     12. Modify fstab to Use RAID Devices
     13. Add Failback Title to grub.conf
     14. Remake Initial RAM Disk (One of Two)
     15. Copy Contents of Non-RAID filesystems to RAID filesystems
     16. Install/Reintall GRUB
     17. Reboot the System (Degraded Array)
     18. Modify Primary Disk Partitions to Type RAID
     19. Add Primary Disk Partitions to RAID Arrays
     20. Modify grub.conf
     21. Remake Initial RAM Disk (Two of Two)
     22. Testing
     22.1 Test - persistent mount on degraded array (/dev/sdb software failed))
     22.2 Test - boot into degraded array (/dev/sdb software removed)
     22.3 Test - boot into degraded array (/dev/sdb software removed)
     22.4 Test - boot into degraded array (/dev/sda physically removed)
 
Applies to:

Linux Kernel - Version: 2.4 to 2.6
Linux Itanium
Linux x86-64

Goal


This article describes how to configure an Oracle Enterprise Linux System to be highly available using RAID1.

The article is intended for Linux System Administrators.

Solution


RAID (Redundant Array of Inexpensive Disks) defines the use of multiple hard disks by systems to provide increased diskspace, performance and availability. This article solely focuses on implementing RAID1, commonly referred to as mirror disk, whereby two (or more) disks contain identical content. System availability and data integrity is maintained as long as at least one disk survives a failure.

Although using working examples from Oracle Enterprise Linux 5 (OEL5), the article similarly applies to other Linux distributions and versions.

Before proceeding, take a complete backup of the system.


1. Original System Configuration


This document assumes that LVM is not used for storage management. When I started to follow this document, I did a default installation of OEL. Unfortunately, by default, it used LVM for volume management, so I couldn't use mdadm for RAID. So I re-installed the box and configured the storage as plain - no LVM, then I could follow these instructions.

I assume your storage is configured similarly as shown here:



And that GRUB is installed:



Prior to implementing RAID, the system comprised the following, simple configuration:


[root@oel5-raid1-3 ~]# uname -a
Linux oel5-raid1-3 2.6.32-300.10.1.el5uek #1 SMP Wed Feb 22 17:37:40 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

[root@oel5-raid1-3 ~]# cat /etc/enterprise-release
Enterprise Linux Enterprise Linux Server release 5.8 (Carthage)

[root@oel5-raid1-3 ~]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        2479    19808145   83  Linux
/dev/sda3            2480        2610     1052257+  82  Linux swap / Solaris



[root@oel5-raid1-3 ~]# blkid
/dev/sda3: LABEL="SWAP-sda3" TYPE="swap"
/dev/sda2: LABEL="/" UUID="aedde157-1fe3-45e3-b538-8dc1193b0430" TYPE="ext3"
/dev/sda1: LABEL="/boot" UUID="e4860f1d-d717-4efd-b91e-af4d8eb05421" TYPE="ext3"



[root@oel5-raid1-3 ~]# cat /etc/fstab
LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=SWAP-sda3         swap                    swap    defaults        0 0

[root@oel5-raid1-3 ~]# mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)



[root@oel5-raid1-3 ~]# swapon -s
Filename                                Type            Size Used Priority
/dev/sda3                               partition       1052248 0       -1



2. Add Second Hard Disk


A second hard disk is added to the system. Ideally, the second disk should be exactly the same (make and model) as the first. To help avoid single point of failure, add additional disks to a separate disk controller than that used by the first disk.

Our second hard disk is /dev/sdb.


[root@oel5-raid1-3 ~]# fdisk -l /dev/sdb

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table


3. Partition Second Hard Disk


The second disk must contain the same configuration (partition layout) as that of the first disk. Disk partitioning can be performed manully using the fdisk(8) utility, however the the sfdisk(8) utility can be used to quickly and easily replicate the partition table from the first disk e.g.:


[root@oel5-raid1-3 ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 2610 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
 /dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63    208844     208782  83  Linux
/dev/sdb2        208845  39825134   39616290  83  Linux
/dev/sdb3      39825135  41929649    2104515  82  Linux swap / Solaris
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)


4. Modify Secondary Disk Partitions to Type RAID


Use the fdisk(8) utility to modify the second disk partitions from type 83/82 (linux/swap) to fd (raid) e.g.:


[root@oel5-raid1-3 ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   83  Linux
/dev/sdb2              14        2479    19808145   83  Linux
/dev/sdb3            2480        2610     1052257+  82  Linux swap / Solaris

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdb2              14        2479    19808145   fd  Linux raid autodetect
/dev/sdb3            2480        2610     1052257+  fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.



Use the partprobe(8) or sfdisk(8) utility to update the kernel with the partition type changes e.g.:

[root@oel5-raid1-3 ~]# partprobe /dev/sdb

Verify creation of the new partitions on the second disk e.g.:


[root@oel5-raid1-3 ~]# cat /proc/partitions
major minor  #blocks  name

   8        0   20971520 sda
   8        1     104391 sda1
   8        2   19808145 sda2
   8        3    1052257 sda3
   8       16   20971520 sdb
   8       17     104391 sdb1
   8       18   19808145 sdb2
   8       19    1052257 sdb3



5. Create RAID1 Arrays on Second Disk


Use the mdadm(8) utility to create a raid1 array on the second disk only e.g.:


[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities :
unused devices: <none>



[root@oel5-raid1-3 ~]# mdadm --create /dev/md1 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb1 missing
mdadm: array /dev/md1 started.
[root@oel5-raid1-3 ~]# mdadm --create /dev/md2 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb2 missing
mdadm: array /dev/md2 started.
[root@oel5-raid1-3 ~]# mdadm --create /dev/md3 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb3 missing
mdadm: array /dev/md3 started.
[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[0]
      1052160 blocks [2/1] [U_]

md2 : active raid1 sdb2[0]
      19808064 blocks [2/1] [U_]

md1 : active raid1 sdb1[0]
      104320 blocks [2/1] [U_]

unused devices: <none>



In the example above, raid devices are created using the same numbering as the device partitions they include e.g. /dev/md1 contains /dev/sdb1 and device /dev/sda1 will be added later. The term missing is used as a stub or placeholder that will eventually be replaced with the partitions on the first disk; /dev/sda1, /dev/sda2, /dev/sda3.

Check /proc/mdstat to verify successful creation of the raid devices e.g.:


[root@oel5-raid1-3 ~]# cat /proc/partitions
major minor  #blocks  name

   8        0   20971520 sda
   8        1     104391 sda1
   8        2   19808145 sda2
   8        3    1052257 sda3
   8       16   20971520 sdb
   8       17     104391 sdb1
   8       18   19808145 sdb2
   8       19    1052257 sdb3
    9        1     104320 md1
   9        2   19808064 md2
   9        3    1052160 md3


6. Taking a Closer Look at RAID Devices


Use the mdadm(8) utility to review raid devices in detail e.g.:


[root@oel5-raid1-3 ~]# mdadm --query --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Fri Jan 25 23:27:05 2013
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
  Used Dev Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Jan 25 23:27:05 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 98d72fb6:ff5130f8:647d100f:9acf7e1c
         Events : 0.1

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       0        0        1      removed



Note that raid device /dev/md1 solely contains one disk member /dev/sdb1 at this point. The state of the array is clean,degraded denoting only one (of two) underlying disk members is currently active and working.

7. RAID Configuration


This step I skipped, but I leave the instructions for reference.

Strictly speaking a master RAID configuration is not required. With relevant partitions being marked as type raid (fd), the kernel will auto assemble detected arrays on boot. If desired, one can create RAID configuration file /etc/mdadm.conf or /etc/mdadm/mdadm.conf as a reference to raid device usage e.g.:

# mkdir  /etc/mdadm/
# echo "DEVICE /dev/hd*[0-9] /dev/sd*[0-9]" >> /etc/mdadm/mdadm.conf
# mdadm --detail --scan >> /etc/mdadm/mdadm.conf
# ln -s /etc/mdadm/mdadm.conf /etc/mdadm.conf

# cat /etc/mdadm/mdadm.conf 
DEVICE /dev/hd*[0-9] /dev/sd*[0-9]
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=a4d5007d:6974901a:637e5622:e5b514c9
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=0e8ce9c6:bd42917d:fd3412bf:01f49095
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=7d696890:890b2eb7:c17bf4e4:d542ba99

The DEVICE filter is used to limit candidate RAID devices being created or added as RAID disk members.

8. Create Filesystems/Swap Devices on RAID devices


Once Created, RAID devices are usable just like any other device. Use the mkfs.ext3(8) or mke2fs(8) and mkswap(8) commands to create EXT3 filesystems and swap device on RAID devices e.g.:


[root@oel5-raid1-3 ~]# mkfs.ext3 -L boot.md1 /dev/md1
mke2fs 1.39 (29-May-2006)
Filesystem label=boot.md1
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
26104 inodes, 104320 blocks
5216 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
13 block groups
8192 blocks per group, 8192 fragments per group
2008 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729

Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 25 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

[root@oel5-raid1-3 ~]# mkfs.ext3 -L root.md2 /dev/md2
mke2fs 1.39 (29-May-2006)
Filesystem label=root.md2
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
2480640 inodes, 4952016 blocks
247600 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
152 block groups
32768 blocks per group, 32768 fragments per group
16320 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

[root@oel5-raid1-3 ~]# mkswap -L swap.md3 /dev/md3
Setting up swapspace version 1, size = 1077407 kB
LABEL=swap.md3, no uuid

[root@oel5-raid1-3 ~]# blkid
/dev/sda3: LABEL="SWAP-sda3" TYPE="swap"
/dev/sda2: LABEL="/" UUID="aedde157-1fe3-45e3-b538-8dc1193b0430" TYPE="ext3"
/dev/sda1: LABEL="/boot" UUID="e4860f1d-d717-4efd-b91e-af4d8eb05421" TYPE="ext3"
/dev/sdb1: LABEL="boot.md1" UUID="56413220-d015-4f29-892d-2e4892fa355e" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb2: LABEL="root.md2" UUID="ee39c18d-3be3-4047-9dfb-55d8b49f70d4" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb3: TYPE="swap" LABEL="swap.md3"
/dev/md1: LABEL="boot.md1" UUID="56413220-d015-4f29-892d-2e4892fa355e" SEC_TYPE="ext2" TYPE="ext3"
/dev/md2: LABEL="root.md2" UUID="ee39c18d-3be3-4047-9dfb-55d8b49f70d4" SEC_TYPE="ext2" TYPE="ext3"
/dev/md3: TYPE="swap" LABEL="swap.md3"


To avoid confusion later, labels added to filesystems and swap device denote the RAID device on which they are created.


9. Backup Current System Configuration


Beyond this point, significant changes are made to the current system, therefore take a backup of the core system configuration e.g.:


[root@oel5-raid1-3 ~]# cp /etc/fstab /etc/fstab.orig
[root@oel5-raid1-3 ~]# cp /boot/grub/grub.conf /boot/grub/grub.conf.orig
[root@oel5-raid1-3 ~]# mkdir /boot.orig
[root@oel5-raid1-3 ~]# sync
[root@oel5-raid1-3 ~]# cp -dpRxu /boot/* /boot.orig/



10. Mount Filesystems on RAID Devices


Mount the raided filesystems e.g.


[root@oel5-raid1-3 ~]# mkdir /boot.md1
[root@oel5-raid1-3 ~]# mount -t ext3 /dev/md1 /boot.md1
[root@oel5-raid1-3 ~]# mount | grep boot
/dev/sda1 on /boot type ext3 (rw)
/dev/md1 on /boot.md1 type ext3 (rw)
[root@oel5-raid1-3 ~]# mkdir /root.md2
[root@oel5-raid1-3 ~]# mount -t ext3 /dev/md2 /root.md2



11. Optionally mount/swapon filesystems/swap device on RAID devices as their non-RAID devices


Optionally test mount/swapon of filesystems on raided devices as their currently mounted non-raided counterparts e.g.:


[root@oel5-raid1-3 ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/sda3                               partition       1052248 0       -1
[root@oel5-raid1-3 ~]# swapoff /dev/sda3
[root@oel5-raid1-3 ~]# swapon /dev/md3
[root@oel5-raid1-3 ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -1



Note: it is not possible to unmount/remount the root filesystem (/dev/sda2) as it's currently in use.

12. Modify fstab to Use RAID Devices


Modify the /etc/fstab file to mount/swapon raided devices on system boot.
Substitute relevant LABEL=  or /dev/sdaN entries with their corresponding /dev/mdN devices e.g.:


[root@oel5-raid1-3 ~]# vi /etc/fstab
[root@oel5-raid1-3 ~]# cat /etc/fstab
/dev/md2                /                       ext3    defaults        1 1
/dev/md1                /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/md3                swap                    swap    defaults        0 0


13. Add Fallback Title to grub.conf


A failback title allows the system to boot the system using one title and fallback to another should any issues occur when booting with the first. This is particularly helpful in that without a failback title, the system may fail to boot and a linux rescue may be needed to restore/recover the system.

Original /boot/grub/grub.conf:

[root@oel5-raid1-3 ~]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=LABEL=/ rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img
title Oracle Linux Server-base (2.6.18-308.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-308.el5 ro root=LABEL=/ rhgb quiet numa=off
        initrd /initrd-2.6.18-308.el5.img

Modify the original /boot/grub/grub.conf file by adding the failback parameter and a failback grub boot title e.g.:


# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
fallback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd1,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/md2 rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=LABEL=/ rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img



In the example above, the system is configured to boot using the first boot title (default=0) i.e. the one with /boot on the first partition of the second grub disk device (hd1,0) and specifying the root filesystem on raid device /dev/md2. Should that fail to boot, the system will failback (failback=1) to boot from the second boot title i.e. the one specifying the /boot filesystem on the first partition of the first grub device (hd0,0) and and specifying the root filesystem with label /1. Note that grub boot title numbering starts from zero (0).

14. Remake Initial RAM Disk (One of Two)


Use the mkinitrd(4) utility to recreate the initial ram disk. The initial ram disk must be rebuilt with raid  module support to ensure the system has the required drivers to boot from raided devices e.g.:

# mv initrd-`uname -r`.img initrd-`uname -r`.img.orig
# mkinitrd -v -f initrd-2.6.18-92.el5.img `uname -r`

These commands are based on these two above. The exact commands depends on your kernel version.



[root@oel5-raid1-3 ~]# cd /boot
[root@oel5-raid1-3 boot]# mv initrd-2.6.32-300.10.1.el5uek.img initrd-2.6.32-300.10.1.el5uek.img.orig
[root@oel5-raid1-3 boot]# uname -r
2.6.32-300.10.1.el5uek
[root@oel5-raid1-3 boot]# mkinitrd -v -f initrd-2.6.32-300.10.1.el5uek.img 2.6.32-300.10.1.el5uek
Creating initramfs
Modulefile is /etc/modprobe.conf
Looking for deps of module ehci-hcd
Looking for deps of module ohci-hcd
Looking for deps of module uhci-hcd
Looking for deps of module ext3
Found RAID component md2
Looking for deps of module raid1
Looking for driver for device sdb2
Looking for deps of module scsi:t-0x00
Looking for deps of module pci:v00001000d00000030sv000015ADsd00001976bc01sc00i00: scsi_transport_spi mptbase mptscsih mptspi
Looking for deps of module scsi_transport_spi
Looking for deps of module mptbase
Looking for deps of module mptscsih: mptbase
Looking for deps of module mptspi: scsi_transport_spi mptbase mptscsih
Found RAID component md3
...



[root@oel5-raid1-3 boot]# ll initrd*
-rw------- 1 root root 3558458 Jan 25 15:25 initrd-2.6.18-308.el5.img
-rw------- 1 root root 3067614 Jan 26 00:16 initrd-2.6.32-300.10.1.el5uek.img
-rw------- 1 root root 3144291 Jan 25 15:24 initrd-2.6.32-300.10.1.el5uek.img.orig



Note: another mkinitrd will be required again later after /dev/sdaN partitions are added to the arrays.

15. Copy Contents of Non-RAID filesystems to RAID filesystems


If the raided filesystems were unmounted earlier, remount them as described in Step 10.
Copy the contents of non-raided filesystems (/boot on /dev/sda1, / on /dev/sda2) to their corresponding filesystems on raided devices (/boot.md1 on /dev/md1, /root.md2 on /dev/md2) e.g.:


[root@oel5-raid1-3 boot]# sync
[root@oel5-raid1-3 boot]# cp -dpRxu /boot/* /boot.md1
[root@oel5-raid1-3 boot]# sync
[root@oel5-raid1-3 boot]# cp -dpRxu / /root.md2



Note: there is no need to copy the contents of the swap device. The non-raided swap device (/dev/sda3) will be swapped-off on system shutdown and the raided swap device (/dev/md3) swapped-on on reboot.

16. Install/Reintall GRUB


To cater for the situation where one or other raid disk member is either unavailable, unusable or missing, GRUB [Grand Unified Boot Loader] must be installed to the boot sector (MBR) of every raid disk member participating in an array i.e. /dev/sda, /dev/sdb. Use the grub(8) utility to install grub on the second grub disk (hd1) [/dev/sdb], currently the sole raid disk member e.g.:


[root@oel5-raid1-3 boot]# grub
Probing devices to guess BIOS drives. This may take a long time.


    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub> root (hd1,0)
root (hd1,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd1)
setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd1)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> quit
quit

The instructions above install Grub on both disks - this is important.


The reference to (hd0,0) in /boot/grub/grub.conf is a grub disk reference that refers to disk 1 partition 1, which in this instance is /dev/sda1, the partition that houses the non-raided /boot filesystem. Grub always references disks as (hdN) regardless whether of type IDE or SCSI. At installation time, grub builds and stores a map of disk devices in file /boot/grub/device.map. As the system was initially installed with only one disk present (/dev/sda), the contents of /boot/grub/device.map appears as follows:

# cat /boot/grub/device.map
# this device map was generated by anaconda
(hd0)     /dev/sda

Had the /boot filesystem been installed on /dev/sda3 ,say, grub references in /boot/grub/grub.conf would have been (hd0,2). Grub disk and partition numbering starts from zero (0), whereas partition table disk entries start with 'a' e.g. /dev/hda (IDE), /dev/sda (SCSI) and partiton table numbering starts from 1 e.g. /dev/hda1, /dev/sda1.

If there is any confusion regarding grub discovered devices, grub itself may be used to detect or list available devices e.g.:
# grub
Probing devices to guess BIOS drives. This may take a long time.

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]

grub> root (hd<tab key>
Possible disks are: hd0 hd1

17. Reboot the System (Degraded Array)


Reboot the system. As a precaution, be sure to have your operating system installation/rescue media on hand. During boot up, review console messages to determine which device is used to boot the system i.e. /dev/md1 {/dev/sdb} or failback device /dev/sda1. All going well, the system will be using the raid devices albeit in a degraded state i.e. all arrays still only contain one disk member (/dev/sdbN).


[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[0]
      104320 blocks [2/1] [U_]

md3 : active raid1 sdb3[0]
      1052160 blocks [2/1] [U_]

md2 : active raid1 sdb2[0]
      19808064 blocks [2/1] [U_]

unused devices: <none>
[root@oel5-raid1-3 ~]# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)
[root@oel5-raid1-3 ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -1


Further verify that no /dev/sdaN partitions are used e.g.:


[root@oel5-raid1-3 ~]# mount | grep sda
[root@oel5-raid1-3 ~]#


If you did not add a failback title as described in step 13 and experienced booting issues, perform a linux rescue to restore/recover the system.

18. Modify Primary Disk Partitions to Type RAID


In preparation for adding /dev/sdaN partitions to their respective arrays, use the fdisk(8) utility to modify the primary disk partitions from type 83/82 (linux/swap) to fd (raid) e.g.:


[root@oel5-raid1-3 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        2479    19808145   83  Linux
/dev/sda3            2480        2610     1052257+  82  Linux swap / Solaris

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   fd  Linux raid autodetect
/dev/sda2              14        2479    19808145   fd  Linux raid autodetect
/dev/sda3            2480        2610     1052257+  fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.




[root@oel5-raid1-3 ~]# fdisk -l /dev/sda

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   fd  Linux raid autodetect
/dev/sda2              14        2479    19808145   fd  Linux raid autodetect
/dev/sda3            2480        2610     1052257+  fd  Linux raid autodetect


Use the partprobe(8) or sfdisk(8) utility to update the kernel with the partition type changes e.g.:

[root@oel5-raid1-3 ~]# partprobe /dev/sda

19. Add Primary Disk Partitions to RAID Arrays


Once the system has successfully booted using raid (i.e. the secondary disk), use the mdadm(8) utility to add the primary disk partitions to their respective arrays. All data on /dev/sdaN partitions will be destroyed in the process.


[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md2 /dev/sda2
mdadm: added /dev/sda2
[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md1 /dev/sda1
mdadm: added /dev/sda1
[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md3 /dev/sda3
mdadm: added /dev/sda3



[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[2] sdb1[0]
      104320 blocks [2/1] [U_]
        resync=DELAYED

md3 : active raid1 sda3[2] sdb3[0]
      1052160 blocks [2/1] [U_]
        resync=DELAYED

md2 : active raid1 sda2[2] sdb2[0]
      19808064 blocks [2/1] [U_]
      [=======>.............]  recovery = 35.7% (7073472/19808064) finish=1.1min speed=175929K/sec

unused devices: <none>



Depending on the size of partitions/disks used, data synchronisation between raid disk members may take a long time. Use the watch(1) command to monitor disk synchronisation progress e.g.:


[root@oel5-raid1-3 ~]# watch -n 5 cat /proc/mdstat


...

Once complete, /proc/mdstat should denote clean and consistent arrays each with two active, working members e.g.:


[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1] sdb1[0]
      104320 blocks [2/2] [UU]

md3 : active raid1 sda3[1] sdb3[0]
      1052160 blocks [2/2] [UU]

md2 : active raid1 sda2[1] sdb2[0]
      19808064 blocks [2/2] [UU]

unused devices: <none>


20. Modify grub.conf


Once all /dev/sdaN partitions are added as disk members of their respective arrays, modify the /boot/grub/grub.conf file. Substitiute the previous reference to LABEL=/ in the second boot title with raid device /dev/md2 e.g.:


[root@oel5-raid1-3 ~]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
fallback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd1,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/md2 rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/md2 rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img



21. Remake Initial RAM Disk (Two of Two)


Use the mkinitrd(4) utility to recreate the initial ram disk (again) e.g.:


[root@oel5-raid1-3 ~]# cd /boot
[root@oel5-raid1-3 boot]# mv initrd-2.6.32-300.10.1.el5uek.img initrd-2.6.32-300.10.1.el5uek.img.orig
[root@oel5-raid1-3 boot]# uname -r
2.6.32-300.10.1.el5uek
[root@oel5-raid1-3 boot]# mkinitrd -v -f initrd-2.6.32-300.10.1.el5uek.img 2.6.32-300.10.1.el5uek
Creating initramfs
Modulefile is /etc/modprobe.conf
Looking for deps of module ehci-hcd
Looking for deps of module ohci-hcd
Looking for deps of module uhci-hcd
Looking for deps of module ext3
Found RAID component md2
Looking for deps of module raid1
Looking for driver for device sdb2
Looking for deps of module scsi:t-0x00
Looking for deps of module pci:v00001000d00000030sv000015ADsd00001976bc01sc00i00: scsi_transport_spi mptbase mptscsih mptspi
Looking for deps of module scsi_transport_spi
Looking for deps of module mptbase
Looking for deps of module mptscsih: mptbase
Looking for deps of module mptspi: scsi_transport_spi mptbase mptscsih
Found RAID component md3
...


At this point, the system is now up and running using raid1 devices for /, /boot filesystems and swap device.

22. Testing


This section I did not test exactly as described - YMMV.

Before relying on the newly configured system, test the system for proper operation and increased availablility.

Suggested testing includes:
boot from alternate boot title (clean array)
persistent mount on degraded array (/dev/sdb software failed)
boot into degraded array (/dev/sdb software removed)
boot into degraded array (/dev/sda physically removed)
Primary diagnostics to monitor during testing include:
console messages
dmesg
/proc/mdstat
mdadm --query --detail <md dev>


22.1 Test - persistent mount on degraded array (/dev/sdb software failed))


As part of configuring the system to use raid, you will have already tested booting the system from the second disk i.e. /dev/md1 {/dev/sdb1}. For this test, modify the /boot/grub/grub.conf to boot the system using the first disk, /dev/md1 {/dev/sda1} i.e.:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=1
failback=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Enterprise Linux (2.6.18-92.el5)
        root (hd1,0)
        kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2 3
        initrd /initrd-2.6.18-92.el5.img
title Enterprise Linux (2.6.18-92.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2 3
        initrd /initrd-2.6.18-92.el5.img

Note the changes to the default and failback parameter values.


22.2 Test - boot into degraded array (/dev/sdb software removed)


Verify that the /, /boot filesystems and swap device remain active, usable and writable after failing the second disk member of each raid array e.g.:
# mdadm --manage --fail /dev/md1 /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
# mdadm --manage --fail /dev/md2 /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
# mdadm --manage --fail /dev/md3 /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md3

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[2](F) sda1[1]
      1052160 blocks [2/1] [_U]
     
md3 : active raid1 sdb3[2](F) sda3[1]
      1052160 blocks [2/1] [_U]
     
md2 : active raid1 sdb2[2](F) sda2[1]
      5116608 blocks [2/1] [_U]
     
unused devices: <none>

# mdadm --query --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Tue Dec 30 21:46:44 2008
     Raid Level : raid1
     Array Size : 1052160 (1027.67 MiB 1077.41 MB)
  Used Dev Size : 1052160 (1027.67 MiB 1077.41 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Wed Dec 31 08:57:33 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

           UUID : a4d5007d:6974901a:637e5622:e5b514c9
         Events : 0.70

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8        1        1      active sync   /dev/sda1
       2       8       17        -      faulty spare   /dev/sdb1

# dmesg
...
raid1: Disk failure on sdb1, disabling device.
        Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sdb1
 disk 1, wo:0, o:1, dev:sda1
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sda1
raid1: Disk failure on sdb2, disabling device.
        Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sdb2
 disk 1, wo:0, o:1, dev:sda2
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sda2
raid1: Disk failure on sdb3, disabling device.
        Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sdb3
 disk 1, wo:0, o:1, dev:sda3
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sda3

# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)

# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -1


22.3 Test - boot into degraded array (/dev/sdb software removed)


Having software failed the second disk raid member (/dev/sdb), software remove the second disk then test successful system boot e.g.:

# mdadm --manage --remove /dev/md1 /dev/sdb1
mdadm: hot removed /dev/sdb1
# mdadm --manage --remove /dev/md2 /dev/sdb2
mdadm: hot removed /dev/sdb2
# mdadm --manage --remove /dev/md3 /dev/sdb3
mdadm: hot removed /dev/sdb3

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1]
      1052160 blocks [2/1] [_U]
     
md3 : active raid1 sda3[1]
      1052160 blocks [2/1] [_U]
     
md2 : active raid1 sda2[1]
      5116608 blocks [2/1] [_U]
     
unused devices: <none>

# mdadm --query --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Tue Dec 30 21:46:44 2008
     Raid Level : raid1
     Array Size : 1052160 (1027.67 MiB 1077.41 MB)
  Used Dev Size : 1052160 (1027.67 MiB 1077.41 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Wed Dec 31 09:06:21 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : a4d5007d:6974901a:637e5622:e5b514c9
         Events : 0.72

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8        1        1      active sync   /dev/sda1

# dmesg
...
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdb2>
md: export_rdev(sdb2)
md: unbind<sdb3>
md: export_rdev(sdb3)

# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)

# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -1

# shutdown -r now
...

On reboot, add the failed/removed second disk back to the array e.g.:
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1]
      1052160 blocks [2/1] [_U]
     
md3 : active raid1 sda3[1]
      1052160 blocks [2/1] [_U]
     
md2 : active raid1 sda2[1]
      5116608 blocks [2/1] [_U]
     
unused devices: <none>

# mdadm --manage --add /dev/md1 /dev/sdb1
mdadm: re-added /dev/sdb1
# mdadm --manage --add /dev/md2 /dev/sdb2
mdadm: re-added /dev/sdb2
# mdadm --manage --add /dev/md3 /dev/sdb3
mdadm: re-added /dev/sdb3

# dmesg
...
md: bind<sdb1>
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:1, dev:sdb1
 disk 1, wo:0, o:1, dev:sda1
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 1052160 blocks.
md: bind<sdb2>
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:1, dev:sdb2
 disk 1, wo:0, o:1, dev:sda2
md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)
md: bind<sdb3>
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:1, dev:sdb3
 disk 1, wo:0, o:1, dev:sda3
md: delaying resync of md3 until md1 has finished resync (they share one or more physical units)
md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)

# watch -n 15 cat /proc/mdstat


22.4 Test - boot into degraded array (/dev/sda physically removed)


Similar to tests 22.2 and 22.3, test for ongoing system operation then system boot after physical removal of one or other (or both) raid disk member disks e.g.:
# mdadm --manage --fail /dev/md1 /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md1
# mdadm --manage --fail /dev/md2 /dev/sda2
mdadm: set /dev/sda2 faulty in /dev/md2
# mdadm --manage --fail /dev/md3 /dev/sda3
mdadm: set /dev/sda3 faulty in /dev/md3

# mdadm --manage --remove /dev/md1 /dev/sda1
mdadm: hot removed /dev/sda1
# mdadm --manage --remove /dev/md2 /dev/sda2
mdadm: hot removed /dev/sda2
# mdadm --manage --remove /dev/md3 /dev/sda3
mdadm: hot removed /dev/sda3


Follow Note.603868.1 to dynamically remove (hot unplug) disk devices from the system. Alternatively, shutdown the system and physically remove device /dev/sda from the system before rebooting. On boot, dynamically add the /dev/sda back as a raid disk member, then repeat the same test but physically remove second disk member /dev/sdb. This test not only validates the failback boot title, but also emulates online replacement of a failed hard disk.

Once satisifed, deploy the system for production use.


No comments:

Post a Comment