Friday, July 26, 2013

TCP.VALIDNODE_CHECKING

Well, this is embarassing :)

I unwittingly learnt how these parameters work:

In my SQLNET.ORA, I had

TCP.VALIDNODE_CHECKING=YES
TCP.EXCLUDED_NODES=SQLSERVER1

Looks like this file gets read by the LISTENER and NOT the database! Argh!

When I restarted the listener, the customer's SQL Server host was blocked from accessing any Oracle databases on my db host. Had to troubleshoot using listener tracing, and figured out that the listener was the culprit.

So, I simply removed those lines and restarted the listener - problem gone away.

Note that the error message on the client-side is misleading, since it says that the service is not found:

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Off course, this is totally misleading.

Wednesday, July 24, 2013

SQL *Net more data to client

I was performance tuning a customer's production database, so naturally I ran AWR Reports.

Whenever I ran AWR reports I kept getting the event "SQL *Net more data to client" as the second highest wait event (after CPU time). When I googled it, I came across this post on AskTom:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:951335700013

In it, SDU (Session Data Unit) and MTU (Maximum Transmission Unit) are mentioned. What's the relationship between SDU and MTU ? 

If you look through the articles on Google, it would seem that they've all just plagiarised each other, with the wrong information. They all incorrectly say SDU should be a multiple of MTU. This lone post on the same AskTom article has the correct info:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:951335700013#67529392482665

The author says that SDU should be a multiple of MSS (Maximum Segment Size of the network protocol in use), and NOT the MTU.

I had to search for a more authoritative source.

I looked towards Oracle Support as the holy grail of Oracle knowledge. Here I found Oracle Support document ID 274483.1 The relationship between MTU (Maximum Transmission Unit) and SDU (Session Data Unit) . It says 

"The principle is that the SDU value be a multiple of the MTU." 

Now, this may the source of all the wrong information on the internet.

This document attempts to summarize another doc, SQL Net Packet Sizes (SDU & TDU Parameters) Doc ID 44694.1. This second document says:

"...set the SDU size as a multiple of the MSS."
The reason why MSS is used and not MTU, is that the MTU includes two headers, which reduces the amount of data which Oracle NS (Network Substrate) can transmit per TCP packet. Only MSS determines how much data Oracle can transmit via the lower network protocols.

To calculate the MSS:

MSS = MTU - TCP header size - IP header size

For bog-standard TCP over Ethernet:

MTU = 1500 bytes 
TCP = 20 bytes
IP = 20 bytes

Thus, the MSS for TCP/IP over Ethernet is 1460.

I confirmed that the customer's network has an MTU of 1500.

Given that for Oracle 10g the maximum for SDU size is 32767, what's the optimal SDU with an MSS of 1460? I've done the simple math, and it's 32120. 

That's what we use for the SDU size for a simple Ethernet network.

The results are quite impressive:

Before:

Top 5 Timed Events                                         Avg % Total
~~~~~~~~~~~~~~~~~~                               wait     Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time                                          1,203               70.5
db file sequential read             140,179            266      2   15.6   User I/O
SQL*Net more data to client   2,669,360       153      0    8.9    Network
control file parallel write           3,755               115     31   6.7   System I/O
db file scattered read               44,471             101      2    5.9   User I/O


After:

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time                                            636          72.4
db file sequential read              81,221         160      2   18.3   User I/O
control file parallel write           3,794         117     31   13.3 System I/O
db file scattered read               22,473          55      2    6.2   User I/O
log file parallel write               3,648          51     14    5.8 System I/O

(Sorry about the formatting, it's like herding cats.)

The "sql net more data to client" wait is eliminated! :) 

At first I thought there was something wrong with the AWR Report, maybe the wrong snapshot was chosen, so I double-checked, and ran it for different days. But, it's really gone! :)

Tuesday, July 23, 2013

Database Network Performance Tuning


I recently looked at the database performance of a production database. This post will explain the network performance parameters which I implemented as a result of this work.

Here's the best SQLNET.ORA (server) I could come up for our OLTP application:

SQLNET.AUTHENTICATION_SERVICES=(NTS)
NAMES.DIRECTORY_PATH=(TNSNAMES)
DISABLE_OOB=ON
TCP.NO_DELAY=YES
DEFAULT_SDU_SIZE=32120
USE_DEDICATED_SERVER=ON
SQLNET.EXPIRE_TIME=10

And here's the relevant part of the LISTENER.ORA:

LISTENER =
  (DESCRIPTION =
    (SDU = 32120) (ADDRESS = (PROTOCOL = TCP)(HOST = SERVER)(PORT = 1521)(SEND_BUF_SIZE = 65535)(RECV_BUF_SIZE = 65535))
  )



Now, these settings are specifically chosen for an OLTP application, where response time is more important than throughput. I'll explain the settings and values.

SQLNET.AUTHENTICATION_SERVICES=(NTS)


Not much to say here except on the server I always use NTS because it's required by ASM, and on the Citrix server I set this variable to NONE.

NAMES.DIRECTORY_PATH=(TNSNAMES)


We use TNSNames for service resolution, so that's what we put.

DISABLE_OOB=ON


Disable Out-of-Band protocol (disables Ctrl-C). The application doesn't require OOB, so why bother with this overhead? Disable it.

Note: do not include the parameter BREAK_POLL_SKIP. With BREAK_POLL_SKIP, the Oracle client will check for a Ctrl-C, while DISABLE_OOB completely disables it. If both are used, I am not sure which takes precendence.

USE_DEDICATED_SERVER=ON


I use Dedicated Server mode for all production databases to ensure maximum performance. We have enough RAM, so why not?

SQLNET.EXPIRE_TIME=10



Closes a dead connection after 10 minutes e.g. application crashed and we need to clean up the dead connection.

TCP.NO_DELAY=YES


This is the most important parameter for OLTP apps. It tells the Oracle network software "stop messing about with buffers, just send the data back to the client ASAP!". This had a substantial performance improvement. 

Sybase, in its ASE documentation, recommends this to be set on for its ASE. The relevant document can be found here:


Sybase puts it eloquently:

The tcp no delay parameter controls TCP (Transmission Control Protocol) packet batching. The default value is 1, which means that TCP packets are not batched. 
TCP normally batches small logical packets into single larger physical packets (by briefly delaying packets) fill physical network frames with as much data as possible. This is intended to improve network throughput in terminal emulation environments where there are mostly keystrokes being sent across the network. 
However, applications that use small TDS (Tabular Data Stream) packets may benefit from disabling TCP packet batching.

Yes, I realise it's Sybase doc and not Oracle, but the concept is the same, and their explanation is the best I could find.

The anti-thesis of this parameter is these two:

RECV_BUF_SIZE
SEND_BUF_SIZE

These two parameters set up the data buffers for the packets going back & forth. It's perfect for DSS (reporting) type applications. Put these in your SQLNET.ORA if you want your OLTP app to run slowly!

If both sets of parameters are used, I'm not sure which takes precedence.

SQLNET.ORA: DEFAULT_SDU_SIZE=32120 & SDU=3120 in Listener.ora


This is extremely important. I'll explain how I arrived at this in a separate post. For the new SDU size to take effect, you must update the SQLNET.ORA (client and server) and the LISTENER.ORA (as shown above), and restart the listener.



Tuesday, March 12, 2013

Very useful RARarchive script


set year=%date:~6,4%
set yr=%date:~8,2%
set month=%date:~3,2%
set day=%date:~0,2%
set hour=%time:~0,2%
set hour=%hour: =0%
set min=%time:~3,2%
set sec=%time:~6,2%

rar a -r -m5 DEV[%year%%month%%day%_%hour%_%min%].rar E:\Backups\DEV\*.*
forfiles /p "E:\Backups" /m "*.rar" /d -14 /c "cmd /c del @path"

Monday, February 25, 2013

Rebuild Indexes

Here's s a neat script I wrote for re-building indexes. One of the apps I look after has a requirement of having indexes stored in a tablespace called INDEXES. This is the script I wrote for doing this programmatically.

This script can be easily modified to re-build all indexes, or all non SYS/SYSTEM indexes, or just invalid indexes. It can also be wrapped in a package, and then called by a job.

The cool thing about this script is the way it uses a BULK COLLECT :).


DECLARE
   sql_stmt                      VARCHAR2 ( 2000 ) DEFAULT NULL;

   TYPE sql_stmt_array_type IS TABLE OF sql_stmt%TYPE;

   sql_stmt_array                sql_stmt_array_type DEFAULT NULL;

   CURSOR cursor1
   IS
      SELECT 'ALTER INDEX ' || a.owner || '.' || a.index_name
             || ' REBUILD TABLESPACE INDEXES' sql1
        FROM dba_indexes a, dba_objects b
       WHERE a.owner IN
                ( 'SCHEMA_OWNER1'
                 ,'SCHEMA_OWNER2')
         AND a.index_type = 'NORMAL'
         AND a.TEMPORARY LIKE 'N'
         AND a.owner = b.owner
         AND a.index_name = b.object_name
         AND b.object_type = 'INDEX'
         AND a.tablespace_name != 'INDEXES';
BEGIN
   OPEN cursor1;

   FETCH cursor1
   BULK COLLECT INTO sql_stmt_array;

   FOR i IN 1 .. sql_stmt_array.COUNT
   LOOP
      sql_stmt := sql_stmt_array ( i );

      EXECUTE IMMEDIATE sql_stmt;
   END LOOP;

   CLOSE cursor1;
END;
/

Sunday, January 27, 2013

Performance Tuning & Recoverability Notes


  • Set the SQLNET.ORA (as described) on the server and client 
  • Set the LISTENER send/receive buffers to the max size i.e. 65535 
  • alter system set SHARED_SERVERS=0 scope=both; -- (do NOT use shared servers if you want max performance) 
  • alter system reset DISPATCHERS scope=spfile sid='*'; -- (needed to disable use of shared servers; restart db instance for this to take effect) 
  • alter system set CURSOR_SHARING='FORCE' scope=both; -- (force cursor re-use wherever possible) 
  • For production systems, take no chances with block corruptions, enable the following: 
    • alter system set db_block_checksum=FULL scope=both;  
    • alter system set db_block_checking=FULL scope=both;  
  • Check for bad blocks by using RMAN: 
    • backup check logical validate database; -- doesn't actually do a backup 
    • select * from v$DATABASE_BLOCK_CORRUPTION; 
  • Store the database files separate from the redo logs. If possible, store the archive logs separate from the redo logs as well. Consider using a SSD volume for redo logs. If an SSD volume isn't forthcoming, use a RAID 10 volume with a thin-stripe of 128k. 
  • Store the database files on an ASM volume. Create the ASM volume from at least two RAID 1 volumes. ASM will create a 1 MB coarse stripe across all volumes, giving excellent storage performance. 
  • For max recoverability: 
    • store the control files, online redo logs, archive logs, incremental backups and a database image copy on an NTFS volume (with a 64k cluster size for max performance). 
    • Follow Oracle Suggested Backup strategy. 
  • Guy Harrison also says that the TEMP tablespace could be located away from the other files since it is used for sorting, etc. which could impact performance. In reality, this is probably not necessary. 

RMAN - solid backup


This isn't the Oracle Suggested Backup script, but this is my RMAN script for a good, solid RMAN backup. It will need slight tweaking if you decide to use it.




backup check logical database; 
sql 'alter system archive log current'; 
backup archivelog all delete all input; 
 allocate channel for maintenance device type disk; 

RUN {  
crosscheck backup of database; 
crosscheck backup of controlfile; 
crosscheck archivelog all;} 

delete force noprompt expired backup of database; 
delete force noprompt expired backup of controlfile; 
delete force noprompt expired archivelog all; 
delete force noprompt obsolete redundancy 5; 

sql "ALTER DATABASE BACKUP CONTROLFILE TO TRACE"; 
sql "ALTER DATABASE BACKUP CONTROLFILE TO ''D:\FRA\SBY01\SBY01_CONTROLFILE_BACKUP.CTL'' REUSE"; 
sql "CREATE PFILE=''C:\oracle\product\10.2.0\db_1\admin\SBY01\pfile\INITSBY01.INI'' FROM  SPFILE"; 

exit; 

How to Configure RAID 1 with LVM Volumes

This blog will describe how to enable RAID 1 using mdraid on volumes which include LVM volumes.

When Oracle Enterprise Linux is installed, the default configuration of the disks includes an LVM volume. The boot volume, /boot, remains a normal ext3 file system.

1. Initial Config


During installation, here is the default configuration:




Which gives us this:

[root@oel5-raid1-3 ~]# uname -r
2.6.32-300.10.1.el5uek

[root@oel5-raid1-3 ~]# cat /etc/enterprise-release
Enterprise Linux Enterprise Linux Server release 5.8 (Carthage)


[root@oel5-raid1-3 ~]# cat /etc/fstab
/dev/VolGroup00/LogVol00 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/VolGroup00/LogVol01 swap                    swap    defaults        0 0



[root@oel5-raid1-3 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      7.7G  2.4G  5.0G  32% /
/dev/sda1              99M   24M   71M  25% /boot
tmpfs                 495M     0  495M   0% /dev/shm



[root@oel5-raid1-3 ~]# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1305    10377990   8e  Linux LVM

Disk /dev/dm-0: 8489 MB, 8489271296 bytes
255 heads, 63 sectors/track, 1032 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/dm-0 doesn't contain a valid partition table

Disk /dev/dm-1: 2113 MB, 2113929216 bytes
255 heads, 63 sectors/track, 257 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/dm-1 doesn't contain a valid partition table



[root@oel5-raid1-3 ~]# mount | grep ext3
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
/dev/sda1 on /boot type ext3 (rw)

[root@oel5-raid1-3 ~]# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               VolGroup00
  PV Size               9.90 GB / not usable 22.76 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              316
  Free PE               0
  Allocated PE          316
  PV UUID               5aMSJb-OALl-wztg-107U-bizd-wLB6-G25RcW

[root@oel5-raid1-3 ~]# vgdisplay
  --- Volume group ---
  VG Name               VolGroup00
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               9.88 GB
  PE Size               32.00 MB
  Total PE              316
  Alloc PE / Size       316 / 9.88 GB
  Free  PE / Size       0 / 0
  VG UUID               dnYd54-w9ZG-METW-V1lP-WizL-wj7A-ZxJ6SO

[root@oel5-raid1-3 ~]# lvdisplay
  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol00
  VG Name                VolGroup00
  LV UUID                6ric1n-Dtgg-uK4s-09KF-EUJv-zm3B-ok6UHn
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                7.91 GB
  Current LE             253
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0

  --- Logical volume ---
  LV Name                /dev/VolGroup00/LogVol01
  VG Name                VolGroup00
  LV UUID                wpMR3Z-7CMn-c6Q9-GF6h-xh7g-dULW-AAVP5U
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                1.97 GB
  Current LE             63
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1

[root@oel5-raid1-3 ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/mapper/VolGroup00-LogVol01         partition       2064376 0       -1

Basically - /boot is normal ext3, which / is an LVM volume and the swap is also on the LVM.


2. Add Second HDD


Add a second hard disk to the system. 

[root@oel5-raid1-3 ~]# fdisk -l /dev/sdb

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

3. Parition the second HDD


The second hard disk must have the same partition layout as the first. The easy way to do this is to use the sfdisk utility.

[root@oel5-raid1-3 ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 1305 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
 /dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63    208844     208782  83  Linux
/dev/sdb2        208845  20964824   20755980  8e  Linux LVM
/dev/sdb3             0         -          0   0  Empty
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

4. Modify the Secondary Disk partitions to type RAID


Use the fdisk utility to modify the partitions on the second disk to type fd (RAID):


[root@oel5-raid1-3 ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 1305.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   83  Linux
/dev/sdb2              14        1305    10377990   8e  Linux LVM

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdb2              14        1305    10377990   fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Use the partprobe utility to update the kernel with the partition type changes:


[root@oel5-raid1-3 ~]# partprobe /dev/sdb


Verify creation of the new partitions:

[root@oel5-raid1-3 ~]# cat /proc/partitions
major minor  #blocks  name

   8        0   10485760 sda
   8        1     104391 sda1
   8        2   10377990 sda2
   8       16   10485760 sdb
   8       17     104391 sdb1
   8       18   10377990 sdb2
 253        0    8290304 dm-0
 253        1    2064384 dm-1

5. Create RAID 1 Arrays on the Second Disk


Let's now create the RAID 1 devices on the second disk:

[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities :
unused devices: <none>
[root@oel5-raid1-3 ~]# mdadm --create /dev/md1 --auto=yes --level=raid1 --raid-devices=2 missing /dev/sdb1
mdadm: array /dev/md1 started.
[root@oel5-raid1-3 ~]# mdadm --create /dev/md2 --auto=yes --level=raid1 --raid-devices=2 missing /dev/sdb2
mdadm: array /dev/md2 started.
[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sdb2[1]
      10377920 blocks [2/1] [_U]

md1 : active raid1 sdb1[1]
      104320 blocks [2/1] [_U]

unused devices: <none>

Note: missing is a keyword placeholder for sda, which we will add later.

Format md1 as ext3:

[root@oel5-raid1-3 ~]# mkfs.ext3 /dev/md1
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
26104 inodes, 104320 blocks
5216 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
13 block groups
8192 blocks per group, 8192 fragments per group
2008 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729

Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.


6. Move the data from the LVM


Now, we move the data from /dev/sda2 to /dev/md2

[root@oel5-raid1-3 ~]# pvcreate /dev/md2
  Writing physical volume data to disk "/dev/md2"
  Physical volume "/dev/md2" successfully created
[root@oel5-raid1-3 ~]# vgextend VolGroup00 /dev/md2
  Volume group "VolGroup00" successfully extended

This command start the volume migration:

[root@oel5-raid1-3 ~]# pvmove -i 2 /dev/sda2 /dev/md2

This can take a while.

Now, we remove /dev/sda2:

[root@oel5-raid1-3 ~]# vgreduce VolGroup00 /dev/sda2
  Removed "/dev/sda2" from volume group "VolGroup00"
[root@oel5-raid1-3 ~]# pvremove /dev/sda2
  Labels on physical volume "/dev/sda2" successfully wiped

Now, convert the /dev/sda2 to a RAID device:

[root@oel5-raid1-3 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 1305.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1305    10377990   fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

Now, add it back as a RAID member of md2:

 mdadm --add /dev/md2 /dev/sda2

And monitor its progress using:

watch -n 2 cat /proc/mdstat

Every 2.0s: cat /proc/mdstat                            Sat Jan 26 20:04:01 2013

Personalities : [raid1]
md2 : active raid1 sda2[2] sdb2[1]
      10377920 blocks [2/1] [_U]
      [======>..............]  recovery = 31.7% (3292288/10377920) finish=0.6min
 speed=193664K/sec

md1 : active raid1 sdb1[1]
      104320 blocks [2/1] [_U]

unused devices: <none>


Press Ctrl-C to exit watch once the re-build is done.


[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda2[0] sdb2[1]
      10377920 blocks [2/2] [UU]

md1 : active raid1 sdb1[1]
      104320 blocks [2/1] [_U]

unused devices: <none>



7. Update fstab


The default /etc/fstab is:

/dev/VolGroup00/LogVol00 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/VolGroup00/LogVol01 swap                    swap    defaults        0 0

We need to change it to this:

/dev/VolGroup00/LogVol00 /                       ext3    defaults        1 1
/dev/md1                /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/VolGroup00/LogVol01 swap                    swap    defaults        0 0

The change is this line

/dev/md1                /boot                   ext3    defaults        1 2

Replace "LABEL=/boot" with "/dev/md1"

8. Update grub.conf


The default /boot/grub/grub.conf is:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/VolGroup00/LogVol00 rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img
title Oracle Linux Server-base (2.6.18-308.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-308.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet numa=off
        initrd /initrd-2.6.18-308.el5.img

And we change it to:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
fallback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title HDD1 (2.6.32-300.10.1.el5uek)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/VolGroup00/LogVol00 rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img
title HDD2 (2.6.32-300.10.1.el5uek)
        root (hd1,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/VolGroup00/LogVol00 rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img

The key changes are:

(i) the addition of the fallback parameter
(ii) the addition of the second title, and its respective attributes.

I also updated the titles to reflect the device from which the system is being booted.

The default parameter is important too. It indicates the title from which the system will boot by default. If Grub can not find a valid /boot partition (e.g. in case of disk failure), then Grub will attempt to boot from the title indicated by fallback

9. Re-create Initial RAMDisk:


[root@oel5-raid1-3 ~]# cd /boot
[root@oel5-raid1-3 boot]# ll initrd*
-rw------- 1 root root 4372497 Jan 26 19:21 initrd-2.6.18-308.el5.img
-rw------- 1 root root 3934645 Jan 26 19:21 initrd-2.6.32-300.10.1.el5uek.img
[root@oel5-raid1-3 boot]# uname -r
2.6.32-300.10.1.el5uek
[root@oel5-raid1-3 boot]# mkinitrd -f -v initrd-2.6.32-300.10.1.el5uek.img 2.6.32-300.10.1.el5uek

The mkinitrd command takes the format of:

mkinitrd -v -f initrd-'uname-r'.img 'uname -r'

That's why it's important to grab uname -r.

10. Copy /Boot

[root@oel5-raid1-3 boot]# mkdir /mnt/boot.md1
[root@oel5-raid1-3 boot]# mount /dev/md1 /mnt/boot.md1
[root@oel5-raid1-3 boot]# cp -dpRxu /boot/* /mnt/boot.md1

This stage has to be done before the next (installing grub on both disks).

11. Install Grub on BOTH disks

It is very important to install Grub on BOTH disks!

[root@oel5-raid1-3 boot]# grub
Probing devices to guess BIOS drives. This may take a long time.


    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> root (hd1,0)
root (hd1,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd1)
setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd1)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> quit
quit

12. Reboot


[root@oel5-raid1-3 boot]# reboot

13. Add /dev/sda to /dev/md1


[root@oel5-raid1-3 ~]# mount | grep ext3
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)

So /dev/sda isn't mounted...

[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[1]
      104320 blocks [2/1] [_U]

md2 : active raid1 sdb2[1] sda2[0]
      10377920 blocks [2/2] [UU]

unused devices: <none>

And not used by /dev/md1...

So...

[root@oel5-raid1-3 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 1305.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1305    10377990   fd  Linux raid autodetect

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   fd  Linux raid autodetect
/dev/sda2              14        1305    10377990   fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

[Note the change of partition type, from 'Linux' to 'Linux raid autodetect']

[root@oel5-raid1-3 ~]# partprobe /dev/sda

[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md1 /dev/sda1
mdadm: added /dev/sda1
[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[0] sdb1[1]
      104320 blocks [2/2] [UU]

md2 : active raid1 sdb2[1] sda2[0]
      10377920 blocks [2/2] [UU]

unused devices: <none>

14. Recreate the initial ram disk:


[root@oel5-raid1-3 ~]# cd /boot
[root@oel5-raid1-3 boot]# uname -r
2.6.32-300.10.1.el5uek
[root@oel5-raid1-3 boot]# mkinitrd -v -f initrd-2.6.32-300.10.1.el5uek.img 2.6.32-300.10.1.el5uek
Creating initramfs

15. Testing


To simulate loss of sdb, here we execute a software fault:

[root@oel5-raid1-3 ~]# mdadm --manage /dev/md1 --fail /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
[root@oel5-raid1-3 ~]# mdadm --manage /dev/md2 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
[root@oel5-raid1-3 ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1
[root@oel5-raid1-3 ~]# mdadm --manage /dev/md2 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2

Shutdown the server, replace /dev/sdb. Start it up, and check the status of the raid:

[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[0]
      104320 blocks [2/1] [U_]

md2 : active raid1 sda2[0]
      10377920 blocks [2/1] [U_]

unused devices: <none>

And what's the status of the new hard disk:

[root@oel5-raid1-3 ~]# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   fd  Linux raid autodetect
/dev/sda2              14        1305    10377990   fd  Linux raid autodetect

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/md2: 10.6 GB, 10626990080 bytes
2 heads, 4 sectors/track, 2594480 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

...



Copy the partition layout to the new disk:

[root@oel5-raid1-3 ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 1305 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
 /dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63    208844     208782  fd  Linux raid autodetect
/dev/sdb2        208845  20964824   20755980  fd  Linux raid autodetect
/dev/sdb3             0         -          0   0  Empty
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)


Clear any remnants of a previous RAID device on the new disk:

[root@oel5-raid1-3 ~]# mdadm --zero-superblock /dev/sdb1
mdadm: Unrecognised md component device - /dev/sdb1
[root@oel5-raid1-3 ~]# mdadm --zero-superblock /dev/sdb2
mdadm: Unrecognised md component device - /dev/sdb2

OK, now add the partitions to the respective md devices:

[root@oel5-raid1-3 ~]# mdadm --add /dev/md1 /dev/sdb1
mdadm: added /dev/sdb1
[root@oel5-raid1-3 ~]# mdadm --add /dev/md2 /dev/sdb2
mdadm: added /dev/sdb2

watch -n 2 cat /proc/mdstat

Wait for the re-synchronisation to complete, press Ctrl-C to exit.

Re-install grub on BOTH hard drives:

[root@oel5-raid1-3 ~]# grub
Probing devices to guess BIOS drives. This may take a long time.


    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> root (hd1,0)
root (hd1,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd1)
setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd1)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> quit
quit


And that's how you replace a disk on md raid!


Saturday, January 26, 2013

How to Configure Oracle Enterprise Linux to be Highly Available Using RAID1


This was published by Oracle, and I'm very sorry for re-publishing it, but the author made some mistakes. I had lots of 'fun' with grub & linux rescue when I followed the original instructions.

I've cleaned it up, made some relevant changes and tested it with OEL 5 for my dear blog readers.


In this Document
  Goal
  Solution
     1. Original System Configuration
     2. Add Second Hard Disk
     3. Partition Second Hard Disk
     4. Modify Secondary Disk Partitions to Type RAID
     5. Create RAID1 Arrays on Second Disk
     6. Taking a Closer Look at RAID Devices
     7. RAID Configuration
     8. Create Filesystems/Swap Devices on RAID devices
     9. Backup Current System Configuration
     10. Mount Filesystems on RAID Devices
     11. Optionally mount/swapon filesystems/swap device on RAID devices as their non-RAID devices
     12. Modify fstab to Use RAID Devices
     13. Add Failback Title to grub.conf
     14. Remake Initial RAM Disk (One of Two)
     15. Copy Contents of Non-RAID filesystems to RAID filesystems
     16. Install/Reintall GRUB
     17. Reboot the System (Degraded Array)
     18. Modify Primary Disk Partitions to Type RAID
     19. Add Primary Disk Partitions to RAID Arrays
     20. Modify grub.conf
     21. Remake Initial RAM Disk (Two of Two)
     22. Testing
     22.1 Test - persistent mount on degraded array (/dev/sdb software failed))
     22.2 Test - boot into degraded array (/dev/sdb software removed)
     22.3 Test - boot into degraded array (/dev/sdb software removed)
     22.4 Test - boot into degraded array (/dev/sda physically removed)
 
Applies to:

Linux Kernel - Version: 2.4 to 2.6
Linux Itanium
Linux x86-64

Goal


This article describes how to configure an Oracle Enterprise Linux System to be highly available using RAID1.

The article is intended for Linux System Administrators.

Solution


RAID (Redundant Array of Inexpensive Disks) defines the use of multiple hard disks by systems to provide increased diskspace, performance and availability. This article solely focuses on implementing RAID1, commonly referred to as mirror disk, whereby two (or more) disks contain identical content. System availability and data integrity is maintained as long as at least one disk survives a failure.

Although using working examples from Oracle Enterprise Linux 5 (OEL5), the article similarly applies to other Linux distributions and versions.

Before proceeding, take a complete backup of the system.


1. Original System Configuration


This document assumes that LVM is not used for storage management. When I started to follow this document, I did a default installation of OEL. Unfortunately, by default, it used LVM for volume management, so I couldn't use mdadm for RAID. So I re-installed the box and configured the storage as plain - no LVM, then I could follow these instructions.

I assume your storage is configured similarly as shown here:



And that GRUB is installed:



Prior to implementing RAID, the system comprised the following, simple configuration:


[root@oel5-raid1-3 ~]# uname -a
Linux oel5-raid1-3 2.6.32-300.10.1.el5uek #1 SMP Wed Feb 22 17:37:40 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

[root@oel5-raid1-3 ~]# cat /etc/enterprise-release
Enterprise Linux Enterprise Linux Server release 5.8 (Carthage)

[root@oel5-raid1-3 ~]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        2479    19808145   83  Linux
/dev/sda3            2480        2610     1052257+  82  Linux swap / Solaris



[root@oel5-raid1-3 ~]# blkid
/dev/sda3: LABEL="SWAP-sda3" TYPE="swap"
/dev/sda2: LABEL="/" UUID="aedde157-1fe3-45e3-b538-8dc1193b0430" TYPE="ext3"
/dev/sda1: LABEL="/boot" UUID="e4860f1d-d717-4efd-b91e-af4d8eb05421" TYPE="ext3"



[root@oel5-raid1-3 ~]# cat /etc/fstab
LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=SWAP-sda3         swap                    swap    defaults        0 0

[root@oel5-raid1-3 ~]# mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)



[root@oel5-raid1-3 ~]# swapon -s
Filename                                Type            Size Used Priority
/dev/sda3                               partition       1052248 0       -1



2. Add Second Hard Disk


A second hard disk is added to the system. Ideally, the second disk should be exactly the same (make and model) as the first. To help avoid single point of failure, add additional disks to a separate disk controller than that used by the first disk.

Our second hard disk is /dev/sdb.


[root@oel5-raid1-3 ~]# fdisk -l /dev/sdb

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table


3. Partition Second Hard Disk


The second disk must contain the same configuration (partition layout) as that of the first disk. Disk partitioning can be performed manully using the fdisk(8) utility, however the the sfdisk(8) utility can be used to quickly and easily replicate the partition table from the first disk e.g.:


[root@oel5-raid1-3 ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 2610 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
 /dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63    208844     208782  83  Linux
/dev/sdb2        208845  39825134   39616290  83  Linux
/dev/sdb3      39825135  41929649    2104515  82  Linux swap / Solaris
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)


4. Modify Secondary Disk Partitions to Type RAID


Use the fdisk(8) utility to modify the second disk partitions from type 83/82 (linux/swap) to fd (raid) e.g.:


[root@oel5-raid1-3 ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   83  Linux
/dev/sdb2              14        2479    19808145   83  Linux
/dev/sdb3            2480        2610     1052257+  82  Linux swap / Solaris

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          13      104391   fd  Linux raid autodetect
/dev/sdb2              14        2479    19808145   fd  Linux raid autodetect
/dev/sdb3            2480        2610     1052257+  fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.



Use the partprobe(8) or sfdisk(8) utility to update the kernel with the partition type changes e.g.:

[root@oel5-raid1-3 ~]# partprobe /dev/sdb

Verify creation of the new partitions on the second disk e.g.:


[root@oel5-raid1-3 ~]# cat /proc/partitions
major minor  #blocks  name

   8        0   20971520 sda
   8        1     104391 sda1
   8        2   19808145 sda2
   8        3    1052257 sda3
   8       16   20971520 sdb
   8       17     104391 sdb1
   8       18   19808145 sdb2
   8       19    1052257 sdb3



5. Create RAID1 Arrays on Second Disk


Use the mdadm(8) utility to create a raid1 array on the second disk only e.g.:


[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities :
unused devices: <none>



[root@oel5-raid1-3 ~]# mdadm --create /dev/md1 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb1 missing
mdadm: array /dev/md1 started.
[root@oel5-raid1-3 ~]# mdadm --create /dev/md2 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb2 missing
mdadm: array /dev/md2 started.
[root@oel5-raid1-3 ~]# mdadm --create /dev/md3 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb3 missing
mdadm: array /dev/md3 started.
[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[0]
      1052160 blocks [2/1] [U_]

md2 : active raid1 sdb2[0]
      19808064 blocks [2/1] [U_]

md1 : active raid1 sdb1[0]
      104320 blocks [2/1] [U_]

unused devices: <none>



In the example above, raid devices are created using the same numbering as the device partitions they include e.g. /dev/md1 contains /dev/sdb1 and device /dev/sda1 will be added later. The term missing is used as a stub or placeholder that will eventually be replaced with the partitions on the first disk; /dev/sda1, /dev/sda2, /dev/sda3.

Check /proc/mdstat to verify successful creation of the raid devices e.g.:


[root@oel5-raid1-3 ~]# cat /proc/partitions
major minor  #blocks  name

   8        0   20971520 sda
   8        1     104391 sda1
   8        2   19808145 sda2
   8        3    1052257 sda3
   8       16   20971520 sdb
   8       17     104391 sdb1
   8       18   19808145 sdb2
   8       19    1052257 sdb3
    9        1     104320 md1
   9        2   19808064 md2
   9        3    1052160 md3


6. Taking a Closer Look at RAID Devices


Use the mdadm(8) utility to review raid devices in detail e.g.:


[root@oel5-raid1-3 ~]# mdadm --query --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Fri Jan 25 23:27:05 2013
     Raid Level : raid1
     Array Size : 104320 (101.89 MiB 106.82 MB)
  Used Dev Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Jan 25 23:27:05 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 98d72fb6:ff5130f8:647d100f:9acf7e1c
         Events : 0.1

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       0        0        1      removed



Note that raid device /dev/md1 solely contains one disk member /dev/sdb1 at this point. The state of the array is clean,degraded denoting only one (of two) underlying disk members is currently active and working.

7. RAID Configuration


This step I skipped, but I leave the instructions for reference.

Strictly speaking a master RAID configuration is not required. With relevant partitions being marked as type raid (fd), the kernel will auto assemble detected arrays on boot. If desired, one can create RAID configuration file /etc/mdadm.conf or /etc/mdadm/mdadm.conf as a reference to raid device usage e.g.:

# mkdir  /etc/mdadm/
# echo "DEVICE /dev/hd*[0-9] /dev/sd*[0-9]" >> /etc/mdadm/mdadm.conf
# mdadm --detail --scan >> /etc/mdadm/mdadm.conf
# ln -s /etc/mdadm/mdadm.conf /etc/mdadm.conf

# cat /etc/mdadm/mdadm.conf 
DEVICE /dev/hd*[0-9] /dev/sd*[0-9]
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=a4d5007d:6974901a:637e5622:e5b514c9
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=0e8ce9c6:bd42917d:fd3412bf:01f49095
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=7d696890:890b2eb7:c17bf4e4:d542ba99

The DEVICE filter is used to limit candidate RAID devices being created or added as RAID disk members.

8. Create Filesystems/Swap Devices on RAID devices


Once Created, RAID devices are usable just like any other device. Use the mkfs.ext3(8) or mke2fs(8) and mkswap(8) commands to create EXT3 filesystems and swap device on RAID devices e.g.:


[root@oel5-raid1-3 ~]# mkfs.ext3 -L boot.md1 /dev/md1
mke2fs 1.39 (29-May-2006)
Filesystem label=boot.md1
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
26104 inodes, 104320 blocks
5216 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
13 block groups
8192 blocks per group, 8192 fragments per group
2008 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729

Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 25 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

[root@oel5-raid1-3 ~]# mkfs.ext3 -L root.md2 /dev/md2
mke2fs 1.39 (29-May-2006)
Filesystem label=root.md2
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
2480640 inodes, 4952016 blocks
247600 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
152 block groups
32768 blocks per group, 32768 fragments per group
16320 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

[root@oel5-raid1-3 ~]# mkswap -L swap.md3 /dev/md3
Setting up swapspace version 1, size = 1077407 kB
LABEL=swap.md3, no uuid

[root@oel5-raid1-3 ~]# blkid
/dev/sda3: LABEL="SWAP-sda3" TYPE="swap"
/dev/sda2: LABEL="/" UUID="aedde157-1fe3-45e3-b538-8dc1193b0430" TYPE="ext3"
/dev/sda1: LABEL="/boot" UUID="e4860f1d-d717-4efd-b91e-af4d8eb05421" TYPE="ext3"
/dev/sdb1: LABEL="boot.md1" UUID="56413220-d015-4f29-892d-2e4892fa355e" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb2: LABEL="root.md2" UUID="ee39c18d-3be3-4047-9dfb-55d8b49f70d4" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb3: TYPE="swap" LABEL="swap.md3"
/dev/md1: LABEL="boot.md1" UUID="56413220-d015-4f29-892d-2e4892fa355e" SEC_TYPE="ext2" TYPE="ext3"
/dev/md2: LABEL="root.md2" UUID="ee39c18d-3be3-4047-9dfb-55d8b49f70d4" SEC_TYPE="ext2" TYPE="ext3"
/dev/md3: TYPE="swap" LABEL="swap.md3"


To avoid confusion later, labels added to filesystems and swap device denote the RAID device on which they are created.


9. Backup Current System Configuration


Beyond this point, significant changes are made to the current system, therefore take a backup of the core system configuration e.g.:


[root@oel5-raid1-3 ~]# cp /etc/fstab /etc/fstab.orig
[root@oel5-raid1-3 ~]# cp /boot/grub/grub.conf /boot/grub/grub.conf.orig
[root@oel5-raid1-3 ~]# mkdir /boot.orig
[root@oel5-raid1-3 ~]# sync
[root@oel5-raid1-3 ~]# cp -dpRxu /boot/* /boot.orig/



10. Mount Filesystems on RAID Devices


Mount the raided filesystems e.g.


[root@oel5-raid1-3 ~]# mkdir /boot.md1
[root@oel5-raid1-3 ~]# mount -t ext3 /dev/md1 /boot.md1
[root@oel5-raid1-3 ~]# mount | grep boot
/dev/sda1 on /boot type ext3 (rw)
/dev/md1 on /boot.md1 type ext3 (rw)
[root@oel5-raid1-3 ~]# mkdir /root.md2
[root@oel5-raid1-3 ~]# mount -t ext3 /dev/md2 /root.md2



11. Optionally mount/swapon filesystems/swap device on RAID devices as their non-RAID devices


Optionally test mount/swapon of filesystems on raided devices as their currently mounted non-raided counterparts e.g.:


[root@oel5-raid1-3 ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/sda3                               partition       1052248 0       -1
[root@oel5-raid1-3 ~]# swapoff /dev/sda3
[root@oel5-raid1-3 ~]# swapon /dev/md3
[root@oel5-raid1-3 ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -1



Note: it is not possible to unmount/remount the root filesystem (/dev/sda2) as it's currently in use.

12. Modify fstab to Use RAID Devices


Modify the /etc/fstab file to mount/swapon raided devices on system boot.
Substitute relevant LABEL=  or /dev/sdaN entries with their corresponding /dev/mdN devices e.g.:


[root@oel5-raid1-3 ~]# vi /etc/fstab
[root@oel5-raid1-3 ~]# cat /etc/fstab
/dev/md2                /                       ext3    defaults        1 1
/dev/md1                /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/md3                swap                    swap    defaults        0 0


13. Add Fallback Title to grub.conf


A failback title allows the system to boot the system using one title and fallback to another should any issues occur when booting with the first. This is particularly helpful in that without a failback title, the system may fail to boot and a linux rescue may be needed to restore/recover the system.

Original /boot/grub/grub.conf:

[root@oel5-raid1-3 ~]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=LABEL=/ rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img
title Oracle Linux Server-base (2.6.18-308.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-308.el5 ro root=LABEL=/ rhgb quiet numa=off
        initrd /initrd-2.6.18-308.el5.img

Modify the original /boot/grub/grub.conf file by adding the failback parameter and a failback grub boot title e.g.:


# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
fallback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd1,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/md2 rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=LABEL=/ rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img



In the example above, the system is configured to boot using the first boot title (default=0) i.e. the one with /boot on the first partition of the second grub disk device (hd1,0) and specifying the root filesystem on raid device /dev/md2. Should that fail to boot, the system will failback (failback=1) to boot from the second boot title i.e. the one specifying the /boot filesystem on the first partition of the first grub device (hd0,0) and and specifying the root filesystem with label /1. Note that grub boot title numbering starts from zero (0).

14. Remake Initial RAM Disk (One of Two)


Use the mkinitrd(4) utility to recreate the initial ram disk. The initial ram disk must be rebuilt with raid  module support to ensure the system has the required drivers to boot from raided devices e.g.:

# mv initrd-`uname -r`.img initrd-`uname -r`.img.orig
# mkinitrd -v -f initrd-2.6.18-92.el5.img `uname -r`

These commands are based on these two above. The exact commands depends on your kernel version.



[root@oel5-raid1-3 ~]# cd /boot
[root@oel5-raid1-3 boot]# mv initrd-2.6.32-300.10.1.el5uek.img initrd-2.6.32-300.10.1.el5uek.img.orig
[root@oel5-raid1-3 boot]# uname -r
2.6.32-300.10.1.el5uek
[root@oel5-raid1-3 boot]# mkinitrd -v -f initrd-2.6.32-300.10.1.el5uek.img 2.6.32-300.10.1.el5uek
Creating initramfs
Modulefile is /etc/modprobe.conf
Looking for deps of module ehci-hcd
Looking for deps of module ohci-hcd
Looking for deps of module uhci-hcd
Looking for deps of module ext3
Found RAID component md2
Looking for deps of module raid1
Looking for driver for device sdb2
Looking for deps of module scsi:t-0x00
Looking for deps of module pci:v00001000d00000030sv000015ADsd00001976bc01sc00i00: scsi_transport_spi mptbase mptscsih mptspi
Looking for deps of module scsi_transport_spi
Looking for deps of module mptbase
Looking for deps of module mptscsih: mptbase
Looking for deps of module mptspi: scsi_transport_spi mptbase mptscsih
Found RAID component md3
...



[root@oel5-raid1-3 boot]# ll initrd*
-rw------- 1 root root 3558458 Jan 25 15:25 initrd-2.6.18-308.el5.img
-rw------- 1 root root 3067614 Jan 26 00:16 initrd-2.6.32-300.10.1.el5uek.img
-rw------- 1 root root 3144291 Jan 25 15:24 initrd-2.6.32-300.10.1.el5uek.img.orig



Note: another mkinitrd will be required again later after /dev/sdaN partitions are added to the arrays.

15. Copy Contents of Non-RAID filesystems to RAID filesystems


If the raided filesystems were unmounted earlier, remount them as described in Step 10.
Copy the contents of non-raided filesystems (/boot on /dev/sda1, / on /dev/sda2) to their corresponding filesystems on raided devices (/boot.md1 on /dev/md1, /root.md2 on /dev/md2) e.g.:


[root@oel5-raid1-3 boot]# sync
[root@oel5-raid1-3 boot]# cp -dpRxu /boot/* /boot.md1
[root@oel5-raid1-3 boot]# sync
[root@oel5-raid1-3 boot]# cp -dpRxu / /root.md2



Note: there is no need to copy the contents of the swap device. The non-raided swap device (/dev/sda3) will be swapped-off on system shutdown and the raided swap device (/dev/md3) swapped-on on reboot.

16. Install/Reintall GRUB


To cater for the situation where one or other raid disk member is either unavailable, unusable or missing, GRUB [Grand Unified Boot Loader] must be installed to the boot sector (MBR) of every raid disk member participating in an array i.e. /dev/sda, /dev/sdb. Use the grub(8) utility to install grub on the second grub disk (hd1) [/dev/sdb], currently the sole raid disk member e.g.:


[root@oel5-raid1-3 boot]# grub
Probing devices to guess BIOS drives. This may take a long time.


    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub> root (hd1,0)
root (hd1,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd1)
setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd1)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83
grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  15 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> quit
quit

The instructions above install Grub on both disks - this is important.


The reference to (hd0,0) in /boot/grub/grub.conf is a grub disk reference that refers to disk 1 partition 1, which in this instance is /dev/sda1, the partition that houses the non-raided /boot filesystem. Grub always references disks as (hdN) regardless whether of type IDE or SCSI. At installation time, grub builds and stores a map of disk devices in file /boot/grub/device.map. As the system was initially installed with only one disk present (/dev/sda), the contents of /boot/grub/device.map appears as follows:

# cat /boot/grub/device.map
# this device map was generated by anaconda
(hd0)     /dev/sda

Had the /boot filesystem been installed on /dev/sda3 ,say, grub references in /boot/grub/grub.conf would have been (hd0,2). Grub disk and partition numbering starts from zero (0), whereas partition table disk entries start with 'a' e.g. /dev/hda (IDE), /dev/sda (SCSI) and partiton table numbering starts from 1 e.g. /dev/hda1, /dev/sda1.

If there is any confusion regarding grub discovered devices, grub itself may be used to detect or list available devices e.g.:
# grub
Probing devices to guess BIOS drives. This may take a long time.

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]

grub> root (hd<tab key>
Possible disks are: hd0 hd1

17. Reboot the System (Degraded Array)


Reboot the system. As a precaution, be sure to have your operating system installation/rescue media on hand. During boot up, review console messages to determine which device is used to boot the system i.e. /dev/md1 {/dev/sdb} or failback device /dev/sda1. All going well, the system will be using the raid devices albeit in a degraded state i.e. all arrays still only contain one disk member (/dev/sdbN).


[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[0]
      104320 blocks [2/1] [U_]

md3 : active raid1 sdb3[0]
      1052160 blocks [2/1] [U_]

md2 : active raid1 sdb2[0]
      19808064 blocks [2/1] [U_]

unused devices: <none>
[root@oel5-raid1-3 ~]# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)
[root@oel5-raid1-3 ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -1


Further verify that no /dev/sdaN partitions are used e.g.:


[root@oel5-raid1-3 ~]# mount | grep sda
[root@oel5-raid1-3 ~]#


If you did not add a failback title as described in step 13 and experienced booting issues, perform a linux rescue to restore/recover the system.

18. Modify Primary Disk Partitions to Type RAID


In preparation for adding /dev/sdaN partitions to their respective arrays, use the fdisk(8) utility to modify the primary disk partitions from type 83/82 (linux/swap) to fd (raid) e.g.:


[root@oel5-raid1-3 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        2479    19808145   83  Linux
/dev/sda3            2480        2610     1052257+  82  Linux swap / Solaris

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   fd  Linux raid autodetect
/dev/sda2              14        2479    19808145   fd  Linux raid autodetect
/dev/sda3            2480        2610     1052257+  fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.




[root@oel5-raid1-3 ~]# fdisk -l /dev/sda

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   fd  Linux raid autodetect
/dev/sda2              14        2479    19808145   fd  Linux raid autodetect
/dev/sda3            2480        2610     1052257+  fd  Linux raid autodetect


Use the partprobe(8) or sfdisk(8) utility to update the kernel with the partition type changes e.g.:

[root@oel5-raid1-3 ~]# partprobe /dev/sda

19. Add Primary Disk Partitions to RAID Arrays


Once the system has successfully booted using raid (i.e. the secondary disk), use the mdadm(8) utility to add the primary disk partitions to their respective arrays. All data on /dev/sdaN partitions will be destroyed in the process.


[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md2 /dev/sda2
mdadm: added /dev/sda2
[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md1 /dev/sda1
mdadm: added /dev/sda1
[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md3 /dev/sda3
mdadm: added /dev/sda3



[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[2] sdb1[0]
      104320 blocks [2/1] [U_]
        resync=DELAYED

md3 : active raid1 sda3[2] sdb3[0]
      1052160 blocks [2/1] [U_]
        resync=DELAYED

md2 : active raid1 sda2[2] sdb2[0]
      19808064 blocks [2/1] [U_]
      [=======>.............]  recovery = 35.7% (7073472/19808064) finish=1.1min speed=175929K/sec

unused devices: <none>



Depending on the size of partitions/disks used, data synchronisation between raid disk members may take a long time. Use the watch(1) command to monitor disk synchronisation progress e.g.:


[root@oel5-raid1-3 ~]# watch -n 5 cat /proc/mdstat


...

Once complete, /proc/mdstat should denote clean and consistent arrays each with two active, working members e.g.:


[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1] sdb1[0]
      104320 blocks [2/2] [UU]

md3 : active raid1 sda3[1] sdb3[0]
      1052160 blocks [2/2] [UU]

md2 : active raid1 sda2[1] sdb2[0]
      19808064 blocks [2/2] [UU]

unused devices: <none>


20. Modify grub.conf


Once all /dev/sdaN partitions are added as disk members of their respective arrays, modify the /boot/grub/grub.conf file. Substitiute the previous reference to LABEL=/ in the second boot title with raid device /dev/md2 e.g.:


[root@oel5-raid1-3 ~]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
fallback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd1,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/md2 rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/md2 rhgb quiet numa=off
        initrd /initrd-2.6.32-300.10.1.el5uek.img



21. Remake Initial RAM Disk (Two of Two)


Use the mkinitrd(4) utility to recreate the initial ram disk (again) e.g.:


[root@oel5-raid1-3 ~]# cd /boot
[root@oel5-raid1-3 boot]# mv initrd-2.6.32-300.10.1.el5uek.img initrd-2.6.32-300.10.1.el5uek.img.orig
[root@oel5-raid1-3 boot]# uname -r
2.6.32-300.10.1.el5uek
[root@oel5-raid1-3 boot]# mkinitrd -v -f initrd-2.6.32-300.10.1.el5uek.img 2.6.32-300.10.1.el5uek
Creating initramfs
Modulefile is /etc/modprobe.conf
Looking for deps of module ehci-hcd
Looking for deps of module ohci-hcd
Looking for deps of module uhci-hcd
Looking for deps of module ext3
Found RAID component md2
Looking for deps of module raid1
Looking for driver for device sdb2
Looking for deps of module scsi:t-0x00
Looking for deps of module pci:v00001000d00000030sv000015ADsd00001976bc01sc00i00: scsi_transport_spi mptbase mptscsih mptspi
Looking for deps of module scsi_transport_spi
Looking for deps of module mptbase
Looking for deps of module mptscsih: mptbase
Looking for deps of module mptspi: scsi_transport_spi mptbase mptscsih
Found RAID component md3
...


At this point, the system is now up and running using raid1 devices for /, /boot filesystems and swap device.

22. Testing


This section I did not test exactly as described - YMMV.

Before relying on the newly configured system, test the system for proper operation and increased availablility.

Suggested testing includes:
boot from alternate boot title (clean array)
persistent mount on degraded array (/dev/sdb software failed)
boot into degraded array (/dev/sdb software removed)
boot into degraded array (/dev/sda physically removed)
Primary diagnostics to monitor during testing include:
console messages
dmesg
/proc/mdstat
mdadm --query --detail <md dev>


22.1 Test - persistent mount on degraded array (/dev/sdb software failed))


As part of configuring the system to use raid, you will have already tested booting the system from the second disk i.e. /dev/md1 {/dev/sdb1}. For this test, modify the /boot/grub/grub.conf to boot the system using the first disk, /dev/md1 {/dev/sda1} i.e.:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda2
#          initrd /initrd-version.img
#boot=/dev/sda
default=1
failback=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Enterprise Linux (2.6.18-92.el5)
        root (hd1,0)
        kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2 3
        initrd /initrd-2.6.18-92.el5.img
title Enterprise Linux (2.6.18-92.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2 3
        initrd /initrd-2.6.18-92.el5.img

Note the changes to the default and failback parameter values.


22.2 Test - boot into degraded array (/dev/sdb software removed)


Verify that the /, /boot filesystems and swap device remain active, usable and writable after failing the second disk member of each raid array e.g.:
# mdadm --manage --fail /dev/md1 /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
# mdadm --manage --fail /dev/md2 /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
# mdadm --manage --fail /dev/md3 /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md3

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[2](F) sda1[1]
      1052160 blocks [2/1] [_U]
     
md3 : active raid1 sdb3[2](F) sda3[1]
      1052160 blocks [2/1] [_U]
     
md2 : active raid1 sdb2[2](F) sda2[1]
      5116608 blocks [2/1] [_U]
     
unused devices: <none>

# mdadm --query --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Tue Dec 30 21:46:44 2008
     Raid Level : raid1
     Array Size : 1052160 (1027.67 MiB 1077.41 MB)
  Used Dev Size : 1052160 (1027.67 MiB 1077.41 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Wed Dec 31 08:57:33 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0

           UUID : a4d5007d:6974901a:637e5622:e5b514c9
         Events : 0.70

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8        1        1      active sync   /dev/sda1
       2       8       17        -      faulty spare   /dev/sdb1

# dmesg
...
raid1: Disk failure on sdb1, disabling device.
        Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sdb1
 disk 1, wo:0, o:1, dev:sda1
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sda1
raid1: Disk failure on sdb2, disabling device.
        Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sdb2
 disk 1, wo:0, o:1, dev:sda2
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sda2
raid1: Disk failure on sdb3, disabling device.
        Operation continuing on 1 devices
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:0, dev:sdb3
 disk 1, wo:0, o:1, dev:sda3
RAID1 conf printout:
 --- wd:1 rd:2
 disk 1, wo:0, o:1, dev:sda3

# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)

# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -1


22.3 Test - boot into degraded array (/dev/sdb software removed)


Having software failed the second disk raid member (/dev/sdb), software remove the second disk then test successful system boot e.g.:

# mdadm --manage --remove /dev/md1 /dev/sdb1
mdadm: hot removed /dev/sdb1
# mdadm --manage --remove /dev/md2 /dev/sdb2
mdadm: hot removed /dev/sdb2
# mdadm --manage --remove /dev/md3 /dev/sdb3
mdadm: hot removed /dev/sdb3

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1]
      1052160 blocks [2/1] [_U]
     
md3 : active raid1 sda3[1]
      1052160 blocks [2/1] [_U]
     
md2 : active raid1 sda2[1]
      5116608 blocks [2/1] [_U]
     
unused devices: <none>

# mdadm --query --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Tue Dec 30 21:46:44 2008
     Raid Level : raid1
     Array Size : 1052160 (1027.67 MiB 1077.41 MB)
  Used Dev Size : 1052160 (1027.67 MiB 1077.41 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Wed Dec 31 09:06:21 2008
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : a4d5007d:6974901a:637e5622:e5b514c9
         Events : 0.72

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8        1        1      active sync   /dev/sda1

# dmesg
...
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdb2>
md: export_rdev(sdb2)
md: unbind<sdb3>
md: export_rdev(sdb3)

# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)

# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md3                                partition       1052152 0       -1

# shutdown -r now
...

On reboot, add the failed/removed second disk back to the array e.g.:
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1]
      1052160 blocks [2/1] [_U]
     
md3 : active raid1 sda3[1]
      1052160 blocks [2/1] [_U]
     
md2 : active raid1 sda2[1]
      5116608 blocks [2/1] [_U]
     
unused devices: <none>

# mdadm --manage --add /dev/md1 /dev/sdb1
mdadm: re-added /dev/sdb1
# mdadm --manage --add /dev/md2 /dev/sdb2
mdadm: re-added /dev/sdb2
# mdadm --manage --add /dev/md3 /dev/sdb3
mdadm: re-added /dev/sdb3

# dmesg
...
md: bind<sdb1>
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:1, dev:sdb1
 disk 1, wo:0, o:1, dev:sda1
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 1052160 blocks.
md: bind<sdb2>
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:1, dev:sdb2
 disk 1, wo:0, o:1, dev:sda2
md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)
md: bind<sdb3>
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:1, o:1, dev:sdb3
 disk 1, wo:0, o:1, dev:sda3
md: delaying resync of md3 until md1 has finished resync (they share one or more physical units)
md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)

# watch -n 15 cat /proc/mdstat


22.4 Test - boot into degraded array (/dev/sda physically removed)


Similar to tests 22.2 and 22.3, test for ongoing system operation then system boot after physical removal of one or other (or both) raid disk member disks e.g.:
# mdadm --manage --fail /dev/md1 /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md1
# mdadm --manage --fail /dev/md2 /dev/sda2
mdadm: set /dev/sda2 faulty in /dev/md2
# mdadm --manage --fail /dev/md3 /dev/sda3
mdadm: set /dev/sda3 faulty in /dev/md3

# mdadm --manage --remove /dev/md1 /dev/sda1
mdadm: hot removed /dev/sda1
# mdadm --manage --remove /dev/md2 /dev/sda2
mdadm: hot removed /dev/sda2
# mdadm --manage --remove /dev/md3 /dev/sda3
mdadm: hot removed /dev/sda3


Follow Note.603868.1 to dynamically remove (hot unplug) disk devices from the system. Alternatively, shutdown the system and physically remove device /dev/sda from the system before rebooting. On boot, dynamically add the /dev/sda back as a raid disk member, then repeat the same test but physically remove second disk member /dev/sdb. This test not only validates the failback boot title, but also emulates online replacement of a failed hard disk.

Once satisifed, deploy the system for production use.