Sean's Blog: 2013

Friday, July 26, 2013

TCP.VALIDNODE_CHECKING

Well, this is embarassing :)

I unwittingly learnt how these parameters work:

In my SQLNET.ORA, I had

TCP.VALIDNODE_CHECKING=YES
TCP.EXCLUDED_NODES=SQLSERVER1

Looks like this file gets read by the LISTENER and NOT the database! Argh!

When I restarted the listener, the customer's SQL Server host was blocked from accessing any Oracle databases on my db host. Had to troubleshoot using listener tracing, and figured out that the listener was the culprit.

So, I simply removed those lines and restarted the listener - problem gone away.

Note that the error message on the client-side is misleading, since it says that the service is not found:

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor

Off course, this is totally misleading.

Wednesday, July 24, 2013

SQL *Net more data to client

I was performance tuning a customer's production database, so naturally I ran AWR Reports.

Whenever I ran AWR reports I kept getting the event "SQL *Net more data to client" as the second highest wait event (after CPU time). When I googled it, I came across this post on AskTom:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:951335700013

In it, SDU (Session Data Unit) and MTU (Maximum Transmission Unit) are mentioned. What's the relationship between SDU and MTU ?

If you look through the articles on Google, it would seem that they've all just plagiarised each other, with the wrong information. They all incorrectly say SDU should be a multiple of MTU. This lone post on the same AskTom article has the correct info:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:951335700013#67529392482665

The author says that SDU should be a multiple of MSS (Maximum Segment Size of the network protocol in use), and NOT the MTU.

I had to search for a more authoritative source.

I looked towards Oracle Support as the holy grail of Oracle knowledge. Here I found Oracle Support document ID 274483.1 The relationship between MTU (Maximum Transmission Unit) and SDU (Session Data Unit) . It says

"The principle is that the SDU value be a multiple of the MTU."

Now, this may the source of all the wrong information on the internet.

This document attempts to summarize another doc, SQL Net Packet Sizes (SDU & TDU Parameters) Doc ID 44694.1. This second document says:

"...set the SDU size as a multiple of the MSS."

The reason why MSS is used and not MTU, is that the MTU includes two headers, which reduces the amount of data which Oracle NS (Network Substrate) can transmit per TCP packet. Only MSS determines how much data Oracle can transmit via the lower network protocols.

To calculate the MSS:

MSS = MTU - TCP header size - IP header size

For bog-standard TCP over Ethernet:

MTU = 1500 bytes

TCP = 20 bytes

IP = 20 bytes

Thus, the MSS for TCP/IP over Ethernet is 1460.

I confirmed that the customer's network has an MTU of 1500.

Given that for Oracle 10g the maximum for SDU size is 32767, what's the optimal SDU with an MSS of 1460? I've done the simple math, and it's 32120.

That's what we use for the SDU size for a simple Ethernet network.

The results are quite impressive:

Before:

Top 5 Timed Events Avg % Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time 1,203 70.5
db file sequential read 140,179 266 2 15.6 User I/O
SQL*Net more data to client 2,669,360 153 0 8.9 Network
control file parallel write 3,755 115 31 6.7 System I/O
db file scattered read 44,471 101 2 5.9 User I/O

After:

Top 5 Timed Events Avg %Total

~~~~~~~~~~~~~~~~~~ wait Call

Event Waits Time (s) (ms) Time Wait Class

------------------------------ ------------ ----------- ------ ------ ----------

CPU time 636 72.4

db file sequential read 81,221 160 2 18.3 User I/O

control file parallel write 3,794 117 31 13.3 System I/O

db file scattered read 22,473 55 2 6.2 User I/O

log file parallel write 3,648 51 14 5.8 System I/O

(Sorry about the formatting, it's like herding cats.)

The "sql net more data to client" wait is eliminated! :)

At first I thought there was something wrong with the AWR Report, maybe the wrong snapshot was chosen, so I double-checked, and ran it for different days. But, it's really gone! :)

Tuesday, July 23, 2013

Database Network Performance Tuning

I recently looked at the database performance of a production database. This post will explain the network performance parameters which I implemented as a result of this work.

Here's the best SQLNET.ORA (server) I could come up for our OLTP application:

SQLNET.AUTHENTICATION_SERVICES=(NTS)
NAMES.DIRECTORY_PATH=(TNSNAMES)
DISABLE_OOB=ON
TCP.NO_DELAY=YES
DEFAULT_SDU_SIZE=32120
USE_DEDICATED_SERVER=ON
SQLNET.EXPIRE_TIME=10

And here's the relevant part of the LISTENER.ORA:

LISTENER =

(DESCRIPTION =

(SDU = 32120) (ADDRESS = (PROTOCOL = TCP)(HOST = SERVER)(PORT = 1521)(SEND_BUF_SIZE = 65535)(RECV_BUF_SIZE = 65535))

)

Now, these settings are specifically chosen for an OLTP application, where response time is more important than throughput. I'll explain the settings and values.

SQLNET.AUTHENTICATION_SERVICES=(NTS)

Not much to say here except on the server I always use NTS because it's required by ASM, and on the Citrix server I set this variable to NONE.

NAMES.DIRECTORY_PATH=(TNSNAMES)

We use TNSNames for service resolution, so that's what we put.

DISABLE_OOB=ON

Disable Out-of-Band protocol (disables Ctrl-C). The application doesn't require OOB, so why bother with this overhead? Disable it.

Note: do not include the parameter BREAK_POLL_SKIP. With BREAK_POLL_SKIP, the Oracle client will check for a Ctrl-C, while DISABLE_OOB completely disables it. If both are used, I am not sure which takes precendence.

USE_DEDICATED_SERVER=ON

I use Dedicated Server mode for all production databases to ensure maximum performance. We have enough RAM, so why not?

SQLNET.EXPIRE_TIME=10

Closes a dead connection after 10 minutes e.g. application crashed and we need to clean up the dead connection.

TCP.NO_DELAY=YES

This is the most important parameter for OLTP apps. It tells the Oracle network software "stop messing about with buffers, just send the data back to the client ASAP!". This had a substantial performance improvement.

Sybase, in its ASE documentation, recommends this to be set on for its ASE. The relevant document can be found here:

http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.ase_15.0.sag1/html/sag1/sag1335.htm

Sybase puts it eloquently:

The tcp no delay parameter controls TCP (Transmission Control Protocol) packet batching. The default value is 1, which means that TCP packets are not batched.

TCP normally batches small logical packets into single larger physical packets (by briefly delaying packets) fill physical network frames with as much data as possible. This is intended to improve network throughput in terminal emulation environments where there are mostly keystrokes being sent across the network.

However, applications that use small TDS (Tabular Data Stream) packets may benefit from disabling TCP packet batching.

Yes, I realise it's Sybase doc and not Oracle, but the concept is the same, and their explanation is the best I could find.

The anti-thesis of this parameter is these two:

RECV_BUF_SIZE

SEND_BUF_SIZE

These two parameters set up the data buffers for the packets going back & forth. It's perfect for DSS (reporting) type applications. Put these in your SQLNET.ORA if you want your OLTP app to run slowly!

If both sets of parameters are used, I'm not sure which takes precedence.

SQLNET.ORA: DEFAULT_SDU_SIZE=32120 & SDU=3120 in Listener.ora

This is extremely important. I'll explain how I arrived at this in a separate post. For the new SDU size to take effect, you must update the SQLNET.ORA (client and server) and the LISTENER.ORA (as shown above), and restart the listener.

Tuesday, March 12, 2013

Very useful RARarchive script

set year=%date:~6,4%
set yr=%date:~8,2%
set month=%date:~3,2%
set day=%date:~0,2%
set hour=%time:~0,2%
set hour=%hour: =0%
set min=%time:~3,2%
set sec=%time:~6,2%

rar a -r -m5 DEV[%year%%month%%day%_%hour%_%min%].rar E:\Backups\DEV\*.*
forfiles /p "E:\Backups" /m "*.rar" /d -14 /c "cmd /c del @path"

Monday, February 25, 2013

Rebuild Indexes

Here's s a neat script I wrote for re-building indexes. One of the apps I look after has a requirement of having indexes stored in a tablespace called INDEXES. This is the script I wrote for doing this programmatically.

This script can be easily modified to re-build all indexes, or all non SYS/SYSTEM indexes, or just invalid indexes. It can also be wrapped in a package, and then called by a job.

The cool thing about this script is the way it uses a BULK COLLECT :).

DECLARE
sql_stmt VARCHAR2 ( 2000 ) DEFAULT NULL;

TYPE sql_stmt_array_type IS TABLE OF sql_stmt%TYPE;

sql_stmt_array sql_stmt_array_type DEFAULT NULL;

CURSOR cursor1
IS
SELECT 'ALTER INDEX ' || a.owner || '.' || a.index_name
|| ' REBUILD TABLESPACE INDEXES' sql1
FROM dba_indexes a, dba_objects b
WHERE a.owner IN
( 'SCHEMA_OWNER1'
,'SCHEMA_OWNER2')
AND a.index_type = 'NORMAL'
AND a.TEMPORARY LIKE 'N'
AND a.owner = b.owner
AND a.index_name = b.object_name
AND b.object_type = 'INDEX'
AND a.tablespace_name != 'INDEXES';
BEGIN
OPEN cursor1;

FETCH cursor1
BULK COLLECT INTO sql_stmt_array;

FOR i IN 1 .. sql_stmt_array.COUNT
LOOP
sql_stmt := sql_stmt_array ( i );

EXECUTE IMMEDIATE sql_stmt;
END LOOP;

CLOSE cursor1;
END;
/

Sunday, January 27, 2013

Performance Tuning & Recoverability Notes

Set the SQLNET.ORA (as described) on the server and client
Set the LISTENER send/receive buffers to the max size i.e. 65535
alter system set SHARED_SERVERS=0 scope=both; -- (do NOT use shared servers if you want max performance)
alter system reset DISPATCHERS scope=spfile sid='*'; -- (needed to disable use of shared servers; restart db instance for this to take effect)
alter system set CURSOR_SHARING='FORCE' scope=both; -- (force cursor re-use wherever possible)
For production systems, take no chances with block corruptions, enable the following:
- alter system set db_block_checksum=FULL scope=both;
- alter system set db_block_checking=FULL scope=both;
Check for bad blocks by using RMAN:
- backup check logical validate database; -- doesn't actually do a backup
- select * from v$DATABASE_BLOCK_CORRUPTION;
Store the database files separate from the redo logs. If possible, store the archive logs separate from the redo logs as well. Consider using a SSD volume for redo logs. If an SSD volume isn't forthcoming, use a RAID 10 volume with a thin-stripe of 128k.
Store the database files on an ASM volume. Create the ASM volume from at least two RAID 1 volumes. ASM will create a 1 MB coarse stripe across all volumes, giving excellent storage performance.
For max recoverability:
- store the control files, online redo logs, archive logs, incremental backups and a database image copy on an NTFS volume (with a 64k cluster size for max performance).
- Follow Oracle Suggested Backup strategy.
Guy Harrison also says that the TEMP tablespace could be located away from the other files since it is used for sorting, etc. which could impact performance. In reality, this is probably not necessary.

RMAN - solid backup

This isn't the Oracle Suggested Backup script, but this is my RMAN script for a good, solid RMAN backup. It will need slight tweaking if you decide to use it.

backup check logical database;

sql 'alter system archive log current';

backup archivelog all delete all input;

allocate channel for maintenance device type disk;

RUN {

crosscheck backup of database;

crosscheck backup of controlfile;

crosscheck archivelog all;}

delete force noprompt expired backup of database;

delete force noprompt expired backup of controlfile;

delete force noprompt expired archivelog all;

delete force noprompt obsolete redundancy 5;

sql "ALTER DATABASE BACKUP CONTROLFILE TO TRACE";

sql "ALTER DATABASE BACKUP CONTROLFILE TO ''D:\FRA\SBY01\SBY01_CONTROLFILE_BACKUP.CTL'' REUSE";

sql "CREATE PFILE=''C:\oracle\product\10.2.0\db_1\admin\SBY01\pfile\INITSBY01.INI'' FROM SPFILE";

exit;

How to Configure RAID 1 with LVM Volumes

This blog will describe how to enable RAID 1 using mdraid on volumes which include LVM volumes.

When Oracle Enterprise Linux is installed, the default configuration of the disks includes an LVM volume. The boot volume, /boot, remains a normal ext3 file system.

1. Initial Config

During installation, here is the default configuration:

Which gives us this:

[root@oel5-raid1-3 ~]# uname -r

2.6.32-300.10.1.el5uek

[root@oel5-raid1-3 ~]# cat /etc/enterprise-release

Enterprise Linux Enterprise Linux Server release 5.8 (Carthage)

[root@oel5-raid1-3 ~]# cat /etc/fstab

/dev/VolGroup00/LogVol00 / ext3 defaults 1 1

LABEL=/boot /boot ext3 defaults 1 2

tmpfs /dev/shm tmpfs defaults 0 0

devpts /dev/pts devpts gid=5,mode=620 0 0

sysfs /sys sysfs defaults 0 0

proc /proc proc defaults 0 0

/dev/VolGroup00/LogVol01 swap swap defaults 0 0

[root@oel5-raid1-3 ~]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/VolGroup00-LogVol00

7.7G 2.4G 5.0G 32% /

/dev/sda1 99M 24M 71M 25% /boot

tmpfs 495M 0 495M 0% /dev/shm

[root@oel5-raid1-3 ~]# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes

255 heads, 63 sectors/track, 1305 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 13 104391 83 Linux

/dev/sda2 14 1305 10377990 8e Linux LVM

Disk /dev/dm-0: 8489 MB, 8489271296 bytes

255 heads, 63 sectors/track, 1032 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/dm-0 doesn't contain a valid partition table

Disk /dev/dm-1: 2113 MB, 2113929216 bytes

255 heads, 63 sectors/track, 257 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/dm-1 doesn't contain a valid partition table

[root@oel5-raid1-3 ~]# mount | grep ext3

/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)

/dev/sda1 on /boot type ext3 (rw)

[root@oel5-raid1-3 ~]# pvdisplay

--- Physical volume ---

PV Name /dev/sda2

VG Name VolGroup00

PV Size 9.90 GB / not usable 22.76 MB

Allocatable yes (but full)

PE Size (KByte) 32768

Total PE 316

Free PE 0

Allocated PE 316

PV UUID 5aMSJb-OALl-wztg-107U-bizd-wLB6-G25RcW

[root@oel5-raid1-3 ~]# vgdisplay

--- Volume group ---

VG Name VolGroup00

System ID

Format lvm2

Metadata Areas 1

Metadata Sequence No 3

VG Access read/write

VG Status resizable

MAX LV 0

Cur LV 2

Open LV 2

Max PV 0

Cur PV 1

Act PV 1

VG Size 9.88 GB

PE Size 32.00 MB

Total PE 316

Alloc PE / Size 316 / 9.88 GB

Free PE / Size 0 / 0

VG UUID dnYd54-w9ZG-METW-V1lP-WizL-wj7A-ZxJ6SO

[root@oel5-raid1-3 ~]# lvdisplay

--- Logical volume ---

LV Name /dev/VolGroup00/LogVol00

VG Name VolGroup00

LV UUID 6ric1n-Dtgg-uK4s-09KF-EUJv-zm3B-ok6UHn

LV Write Access read/write

LV Status available

# open 1

LV Size 7.91 GB

Current LE 253

Segments 1

Allocation inherit

Read ahead sectors auto

- currently set to 256

Block device 253:0

--- Logical volume ---

LV Name /dev/VolGroup00/LogVol01

VG Name VolGroup00

LV UUID wpMR3Z-7CMn-c6Q9-GF6h-xh7g-dULW-AAVP5U

LV Write Access read/write

LV Status available

# open 1

LV Size 1.97 GB

Current LE 63

Segments 1

Allocation inherit

Read ahead sectors auto

- currently set to 256

Block device 253:1

[root@oel5-raid1-3 ~]# swapon -s

Filename Type Size Used Priority

/dev/mapper/VolGroup00-LogVol01 partition 2064376 0 -1

Basically - /boot is normal ext3, which / is an LVM volume and the swap is also on the LVM.

2. Add Second HDD

Add a second hard disk to the system.

[root@oel5-raid1-3 ~]# fdisk -l /dev/sdb

Disk /dev/sdb: 10.7 GB, 10737418240 bytes

255 heads, 63 sectors/track, 1305 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

3. Parition the second HDD

The second hard disk must have the same partition layout as the first. The easy way to do this is to use the sfdisk utility.

[root@oel5-raid1-3 ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb

Checking that no-one is using this disk right now ...

Disk /dev/sdb: 1305 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature

/dev/sdb: unrecognized partition table type

Old situation:

No partitions found

New situation:

Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System

/dev/sdb1 * 63 208844 208782 83 Linux

/dev/sdb2 208845 20964824 20755980 8e Linux LVM

/dev/sdb3 0 - 0 0 Empty

/dev/sdb4 0 - 0 0 Empty

Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)

to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1

(See fdisk(8).)

4. Modify the Secondary Disk partitions to type RAID

Use the fdisk utility to modify the partitions on the second disk to type fd (RAID):

[root@oel5-raid1-3 ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 1305.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 10.7 GB, 10737418240 bytes

255 heads, 63 sectors/track, 1305 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sdb1 * 1 13 104391 83 Linux

/dev/sdb2 14 1305 10377990 8e Linux LVM

Command (m for help): t

Partition number (1-4): 1

Hex code (type L to list codes): fd

Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t

Partition number (1-4): 2

Hex code (type L to list codes): fd

Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sdb: 10.7 GB, 10737418240 bytes

255 heads, 63 sectors/track, 1305 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sdb1 * 1 13 104391 fd Linux raid autodetect

/dev/sdb2 14 1305 10377990 fd Linux raid autodetect

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

Syncing disks.

Use the partprobe utility to update the kernel with the partition type changes:

[root@oel5-raid1-3 ~]# partprobe /dev/sdb

Verify creation of the new partitions:

[root@oel5-raid1-3 ~]# cat /proc/partitions

major minor #blocks name

8 0 10485760 sda

8 1 104391 sda1

8 2 10377990 sda2

8 16 10485760 sdb

8 17 104391 sdb1

8 18 10377990 sdb2

253 0 8290304 dm-0

253 1 2064384 dm-1

5. Create RAID 1 Arrays on the Second Disk

Let's now create the RAID 1 devices on the second disk:

[root@oel5-raid1-3 ~]# cat /proc/mdstat

Personalities :

unused devices: <none>

[root@oel5-raid1-3 ~]# mdadm --create /dev/md1 --auto=yes --level=raid1 --raid-devices=2 missing /dev/sdb1

mdadm: array /dev/md1 started.

[root@oel5-raid1-3 ~]# mdadm --create /dev/md2 --auto=yes --level=raid1 --raid-devices=2 missing /dev/sdb2

mdadm: array /dev/md2 started.

[root@oel5-raid1-3 ~]# cat /proc/mdstat

Personalities : [raid1]

md2 : active raid1 sdb2[1]

10377920 blocks [2/1] [_U]

md1 : active raid1 sdb1[1]

104320 blocks [2/1] [_U]

unused devices: <none>

Note: missing is a keyword placeholder for sda, which we will add later.

Format md1 as ext3:

[root@oel5-raid1-3 ~]# mkfs.ext3 /dev/md1

mke2fs 1.39 (29-May-2006)

Filesystem label=

OS type: Linux

Block size=1024 (log=0)

Fragment size=1024 (log=0)

26104 inodes, 104320 blocks

5216 blocks (5.00%) reserved for the super user

First data block=1

Maximum filesystem blocks=67371008

13 block groups

8192 blocks per group, 8192 fragments per group

2008 inodes per group

Superblock backups stored on blocks:

8193, 24577, 40961, 57345, 73729

Writing inode tables: done

Creating journal (4096 blocks): done

Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 38 mounts or

180 days, whichever comes first. Use tune2fs -c or -i to override.

6. Move the data from the LVM

Now, we move the data from /dev/sda2 to /dev/md2

[root@oel5-raid1-3 ~]# pvcreate /dev/md2

Writing physical volume data to disk "/dev/md2"

Physical volume "/dev/md2" successfully created

[root@oel5-raid1-3 ~]# vgextend VolGroup00 /dev/md2

Volume group "VolGroup00" successfully extended

This command start the volume migration:

[root@oel5-raid1-3 ~]# pvmove -i 2 /dev/sda2 /dev/md2

This can take a while.

Now, we remove /dev/sda2:

[root@oel5-raid1-3 ~]# vgreduce VolGroup00 /dev/sda2

Removed "/dev/sda2" from volume group "VolGroup00"

[root@oel5-raid1-3 ~]# pvremove /dev/sda2

Labels on physical volume "/dev/sda2" successfully wiped

Now, convert the /dev/sda2 to a RAID device:

[root@oel5-raid1-3 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 1305.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): t

Partition number (1-4): 2

Hex code (type L to list codes): fd

Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sda: 10.7 GB, 10737418240 bytes

255 heads, 63 sectors/track, 1305 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 13 104391 83 Linux

/dev/sda2 14 1305 10377990 fd Linux raid autodetect

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.

The kernel still uses the old table.

The new table will be used at the next reboot.

Syncing disks.

Now, add it back as a RAID member of md2:

mdadm --add /dev/md2 /dev/sda2

And monitor its progress using:

watch -n 2 cat /proc/mdstat

Every 2.0s: cat /proc/mdstat Sat Jan 26 20:04:01 2013

Personalities : [raid1]

md2 : active raid1 sda2[2] sdb2[1]

10377920 blocks [2/1] [_U]

[======>..............] recovery = 31.7% (3292288/10377920) finish=0.6min

speed=193664K/sec

md1 : active raid1 sdb1[1]

104320 blocks [2/1] [_U]

unused devices: <none>

Press Ctrl-C to exit watch once the re-build is done.

[root@oel5-raid1-3 ~]# cat /proc/mdstat

Personalities : [raid1]

md2 : active raid1 sda2[0] sdb2[1]

10377920 blocks [2/2] [UU]

md1 : active raid1 sdb1[1]

104320 blocks [2/1] [_U]

unused devices: <none>

7. Update fstab

The default /etc/fstab is:

/dev/VolGroup00/LogVol00 / ext3 defaults 1 1

LABEL=/boot /boot ext3 defaults 1 2

tmpfs /dev/shm tmpfs defaults 0 0

devpts /dev/pts devpts gid=5,mode=620 0 0

sysfs /sys sysfs defaults 0 0

proc /proc proc defaults 0 0

/dev/VolGroup00/LogVol01 swap swap defaults 0 0

We need to change it to this:

/dev/VolGroup00/LogVol00 / ext3 defaults 1 1

/dev/md1 /boot ext3 defaults 1 2

tmpfs /dev/shm tmpfs defaults 0 0

devpts /dev/pts devpts gid=5,mode=620 0 0

sysfs /sys sysfs defaults 0 0

proc /proc proc defaults 0 0

/dev/VolGroup00/LogVol01 swap swap defaults 0 0

The change is this line

/dev/md1 /boot ext3 defaults 1 2

Replace "LABEL=/boot" with "/dev/md1"

8. Update grub.conf

The default /boot/grub/grub.conf is:

# grub.conf generated by anaconda

# Note that you do not have to rerun grub after making changes to this file

# NOTICE: You have a /boot partition. This means that

# all kernel and initrd paths are relative to /boot/, eg.

# root (hd0,0)

# kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00

# initrd /initrd-version.img

#boot=/dev/sda

default=0

timeout=5

splashimage=(hd0,0)/grub/splash.xpm.gz

hiddenmenu

title Oracle Linux Server (2.6.32-300.10.1.el5uek)

root (hd0,0)

kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/VolGroup00/LogVol00 rhgb quiet numa=off

initrd /initrd-2.6.32-300.10.1.el5uek.img

title Oracle Linux Server-base (2.6.18-308.el5)

root (hd0,0)

kernel /vmlinuz-2.6.18-308.el5 ro root=/dev/VolGroup00/LogVol00 rhgb quiet numa=off

initrd /initrd-2.6.18-308.el5.img

And we change it to:

# grub.conf generated by anaconda

# Note that you do not have to rerun grub after making changes to this file

# NOTICE: You have a /boot partition. This means that

# all kernel and initrd paths are relative to /boot/, eg.

# root (hd0,0)

# kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00

# initrd /initrd-version.img

#boot=/dev/sda

default=0

fallback=1

timeout=5

splashimage=(hd0,0)/grub/splash.xpm.gz

hiddenmenu

title HDD1 (2.6.32-300.10.1.el5uek)

root (hd0,0)

kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/VolGroup00/LogVol00 rhgb quiet numa=off

initrd /initrd-2.6.32-300.10.1.el5uek.img

title HDD2 (2.6.32-300.10.1.el5uek)

root (hd1,0)

kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/VolGroup00/LogVol00 rhgb quiet numa=off

initrd /initrd-2.6.32-300.10.1.el5uek.img

The key changes are:

(i) the addition of the fallback parameter

(ii) the addition of the second title, and its respective attributes.

I also updated the titles to reflect the device from which the system is being booted.

The default parameter is important too. It indicates the title from which the system will boot by default. If Grub can not find a valid /boot partition (e.g. in case of disk failure), then Grub will attempt to boot from the title indicated by fallback.

9. Re-create Initial RAMDisk:

[root@oel5-raid1-3 ~]# cd /boot

[root@oel5-raid1-3 boot]# ll initrd*

-rw------- 1 root root 4372497 Jan 26 19:21 initrd-2.6.18-308.el5.img

-rw------- 1 root root 3934645 Jan 26 19:21 initrd-2.6.32-300.10.1.el5uek.img

[root@oel5-raid1-3 boot]# uname -r

2.6.32-300.10.1.el5uek

[root@oel5-raid1-3 boot]# mkinitrd -f -v initrd-2.6.32-300.10.1.el5uek.img 2.6.32-300.10.1.el5uek

The mkinitrd command takes the format of:

mkinitrd -v -f initrd-'uname-r'.img 'uname -r'

That's why it's important to grab uname -r.

10. Copy /Boot

[root@oel5-raid1-3 boot]# mkdir /mnt/boot.md1

[root@oel5-raid1-3 boot]# mount /dev/md1 /mnt/boot.md1

[root@oel5-raid1-3 boot]# cp -dpRxu /boot/* /mnt/boot.md1

This stage has to be done before the next (installing grub on both disks).

11. Install Grub on BOTH disks

It is very important to install Grub on BOTH disks!

[root@oel5-raid1-3 boot]# grub

Probing devices to guess BIOS drives. This may take a long time.

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB

lists possible command completions. Anywhere else TAB lists the possible

completions of a device/filename.]

grub> root (hd0,0)

root (hd0,0)

Filesystem type is ext2fs, partition type 0x83

grub> setup (hd0)

setup (hd0)

Checking if "/boot/grub/stage1" exists... no

Checking if "/grub/stage1" exists... yes

Checking if "/grub/stage2" exists... yes

Checking if "/grub/e2fs_stage1_5" exists... yes

Running "embed /grub/e2fs_stage1_5 (hd0)"... 15 sectors are embedded.

succeeded

Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded

Done.

grub> root (hd1,0)

root (hd1,0)

Filesystem type is ext2fs, partition type 0xfd

grub> setup (hd1)

setup (hd1)

Checking if "/boot/grub/stage1" exists... no

Checking if "/grub/stage1" exists... yes

Checking if "/grub/stage2" exists... yes

Checking if "/grub/e2fs_stage1_5" exists... yes

Running "embed /grub/e2fs_stage1_5 (hd1)"... 15 sectors are embedded.

succeeded

Running "install /grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded

Done.

grub> quit

quit

12. Reboot

[root@oel5-raid1-3 boot]# reboot

13. Add /dev/sda to /dev/md1

[root@oel5-raid1-3 ~]# mount | grep ext3

/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)

/dev/md1 on /boot type ext3 (rw)

So /dev/sda isn't mounted...

[root@oel5-raid1-3 ~]# cat /proc/mdstat

Personalities : [raid1]

md1 : active raid1 sdb1[1]

104320 blocks [2/1] [_U]

md2 : active raid1 sdb2[1] sda2[0]

10377920 blocks [2/2] [UU]

unused devices: <none>

And not used by /dev/md1...

So...

[root@oel5-raid1-3 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 1305.

There is nothing wrong with that, but this is larger than 1024,

and could in certain setups cause problems with:

1) software that runs at boot time (e.g., old versions of LILO)

2) booting and partitioning software from other OSs

(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 10.7 GB, 10737418240 bytes

255 heads, 63 sectors/track, 1305 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 13 104391 83 Linux

/dev/sda2 14 1305 10377990 fd Linux raid autodetect

Command (m for help): t

Partition number (1-4): 1

Hex code (type L to list codes): fd

Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sda: 10.7 GB, 10737418240 bytes

255 heads, 63 sectors/track, 1305 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 13 104391 fd Linux raid autodetect

/dev/sda2 14 1305 10377990 fd Linux raid autodetect

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.

The kernel still uses the old table.

The new table will be used at the next reboot.

Syncing disks.

[Note the change of partition type, from 'Linux' to 'Linux raid autodetect']

[root@oel5-raid1-3 ~]# partprobe /dev/sda

[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md1 /dev/sda1

mdadm: added /dev/sda1

[root@oel5-raid1-3 ~]# cat /proc/mdstat

Personalities : [raid1]

md1 : active raid1 sda1[0] sdb1[1]

104320 blocks [2/2] [UU]

md2 : active raid1 sdb2[1] sda2[0]

10377920 blocks [2/2] [UU]

unused devices: <none>

14. Recreate the initial ram disk:

[root@oel5-raid1-3 ~]# cd /boot

[root@oel5-raid1-3 boot]# uname -r

2.6.32-300.10.1.el5uek

[root@oel5-raid1-3 boot]# mkinitrd -v -f initrd-2.6.32-300.10.1.el5uek.img 2.6.32-300.10.1.el5uek

Creating initramfs

15. Testing

To simulate loss of sdb, here we execute a software fault:

[root@oel5-raid1-3 ~]# mdadm --manage /dev/md1 --fail /dev/sdb1

mdadm: set /dev/sdb1 faulty in /dev/md1

[root@oel5-raid1-3 ~]# mdadm --manage /dev/md2 --fail /dev/sdb2

mdadm: set /dev/sdb2 faulty in /dev/md2

[root@oel5-raid1-3 ~]# mdadm --manage /dev/md1 --remove /dev/sdb1

mdadm: hot removed /dev/sdb1

[root@oel5-raid1-3 ~]# mdadm --manage /dev/md2 --remove /dev/sdb2

mdadm: hot removed /dev/sdb2

Shutdown the server, replace /dev/sdb. Start it up, and check the status of the raid:

[root@oel5-raid1-3 ~]# cat /proc/mdstat

Personalities : [raid1]

md1 : active raid1 sda1[0]

104320 blocks [2/1] [U_]

md2 : active raid1 sda2[0]

10377920 blocks [2/1] [U_]

unused devices: <none>

And what's the status of the new hard disk:

[root@oel5-raid1-3 ~]# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes

255 heads, 63 sectors/track, 1305 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

/dev/sda1 * 1 13 104391 fd Linux raid autodetect

/dev/sda2 14 1305 10377990 fd Linux raid autodetect

Disk /dev/sdb: 10.7 GB, 10737418240 bytes

255 heads, 63 sectors/track, 1305 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/md2: 10.6 GB, 10626990080 bytes

2 heads, 4 sectors/track, 2594480 cylinders

Units = cylinders of 8 * 512 = 4096 bytes

...

Copy the partition layout to the new disk:

[root@oel5-raid1-3 ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb

Checking that no-one is using this disk right now ...

Disk /dev/sdb: 1305 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature

/dev/sdb: unrecognized partition table type

Old situation:

No partitions found

New situation:

Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System

/dev/sdb1 * 63 208844 208782 fd Linux raid autodetect

/dev/sdb2 208845 20964824 20755980 fd Linux raid autodetect

/dev/sdb3 0 - 0 0 Empty

/dev/sdb4 0 - 0 0 Empty

Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)

to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1

(See fdisk(8).)

Clear any remnants of a previous RAID device on the new disk:

[root@oel5-raid1-3 ~]# mdadm --zero-superblock /dev/sdb1

mdadm: Unrecognised md component device - /dev/sdb1

[root@oel5-raid1-3 ~]# mdadm --zero-superblock /dev/sdb2

mdadm: Unrecognised md component device - /dev/sdb2

OK, now add the partitions to the respective md devices:

[root@oel5-raid1-3 ~]# mdadm --add /dev/md1 /dev/sdb1

mdadm: added /dev/sdb1

[root@oel5-raid1-3 ~]# mdadm --add /dev/md2 /dev/sdb2

mdadm: added /dev/sdb2

watch -n 2 cat /proc/mdstat

Wait for the re-synchronisation to complete, press Ctrl-C to exit.

Re-install grub on BOTH hard drives:

[root@oel5-raid1-3 ~]# grub

Probing devices to guess BIOS drives. This may take a long time.

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB

lists possible command completions. Anywhere else TAB lists the possible

completions of a device/filename.]

grub> root (hd0,0)

root (hd0,0)

Filesystem type is ext2fs, partition type 0xfd

grub> setup (hd0)

setup (hd0)

Checking if "/boot/grub/stage1" exists... no

Checking if "/grub/stage1" exists... yes

Checking if "/grub/stage2" exists... yes

Checking if "/grub/e2fs_stage1_5" exists... yes

Running "embed /grub/e2fs_stage1_5 (hd0)"... 15 sectors are embedded.

succeeded

Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded

Done.

grub> root (hd1,0)

root (hd1,0)

Filesystem type is ext2fs, partition type 0xfd

grub> setup (hd1)

setup (hd1)

Checking if "/boot/grub/stage1" exists... no

Checking if "/grub/stage1" exists... yes

Checking if "/grub/stage2" exists... yes

Checking if "/grub/e2fs_stage1_5" exists... yes

Running "embed /grub/e2fs_stage1_5 (hd1)"... 15 sectors are embedded.

succeeded

Running "install /grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded

Done.

grub> quit

quit

And that's how you replace a disk on md raid!

Saturday, January 26, 2013

How to Configure Oracle Enterprise Linux to be Highly Available Using RAID1

This was published by Oracle, and I'm very sorry for re-publishing it, but the author made some mistakes. I had lots of 'fun' with grub & linux rescue when I followed the original instructions.

I've cleaned it up, made some relevant changes and tested it with OEL 5 for my dear blog readers.

In this Document
Goal
Solution
1. Original System Configuration
2. Add Second Hard Disk
3. Partition Second Hard Disk
4. Modify Secondary Disk Partitions to Type RAID
5. Create RAID1 Arrays on Second Disk
6. Taking a Closer Look at RAID Devices
7. RAID Configuration
8. Create Filesystems/Swap Devices on RAID devices
9. Backup Current System Configuration
10. Mount Filesystems on RAID Devices
11. Optionally mount/swapon filesystems/swap device on RAID devices as their non-RAID devices
12. Modify fstab to Use RAID Devices
13. Add Failback Title to grub.conf
14. Remake Initial RAM Disk (One of Two)
15. Copy Contents of Non-RAID filesystems to RAID filesystems
16. Install/Reintall GRUB
17. Reboot the System (Degraded Array)
18. Modify Primary Disk Partitions to Type RAID
19. Add Primary Disk Partitions to RAID Arrays
20. Modify grub.conf
21. Remake Initial RAM Disk (Two of Two)
22. Testing
22.1 Test - persistent mount on degraded array (/dev/sdb software failed))
22.2 Test - boot into degraded array (/dev/sdb software removed)
22.3 Test - boot into degraded array (/dev/sdb software removed)
22.4 Test - boot into degraded array (/dev/sda physically removed)

Applies to:

Linux Kernel - Version: 2.4 to 2.6
Linux Itanium
Linux x86-64

Goal

This article describes how to configure an Oracle Enterprise Linux System to be highly available using RAID1.

The article is intended for Linux System Administrators.

Solution

RAID (Redundant Array of Inexpensive Disks) defines the use of multiple hard disks by systems to provide increased diskspace, performance and availability. This article solely focuses on implementing RAID1, commonly referred to as mirror disk, whereby two (or more) disks contain identical content. System availability and data integrity is maintained as long as at least one disk survives a failure.

Although using working examples from Oracle Enterprise Linux 5 (OEL5), the article similarly applies to other Linux distributions and versions.

Before proceeding, take a complete backup of the system.

1. Original System Configuration

This document assumes that LVM is not used for storage management. When I started to follow this document, I did a default installation of OEL. Unfortunately, by default, it used LVM for volume management, so I couldn't use mdadm for RAID. So I re-installed the box and configured the storage as plain - no LVM, then I could follow these instructions.

I assume your storage is configured similarly as shown here:

And that GRUB is installed:

Prior to implementing RAID, the system comprised the following, simple configuration:

[root@oel5-raid1-3 ~]# uname -a
Linux oel5-raid1-3 2.6.32-300.10.1.el5uek #1 SMP Wed Feb 22 17:37:40 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

[root@oel5-raid1-3 ~]# cat /etc/enterprise-release
Enterprise Linux Enterprise Linux Server release 5.8 (Carthage)

[root@oel5-raid1-3 ~]# fdisk -l

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 2479 19808145 83 Linux
/dev/sda3 2480 2610 1052257+ 82 Linux swap / Solaris

[root@oel5-raid1-3 ~]# blkid
/dev/sda3: LABEL="SWAP-sda3" TYPE="swap"
/dev/sda2: LABEL="/" UUID="aedde157-1fe3-45e3-b538-8dc1193b0430" TYPE="ext3"
/dev/sda1: LABEL="/boot" UUID="e4860f1d-d717-4efd-b91e-af4d8eb05421" TYPE="ext3"

[root@oel5-raid1-3 ~]# cat /etc/fstab
LABEL=/ / ext3 defaults 1 1
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
LABEL=SWAP-sda3 swap swap defaults 0 0

[root@oel5-raid1-3 ~]# mount
/dev/sda2 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)

[root@oel5-raid1-3 ~]# swapon -s
Filename Type Size Used Priority
/dev/sda3 partition 1052248 0 -1

2. Add Second Hard Disk

A second hard disk is added to the system. Ideally, the second disk should be exactly the same (make and model) as the first. To help avoid single point of failure, add additional disks to a separate disk controller than that used by the first disk.

Our second hard disk is /dev/sdb.

[root@oel5-raid1-3 ~]# fdisk -l /dev/sdb

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn't contain a valid partition table

3. Partition Second Hard Disk

The second disk must contain the same configuration (partition layout) as that of the first disk. Disk partitioning can be performed manully using the fdisk(8) utility, however the the sfdisk(8) utility can be used to quickly and easily replicate the partition table from the first disk e.g.:

[root@oel5-raid1-3 ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 2610 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an msdos signature
/dev/sdb: unrecognized partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/sdb1 * 63 208844 208782 83 Linux
/dev/sdb2 208845 39825134 39616290 83 Linux
/dev/sdb3 39825135 41929649 2104515 82 Linux swap / Solaris
/dev/sdb4 0 - 0 0 Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

4. Modify Secondary Disk Partitions to Type RAID

Use the fdisk(8) utility to modify the second disk partitions from type 83/82 (linux/swap) to fd (raid) e.g.:

[root@oel5-raid1-3 ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 83 Linux
/dev/sdb2 14 2479 19808145 83 Linux
/dev/sdb3 2480 2610 1052257+ 82 Linux swap / Solaris

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 fd Linux raid autodetect
/dev/sdb2 14 2479 19808145 fd Linux raid autodetect
/dev/sdb3 2480 2610 1052257+ fd Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Use the partprobe(8) or sfdisk(8) utility to update the kernel with the partition type changes e.g.:

[root@oel5-raid1-3 ~]# partprobe /dev/sdb

Verify creation of the new partitions on the second disk e.g.:

[root@oel5-raid1-3 ~]# cat /proc/partitions
major minor #blocks name

8 0 20971520 sda
8 1 104391 sda1
8 2 19808145 sda2
8 3 1052257 sda3
8 16 20971520 sdb
8 17 104391 sdb1
8 18 19808145 sdb2
8 19 1052257 sdb3

5. Create RAID1 Arrays on Second Disk

Use the mdadm(8) utility to create a raid1 array on the second disk only e.g.:

[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities :
unused devices: <none>

[root@oel5-raid1-3 ~]# mdadm --create /dev/md1 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb1 missing
mdadm: array /dev/md1 started.
[root@oel5-raid1-3 ~]# mdadm --create /dev/md2 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb2 missing
mdadm: array /dev/md2 started.
[root@oel5-raid1-3 ~]# mdadm --create /dev/md3 --auto=yes --level=raid1 --raid-devices=2 /dev/sdb3 missing
mdadm: array /dev/md3 started.
[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb3[0]
1052160 blocks [2/1] [U_]

md2 : active raid1 sdb2[0]
19808064 blocks [2/1] [U_]

md1 : active raid1 sdb1[0]
104320 blocks [2/1] [U_]

unused devices: <none>

In the example above, raid devices are created using the same numbering as the device partitions they include e.g. /dev/md1 contains /dev/sdb1 and device /dev/sda1 will be added later. The term missing is used as a stub or placeholder that will eventually be replaced with the partitions on the first disk; /dev/sda1, /dev/sda2, /dev/sda3.

Check /proc/mdstat to verify successful creation of the raid devices e.g.:

[root@oel5-raid1-3 ~]# cat /proc/partitions
major minor #blocks name

8 0 20971520 sda
8 1 104391 sda1
8 2 19808145 sda2
8 3 1052257 sda3
8 16 20971520 sdb
8 17 104391 sdb1
8 18 19808145 sdb2
8 19 1052257 sdb3
9 1 104320 md1
9 2 19808064 md2
9 3 1052160 md3

6. Taking a Closer Look at RAID Devices

Use the mdadm(8) utility to review raid devices in detail e.g.:

[root@oel5-raid1-3 ~]# mdadm --query --detail /dev/md1
/dev/md1:
Version : 0.90
Creation Time : Fri Jan 25 23:27:05 2013
Raid Level : raid1
Array Size : 104320 (101.89 MiB 106.82 MB)
Used Dev Size : 104320 (101.89 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Fri Jan 25 23:27:05 2013
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : 98d72fb6:ff5130f8:647d100f:9acf7e1c
Events : 0.1

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 0 0 1 removed

Note that raid device /dev/md1 solely contains one disk member /dev/sdb1 at this point. The state of the array is clean,degraded denoting only one (of two) underlying disk members is currently active and working.

7. RAID Configuration

This step I skipped, but I leave the instructions for reference.

Strictly speaking a master RAID configuration is not required. With relevant partitions being marked as type raid (fd), the kernel will auto assemble detected arrays on boot. If desired, one can create RAID configuration file /etc/mdadm.conf or /etc/mdadm/mdadm.conf as a reference to raid device usage e.g.:

# mkdir /etc/mdadm/
# echo "DEVICE /dev/hd*[0-9] /dev/sd*[0-9]" >> /etc/mdadm/mdadm.conf
# mdadm --detail --scan >> /etc/mdadm/mdadm.conf
# ln -s /etc/mdadm/mdadm.conf /etc/mdadm.conf

# cat /etc/mdadm/mdadm.conf
DEVICE /dev/hd*[0-9] /dev/sd*[0-9]
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=a4d5007d:6974901a:637e5622:e5b514c9
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=0e8ce9c6:bd42917d:fd3412bf:01f49095
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=7d696890:890b2eb7:c17bf4e4:d542ba99

The DEVICE filter is used to limit candidate RAID devices being created or added as RAID disk members.

8. Create Filesystems/Swap Devices on RAID devices

Once Created, RAID devices are usable just like any other device. Use the mkfs.ext3(8) or mke2fs(8) and mkswap(8) commands to create EXT3 filesystems and swap device on RAID devices e.g.:

[root@oel5-raid1-3 ~]# mkfs.ext3 -L boot.md1 /dev/md1
mke2fs 1.39 (29-May-2006)
Filesystem label=boot.md1
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
26104 inodes, 104320 blocks
5216 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
13 block groups
8192 blocks per group, 8192 fragments per group
2008 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729

Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 25 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

[root@oel5-raid1-3 ~]# mkfs.ext3 -L root.md2 /dev/md2
mke2fs 1.39 (29-May-2006)
Filesystem label=root.md2
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
2480640 inodes, 4952016 blocks
247600 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
152 block groups
32768 blocks per group, 32768 fragments per group
16320 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

[root@oel5-raid1-3 ~]# mkswap -L swap.md3 /dev/md3
Setting up swapspace version 1, size = 1077407 kB
LABEL=swap.md3, no uuid

[root@oel5-raid1-3 ~]# blkid
/dev/sda3: LABEL="SWAP-sda3" TYPE="swap"
/dev/sda2: LABEL="/" UUID="aedde157-1fe3-45e3-b538-8dc1193b0430" TYPE="ext3"
/dev/sda1: LABEL="/boot" UUID="e4860f1d-d717-4efd-b91e-af4d8eb05421" TYPE="ext3"
/dev/sdb1: LABEL="boot.md1" UUID="56413220-d015-4f29-892d-2e4892fa355e" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb2: LABEL="root.md2" UUID="ee39c18d-3be3-4047-9dfb-55d8b49f70d4" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb3: TYPE="swap" LABEL="swap.md3"
/dev/md1: LABEL="boot.md1" UUID="56413220-d015-4f29-892d-2e4892fa355e" SEC_TYPE="ext2" TYPE="ext3"
/dev/md2: LABEL="root.md2" UUID="ee39c18d-3be3-4047-9dfb-55d8b49f70d4" SEC_TYPE="ext2" TYPE="ext3"
/dev/md3: TYPE="swap" LABEL="swap.md3"

To avoid confusion later, labels added to filesystems and swap device denote the RAID device on which they are created.

9. Backup Current System Configuration

Beyond this point, significant changes are made to the current system, therefore take a backup of the core system configuration e.g.:

[root@oel5-raid1-3 ~]# cp /etc/fstab /etc/fstab.orig
[root@oel5-raid1-3 ~]# cp /boot/grub/grub.conf /boot/grub/grub.conf.orig
[root@oel5-raid1-3 ~]# mkdir /boot.orig
[root@oel5-raid1-3 ~]# sync
[root@oel5-raid1-3 ~]# cp -dpRxu /boot/* /boot.orig/

10. Mount Filesystems on RAID Devices

Mount the raided filesystems e.g.

[root@oel5-raid1-3 ~]# mkdir /boot.md1
[root@oel5-raid1-3 ~]# mount -t ext3 /dev/md1 /boot.md1
[root@oel5-raid1-3 ~]# mount | grep boot
/dev/sda1 on /boot type ext3 (rw)
/dev/md1 on /boot.md1 type ext3 (rw)
[root@oel5-raid1-3 ~]# mkdir /root.md2
[root@oel5-raid1-3 ~]# mount -t ext3 /dev/md2 /root.md2

11. Optionally mount/swapon filesystems/swap device on RAID devices as their non-RAID devices

Optionally test mount/swapon of filesystems on raided devices as their currently mounted non-raided counterparts e.g.:

[root@oel5-raid1-3 ~]# swapon -s
Filename Type Size Used Priority
/dev/sda3 partition 1052248 0 -1
[root@oel5-raid1-3 ~]# swapoff /dev/sda3
[root@oel5-raid1-3 ~]# swapon /dev/md3
[root@oel5-raid1-3 ~]# swapon -s
Filename Type Size Used Priority
/dev/md3 partition 1052152 0 -1

Note: it is not possible to unmount/remount the root filesystem (/dev/sda2) as it's currently in use.

12. Modify fstab to Use RAID Devices

Modify the /etc/fstab file to mount/swapon raided devices on system boot.
Substitute relevant LABEL= or /dev/sdaN entries with their corresponding /dev/mdN devices e.g.:

[root@oel5-raid1-3 ~]# vi /etc/fstab
[root@oel5-raid1-3 ~]# cat /etc/fstab
/dev/md2 / ext3 defaults 1 1
/dev/md1 /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/md3 swap swap defaults 0 0

13. Add Fallback Title to grub.conf

A failback title allows the system to boot the system using one title and fallback to another should any issues occur when booting with the first. This is particularly helpful in that without a failback title, the system may fail to boot and a linux rescue may be needed to restore/recover the system.

Original /boot/grub/grub.conf:

[root@oel5-raid1-3 ~]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/sda2
# initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
root (hd0,0)
kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=LABEL=/ rhgb quiet numa=off
initrd /initrd-2.6.32-300.10.1.el5uek.img
title Oracle Linux Server-base (2.6.18-308.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-308.el5 ro root=LABEL=/ rhgb quiet numa=off
initrd /initrd-2.6.18-308.el5.img

Modify the original /boot/grub/grub.conf file by adding the failback parameter and a failback grub boot title e.g.:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/sda2
# initrd /initrd-version.img
#boot=/dev/sda
default=0
fallback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
root (hd1,0)
kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/md2 rhgb quiet numa=off
initrd /initrd-2.6.32-300.10.1.el5uek.img
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
root (hd0,0)
kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=LABEL=/ rhgb quiet numa=off
initrd /initrd-2.6.32-300.10.1.el5uek.img

In the example above, the system is configured to boot using the first boot title (default=0) i.e. the one with /boot on the first partition of the second grub disk device (hd1,0) and specifying the root filesystem on raid device /dev/md2. Should that fail to boot, the system will failback (failback=1) to boot from the second boot title i.e. the one specifying the /boot filesystem on the first partition of the first grub device (hd0,0) and and specifying the root filesystem with label /1. Note that grub boot title numbering starts from zero (0).

14. Remake Initial RAM Disk (One of Two)

Use the mkinitrd(4) utility to recreate the initial ram disk. The initial ram disk must be rebuilt with raid module support to ensure the system has the required drivers to boot from raided devices e.g.:

# mv initrd-`uname -r`.img initrd-`uname -r`.img.orig
# mkinitrd -v -f initrd-2.6.18-92.el5.img `uname -r`

These commands are based on these two above. The exact commands depends on your kernel version.

[root@oel5-raid1-3 ~]# cd /boot
[root@oel5-raid1-3 boot]# mv initrd-2.6.32-300.10.1.el5uek.img initrd-2.6.32-300.10.1.el5uek.img.orig
[root@oel5-raid1-3 boot]# uname -r
2.6.32-300.10.1.el5uek
[root@oel5-raid1-3 boot]# mkinitrd -v -f initrd-2.6.32-300.10.1.el5uek.img 2.6.32-300.10.1.el5uek
Creating initramfs
Modulefile is /etc/modprobe.conf
Looking for deps of module ehci-hcd
Looking for deps of module ohci-hcd
Looking for deps of module uhci-hcd
Looking for deps of module ext3
Found RAID component md2
Looking for deps of module raid1
Looking for driver for device sdb2
Looking for deps of module scsi:t-0x00
Looking for deps of module pci:v00001000d00000030sv000015ADsd00001976bc01sc00i00: scsi_transport_spi mptbase mptscsih mptspi
Looking for deps of module scsi_transport_spi
Looking for deps of module mptbase
Looking for deps of module mptscsih: mptbase
Looking for deps of module mptspi: scsi_transport_spi mptbase mptscsih
Found RAID component md3
...

[root@oel5-raid1-3 boot]# ll initrd*
-rw------- 1 root root 3558458 Jan 25 15:25 initrd-2.6.18-308.el5.img
-rw------- 1 root root 3067614 Jan 26 00:16 initrd-2.6.32-300.10.1.el5uek.img
-rw------- 1 root root 3144291 Jan 25 15:24 initrd-2.6.32-300.10.1.el5uek.img.orig

Note: another mkinitrd will be required again later after /dev/sdaN partitions are added to the arrays.

15. Copy Contents of Non-RAID filesystems to RAID filesystems

If the raided filesystems were unmounted earlier, remount them as described in Step 10.
Copy the contents of non-raided filesystems (/boot on /dev/sda1, / on /dev/sda2) to their corresponding filesystems on raided devices (/boot.md1 on /dev/md1, /root.md2 on /dev/md2) e.g.:

[root@oel5-raid1-3 boot]# sync
[root@oel5-raid1-3 boot]# cp -dpRxu /boot/* /boot.md1
[root@oel5-raid1-3 boot]# sync
[root@oel5-raid1-3 boot]# cp -dpRxu / /root.md2

Note: there is no need to copy the contents of the swap device. The non-raided swap device (/dev/sda3) will be swapped-off on system shutdown and the raided swap device (/dev/md3) swapped-on on reboot.

16. Install/Reintall GRUB

To cater for the situation where one or other raid disk member is either unavailable, unusable or missing, GRUB [Grand Unified Boot Loader] must be installed to the boot sector (MBR) of every raid disk member participating in an array i.e. /dev/sda, /dev/sdb. Use the grub(8) utility to install grub on the second grub disk (hd1) [/dev/sdb], currently the sole raid disk member e.g.:

[root@oel5-raid1-3 boot]# grub
Probing devices to guess BIOS drives. This may take a long time.

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]
grub> root (hd1,0)
root (hd1,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd1)
setup (hd1)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd1)"... 15 sectors are embedded.
succeeded
Running "install /grub/stage1 (hd1) (hd1)1+15 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> root (hd0,0)
root (hd0,0)
Filesystem type is ext2fs, partition type 0x83
grub> setup (hd0)
setup (hd0)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd0)"... 15 sectors are embedded.
succeeded
Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> quit
quit

The instructions above install Grub on both disks - this is important.

The reference to (hd0,0) in /boot/grub/grub.conf is a grub disk reference that refers to disk 1 partition 1, which in this instance is /dev/sda1, the partition that houses the non-raided /boot filesystem. Grub always references disks as (hdN) regardless whether of type IDE or SCSI. At installation time, grub builds and stores a map of disk devices in file /boot/grub/device.map. As the system was initially installed with only one disk present (/dev/sda), the contents of /boot/grub/device.map appears as follows:

# cat /boot/grub/device.map
# this device map was generated by anaconda
(hd0) /dev/sda

Had the /boot filesystem been installed on /dev/sda3 ,say, grub references in /boot/grub/grub.conf would have been (hd0,2). Grub disk and partition numbering starts from zero (0), whereas partition table disk entries start with 'a' e.g. /dev/hda (IDE), /dev/sda (SCSI) and partiton table numbering starts from 1 e.g. /dev/hda1, /dev/sda1.

If there is any confusion regarding grub discovered devices, grub itself may be used to detect or list available devices e.g.:
# grub
Probing devices to guess BIOS drives. This may take a long time.

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]

grub> root (hd<tab key>
Possible disks are: hd0 hd1

17. Reboot the System (Degraded Array)

Reboot the system. As a precaution, be sure to have your operating system installation/rescue media on hand. During boot up, review console messages to determine which device is used to boot the system i.e. /dev/md1 {/dev/sdb} or failback device /dev/sda1. All going well, the system will be using the raid devices albeit in a degraded state i.e. all arrays still only contain one disk member (/dev/sdbN).

[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[0]
104320 blocks [2/1] [U_]

md3 : active raid1 sdb3[0]
1052160 blocks [2/1] [U_]

md2 : active raid1 sdb2[0]
19808064 blocks [2/1] [U_]

unused devices: <none>
[root@oel5-raid1-3 ~]# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)
[root@oel5-raid1-3 ~]# swapon -s
Filename Type Size Used Priority
/dev/md3 partition 1052152 0 -1

Further verify that no /dev/sdaN partitions are used e.g.:

[root@oel5-raid1-3 ~]# mount | grep sda
[root@oel5-raid1-3 ~]#

If you did not add a failback title as described in step 13 and experienced booting issues, perform a linux rescue to restore/recover the system.

18. Modify Primary Disk Partitions to Type RAID

In preparation for adding /dev/sdaN partitions to their respective arrays, use the fdisk(8) utility to modify the primary disk partitions from type 83/82 (linux/swap) to fd (raid) e.g.:

[root@oel5-raid1-3 ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 2610.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 2479 19808145 83 Linux
/dev/sda3 2480 2610 1052257+ 82 Linux swap / Solaris

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 fd Linux raid autodetect
/dev/sda2 14 2479 19808145 fd Linux raid autodetect
/dev/sda3 2480 2610 1052257+ fd Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

[root@oel5-raid1-3 ~]# fdisk -l /dev/sda

Disk /dev/sda: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 fd Linux raid autodetect
/dev/sda2 14 2479 19808145 fd Linux raid autodetect
/dev/sda3 2480 2610 1052257+ fd Linux raid autodetect

Use the partprobe(8) or sfdisk(8) utility to update the kernel with the partition type changes e.g.:

[root@oel5-raid1-3 ~]# partprobe /dev/sda

19. Add Primary Disk Partitions to RAID Arrays

Once the system has successfully booted using raid (i.e. the secondary disk), use the mdadm(8) utility to add the primary disk partitions to their respective arrays. All data on /dev/sdaN partitions will be destroyed in the process.

[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md2 /dev/sda2
mdadm: added /dev/sda2
[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md1 /dev/sda1
mdadm: added /dev/sda1
[root@oel5-raid1-3 ~]# mdadm --manage --add /dev/md3 /dev/sda3
mdadm: added /dev/sda3

[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[2] sdb1[0]
104320 blocks [2/1] [U_]
resync=DELAYED

md3 : active raid1 sda3[2] sdb3[0]
1052160 blocks [2/1] [U_]
resync=DELAYED

md2 : active raid1 sda2[2] sdb2[0]
19808064 blocks [2/1] [U_]
[=======>.............] recovery = 35.7% (7073472/19808064) finish=1.1min speed=175929K/sec

unused devices: <none>

Depending on the size of partitions/disks used, data synchronisation between raid disk members may take a long time. Use the watch(1) command to monitor disk synchronisation progress e.g.:

[root@oel5-raid1-3 ~]# watch -n 5 cat /proc/mdstat

...

Once complete, /proc/mdstat should denote clean and consistent arrays each with two active, working members e.g.:

[root@oel5-raid1-3 ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1] sdb1[0]
104320 blocks [2/2] [UU]

md3 : active raid1 sda3[1] sdb3[0]
1052160 blocks [2/2] [UU]

md2 : active raid1 sda2[1] sdb2[0]
19808064 blocks [2/2] [UU]

unused devices: <none>

20. Modify grub.conf

Once all /dev/sdaN partitions are added as disk members of their respective arrays, modify the /boot/grub/grub.conf file. Substitiute the previous reference to LABEL=/ in the second boot title with raid device /dev/md2 e.g.:

[root@oel5-raid1-3 ~]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/sda2
# initrd /initrd-version.img
#boot=/dev/sda
default=0
fallback=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
root (hd1,0)
kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/md2 rhgb quiet numa=off
initrd /initrd-2.6.32-300.10.1.el5uek.img
title Oracle Linux Server (2.6.32-300.10.1.el5uek)
root (hd0,0)
kernel /vmlinuz-2.6.32-300.10.1.el5uek ro root=/dev/md2 rhgb quiet numa=off
initrd /initrd-2.6.32-300.10.1.el5uek.img

21. Remake Initial RAM Disk (Two of Two)

Use the mkinitrd(4) utility to recreate the initial ram disk (again) e.g.:

[root@oel5-raid1-3 ~]# cd /boot
[root@oel5-raid1-3 boot]# mv initrd-2.6.32-300.10.1.el5uek.img initrd-2.6.32-300.10.1.el5uek.img.orig
[root@oel5-raid1-3 boot]# uname -r
2.6.32-300.10.1.el5uek
[root@oel5-raid1-3 boot]# mkinitrd -v -f initrd-2.6.32-300.10.1.el5uek.img 2.6.32-300.10.1.el5uek
Creating initramfs
Modulefile is /etc/modprobe.conf
Looking for deps of module ehci-hcd
Looking for deps of module ohci-hcd
Looking for deps of module uhci-hcd
Looking for deps of module ext3
Found RAID component md2
Looking for deps of module raid1
Looking for driver for device sdb2
Looking for deps of module scsi:t-0x00
Looking for deps of module pci:v00001000d00000030sv000015ADsd00001976bc01sc00i00: scsi_transport_spi mptbase mptscsih mptspi
Looking for deps of module scsi_transport_spi
Looking for deps of module mptbase
Looking for deps of module mptscsih: mptbase
Looking for deps of module mptspi: scsi_transport_spi mptbase mptscsih
Found RAID component md3
...

At this point, the system is now up and running using raid1 devices for /, /boot filesystems and swap device.

22. Testing

This section I did not test exactly as described - YMMV.

Before relying on the newly configured system, test the system for proper operation and increased availablility.

Suggested testing includes:
boot from alternate boot title (clean array)
persistent mount on degraded array (/dev/sdb software failed)
boot into degraded array (/dev/sdb software removed)
boot into degraded array (/dev/sda physically removed)
Primary diagnostics to monitor during testing include:
console messages
dmesg
/proc/mdstat
mdadm --query --detail <md dev>

22.1 Test - persistent mount on degraded array (/dev/sdb software failed))

As part of configuring the system to use raid, you will have already tested booting the system from the second disk i.e. /dev/md1 {/dev/sdb1}. For this test, modify the /boot/grub/grub.conf to boot the system using the first disk, /dev/md1 {/dev/sda1} i.e.:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You have a /boot partition. This means that
# all kernel and initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel /vmlinuz-version ro root=/dev/sda2
# initrd /initrd-version.img
#boot=/dev/sda
default=1
failback=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Enterprise Linux (2.6.18-92.el5)
root (hd1,0)
kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2 3
initrd /initrd-2.6.18-92.el5.img
title Enterprise Linux (2.6.18-92.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-92.el5 ro root=/dev/md2 3
initrd /initrd-2.6.18-92.el5.img

Note the changes to the default and failback parameter values.

22.2 Test - boot into degraded array (/dev/sdb software removed)

Verify that the /, /boot filesystems and swap device remain active, usable and writable after failing the second disk member of each raid array e.g.:
# mdadm --manage --fail /dev/md1 /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md1
# mdadm --manage --fail /dev/md2 /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md2
# mdadm --manage --fail /dev/md3 /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md3

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb1[2](F) sda1[1]
1052160 blocks [2/1] [_U]

md3 : active raid1 sdb3[2](F) sda3[1]
1052160 blocks [2/1] [_U]

md2 : active raid1 sdb2[2](F) sda2[1]
5116608 blocks [2/1] [_U]

unused devices: <none>

# mdadm --query --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Tue Dec 30 21:46:44 2008
Raid Level : raid1
Array Size : 1052160 (1027.67 MiB 1077.41 MB)
Used Dev Size : 1052160 (1027.67 MiB 1077.41 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Wed Dec 31 08:57:33 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0

UUID : a4d5007d:6974901a:637e5622:e5b514c9
Events : 0.70

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 1 1 active sync /dev/sda1
2 8 17 - faulty spare /dev/sdb1

# dmesg
...
raid1: Disk failure on sdb1, disabling device.
Operation continuing on 1 devices
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:0, dev:sdb1
disk 1, wo:0, o:1, dev:sda1
RAID1 conf printout:
--- wd:1 rd:2
disk 1, wo:0, o:1, dev:sda1
raid1: Disk failure on sdb2, disabling device.
Operation continuing on 1 devices
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:0, dev:sdb2
disk 1, wo:0, o:1, dev:sda2
RAID1 conf printout:
--- wd:1 rd:2
disk 1, wo:0, o:1, dev:sda2
raid1: Disk failure on sdb3, disabling device.
Operation continuing on 1 devices
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:0, dev:sdb3
disk 1, wo:0, o:1, dev:sda3
RAID1 conf printout:
--- wd:1 rd:2
disk 1, wo:0, o:1, dev:sda3

# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)

# swapon -s
Filename Type Size Used Priority
/dev/md3 partition 1052152 0 -1

22.3 Test - boot into degraded array (/dev/sdb software removed)

Having software failed the second disk raid member (/dev/sdb), software remove the second disk then test successful system boot e.g.:

# mdadm --manage --remove /dev/md1 /dev/sdb1
mdadm: hot removed /dev/sdb1
# mdadm --manage --remove /dev/md2 /dev/sdb2
mdadm: hot removed /dev/sdb2
# mdadm --manage --remove /dev/md3 /dev/sdb3
mdadm: hot removed /dev/sdb3

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1]
1052160 blocks [2/1] [_U]

md3 : active raid1 sda3[1]
1052160 blocks [2/1] [_U]

md2 : active raid1 sda2[1]
5116608 blocks [2/1] [_U]

unused devices: <none>

# mdadm --query --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Tue Dec 30 21:46:44 2008
Raid Level : raid1
Array Size : 1052160 (1027.67 MiB 1077.41 MB)
Used Dev Size : 1052160 (1027.67 MiB 1077.41 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Wed Dec 31 09:06:21 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : a4d5007d:6974901a:637e5622:e5b514c9
Events : 0.72

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 1 1 active sync /dev/sda1

# dmesg
...
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdb2>
md: export_rdev(sdb2)
md: unbind<sdb3>
md: export_rdev(sdb3)

# mount | grep md
/dev/md2 on / type ext3 (rw)
/dev/md1 on /boot type ext3 (rw)

# swapon -s
Filename Type Size Used Priority
/dev/md3 partition 1052152 0 -1

# shutdown -r now
...

On reboot, add the failed/removed second disk back to the array e.g.:
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda1[1]
1052160 blocks [2/1] [_U]

md3 : active raid1 sda3[1]
1052160 blocks [2/1] [_U]

md2 : active raid1 sda2[1]
5116608 blocks [2/1] [_U]

unused devices: <none>

# mdadm --manage --add /dev/md1 /dev/sdb1
mdadm: re-added /dev/sdb1
# mdadm --manage --add /dev/md2 /dev/sdb2
mdadm: re-added /dev/sdb2
# mdadm --manage --add /dev/md3 /dev/sdb3
mdadm: re-added /dev/sdb3

# dmesg
...
md: bind<sdb1>
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sdb1
disk 1, wo:0, o:1, dev:sda1
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 1052160 blocks.
md: bind<sdb2>
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sdb2
disk 1, wo:0, o:1, dev:sda2
md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)
md: bind<sdb3>
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:1, dev:sdb3
disk 1, wo:0, o:1, dev:sda3
md: delaying resync of md3 until md1 has finished resync (they share one or more physical units)
md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)

# watch -n 15 cat /proc/mdstat

22.4 Test - boot into degraded array (/dev/sda physically removed)

Similar to tests 22.2 and 22.3, test for ongoing system operation then system boot after physical removal of one or other (or both) raid disk member disks e.g.:
# mdadm --manage --fail /dev/md1 /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md1
# mdadm --manage --fail /dev/md2 /dev/sda2
mdadm: set /dev/sda2 faulty in /dev/md2
# mdadm --manage --fail /dev/md3 /dev/sda3
mdadm: set /dev/sda3 faulty in /dev/md3

# mdadm --manage --remove /dev/md1 /dev/sda1
mdadm: hot removed /dev/sda1
# mdadm --manage --remove /dev/md2 /dev/sda2
mdadm: hot removed /dev/sda2
# mdadm --manage --remove /dev/md3 /dev/sda3
mdadm: hot removed /dev/sda3

Follow Note.603868.1 to dynamically remove (hot unplug) disk devices from the system. Alternatively, shutdown the system and physically remove device /dev/sda from the system before rebooting. On boot, dynamically add the /dev/sda back as a raid disk member, then repeat the same test but physically remove second disk member /dev/sdb. This test not only validates the failback boot title, but also emulates online replacement of a failed hard disk.

Once satisifed, deploy the system for production use.