unixway > volume management > solaris > disksuite > os mirror

Mirroring the operating system

In the steps below, I'm using DiskSuite to mirror the active root disk (c0t0d0) to a mirror (c0t1d0). I'm assuming that partitions five and six of each disk have a couple of cylinders free for DiskSuite's state database replicas.

Introduction

First, we start with a filesystem layout that looks as follows:
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/dsk/c0t0d0s0    6607349  826881 5714395    13%    /
/proc                      0       0       0     0%    /proc
fd                         0       0       0     0%    /dev/fd
mnttab                     0       0       0     0%    /etc/mnttab
/dev/dsk/c0t0d0s4    1016863    8106  947746     1%    /var
swap                 1443064       8 1443056     1%    /var/run
swap                 1443080      24 1443056     1%    /tmp
We're going to be mirroring from c0t0d0 to c0t1d0. When the operating system was installed, we created unassigned slices five, six, and seven of roughly 10 MB each. We will use slices five and six for the DiskSuite state database replicas. The output from the "format" command is as follows:
# format 
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <SEAGATE-ST19171W-0024 cyl 5266 alt 2 hd 20 sec 168>
          /pci@1f,4000/scsi@3/sd@0,0
       1. c0t1d0 <SEAGATE-ST19171W-0024 cyl 5266 alt 2 hd 20 sec 168>
          /pci@1f,4000/scsi@3/sd@1,0
Specify disk (enter its number): 0

selecting c0t0d0
[disk formatted]
...
partition> p
Current partition table (original):
Total disk cylinders available: 5266 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders        Size            Blocks
  0       root    wm       0 - 3994        6.40GB    (3995/0/0) 13423200
  1       swap    wu    3995 - 4619        1.00GB    (625/0/0)   2100000
  2     backup    wm       0 - 5265        8.44GB    (5266/0/0) 17693760
  3 unassigned    wu       0               0         (0/0/0)           0
  4        var    wm    4620 - 5244        1.00GB    (625/0/0)   2100000
  5 unassigned    wm    5245 - 5251       11.48MB    (7/0/0)       23520
  6 unassigned    wm    5252 - 5258       11.48MB    (7/0/0)       23520
  7 unassigned    wm    5259 - 5265       11.48MB    (7/0/0)       23520

DiskSuite Mirroring

Note that much of the process of mirroring the root disk has been automated with the sdsinstall script. With the exception of the creation of device aliases, all of the work done in the following steps can be achieved via the following:

# ./sdsinstall -p c0t0d0 -s c0t1d0 -m s5 -m s6

  1. Ensure that the partition tables of both disks are identical:

    # prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t1d0s2

  2. Add the state database replicas. For redundancy, each disk has two state database replicas.

    # metadb -a -f c0t0d0s5
    # metadb -a c0t0d0s6
    # metadb -a c0t1d0s5
    # metadb -a c0t1d0s6

    Note that there appears to be a lot of confusion regarding the recommended number and location of state database replicas. According the the DiskSuite reference manual:

    State database replicas contain configuration and status information of all metadevices and hot spares. Multiple copies (replicas) are maintained to provide redundancy. Multiple copies also prevent the database from being corrupted during a system crash (at most, only one copy if the database will be corrupted).

    State database replicas are also used for mirror resync regions. Too few state database replicas relative to the number of mirrors may cause replica I/O to impact mirror performance.

    At least three replicas are recommended. DiskSuite allows a maximum of 50 replicas. The following guidelines are recommended:

    • For a system with only a single drive: put all 3 replicas in one slice.

    • For a system with two to four drives: put two replicas on each drive.

    • For a system with five or more drives: put one replica on each drive.

    In general, it is best to distribute state database replicas across slices, drives, and controllers, to avoid single points-of-failure.

    Each state database replica occupies 517 KB (1034 disk sectors) of disk storage by default. Replicas can be stored on: a dedicated disk partition, a partition which will be part of a metadevice, or a partition which will be part of a logging - device.

    Note - Replicas cannot be stored on the root (/), swap, or /usr slices, or on slices containing existing file systems or data.

    Starting with DiskSuite 4.2.1, an optional /etc/system parameter exists which allows DiskSuite to boot with just 50% of the state database replicas online. For example, if one of the two boot disks were to fail, just two of the four state database replicas would be available. Without this /etc/system parameter (or with older versions of DiskSuite), the system would complain of "insufficient state database replicas", and manual intervention would be required on bootup. To enable the "50% boot" behaviour with DiskSuite 4.2.1, execute the following command:

    # echo "set md:mirrored_root_flag=1" >> /etc/system
  3. Define the metadevices on c0t0d0 (/):

    # metainit -f d10 1 1 c0t0d0s0
    # metainit -f d20 1 1 c0t1d0s0
    # metainit d0 -m d10

    The metaroot command edits the /etc/vfstab and /etc/system files:

    # metaroot d0

    Define the metadevices for c0t0d0s1 (swap):

    # metainit -f d11 1 1 c0t0d0s1
    # metainit -f d21 1 1 c0t1d0s1
    # metainit d1 -m d11

    Define the metadevices for c0t0d0s4 (/var):

    # metainit -f d14 1 1 c0t0d0s4
    # metainit -f d24 1 1 c0t1d0s4
    # metainit d4 -m d14

  4. Edit /etc/vfstab so that it references the DiskSuite metadevices instead of simple slices:

    #device           device          mount   FS      fsck    mount   mount
    #to mount         to fsck         point   type    pass    at boot options
    #
    fd               -                /dev/fd fd      -       no      -
    /proc            -                /proc   proc    -       no      -
    /dev/md/dsk/d1   -                -       swap    -       no      -
    /dev/md/dsk/d0   /dev/md/rdsk/d0  /       ufs     1       no      logging
    /dev/md/dsk/d4   /dev/md/rdsk/d4  /var    ufs     1       no      logging
    swap             -                /tmp    tmpfs   -       yes     -
    
  5. Reboot the system:

    # lockfs -fa

    # sync;sync;sync;init 6

  6. After the system reboots from the metadevices for /, /var, and swap, set up mirrors:

    # metattach d0 d20
    # metattach d1 d21
    # metattach d4 d24

    The process of synchronizing the data to the mirror disk will take a while. You can monitor its progress via the command:

    # metastat|grep -i progress
  7. Capture the DiskSuite configuration in the text file md.tab. With Solaris 2.6 and Solaris 7, this text file resides in the directory /etc/opt/SUNWmd; however, more recent versions of Solaris place the file in the /etc/lvm directory. We'll assume that we're running Solaris 8 here:

    # metastat -p | tee /etc/lvm/md.tab

  8. In order for the system to be able to dump core in the event of a panic, the dump device needs to reference the DiskSuite metadevice:

    # dumpadm -d /dev/md/dsk/d1
  9. If the primary boot disk should fail, make it easy to boot from the mirror. Some sites choose to alter the OBP "boot-device" variable; in this case, we choose to simply define the device aliases "sds-root" and "sds-mirror". In the event that the primary boot device ("disk" or "sds-root") should fail, the administrator simply needs to type "boot sds-mirror" at the OBP prompt.

    Determine the device path to the boot devices for both the primary and mirror:

    # ls -l /dev/dsk/c0t0d0s0 /dev/dsk/c0t1d0s0
    lrwxrwxrwx   1 root     root          41 Oct 17 11:48 /dev/dsk/c0t0d0s0 -> ../..
    /devices/pci@1f,4000/scsi@3/sd@0,0:a
    lrwxrwxrwx   1 root     root          41 Oct 17 11:48 /dev/dsk/c0t1d0s0 -> ../..
    /devices/pci@1f,4000/scsi@3/sd@1,0:a
    

    Use the device paths to define the sds-root and sds-mirror device aliases (note that we use the label "disk" instead of "sd" in the device alias path):

    # eeprom "nvramrc=devalias sds-root /pci@1f,4000/scsi@3/disk@0,0
    devalias sds-mirror /pci@1f,4000/scsi@3/disk@1,0"
    # eeprom "use-nvramrc?=true"
    

    Test the process of booting from either sds-root or sds-mirror.

Once the above sequence of steps has been completed. the system will look as follows:

# metadb
        flags           first blk       block count
     a m  p  luo        16              1034            /dev/dsk/c0t0d0s5
     a    p  luo        16              1034            /dev/dsk/c0t0d0s6
     a    p  luo        16              1034            /dev/dsk/c0t1d0s5
     a    p  luo        16              1034            /dev/dsk/c0t1d0s6

# df -k
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/md/dsk/d0       6607349  845208 5696068    13%    /
/proc                      0       0       0     0%    /proc
fd                         0       0       0     0%    /dev/fd
mnttab                     0       0       0     0%    /etc/mnttab
/dev/md/dsk/d4       1016863    8414  947438     1%    /var
swap                 1443840       8 1443832     1%    /var/run
swap                 1443848      16 1443832     1%    /tmp

Trans metadevices for logging

UFS filesystem logging was first supported with Solaris 7. Prior to that release, one could create trans metadevices with DiskSuite to achieve the same effect. For Solaris 7 and up, it's much easier to simply enable ufs logging by adding the word "logging" to the last field of the /etc/vfstab file. The following section is included for those increasingly rare Solaris 2.6 installations.

The following two steps assume that you are have an available (<=64MB) slice 3 available for logging.

  1. Define the trans metadevice mirror (c0t0d0s3):

    # metainit d13 1 1 c0t0d0s3
    # metainit d23 1 1 c0t1d0s3
    # metainit d3 -m d13
    # metattach d3 d23

  2. Make /var use the trans meta device for logging:

    # metainit -f d64 -t d4 d3

    Edit vfstab as follows:

    /dev/md/dsk/d64 /dev/md/rdsk/d64 /var ufs 1 no -

    Ensure that no volumes are syncing before running the following:

    # sync;sync;sync;init 6