Table_of_Contents
1.2.1 Backup Procedure
1.2.2 Restore Procedure
1.3 Common System and Data Disks
1.4 Separate System and Data Disks
1.5 Configure and Install RAID
1.7 Verify Email Notifications
1.8 Controlling RAID
Automatically Restore Failed RAID Disk
Manually Restore Failed RAID Disk
1.10.1 Create RAID Partition on Replacement Disk
1.10.2 Add Replacement Disk to RAID
RAID (Redundant Array of Independent Disks) is a data storage virtualization technology that combines multiple physical disk partitions into a single logical unit for the purposes of data redundancy (reliability), performance improvements, or both.
This script creates a RAID disk (type 1) from two equal size disk partitions and installs / configures email notifications providing RAID status whenever anything (such as disk errors / failure) changes. Combined with regular backups, RAID guarantees that no data will be lost, or, if so, it is easily replaced.
An overview of RAID is available at WikiPedia.
RAID is desirable for mission critical applications where a disk failure does not result in loss of data / services and time is available to replace a failed disk resulting in no loss of data.
RAID disks are intended for extra bulk data storage such as websites, shared files, media (such as ZoneMinder security camera footage), virtual machines and docker containers. SecureOffice boot and rootfs partitions are not part of any RAID array since they are easily replaced from backups in the event of a system disk failure.
(Optional) If either of the RAID target partitions / disks contains important data, it needs to be backed up to another location and restored to the RAID array once RAID installation is complete. If multiple partitions need backup / restore, repeat this procedure for each partition.
In default configuration, SecureOffice boot and system partitions are on the same disk as bulk data storage (for virtual machines, websites, file server data, etc) using partition 4 of the system disk. Partition 4 of the system disk will be configured as a RAID disk component.
Requirements:
Skip to Section 1.5.
In this configuration, SecureOffice boot and system partitions are on a separate device such as mSATA. Bulk data storage (for virtual machines, websites, file servers, data, etc) uses two additional SATA disks configured for RAID. Each additional disk will have two partitions: swap (partition 3) and RAID (partition 4). The size of the RAID partition (using both disks) will be the size of the smaller disk less the size of the swap partition (default: 8GB).
Requirements:
In subsequent instructions, DRIVE1 and DRIVE2 will be the disks containing the RAID disk components / partitions. They need to be identified for use by the RAID installation script.
If using common system and data disks (you added one disk for RAID), <DRIVE1> will be the existing system disk and <DRIVE2> will be the new disk. Assuming your boot / system disk is "/dev/sda" and your new disk is "/dev/sdb", <DRIVE1> will be "/dev/sda" and <DRIVE2> will be "/dev/sdb".
If two new SATA disks were added for RAID, <DRIVE1> will be the lower (b less than c) new disk. <DRIVE2> will be the higher (c greater than b) new disk. Assuming the two new disks were detected as "/dev/sdb" and "/dev/sdc":
Using the data partition (4) of the boot / system disk as a RAID member provides the option (RAID script will prompt) of backing up the system partitions (1 to 3) to the additional disk. This provides a failsafe boot option (select as primary boot disk in BIOS setup) should the system disk fail. To use this option, <DRIVE1> value (below) must be the boot / system disk. This will result in partitions 1 to 3 (boot, rootfs, swap) of both disks being identical, with partition 4 of each disk being RAID members. Note that the system partitions of the additional disk will not be updated (configuration, packages) as the system partitions of the primary disk change. This is the recommended configuration.
Choosing to not backup partitions 1 to 3 of the boot / system disk to the secondary disk will result in following partition layout:
The disks (DRIVE1, DRIVE2) corresponding to the two disks to be used for RAID need to be identified, by following these identify disk instructions.
SecureOffice must be connected to the internet from your registered domain. To test this, enter "ping <www.your_domain>". The result should be the WAN IP address of SecureOffice (WAN Topology) or main router (LAN Topology) IP address. If not, DDNS is not working and needs to be configured or checked.
Enter the following (one line) at a SecureOffice command prompt to download the RAID installation script:
"cd /tmp; sget Files/do_raid.sh; chmod +x do_raid.sh" ("sget" is not a typo, it is a script that wraps wget by providing path prefix and download credentials)
The following values at script beginning must be changed to adapt to your installation (nano /tmp/do_raid.sh).
Enter the following command to install and configure RAID:
"./do_raid.sh"
Answer prompts / questions as follows:
Common system disk and first RAID component / partition:
Separate system disk and RAID components / partitions:
Follow the prompts and fix any errors (such as RAID device already exists) until installation is successful. At the end of RAID installation, result is:
Any errors can be diagnosed from the script output and log (/var/log/do_raid.log).
Enter the following commands:
"mount" You should see "/dev/md0 on"the value specified for "RAID_MOUNT" above.
If "/dev/md0" is not mounted, enter "logread | grep md" for diagnostic information and fix any problems.
After your next reboot, repeat the "mount" command above to confirm that the RAID disk auto mounts at startup and the swap partition from DRIVE2 is active (Enter "swapon -s"). If not, repeat the "logread" command for diagnostic information and fix any problems.
After RAID installation, you should have received two status emails (may be delayed by servers). If not, your send email server may need additional configuration.
Send a test email with debug enabled by entering the following command (omitting outer quotes):
'echo -e "From: <EMAIL_FROM>\nSubject: Raid Status\n\n Raid Installed!" | msmtp --debug --timeout=10 -t <EMAIL_TO>'
Replacing <EMAIL_FROM> and <EMAIL_TO> with the RAID script configuration values entered above. Debug output from the above command should provide clues regarding any issues.
If the email is not received, do an internet search "msmtp your_email_provider" to determine any additional configuration required.
Once required configuration additions / changes are identified, edit ("nano /etc/msmtprc") to enter any changes required.
Further information regarding msmtp email configuration / debugging is available at the Arch Wiki.
SecureOffice / OpenWrt RAID installation uses a standard initscript ("/etc/init.d/mdadm") for controlling the RAID array. The RAID UUID and device ("/dev/md0") is specified in "/etc/config/mdadm". The RAID mount point (default: "/home/data") for the RAID disk (/dev/md0) is specified in "/etc/config/fstab".
After RAID is installed, the RAID disk is active and mounted at (default: /home/data, or script value RAID_MOUNT) and configured to auto mount at boot.
During system backup (GUI: "Backup / Flash Firmware -> Backup"), all RAID configuration is saved. During system restore (GUI: "Backup / Flash Firmware -> Restore"), all packages required for RAID are re-installed and configuration restored. Script "/etc/backup/raid" was installed by the RAID script and is used by system backup / restore.
What is not restored is the RAID disk contents (bulk data) which must be backed up / restored by an alternate method. The backup / restore method used by docker-nextcloud can serve as an example for backing up / restoring bulk data such as virtual machines.
Every time a significant RAID event such as SecureOffice boot (RAID mounted) or RAID disk failure occurs, an email will be received containing details.
When a RAID disk member fails, the RAID array will continue to run using the redundant (good) RAID disk member. This allows time to replace the faulty disk without losing data (the whole point of RAID).
To identify a failed RAID disk:
Enter "cat /proc/mdstat". If both RAID members (disks) are active, you will see output of the form below. The "[UU]" result indicates both disks are good.
root@SecureOffice-x86_64:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10]
md0 : active raid1 sdb4[1] sda4[0]
201745984 blocks super 1.2 [2/2] [UU]
If one of the RAID members has failed, you will see output of the form below: "(F)" will appear next to the failed disk and "[_U]" or "[U_]" will be reported. In this example, "/dev/sdb4" has failed and needs to be replaced.
root@SecureOffice-x86_64:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10]
md0 : active raid1 sdb4[1] (F) sda4[0]
201745984 blocks super 1.2 [2/2] [_U]
The failed disk must be removed from the RAID array. Enter "mdadm --manage /dev/md0 --remove /dev/sdX4" where "X" is the identity (a, b, etc) of the failed disk reported above.
Power down SecureOffice (Enter "shutdown") and replace the failed disk with one of equal or greater size.
If the failed disk is the boot / system disk (Common system and data disks), SecureOffice must be re-installed on the replacement disk: Replace System Disk, including restore from backup.
The easiest way to restore a failed RAID disk is to re-install RAID by re-running the RAID installation script. To do so:
A harder way to restore a failed RAID disk is:
If the failed disk was the system disk, partition 4 (RAID) on the replacement system disk must be created with the same size as partition 4 of the good RAID disk.
If the failed disk was not the system disk and using a common disk for system and RAID partitions and it is not required to backup the system / swap partitions to the replacement disk.
Else, if it is desired to backup the system partitions to the replacement disk, copy partitions 1 and 2 (boot, rootfs) from the system disk to the replacement disk:
RAID requires the partition size of member disks to be identical. When replacing a RAID disk, the required partition size can be determined from the good RAID partition. Assuming the good partition is "/dev/sdX" Enter "fdisk -l /dev/sdX" where "X" is "a, b, c, etc" to determine partition layout. You will see output as below:
root@SecureOffice-x86_64:~# fdisk -l /dev/sda
Disk /dev/sda: 238.49 GiB, 256060514304 bytes, 500118192 sectors
Disk model: MT-256
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xf5c46f89
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 512 107007 106496 52M 83 Linux
/dev/sda2 107520 15774719 15667200 7.5G 83 Linux
/dev/sda3 15775744 32552959 16777216 8G 82 Linux swap / Solaris
/dev/sda4 32552960 436309170 403756211 192.5G 83 Linux
The entry in the "/dev/sdX4" row and "Sectors" column is the required size of the RAID partition (<RAID SECTORS>).
Enter "fdisk -l /dev/sdY" where "Y" is the replacement disk. If the replacement disk is the system disk or system partitions were backed up to the replacement disk, you should see entries for partitions 1, 2, 3. If the replacement disk is not the system disk and system partitions were not backed up, you should see a single swap partition (created previously).
Enter 'echo -e "n\np\n4\n\n+<RAID SECTORS>\nw" | fdisk /dev/sdxY' where "< RAID SECTORS>" was determined above and "Y" is the replacement disk.
As a final check, Enter "fdisk -l /dev/sdX" and "fdisk -l /dev/sdY". The size of the swap (3) and data (4) partitions should be identical for both disks.
Enter "mdadm --manage /dev/md0 --add /dev/sdY4" where "Y" is the replacement disk ID.
Enter "cat /proc/mdstat". You should see output as below, indicating the RAID array is rebuilding:
root@SecureOffice-x86_64:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10]
md0 : active raid1 sdb4[1] sda4[0]
201745984 blocks super 1.2 [2/2] [_U]
[=>...................] recovery = 9.9% (2423168/24418688) finish=2.8min speed=127535K/sec
After recovery is complete, you will see output as below:
root@SecureOffice-x86_64:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10]
md0 : active raid1 sdb4[1] sda4[0]
201745984 blocks super 1.2 [2/2] [UU]
If data was backed up from either disk (such as /home/data) prior to installing RAID (Section 1.2), or was backed up prior to replacing a failed disk, restore it to the RAID disk if required.
Reboot to verify that the RAID array starts correctly and all configuration / files have been restored.
Additional instructions are available to replace the failed disk and restore it to the RAID array. Should the previous link change, google "Linux replace failed RAID disk".
|
Technologies Used: