How to install and configure SnapRAID on Ubuntu Server
NOTE: This guide has been written for Xenial Xerus 16.04 LTS but has also been tested on Ubuntu Trusty Tahr 14.04LTS, Precise Pangolin 12.04 & Lucid Lynx 10.04 although it may work equally well on earlier versions of Ubuntu too.
You may have read already that I currently have six 2TB drives in my server which hold all my CD, DVD & Blu-ray rips. Statistically speaking, the more drives you have the more likely you are to experience a drive failure at some point. As I obviously still own all the physical media I could re-rip them all in the event of such an occurrence. However, this would prove quite time-consuming and so I figured it's about time I implemented some sort of RAID system to protect the data on the drives. RAID is basically a mechanism which allows you to recreate the data on a failed drive using the data from the other drive(s) in the RAID Array. It should be stressed that RAID is not a substitute for proper backups and so you should ensure important data is backed up properly. If you're not familiar with RAID then you can read all about it here.
Traditional RAID systems operate at the system level and so you need to format your drives to implement your chosen RAID system. This can be irritating especially if you've already filled up your drives with data. Furthermore, if you experience multiple concurrent drive failures over and above the RAID tolerance level you've implemented then you lose ALL the data in the RAID Array, not just the data on the failed drives.
In my search for a suitable RAID system to implement on my server I came across SnapRAID. SnapRAID is a RAID system that sits on top of your existing file system. This means you can implement SnapRAID without having to reformat your drives. Furthermore, the drives you want to protect with SnapRAID do not all have to have the same storage capacity as each other.
SnapRAID is what is called a snapshot RAID system. Many, if not most, RAID systems operate in real-time whereby the parity information used to re-construct your data is updated as soon as you make any changes to that data. This means that the Array is always up to date and all data can be re-constructed from parity if required. A snapshot RAID system on the other hand operates very differently. Any changes you make to your data are not reflected in parity until you manually force a re-synch of the Array. It follows that until such time as the Array has been re-synched then any changed data is at risk since it cannot be re-constructed from parity. Once the Array has been re-synced then all data in the Array is once again protected.
Whilst at first glance it may seem that a real-time RAID system is the most sensible option it's not necessarily the case. Consider the type of data on a typical media server; it's nearly all ripped data and so rarely changes. Maintaining a real-time RAID parity can be quite resource intensive and as mentioned above it requires that the drives are empty before you start. Generally it also means that all drives have to remain spinning all the time plus they must have the same storage capacity as each other. Also, if you accidentally delete anything when using a traditional RAID setup it is also deleted immediately from the RAID Array. SnapRAID on the other hand operates more as a backup since the array is only brought up to date when you choose to do so. So you should be able to recover your deleted content from parity. I say should because it depends on what other changes have been made since you updated the parity information.
In my setup I've scheduled SnapRAID to automatically re-sync the Array regularly so I don't have to remember to do it manually. Plus I'm using a script that emails me the results when it's done, assuming of course that it actually had something to do (most times it does not). You can read more about SnapRAID here. With SnapRAID you can add individual folders and/or files to the RAID Array, you don't have to add whole drives. This means, for example, you can RAID infrequently changing data on your OS drive whilst excluding the data which changes frequently. Or whatever variation you choose. The great thing about SnapRAID is it is flexible.
TIP: If you're going to be following this guide step by step and are using Putty to administer your server then you can save yourself some typing by simply highlighting each command below, right-clicking on it and selecting Copy. Then toggle over to your Putty Session and right-click once more. The command you've just copied from here will be automatically pasted into your Putty Session.
How to install SnapRAID 10.x on Ubuntu Server
So, how do you install it and configure it? Well, there's no GUI so you have to do a bit of typing I'm afraid!
NO GUI?? WHAT?? Honestly, don't be put off by this, there really is nothing to it. There is a GUI available if you really want one, although I've never tried it. You still have to install SnapRAID using a few commands, the GUI only comes into play once you've got it up and running.
So, before we do anything else we should bring the Ubuntu Repository up to date. So, from a Putty Session or, if you've got a screen and keyboard attached to your server, then you can use the command line itself to type the following:
sudo apt-get update
You'll be prompted for a password. This is the password you created when you installed Ubuntu. Ubuntu tends to prompt for a password each time you issue a "sudo" command.
To install SnapRAID you need to make sure you have a suitable C Compiler installed. Issue the following command to install one. Note, if gcc is already installed then Ubuntu will tell you so and do nothing more:
sudo apt-get install gcc
Download and install SnapRAID
The next thing we need to do is create a folder to hold the downloaded installation files:
sudo mkdir /var/lib/snapraid
Now change the permissions on this folder and switch into it:
sudo chmod a+w /var/lib/snapraid
Now let's download the installation files. At the time of writing the latest version of SnapRAID 10.0. Check this page for the latest version and alter the following commands accordingly.
If you need to install make then issue the following command: sudo apt-get install make and then issue the make command once more.
Next we should check that everything has been built correctly:
It should run through a whole bunch of tests and, hopefully, come back with "Success!" We're nearly there, only one more step:
sudo make install
As a bit of clean-up we can remove the downloaded file since we no longer need it:
Now we have installed SnapRAID we need to configure it. Before we do that let me explain a couple of things:
Drives and partitions
It's important when you're implementing a RAID solution that you understand the difference between drives and partitions. A drive is the physical disc unit itself. On that drive you can have one or more partitions. That is, you can split the drive up into separate chunks. These chunks will look like separate "drives" as far as your operating system is concerned. They will also be classed as separate "drives" as far as SnapRAID is concerned too. Therefore, if you have a single physical drive with 2 partitions on it and that drive goes bang, you have lost 2 "drives". With a single parity file that single drive failure will be over and above the permitted tolerance level and so you will lose all data on that physical drive, it cannot be recovered from parity. All my drives, apart from the OS drive have a single partition on them. My OS drive is split into two: one partition holds the OS and is not part of the RAID Array, the other partition holds data and is part of the Array.
The Parity file
The Parity file is often a source of confusion for people new to RAID. To keep things simple, when I refer to a "drive" I also mean partition and in the example that follows I'm assuming you want to protect all the data on the drives using RAID. Remember, the Parity file, along with the data on the remaining drives, is used to recover the data on a failed drive.
The Parity file will be around the same size as the largest fullest data drive and so you need to specify a location for the Parity file with sufficient spare capacity to hold it. Consider the following drives:
1TB data drive 50% full (ie. it contains 500GB of data)
750GB data drive 100% full (ie. it contains 750G data)
2TB data drive 75% full (ie. it contains 1.5TB of data)
2TB data drive 90% full (ie. it contains 1.8TB of data)
In the above example you would need to store the parity file on a 2TB drive because that is the biggest data drive in the Array. Based on the current usage the parity file will be approximately 1.8TB in size (ie. the size of the data on the 4th disc). Although the 750GB drive is completely full it is still less than the 1.8TB of data on the 4th drive.
This brings me onto another important point. When I say that the parity file will be approximately the same size as largest fullest drive it is actually slightly larger than that. So, if you completely fill up the 2TB drive then the parity file will need to be 2TB plus a little bit more. ie. it will not fit on a 2TB drive. For this reason you should never completely fill your drives with data, especially the drives which have the same capacity as your parity drive.
Obviously I cannot show you exactly how you should configure yours but you will nevertheless need a SnapRAID configuration file. There is a sample file in the SnapRAID folder. So, copy the file to the /etc folder and edit it to suit your requirements. If you're taking my advice about storing config files outside the OS then issue the following two commands.
From the above you can see the parity information will be stored in the /media/ST4000VN_2/SNAPRaid_Parity folder.
I've told SnapRAID to create a "content" file on a few of my disks. The recommendation is you must have at least one copy for each parity file plus one more. Although the SnapRAID manual states that you can store the content file on the same disk as the parity file itself it is not a good idea to do so since it can lead to problems when your drives are nearly full as explained above. For reference, my "content" files are around 1GB in size.
For information, the "content" file is created automatically by SnapRAID so you must ensure that the user running SnapRAID (in my case me) has permission to create the file in that location. The same goes for the Parity file.
All my data drives have just a single partition and I have two root folders on each of them: Unprotected and RAIDMain. Everything I want RAID protected I put in RAIDMain and everything else I put in the Unprotected folder. For example, backups of other computers in my house go in the Unprotected folder. On the OS Drive I have a RAIDMain folder plus all the other system folders. None of my system folders are backed up. As you can see from the above include statement only the RAIDMain folders are included. SnapRAID offers great flexibility but I recommend you start off simple and then progress from there.
To keep the SnapRAID parity up to date you have to "sync" it. You can either do this manually when you know you have changed files protected by SnapRAID (added, deleted or updated them) or you can use a cron job to do it. I plumped for the latter option to save me having to remember to do it manually. For information, you issue the command snapraid sync in a Putty session or at the command prompt to bring the SnapRAID parity information up to date.
So, the script I use can be found here: SnapRAID Script and I downloaded it from here so all credit to the author. I did however change it so the scrub job ran weekly and not each day.
Instead of downloading the script you can create it via Putty:
Using Putty navigate into the folder where you're going to store the script. For example type cd /media/WD40EFRX/RAIDMain/MyScripts.
Next type vim SnapRAIDSync.sh (or your preferred script name) and press Enter. This will open the file for editing.
Highlight the whole script from here, right click and select Copy.
Then in your Putty Session press the [Insert] key once, insert a few blank lines by pressing the [Enter] key and then right click and the whole script will be pasted into the screen. Double-check you've pasted the whole script and not missed the first or last few lines.
Edit the script as required and then press the [Esc] key once and type :wq to save and quit out of the script. If you make a mistake then issue :q! instead of :wq to abort your changes.
Next, you need to make the script executable by issuing the following command:
chmod a+x SnapRAIDSync.sh
As a last step you might need to install an extra package if you've not got it already:
sudo apt-get install mailutils
NOTE: If you see a Postfix Configuration menu then select "No Configuration" and press Enter.
This script is really rather neat in many ways, one being that it will do nothing if you have deleted more than x files (x being 20 by default but you can change it to whatever number you like). This will ensure that if you accidentally delete a load of files you can still recover them from parity. If this fail-safe option wasn't there then those deleted files would be lost once you'd brought the parity information up to date. Another cool feature of this script is it will send you an email of exactly what it did. You obviously have to put your own email address in the script for it to email you.
Run the script as a cron job using Webmin
To set up the above script as a cron job within Webmin launch Webmin then click on System and then Scheduled Cron Jobs. Then click Create a new scheduled cron job at the top of the screen that opens.
Click the button next to the Execute cron job as option and choose your username.
Type the full path of the script into the Command box. So you'd type /media/WD40EFRX/RAIDMain/MyScripts/SnapRAIDSynch.sh >/dev/null 2>&1 replacing the path as required.
In the When to Execute section choose whatever time period you like. It is obviously not wise to schedule a sync when you are potentially in the midst of changing any of the data so choose a time period when you know you won't be. For example, to run the sync at 3am each day choose the Times and dates selected below .. option and in the Minutes section choose Selected .. and highlight 0. In the Hours section choose Selected .. and highlight 3. Leave the Days, Months and Weekdays options as All.
Don't forget to click the Create button when you've set up the schedule to create the cron job.
One of the newer features of SnapRAID is the pooling facility. This facility creates a unified view of all the individual files and folders which are spread across the various drives in your Array and displays them as though they were stored on one single huge drive. So, instead of needing to know which drive a particular file was stored on you can instead access it directly via the pool.
For example, on each of the drives in my Array I have a folder called "Blu-Rays" and inside there I have a separate folder for each blu-ray movie I own. When I access the Blu-Rays folder via the pool I see all the individual movie folders inside it even tho in reality they are spread across 6 different disks. This makes browsing your movie collection a much more family-friendly affair.
You create or update the pool by issuing the following command after having updated the array:
Of course you can append this command to the end of the above script if you like.
One of the more unpleasant things that can happen to your data is it goes "bad". By that I mean the area on a drive where the data has been written to previously suddenly starts throwing up read errors so can no longer be accessed. SnapRAID has an option that allows you to check for such a situation and so you can work round it. It effectively allows you to recover that chunk of data from parity or if the error is on the parity drive itself it recomputes parity for that data and writes the recovered data to a different part of the affected disk.
You can initiate a scrub as follows:
I suggest you consult the online documentation for how to scrub your data and deal with any issues which may arise.
More help required
If you need to consult the manual you can issue the following command in a Putty session: