How to install and configure SnapRAID on Ubuntu Server
NOTE: This guide has been written for Ubuntu Precise Pangolin 12.04 & Lucid Lynx 10.04 although it may work equally well on earlier or later versions of Ubuntu.
You may have read already that I currently have six 2TB drives in my server which hold all my CD, DVD & Blu-ray rips. Statistically speaking, the more drives you have the more likely you are to experience a drive failure at some point. As I obviously still own all the physical media I could re-rip them all in the event of such an occurrence. However, this would prove quite time-consuming and so I figured it's about time I implemented some sort of RAID system to protect the data on the drives. RAID is basically a mechanism which allows you to recreate the data on a failed drive using the data from the other drive(s) in the RAID Array. It should be stressed that RAID is not a substitute for proper backups and so you should ensure important data is backed up properly. If you're not familiar with RAID then you can read all about it here.
Traditional RAID systems operate at the system level and so you need to format your drives to implement your chosen RAID system. This can be irritating especially if you've already filled up your drives with data. Furthermore, if you experience multiple concurrent drive failures over and above the RAID tolerance level you've implemented then you lose ALL the data in the RAID Array, not just the data on the failed drives.
In my search for a suitable RAID system to implement on my server I came across SnapRAID. SnapRAID is a RAID system that sits on top of your existing file system. This means you can implement SnapRAID without having to reformat your drives. Furthermore, the drives you want to protect with SnapRAID do not have to all be the same physical size as each other.
SnapRAID is what is called a snapshot RAID system. Many, if not most, RAID systems operate in real-time whereby the parity information used to re-construct your data is updated as soon as you make any changes to that data. This means that the Array is always up to date and all data can be re-constructed from parity if required. A snapshot RAID system on the other hand operates very differently. Any changes you make to your data are not reflected in parity until you manually force a re-synch of the Array. It follows that until such time as the Array has been re-synched then any changed data is at risk since it cannot be re-constructed from parity. Once the Array has been re-synced then all data in the Array is once again protected.
Whilst at first glance it may seem that a real-time RAID system is the most sensible option it's not necessarily the case. Consider the type of data on a typical media server; it's nearly all ripped data and so rarely changes. Maintaining a real-time RAID parity can be quite resource intensive plus it requires that the drives are empty before you start. Generally it also means that all drives have to remain spinning all the time and must have the same storage capacity as each other. In my setup I've scheduled SnapRAID to automatically re-sync the Array regularly so I don't have to remember to do it manually. Plus I'm using a script that emails me the results when it's done, assuming of course that it actually had something to do (most times it does not). You can read more about SnapRAID here. With SnapRAID you can add individual folders and/or files to the RAID Array, you don't have to add whole drives. This means, for example, you can RAID infrequently changing data on your OS drive whilst excluding the data which changes frequently. Or whatever variation you choose. The great thing about SnapRAID is it is flexible.
TIP: If you're going to be following this guide step by step and are using Putty to administer your server then you can save yourself some typing by simply highlighting each command below, right-clicking on it and selecting Copy. Then toggle over to your Putty Session and right-click once more. The command you've just copied from here will be automatically pasted into your Putty Session.
How to install SnapRAID 3.x on Ubuntu Server
So, how do you install it and configure it? Well, there's no GUI so you have to do a bit of typing I'm afraid!
NO GUI?? WHAT?? Honestly, don't be put off by this, there really is nothing to it. There is a GUI available if you really want one, although I've never tried it. You still have to install SnapRAID using a few commands, the GUI only comes into play once you've got it up and running.
So, before we do anything else we should bring the Ubuntu Repository up to date. So, from a Putty Session or, if you've got a screen and keyboard attached to your server, then you can use the command line itself to type the following:
sudo apt-get update
You'll be prompted for a password. This is the password you created when you installed Ubuntu. Ubuntu tends to prompt for a password each time you issue a "sudo" command.
To install SnapRAID you need to make sure you have a suitable C Compiler installed. Issue the following command to install one. Note, if gcc is already installed then Ubuntu will tell you so and do nothing more:
sudo apt-get install gcc
Download and install SnapRAID
The next thing we need to do is create a folder to hold the downloaded installation files:
sudo mkdir /var/lib/snapraid
Now change the permissions on this folder and switch into it:
sudo chmod a+w /var/lib/snapraid
Now let's download the installation files. At the time of writing the latest version of SnapRAID 3.0. Check this page for the latest version and alter the following commands accordingly.
The files come down compressed so we need to uncompress them:
tar -xzf snapraid-3.0.tar.gz
Now switch into the folder that has just been created:
Now we want to check everything is in place and is ready to go:
This should run through without issue and the last few lines should read....
configure: creating ./config.status
config.status: creating Makefile
config.status: creating config.h
Now make the executables:
If you need to install make then issue the following command: sudo apt-get install make
Next we should check that everything has been built correctly:
It should run through a bunch of tests and, hopefully, come back with "Success!" We're nearly there, only one more step:
sudo make install
As a bit of clean-up we can remove the downloaded file since we no longer need it:
Now we have installed SnapRAID we need to configure it. Before we do that let me explain a couple of things:
Drives and partitions
It's important when you're implementing a RAID solution that you understand the difference between drives and partitions. A drive is the physical disc unit itself. On that drive you can have one or more partitions. That is, you can split the drive up into separate chunks and those chunks will look like separate "drives" as far as your operating system is concerned. They will also be classed as separate "drives" as far as SnapRAID is concerned too. Therefore, if you have a single physical drive with 2 partitions on it and that drive goes bang, you have lost 2 "drives". With a single parity file that failure will be over and above the permitted tolerance level and so you will lose all data on that physical drive, it cannot be recovered from parity. All my drives, apart from the OS drive have a single partition on them. My OS drive is split into two: one partition holds the OS and is not part of the RAID Array, the other partition holds data and is part of the Array.
The Parity file
The Parity file is often a source of confusion for people new to RAID. To keep things simple when I refer to "drive" I also mean partition and in the example that follows I'm assuming you want to protect all the data on the drives using RAID. Remember, the Parity file, along with the data on the remaining drives, is used to recover the data on a failed drive.
The Parity file will be around the same size as the largest fullest data drive and so you need to specify a location for the Parity file with sufficient spare capacity to hold it. Consider the following drives:
1TB data drive 50% full (ie. it contains 500GB of data)
750GB data drive 100% full (ie. it contains 750G data)
2TB data drive 75% full (ie. it contains 1.5TB of data)
2TB data drive 90% full (ie. it contains 1.8TB of data)
In the above example you would need to store the parity file on a 2TB drive because that is the biggest data drive in the Array. Based on the current usage the parity file will be approximately 1.8TB in size (ie. the size of the data on the 4th disc). Although the 750GB drive is completely full it is still less than the 1.8TB of data on the 4th drive.
This brings me onto another important point. When I say the parity file will be approximately the same size as largest fullest drive it is actually slightly larger than that. So, if you completely fill up the 2TB drive then the parity file will need to be 2TB plus a little bit more. ie. it will not fit on a 2TB drive. For this reason you should never completely fill your drives with data, especially the drives which have the same capacity as your parity drive.
Obviously I cannot show you exactly how you should configure yours but you will need a SnapRAID configuration file. There is a sample file in the SnapRAID folder. So, copy the file to the /etc folder and edit it to suit your requirements:
sudo cp /var/lib/snapraid/snapraid-3.0/snapraid.conf.example /etc/snapraid.conf
and then issue the following command to edit it
sudo vim /etc/snapraid.conf
Please see the SnapRAID website for configuration notes and examples.
For reference, here's an extract from my /etc/snapraid.conf file
disk d1 /media/HD203WI/
disk d2 /media/HD204UI_1/
disk d3 /media/WD2000FYPS/
disk d4 /media/WD20EADS_1/
disk d5 /media/WD20EADS_2/
From the above you can see the parity information will be stored in the /media/HD204UI_2/SNAPRaid_Parity folder.
I've told SnapRAID to create a "content" file on a few of my disks. The recommendation is you must have at least one copy for each parity file plus one more. Although the manual states that you can store the content file on the same disk as the parity file it is not a good idea since it can lead to problems when your drives are nearly full as explained above. For reference, my "content" files are around 1GB in size.
For information, the "content" file is created automatically by SnapRAID so you must ensure that the user running SnapRAID (in my case me) has permission to create the file in that location. The same goes for the Parity file.
All my data drives have just a single partition and I have two root folders in each of them: Unprotected and RAIDMain. Everything I want RAID protected I put in RAIDMain and everything else I put in the Unprotected folder. Backups of other computers in my house go in the Unprotected folder. On the OS Drive I have a RAIDMain folder plus all the other system folders. None of my system folders are backed up. As you can see from the above include statement only the RAIDMain folders are included. SnapRAID offers great flexibility but I recommend you start off simple and then progress from there.
To keep the SnapRAID parity up to date you have to "sync" it. You can either do this manually when you know you have changed files protected by SnapRAID (added, deleted or updated them) or you can use a cron job to do it. I plumped for the latter option to save me having to remember to do it manually. For information, you issue the command snapraid sync in a Putty session or at the command prompt to bring the SnapRAID parity information up to date.
So, the script I use can be found here: SnapRAID Script and I downloaded it from here so all credit to the author.
Instead of downloading the script you can create it via Putty:
Highlight the whole script, right click and select Copy.
Using Putty navigate into the folder where you're going to store the script.
For example type cd /home/xxx/MyScripts where xxx is your username.
Next type vim SnapRAIDSync.sh (or your preferred script name) and press Enter. This will open the file for editing.
Then press the [Insert] key once, insert a few blank lines and then right click and the whole script will be pasted into the screen.
Edit the script as required and then press the [Esc] key once and type :wq to save and quit out of the script. If you make a mistake then issue :q! instead of :wq to abort your changes.
Don't forget to make the script executable: chmod a+x SnapRAIDSync.sh
As a last step you might need to install an extra package if you've not got it already:
sudo apt-get install mailutils
This script is really rather neat in many ways, one being that it will do nothing if you have deleted more than x files (x being 20 by default but you can change it to whatever number you like). This will ensure that if you accidentally delete a load of files you can still recover them from parity. If this fail-safe option wasn't there then those deleted files would be lost once you'd brought the parity information up to date. Another cool feature of this script is it will send you an email of exactly what it did. You obviously have to put your own email address in the script for it to email you.
Run the script as a cron job using Webmin
To set up the above script as a cron job within Webmin launch Webmin then click on System and then Scheduled Cron Jobs. Then click Create a new scheduled cron job at the top of the screen that opens.
Click the button next to the Execute cron job as option and choose your username.
Type the full path of the script into the Command box. So you'd type /home/htkh/MyScripts/SnapRAIDSync.sh replacing htkh with your own username, MyScripts with the name of the scripts folder you created and SnapRAIDSync.sh with the script name.
In the When to Execute section choose whatever time period you like. It is obviously not wise to schedule a sync when you are potentially in the midst of changing any of the data so choose a time period when you know you won't be. For example, to run the sync at 3am each day choose the Times and dates selected below .. option and in the Minutes section choose Selected .. and highlight 0. In the Hours section choose Selected .. and highlight 3. Leave the Days, Months and Weekdays options as All.
Don't forget to click the Create button when you've set up the schedule.
More help required
If you need to consult the manual you can issue the following command in a Putty session:
Or visit the SnapRAID website.
Still stuck? Now what you were looking for? Then head over to the Discussion Forum!