How to install and configure SnapRAID on Ubuntu Server

NOTE: This guide has been written for Focal Fossa 20.04LTS but has also been tested on Xenial Xerus 16.04 LTS, Ubuntu Trusty Tahr 14.04LTS, Precise Pangolin 12.04 & Lucid Lynx 10.04 although it may work equally well on earlier versions of Ubuntu too.

Overview

You may have read already that I currently have six 4TB drives in my server which hold all my CD, DVD & Blu-ray rips. Statistically speaking, the more drives you have the more likely you are to experience a drive failure at some point. As I obviously still own all the physical media I could re-rip them all in the event of such an occurrence. However, this would prove quite time-consuming and so I figured it’s about time I implemented some sort of RAID system to protect the data on the drives. RAID is basically a mechanism which allows you to recreate the data on a failed drive using the data from the other drive(s) in the RAID Array. It should be stressed that RAID is not a substitute for proper backups and so you should ensure important data is backed up properly. If you’re not familiar with RAID then you can read all about it here.

Traditional RAID systems operate at the system level and so you need to format your drives to implement your chosen RAID system. This can be irritating especially if you’ve already filled up your drives with data. Furthermore, if you experience multiple concurrent drive failures over and above the RAID tolerance level that you’ve implemented then you lose ALL the data in the RAID Array, not just the data on the failed drives.

In my search for a suitable RAID system to implement on my server I came across SnapRAID. SnapRAID is a RAID system that sits on top of your existing file system. This means you can implement SnapRAID without having to reformat your drives. Furthermore, the drives you want to protect with SnapRAID do not all have to have the same storage capacity as each other.

SnapRAID is what is called a snapshot RAID system. Many, if not most, RAID systems operate in real-time whereby the parity information used to re-construct your data is updated as soon as you make any changes to that data. This means that the Array is always up to date and all data can be re-constructed from parity if required.
A snapshot RAID system on the other hand operates very differently. Any changes you make to your data are not reflected in parity until you manually force a re-synch of the Array. It follows that until such time as the Array has been re-synched then any changed data is at risk since it cannot be re-constructed from parity. Once the Array has been re-synced then all data in the Array is once again protected.

Whilst at first glance it may seem that a real-time RAID system is the most sensible option it’s not necessarily the case. Consider the type of data on a typical media server; it’s nearly all ripped data and so rarely changes. Maintaining a real-time RAID parity can be quite resource intensive and as mentioned above it requires that the drives are empty before you start. Generally it also means that all drives have to remain spinning all the time plus they must have the same storage capacity as each other. Also, if you accidentally delete anything when using a traditional RAID setup it is also deleted immediately from the RAID Array. SnapRAID on the other hand operates more as a backup since the array is only brought up to date when you choose to do so. So you should be able to recover your deleted content from parity. I say should because it depends on what other changes have been made since you updated the parity information.

In my setup I’ve scheduled SnapRAID to automatically re-sync the Array regularly so I don’t have to remember to do it manually. Plus I’m using a script that emails me the results when it’s done, assuming of course that it actually had something to do (most times it does not). You can read more about SnapRAID here. With SnapRAID you can add individual folders and/or files to the RAID Array, you don’t have to add whole drives. This means, for example, you can RAID infrequently changing data on your OS drive whilst excluding the data which changes frequently. Or whatever variation you choose. The great thing about SnapRAID is it is flexible.

TIP: If you’re going to be following this guide step by step and are using Putty to administer your server then you can save yourself some typing by simply highlighting each command below, right-clicking on it and selecting Copy. Then toggle over to your Putty Session and right-click once more. The command you’ve just copied from here will be automatically pasted into your Putty Session.

How to install SnapRAID 11.x on Ubuntu Server

So, how do you install it and configure it? Well, there’s no GUI so you have to do a bit of typing I’m afraid!

NO GUI?? WHAT?? Are you SERIOUS?? Honestly, don’t be put off by this, there really is nothing to it. There is a GUI available if you really want one, although I’ve never tried it. You still have to install SnapRAID using a few commands, the GUI only comes into play once you’ve got it up and running.

So, before we do anything else we should bring the Ubuntu Repository up to date. So, from a Putty Session or, if you’ve got a screen and keyboard attached to your server, then you can use the command line itself to type the following:

sudo apt-get update

You’ll be prompted for a password. This is the password you created when you installed Ubuntu. Ubuntu tends to prompt for a password each time you issue a “sudo” command.

To install SnapRAID you need to make sure you have a suitable C Compiler installed. Issue the following command to install one. Note, if gcc is already installed then Ubuntu will tell you so and do nothing more:

sudo apt-get install gcc

Download and install SnapRAID

The next thing we need to do is create a folder to hold the downloaded installation files:

sudo mkdir /var/lib/snapraid

Now change the permissions on this folder and switch into it:

sudo chmod a+w /var/lib/snapraid

cd /var/lib/snapraid

Now let’s download the installation files. At the time of writing the latest version of SnapRAID 11.5. Check this page for the latest version and alter the following commands accordingly.

wget https://github.com/amadvance/snapraid/releases/download/v11.5/snapraid-11.5.tar.gz

The files come down compressed so we need to uncompress them:

tar -xzf snapraid-11.5.tar.gz

Now switch into the folder that has just been created:

cd snapraid-11.5

Now we want to check everything is in place and is ready to go:

./configure

This should run through without issue and the last few lines should read….

configure: creating ./config.status
config.status: creating Makefile
config.status: creating config.h

Now make the executables:

make

If you need to install make then issue the following command: sudo apt-get install make and then issue the make command once more.

Next we must check that everything has been built correctly:

make check

This will run through a whole bunch of tests and, hopefully, come back with the following:

Everything OK
===== Regression test completed with SUCCESS!
===== Please ignore any error message printed above, they are expected!
===== Everything OK
make[1]: Leaving directory '/var/lib/snapraid/snapraid-11.5'

We’re nearly there, only one more step:

sudo make install

As a bit of clean-up we can remove the downloaded file since we no longer need it:

cd .. && rm /var/lib/snapraid/snapraid-11.5.tar.gz

Now we have successfully installed SnapRAID we need to configure it. Before we do that let me explain a couple of things:

Drives and partitions

It’s important when you’re implementing a RAID solution that you understand the difference between drives and partitions. A drive is the physical disc unit itself. On that drive you can have one or more partitions. That is, you can split the drive up into separate chunks. These chunks will look like separate “drives” as far as your operating system is concerned. They will also be classed as separate “drives” as far as SnapRAID is concerned too. Therefore, if you have a single physical drive with 2 partitions on it and that drive goes bang, you have lost 2 “drives” as far as SnapRAID is concerned. With a single parity file that single drive failure will be over and above the permitted tolerance level and so you will lose all data on that physical drive, it cannot be recovered from parity. All my drives, apart from the OS drive have a single partition on them. My OS drive is split into two: one partition holds the OS and is not part of the RAID Array, the other partition holds data and IS part of the Array.

The Parity file

The Parity file is often a source of confusion for people new to RAID. To keep things simple, when I refer to a “drive” I also mean partition and in the example that follows I’m assuming you want to protect all the data on the drives using RAID. Remember, the Parity file, along with the data on the remaining drives, is used to recover the data on a failed drive.

The Parity file will be around the same size as the largest fullest data drive and so you need to specify a location for the Parity file with sufficient spare capacity to hold it. Consider the following drives:

  • 1TB data drive 50% full (ie. it contains 500GB of data)
  • 750GB data drive 100% full (ie. it contains 750G data)
  • 2TB data drive 75% full (ie. it contains 1.5TB of data)
  • 2TB data drive 90% full (ie. it contains 1.8TB of data)

In the above example you would need to store the parity file on a 2TB drive because that is the biggest data drive in the Array. Based on the current usage the parity file will be approximately 1.8TB in size (ie. the size of the data on the 4th disc). Although the 750GB drive is completely full it is still less than the 1.8TB of data on the 4th drive.

This brings me onto another important point. When I say that the parity file will be approximately the same size as largest fullest drive it is actually slightly larger than that. So, if you completely fill up the 2TB drive then the parity file will need to be 2TB plus a little bit more. ie. it will not fit on a 2TB drive. For this reason you should never completely fill your drives with data, especially the drives which have the same capacity as your parity drive.

Configure SnapRAID

Now we have SnapRAID installed we need to configure it. See the Configure SnapRAID guide.