Snapshot style backups on linux similar to time machine

Rule number 1. ALWAYS. KEEP. BACKUPS. I repeat. Always Keep Backups. Backing up your files can seem like a useless tasks, until that one time you need them. Backups require you to configure a process to keep daily or weekly backups, extra storage space, and persistence to keep regular backups. It’s easy to get distracted and forget about backups, especially when you never use them, or you have a large file server that requires an additional storage medium to backup your files. I can’t stress enough how important it is that you keep up with backups regularly. I’ve had computers and drives crash more times than I can count. Power outages, hardware failures, anything can cause you to lose files. And if you’re like me, and maintain your life, your photos, your work files, your finances, virtually everything in the digital space, one simple failure can be devastating.

 

I’ve used many different backup tools from tape drives, to CD’s, to removable drives. I remember buying my first write-once CD for $23.00. Yes, that was a single CD. With hard drives now in the hundreds of Gigabytes, a CD can’t compare to a hard drive or thumb drive. The price has become affordable. No longer do we need a tape backup, if you remember those, they could be quite costly.

Let’s think about what is our optimal solution for backups. To be truly safe from data loss, we should consider the following scenarios:

  • Hard Drive Failure
  • Data Corruption by power loss, malicious intent, or hardware failures
  • Destruction of your location / home / office
  • Simple Human Error

 

Perhaps your hard drive fails. Or perhaps your computer is destroyed by some natural disaster. Or perhaps you simply make a mistake and save over a copy of an important file. How can we protect against all these scenarios? Backup often, and regularly, and keep an offsite copy. Whether in a safe, your car, another building or location, etc. This will insure that no matter what happens, you can still retrieve your sensitive data.

I work from home most of the time, so my process is to rotate a daily backup, keeping at least one copy in my vehicle at all times. Like American Express, I never leave home without it. This insures that no matter what happens, I have a backup of all my important files somewhere that is accessible, even if my home is destroyed by flood or fire. I also want the process to be automated as I tend to forget, or get lazy about creating backups. The one day I forget, is inevitably, the one day my computer decides to fail. Here then, are my prerequisites for a solid backup plan.

  • The process should be automated as much as possible
  • At a minimum, I should have a daily backup
  • There should always be a copy of my files offsite
  • I would like a minimum of a month’s worth of backups as I don’t often realize when a file might become corrupted or damaged

 

As far as backup scripts or tools go, I really like Apple’s Time Machine the best. Files are easily retrieved for anytime in history. You keep a full backup automatically and can retrieve a file from yesterday or a month ago if needed. It’s automatic, and uses an external drive to keep as much history as the drive space allows.

I wanted the same type of backup system on my linux file server. There’s a little known tool which allows me to do just the same as Apple’s Time Machine. That tool is called “rsync”. Rsync works by creating an exact copy of a directory and all its contents.

The Backup Script

Here is an example of a simple backup script that backups up a couple directories.

 

#!/bin/bash

BACKUP_DIR=/mybackup
BACKUP_HISTORY=$BACKUP_DIR/backup_history
MAX_BACKUPS=10
DIRS_TO_BACKUP="/var/www /home"

date=`date "+%Y-%m-%d-%H%M%S"`

echo "back-$date" >> $BACKUP_HISTORY

for dir in $DIRS_TO_BACKUP
do
   rsync -aXP --link-dest=$BACKUP_DIR/latest $dir $BACKUP_DIR/back-$date
done

rm -f $BACKUP_DIR/latest
ln -s $BACKUP_DIR/back-$date $BACKUP_DIR/latest

#rotate backups

backup_count=`cat $BACKUP_HISTORY | wc -l`
first_line=`head -n 1 $BACKUP_HISTORY`
MAX_BACKUPS=`expr $MAX_BACKUPS - 1`

if [ $backup_count -gt $MAX_BACKUPS ]; then
  awk '{if (NR!=1) {print}}' $BACKUP_HISTORY >> $BACKUP_HISTORY.1
  mv $BACKUP_HISTORY.1 $BACKUP_HISTORY
  rm -rf $BACKUP_DIR/$first_line
fi

In the above script, I am first getting the current date and using that in name of the folder I’m creating for this backup.

  • The -a option tells rsync to make an archive copy.
  • The -X option is to preserve extended attributes for OS X files.
  • The -P option tells rsync to display progress as it runs.
  • The best part of rsync is the –link-dest option. This option tells rsync to compare the new backup, with an existing copy of your backup. By providing this, rsync will only copy those files which have changed, and your new backup will create hard links to existing files which have not changed. The end result is that the “latest” directory, then appears to have a complete copy of all your files. The thing to note here, is how hard links work. A hard link is a pointer to a file stored in disk space somewhere. You can create any number of hard links pointing to the same file. It’s similar to a shortcut. So it looks like you have 5 copies of the same file, but in storage space terms, you are only storing the file contents once. This helps rsync save an enormous amount of space by only saving files which have been modified or changed. An important thing to note is when you delete a hard link, you are not deleting the file it points to as long as there still exists at least one other hard link to that file. You must delete ALL hard links to delete the file from your hard drive.

This script then takes a simple approach to rotate backups. You can keep as many as you want based on the amount of storage space you have available. The script will delete the oldest backup on each run if you have at least MAX_BACKUPS.Automate Your Backup

Use cron to automate your backup. crontab -e will open a screen to edit your cron jobs. A typically cron job for your backup might look like the following which runs every hour at 0 minutes.

#minute   hour   day-of-month   month   day-of-week
0 * * * *         /root/mybackup.bash >/dev/null 2>&1

If you want to run your backup once per day at 3 am, your crontab should look like this:

#minute   hour   day-of-month   month   day-of-week
0 3 * * *         /root/mybackup.bash >/dev/null 2>&1

Restoring Your Backup

You can restore your backup in a number of ways. You can simple cp a single file, or to restore the entire backup, remove the original directory and cp -a to copy a backup to its original location. Or simply use rsync again, but without the –link-dest option to restore a specific backup.

rsync -aXP --delete /mybackup/latest/www /var/

Rsync contains a lot of options, from a simple backup, to copying from one machine, to the next, even across the internet. Feel free to man rsync on your own server and explore the various options.

One thing I like to do is mark certain backups whenever I make significant changes to the OS or make hardware changes, etc. I simply create a soft link ln -s /mybackup/back-2011-12-10-010000 /mybackup/before_install. This allows me to easily keep track of snapshots prior to any changes that might cause a data loss. If you want to keep a specific snapshot, simply rename the directory and the rotate script won’t delete that snapshot.

Leave a Reply

Your email address will not be published.

You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*