Setting up a proper backup system is often ignored until it's too late. Manage a computer or a server for long enough, and you'll inevitably run into missing data, or worse yet, corrupted data. For small servers running on a VPS, a complete off-site backup solution might be cost prohibitive or even unavailable. Many backup systems use complicated or proprietary storage mechanisms, making recovery difficult when restoring from "bare metal". Using a combination of rsync, ssh, sudo, and a touch of bash, it's possible to back up your servers quickly and easily.
Tools Needed
Plan of Attack
Using a few standard *nix tools, we're going to set up incremental off-site backups. rsync will be the core of our backup strategy. rsync is an efficient and flexible program for copying data between different systems. In essence, rsync mirrors the contents of a source directory to a destination directory. What makes rsync awesome is that it can operate very quickly even over slow network links. By default, rsync will only transfer changed files between systems, ignoring files that already exist in the destination. This means that once the initial copy is complete, subsequent rsync commands will be much faster.
An important component of a backup strategy is not just to have the current state of a file system, but also to have some ability to pull files from previous backups. Without previous backup storage, it's possible for data corruption to cause data loss. rsync enables incremental backups with the --link-dest parameter. This flag tells rsync to use hard links to link to previous unchanged copies of a file. Without this option, each backup would contain a complete copy of a file, significantly impacting disk space needed for the backup.
rsync is flexible in that it enables the use of many different protocols to transfer files between servers. One of those protocols is SSH, which is installed on just about every *nix server by default. Using ssh allows us to use standard access controls for user accounts as well as use key authentication for security. Over slow network links, SSH is one of the best options for incremental backups. When using ssh, rsync spawns a copy of the rsync process on the server, allowing for changes to determined locally on the server instead of transferring the files to the backup system. This can be significantly faster than using NFS or Samba to access to source files from the backup server.
sudo is used to help protect our destination server. In order to back up a machine, the backup server must have permission to access most files on the backup client. A naïve approach would be to connect with ssh to a root account. However, that means that if the backup server is compromised it could be used to compromise the backup client. With sudo, we can add the ability for a restricted account to run a very specific command and ensure that the backup server can not change any files on the backup client.
Finally, we use cron to schedule our backups. Why? Because backups are SERIOUS BUSINESS.
All of the following steps are based on using Ubuntu 10.04. Modify the commands as needed for your distro or operating system. "Backup Server" means the server where backups are stored. "Backup Client" means the server that is being backed up.
Step 1: Add an rsync user account on the backup client
$ sudo useradd rsync 
Note that by default, this account will not be able to log in with a password. This helps improve security on the server by requiring SSH keys to log in remotely.
Step 2: Enable passwordless sudo for the rsync command
$ sudo visudo 
Add the following line to the end of the file:
rsync ALL=(ALL) NOPASSWD: /usr/bin/rsync --server --sender -logDtprze.iLsf --numeric-ids . / 
Step 3: Generate an SSH key on the backup server to authenticate with the backup client
$ sudo -i # mkdir .ssh # chmod 0700 .ssh # cd .ssh # ssh-keygen -C rsync-backup 
When creating the SSH key, don't enter a passphrase. Otherwise, the backup script will not be able to connect automatically. After the keypair is generated, you will need to copy id_rsa.pub to ~/.ssh/authorized_keys of the rsync user on the backup client. It's usually easiest to just copy and paste the public key from your terminal.
Step 4: Set up the backup script on the backup server
I've uploaded backup-servername.sh to github as a starting point. It can be placed in /etc/cron.daily to be run once every 24 hours. A default "excludes" file is provided as well to prevent backing up /dev, /proc, and other system directories. Make sure to make it executable and to replace all instances of "servername" with the name of the server you are backing up. As well, I usually keep the backup destination on a separate LVM volume that I only mount when needed. Simpler configurations can remove the calls to mount and unmount.
Step 4a: MySQL
To ensure consistent backups, I back up MySQL directly using mysqldump. Database backups are not stored incrementally, but for most servers the disk space used will be minimal. To back up all tables, either create a MySQL user with SELECT granted for all databases, or use the root MySQL account. Make sure that the permissions on the backup script are 0600 so that only root can read the saved password.
Step 5: Testing!
Since the backup program is a simple bash script, it can be executed directly to manually run a backup.
$ sudo /etc/cron.daily/backup-servername.sh 
If rsync isn't behaving as expected, or you are customizing the parameters, temporarily change the sudoers file on the backup client with visudo to allow all rsync options:
rsync ALL=(ALL) NOPASSWD: /usr/bin/rsync 
Then, use ps auxww | grep rsync to pull out the exact command line used.
After running the script over a few days, there will be dated directories containing each backup. You can verify that hard linking is working properly with du and stat. du will show us that the subsequent backups are taking up minimal disk space, while stat will confirm the number of Links to unchanging files.
# du -sh 20120101 20120102 15G 20120101 302M 20120102 # stat 20120101/bin/bash File: `20120101/bin/bash' Size: 818232 Blocks: 1600 IO Block: 4096 regular file Device: fc03h/64515d Inode: 979 Links: 47 Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2011-11-12 02:12:21.000000000 -0500 Modify: 2010-04-18 21:51:35.000000000 -0400 Change: 2012-04-14 10:10:04.715677821 -0400 
Restoring from a backup is simple. Boot your destination server off of a rescue image and enable SSH. Use rsync to copy everything back over to the server. Create any directories that were excluded. Though I like back up complete servers, most of the time when restoring I simply reinstall all packages with apt-get, and then rsync /home, /root, /usr/local, and /etc.
Advantages of rsync / hard link backups
- Possible to chroot into a backup if your server is the same OS and processor architecture as the backup client.
- Low disk usage compared to complete copies.
- Simple to understand.
- rsync runs on just about any OS available.
- Filesystem independent.
- No complexity of block-level incremental backups.
- Can delete any incremental backup in any order without affecting other backups.
Disadvantages of rsync / hard link backups
- User and group IDs may not match on the destination server.
- Editing files in the backup is possible.
- Poor performance and disk use for large files that change slightly, such as virtual machine disk images.
Next Steps
As is, this backup script doesn't remove any old backups. I like to do it manually every few months as it forces me to check to make sure that backups are still running successfully. For deployments beyond a single server, a combination of date and rm -rf should make it possible to easily remove old backups beyond a given age.