by Andrew Berry on May 2, 2012 // Short URL

Simple off-site backups with rsync, ssh, and sudo

Setting up a proper backup system is often ignored until it's too late. Manage a computer or a server for long enough, and you'll inevitably run into missing data, or worse yet, corrupted data. For small servers running on a VPS, a complete off-site backup solution might be cost prohibitive or even unavailable. Many backup systems use complicated or proprietary storage mechanisms, making recovery difficult when restoring from "bare metal". Using a combination of rsync, ssh, sudo, and a touch of bash, it's possible to back up your servers quickly and easily.

Tools Needed

Plan of Attack

Using a few standard *nix tools, we're going to set up incremental off-site backups. rsync will be the core of our backup strategy. rsync is an effecient and flexible program for copying data between different systems. In essence, rsync mirrors the contents of a source directory to a destination directory. What makes rsync awesome is that it can operate very quickly even over slow network links. By default, rsync will only transfer changed files between systems, ignoring files that already exist in the destination. This means that once the initial copy is complete, subsequent rsync commands will be much faster.

An important component of a backup strategy is not just to have the current state of a file system, but also to have some ability to pull files from previous backups. Without previous backup storage, it's possible for data corruption to cause data loss. rsync enables incremental backups with the --link-dest parameter. This flag tells rsync to use hard links to link to previous unchanged copies of a file. Without this option, each backup would contain a complete copy of a file, significantly impacting disk space needed for the backup.

rsync is flexible in that it enables the use of many different protocols to transfer files between servers. One of those protocols is SSH, which is installed on just about every *nix server by default. Using ssh allows us to use standard access controls for user accounts as well as use key authentication for security. Over slow network links, SSH is one of the best options for incremental backups. When using ssh, rsync spawns a copy of the rsync process on the server, allowing for changes to determined locally on the server instead of transferring the files to the backup system. This can be significantly faster than using NFS or Samba to access to source files from the backup server.

sudo is used to help protect our destination server. In order to back up a machine, the backup server must have permission to access most files on the backup client. A naïve approach would be to connect with ssh to a root account. However, that means that if the backup server is compromised it could be used to compromise the backup client. With sudo, we can add the ability for a restricted account to run a very specific command and ensure that the backup server can not change any files on the backup client.

Finally, we use cron to schedule our backups. Why? Because backups are SERIOUS BUSINESS.

All of the following steps are based on using Ubuntu 10.04. Modify the commands as needed for your distro or operating system. "Backup Server" means the server where backups are stored. "Backup Client" means the server that is being backed up.

Step 1: Add an rsync user account on the backup client

$ sudo useradd rsync
Note that by default, this account will not be able to log in with a password. This helps improve security on the server by requiring SSH keys to log in remotely.

Step 2: Enable passwordless sudo for the rsync command

$ sudo visudo

Add the following line to the end of the file:

rsync   ALL=(ALL) NOPASSWD: /usr/bin/rsync --server --sender -logDtprze.iLsf --numeric-ids . /

Step 3: Generate an SSH key on the backup server to authenticate with the backup client

$ sudo -i
# mkdir .ssh
# chmod 0700 .ssh
# cd .ssh
# ssh-keygen -C rsync-backup

When creating the SSH key, don't enter a passphrase. Otherwise, the backup script will not be able to connect automatically. After the keypair is generated, you will need to copy id_rsa.pub to ~/.ssh/authorized_keys of the rsync user on the backup client. It's usually easiest to just copy and paste the public key from your terminal.

Step 4: Set up the backup script on the backup server

I've uploaded backup-servername.sh to github as a starting point. It can be placed in /etc/cron.daily to be run once every 24 hours. A default "excludes" file is provided as well to prevent backing up /dev, /proc, and other system directories. Make sure to make it executable and to replace all instances of "servername" with the name of the server you are backing up. As well, I usually keep the backup destination on a separate LVM volume that I only mount when needed. Simpler configurations can remove the calls to mount and unmount.

Step 4a: MySQL

To ensure consistent backups, I back up MySQL directly using mysqldump. Database backups are not stored incrementally, but for most servers the disk space used will be minimal. To back up all tables, either create a MySQL user with SELECT granted for all databases, or use the root MySQL account. Make sure that the permissions on the backup script are 0600 so that only root can read the saved password.

Step 5: Testing!

Since the backup program is a simple bash script, it can be executed directly to manually run a backup.

$ sudo /etc/cron.daily/backup-servername.sh

If rsync isn't behaving as expected, or you are customizing the parameters, temporarily change the sudoers file on the backup client with visudo to allow all rsync options:

rsync ALL=(ALL) NOPASSWD: /usr/bin/rsync

Then, use ps auxww | grep rsync to pull out the exact command line used.

After running the script over a few days, there will be dated directories containing each backup. You can verify that hard linking is working properly with du and stat. du will show us that the subsequent backups are taking up minimal disk space, while stat will confirm the number of Links to unchanging files.

# du -sh 20120101 20120102
15G 20120101
302M 20120102
# stat 20120101/bin/bash
  File: `20120101/bin/bash'
  Size: 818232    Blocks: 1600       IO Block: 4096   regular file
Device: fc03h/64515d Inode: 979         Links: 47
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2011-11-12 02:12:21.000000000 -0500
Modify: 2010-04-18 21:51:35.000000000 -0400
Change: 2012-04-14 10:10:04.715677821 -0400

Restoring from a backup is simple. Boot your destination server off of a rescue image and enable SSH. Use rsync to copy everything back over to the server. Create any directories that were excluded. Though I like back up complete servers, most of the time when restoring I simply reinstall all packages with apt-get, and then rsync /home, /root, /usr/local, and /etc.

Advantages of rsync / hard link backups

  • Possible to chroot into a backup if your server is the same OS and processor architecture as the backup client.
  • Low disk usage compared to complete copies.
  • Simple to understand.
  • rsync runs on just about any OS available.
  • Filesystem independent.
  • No complexity of block-level incremental backups.
  • Can delete any incremental backup in any order without affecting other backups.

Disadvantages of rsync / hard link backups

  • User and group IDs may not match on the destination server.
  • Editing files in the backup is possible.
  • Poor performance and disk use for large files that change slightly, such as virtual machine disk images.

Next Steps

As is, this backup script doesn't remove any old backups. I like to do it manually every few months as it forces me to check to make sure that backups are still running sucessfully. For deployments beyond a single server, a combination of date and rm -rf should make it possible to easily remove old backups beyond a given age.

Andrew Berry

Senior Drupal Architect

Want Andrew Berry to speak at your event? Contact us with the details and we’ll be in touch soon.

Comments

greg.1.anderson

Bash script to remove old backups

I have been using a similar technique to backup stuff with rsync and hard links for some time now. I manage pruning older copies from the backup set by grouping them into hourly, daily, weekly and monthly folders. The "hourly" folder is the first destination for a historical backup folder. If the time delta of the items in the hourly folder gets to be > 1 day, then one of the older items is migrated to the "daily" folder, and the others are deleted. This process is repeated for each of the time periods. This method of distribution over time insures that I have a good mix of recent and older copies of each item set being backed up.

The migration script is here:

https://github.com/greg-1-anderson/utiliscripts/blob/master/migratehisto...

Reply

bunam

Nice job ! I'm using

Nice job !

I'm using rdiff-backup.
My final goal is rdiff-backup + a dedupe file system but i'm not yet.

Reply

Chris

In your shell script the

In your shell script the double-quotes around the mysql password are dubious since they are inside a double-quoted statement. They will not be included in the final command. Perhaps they should be omitted or a $ should be added if a variable is intended to be used there.

Reply

drupalfever

Password in Quotes by design

I believe that Andrew Berry put the phrase "my-super-secure-password" between quotes by design.

It is a reproachful practice to leave the database root user without a password but it is also very common to find it on production servers out there!

Knowing that an expressive percentage of databases out there have the root user with no password may have been the reasoning behind Andrew Berry's decision to write the script the way it is. The script, set the way it is, would work perfectly on servers where the database's root user has no password set!

Reply

hendrohwibowo

rsnapshot

rsnapshot is the most powerful of all, you'll get daily, weekly, monthly (and hourly if needed) and still not taking too much space because it used symlink to unchanged files.

i never looked back into those complicated format, backup strategy and very expensive commercial enterprise backup tool again since i got to know rsnapshot.

Reply

deviantintegral

It will work anywhere you

It will work anywhere you have SSH and rsync access. You'd probably want to modify the script to only backup from your home directory instead of the whole server.

Reply

DrupalFever

visudo settings problem

I wanted to start by saying that I successfully implemented the Backup schema suggested on this article in my company's Dedicated Server . Woohoo!!

It toke me a while but I was finally able to make it work thanks to your thorough and clear explanations.

I only have a small problem. When I set the permissions for the "rsync" user on our Dedicated Server according to what you suggested, I get an error message.

rsync   ALL=(ALL) NOPASSWD: /usr/bin/rsync --server --sender -logDtprze.iLsf --numeric-ids . /

If I set it with the simpler permission that you suggested on the 5th step (Testing), it works just fine. If, however, I make any other change on the permission string it stops working.

I followed your suggestion and used the following to figure out the right permission string but got nowhere.

ps auxww | grep rsync

My question is:
Is there any security risk to let the "rsync" user have the following security settings on the sudo file?

rsync   ALL=(ALL) NOPASSWD: /usr/bin/rsync

By the way, thank you so much for sharing this with the rest of us!

Reply

deviantintegral

The biggest risk is that if

The biggest risk is that if the rsync account is compromised, that it could be used to write files to the server instead of just simply changing them. Restricting what rsync can do offers a decent amount of protection. As long as you trust anyone on the backup server who has access to the rsync key, there isn't a huge security risk in using the wider sudo setting.

Reply

DrupalFever

Thanks

Hi, Andrew! Thanks for the quick response.

The backup server that we have here is not exposed to the internet. I am the only one here in the office who has access to it. Access to the rsync user through the Backup Server should not be a security risk.

In any case, I will make a conscious effort to understand visudo and user permissions on CentOS a little better.

Thanks again.

Reply