Midgard backups with rdiff-backup (and Amazon S3)
Posted on 2007-07-31 10:10:26 EEST.
I have experimented with backing up my own Midgard blobs and database with rdiff-backup and offsiting them to Amazon S3 (it will take some more days to initially upload rest of the 11GB of blobs [photos mostly], so I have more info about that part later). The examples suppose Debian Etch system with standard Midgard installation.
Inspiration for my experiments comes from the following articles:
- How To: Bulletproof Server Backups with Amazon S3
- How I automated my backups to Amazon S3 using s3sync
Lets get started:
- Install rdiff-backup
apt-get install rdiff-backup - Create directories to store the rdiff data in
mkdir -p /var/backups/rdiff/midgard/blobs mkdir -p /var/backups/rdiff/midgard/sql - Create script to backup the database and blobs
nano /var/backups/rdiff/midgard/backup.shwith the following contents (adjust mysqldump arguments for username and password)#!/bin/bash # Update local rdiff directories mysqldump --opt midgard >/var/lib/midgard/backups/midgard_backup.sql rdiff-backup /var/lib/midgard/backups /var/backups/rdiff/midgard/sql/ rdiff-backup /var/lib/midgard/blobs /var/backups/rdiff/midgard/blobs/ - Make sure /var/lib/midgard/backups/ doesn't have any old SQL files you don't need to backup
- Run the script and wait, the first run will take some time if you have lots of blobs
- Make a small change and run the script again, observe changes in /var/backups/rdiff/midgard/rdiff-backup-data -tree.
- Put the backup script to cron
You now have local backups running nightly, storing the latest version as plain directory tree and earlier versions as reverse diffs.
Then to the S3 stuff
- Get yourself an S3 account from Amazon
- Install ruby and ruby-openssl
apt-get install ruby libopenssl-ruby - Get and install s3sync
mkdir -p /usr/src/s3sync cd /usr/src/s3sync wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz tar -xvzf s3sync.tar.gz mv s3sync /usr/lib/ruby/1.8/ - Check the README and experiment a bit to make sure s3sync works for you
- Make shortcut scripts to /usr/local/bin, s3cmd
#!/bin/bash export AWS_ACCESS_KEY_ID=your_key export AWS_SECRET_ACCESS_KEY=your_secret_key export SSL_CERT_DIR=/etc/ssl/certs PWD=`pwd` cd /usr/lib/ruby/1.8/s3sync ./s3cmd.rb --ssl $@ cd $PWDand s3sync#!/bin/bash export AWS_ACCESS_KEY_ID=your_key export AWS_SECRET_ACCESS_KEY=your_secret_key export SSL_CERT_DIR=/etc/ssl/certs PWD=`pwd` cd /usr/lib/ruby/1.8/s3sync ./s3sync.rb --ssl $@ cd $PWD - Check that the scripts work by calling them with the --help argument
- Create a bucket for your backups, buckets must have globally unique names so I suggest using something like com.yourdomain.rdiffs but in theory any unique name is valid.
s3cmd createbucket com.yourdomain.rdiffsIf this complains something to the tune of certificate verify failed then run apt-get install ca-certificates and if that doesn't help see S3sync README for the certificate you need. - Make initial sync to S3 (see s3sync documentation on the bucket:prefix format)
s3sync -v -r /var/backups/rdiff/ com.yourdomain.rdiffs:machinename - Make a cronjob to periodically sync you rdiff directories to S3
s3sync --delete -r /var/backups/rdiff/ com.yourdomain.rdiffs:machinenameIncidentally this will sync all directories under the /var/backups/rdiff/ so if you add more rdiff-backup directories there is no need to change this, do note that the --delete option causes anything that is not present locally be deleted from S3 (s3sync works very much like rsync)
Whew, done...
Update 2007.08.01
The initial upload finished yesterday, I didn't have the forethought to time it but in any case it was basically bound to my outgoing bandwidth (with wondershaper on the FW making sure that normal browsing etc is not unduly affected). Now doing the s3sync with no changes takes a little over 8 minutes, two of which is actual CPU time (dataset is 11GB in 26.5k objects), iptraf shows only moderate burst traffic so I'd say most of the time is spent in local file I/O (to calculate hashes) and waiting for S3 to reply.
And here's a small script one could use to sync the rdiff directory as often as seen fit, it will avoid parallel runs (S3 does not have any kind of locking so having parallel processes writing to same objects is bad idea)
#!/bin/bash
CMD="s3sync --delete -r /var/backups/rdiff/ com.yourdomain.rdiffs:machinename"
ps ax | grep -e "$CMD" >/dev/null
RET=$?
while ( [ $RET == 0 ] ); do
date
echo Another instance of \"$CMD\" is running, waiting for it to finish
sleep 45
ps ax | grep -e "$CMD" >/dev/null
RET=$?
done
$CMD
Update 2007.09.21
The backups have been runnin smoothly for a while now, with this amount of data my monthly S3 fees are about 2.5USD / month, not bad at all...