Midgard backups with rdiff-backup (and Amazon S3)

Posted on 2007-07-31 10:10:26 EEST.

I have experimented with backing up my own Midgard blobs and database with rdiff-backup and offsiting them to Amazon S3 (it will take some more days to initially upload rest of the 11GB of blobs [photos mostly], so I have more info about that part later). The examples suppose Debian Etch system with standard Midgard installation.

Inspiration for my experiments comes from the following articles:

Lets get started:

  1. Install rdiff-backup
    apt-get install rdiff-backup
                
  2. Create directories to store the rdiff data in
    mkdir -p /var/backups/rdiff/midgard/blobs
    mkdir -p /var/backups/rdiff/midgard/sql
                
  3. Create script to backup the database and blobs
    nano /var/backups/rdiff/midgard/backup.sh
                
    with the following contents (adjust mysqldump arguments for username and password)
    #!/bin/bash
    # Update local rdiff directories
    mysqldump --opt midgard >/var/lib/midgard/backups/midgard_backup.sql
    rdiff-backup /var/lib/midgard/backups /var/backups/rdiff/midgard/sql/
    rdiff-backup /var/lib/midgard/blobs /var/backups/rdiff/midgard/blobs/
                
  4. Make sure /var/lib/midgard/backups/ doesn't have any old SQL files you don't need to backup
  5. Run the script and wait, the first run will take some time if you have lots of blobs
  6. Make a small change and run the script again, observe changes in /var/backups/rdiff/midgard/rdiff-backup-data -tree.
  7. Put the backup script to cron

You now have local backups running nightly, storing the latest version as plain directory tree and earlier versions as reverse diffs.

Then to the S3 stuff

  1. Get yourself an S3 account from Amazon
  2. Install ruby and ruby-openssl
    apt-get install ruby libopenssl-ruby
                
  3. Get and install s3sync
    mkdir -p /usr/src/s3sync
    cd /usr/src/s3sync
    wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
    tar -xvzf s3sync.tar.gz
    mv s3sync /usr/lib/ruby/1.8/
                
  4. Check the README and experiment a bit to make sure s3sync works for you
  5. Make shortcut scripts to /usr/local/bin, s3cmd
    #!/bin/bash
    export AWS_ACCESS_KEY_ID=your_key
    export AWS_SECRET_ACCESS_KEY=your_secret_key
    export SSL_CERT_DIR=/etc/ssl/certs
    PWD=`pwd` 
    cd /usr/lib/ruby/1.8/s3sync
    ./s3cmd.rb --ssl $@
    cd $PWD
                
    and s3sync
    #!/bin/bash
    export AWS_ACCESS_KEY_ID=your_key
    export AWS_SECRET_ACCESS_KEY=your_secret_key
    export SSL_CERT_DIR=/etc/ssl/certs
    PWD=`pwd` 
    cd /usr/lib/ruby/1.8/s3sync
    ./s3sync.rb --ssl $@
    cd $PWD
                
  6. Check that the scripts work by calling them with the --help argument
  7. Create a bucket for your backups, buckets must have globally unique names so I suggest using something like com.yourdomain.rdiffs but in theory any unique name is valid.
    s3cmd createbucket com.yourdomain.rdiffs
                
    If this complains something to the tune of certificate verify failed then run apt-get install ca-certificates and if that doesn't help see S3sync README for the certificate you need.
  8. Make initial sync to S3 (see s3sync documentation on the bucket:prefix format)
    s3sync -v -r /var/backups/rdiff/ com.yourdomain.rdiffs:machinename
                
  9. Make a cronjob to periodically sync you rdiff directories to S3
    s3sync --delete -r /var/backups/rdiff/ com.yourdomain.rdiffs:machinename
                
    Incidentally this will sync all directories under the /var/backups/rdiff/ so if you add more rdiff-backup directories there is no need to change this, do note that the --delete option causes anything that is not present locally be deleted from S3 (s3sync works very much like rsync)

Whew, done...

Update 2007.08.01

The initial upload finished yesterday, I didn't have the forethought to time it but in any case it was basically bound to my outgoing bandwidth (with wondershaper on the FW making sure that normal browsing etc is not unduly affected). Now doing the s3sync with no changes takes a little over 8 minutes, two of which is actual CPU time (dataset is 11GB in 26.5k objects), iptraf shows only moderate burst traffic so I'd say most of the time is spent in local file I/O (to calculate hashes) and waiting for S3 to reply.

And here's a small script one could use to sync the rdiff directory as often as seen fit, it will avoid parallel runs (S3 does not have any kind of locking so having parallel processes writing to same objects is bad idea)

#!/bin/bash
CMD="s3sync --delete -r /var/backups/rdiff/ com.yourdomain.rdiffs:machinename"
ps ax | grep -e "$CMD" >/dev/null
RET=$?
while ( [ $RET == 0 ] ); do
    date
    echo Another instance of \"$CMD\" is running, waiting for it to finish
    sleep 45
    ps ax | grep -e "$CMD" >/dev/null
    RET=$?
done
$CMD
    

Update 2007.09.21

The backups have been runnin smoothly for a while now, with this amount of data my monthly S3 fees are about 2.5USD / month, not bad at all...

Back

Layout Copyright © 2006 Finnish Teleservice Center Ltd Oy - Site Powered by Midgard CMS