Backing up MySQL databases to Amazon S3

Sep 23, 2012

Regardless of what database system you use, you want to backup your data regularly in case the worst happens to your database server. This article will walk you through how to completely automate your database backups from MySQL on Linux using Amazon’s Simple Storage Service, otherwise known as S3.

Backing up the database with mysqldump

In order to be certain you have consistent database backups you need to actually dump the database from MySQL using the mysqldump command. If you were to just copy the file system directories where the MySQL databases are stored you may not end up with a copy of the database in a consistent state because of operations in progress.

The mysqldump command allows us to either dump a specific database or all databases to the output. For our case today we are going to assume you want to backup all databases including the ones MySQL uses internally to store user privileges.

Setup a backup user that can access all databases from the local machine only with no password. This will save you from having to put your database root user credentials into a script file which is not a good idea.

GRANT LOCK TABLES, SELECT, SHOW VIEW, RELOAD on \*.\* to backup@localhost IDENTIFIED BY 'password';
FLUSH PRIVILEGES;

Then actually dump the database.

mysqldump --tz-utc --all-databases --flush-privileges --log-error=/var/log/mysqldump_errors.log  --result-file=/var/local/mysqldump_all.sql --user=backup --password=password

The options

  • –tz-utc makes it use UTC time so if you restore on a server with a different timezone it will not ruin the dates.

  • –flush-privileges is needed when dumping the user information in the special mysql database. This is also the reason our backup user needed the RELOAD privilege.

  • –log-error is just to make it easier to troubleshoot if something goes wrong

  • –user=backup and –password= specify our special backup user.

There is no reason to pay the transmission and storage costs of raw uncompressed SQL so you should also compress the dump file. It’s also possible to pipe the output from mysqldump directly into bzip or gzip.

bzip2 /var/local/mysqldump_all.sql

You should now have a file named bzip2 /var/local/mysqldump_all.sql.bz2

Storing the backup on S3

We are going to use a library called Boto which provides a Python interface around the AWS service APIs, but first we need to setup the S3 bucket and get our AWS credentials.

If you do not yet have an AWS account, you can sign-up for one at aws.amazon.com. Be sure to check out the pricing so you know what you are signing yourself up for before you start using this for backups.

You will need a bucket on S3 to be the destination for your backups. In S3 parlance a bucket is bascially just a top level folder. You can create one by going to the AWS Management Console, selecting the S3 service, and then clicking “Create Bucket”.

For the rest of this article, we will assume your bucket is named ‘example_backup’;

You will need your AWS access key and your AWS secret access key in order to access the S3 API using Boto. to get these credentials, visit aws.amazon.com and click on ‘My Account/Console’ and select ‘Security Credentials’ from the dropdown. Then find the section entitled ‘Access Credentials’ on that page. Your access key will be shown, but you will have to click the ‘Show’ link to temporarily display your secret access key so nosy overlookers cannot get it from over your shoulder.

Be careful with your secret access key because this is what someone needs to access or change your data or settings in S3 or other AWS services.

Python source for uploading with Boto

This is a simple Python script which uses Boto to upload one or more files to a specified directory in a given S3 bucket and account. Be sure to replace the placeholder values with your account and bucket information.

#!/usr/bin/python
import sys
import os.path
from boto.s3.connection import S3Connection
import boto

if __name__ == "__main__":
    print "Connecting to S3"
    accessKey = "<youraccesskey>" # change to your access key
    secretKey = "<yoursecretkey>" # change to your secret access key
    bucketName = "example_backup" # change to your bucket name
    s3conn = S3Connection(accessKey,secretKey)
    bucket = s3conn.get_bucket(bucketName)

    if len(sys.argv) < 3:
        print >> sys.stderr, "Usage: uploadtoS3.py <basedir> <file1> <file2> ... <fileN>"
        sys.exit(2)

    prefix = sys.argv[1] + "/" # first argument is directory name in bucket, can be "" if undesired
    for arg in sys.argv[2:]: 
        print "Uploading %s to %s in bucket %s" % (arg,prefix,bucketName)
        fname = os.path.basename(arg)
        key = bucket.new_key(prefix + fname)
        key.set_contents_from_filename(arg)
    print "Done."

Restoring

If you do need to restore you can download a backup file from your S3 account using the AWS Management Console, or you could also automate that using Boto. I will leave the later as an exercise for the reader.

After you have downloaded your backup file from S3 you can restore it like this assuming you chose to compress it with bzip2 as in the example above.

bunzip2 your_backup_filename.sql.bz2
mysql -u root -p[root_password] < your_backup_filename.sql

Putting it all together

In order for this to really be automated we need it to run all on its own. For that reason we are going to create a simple shell script to run all the actions needed. You can place this script into your /etc/cron.daily directory if your system is setup like mine, or you can install it somewhere and edit your crontab to run it from there.

#!/bin/sh
# Backup all mysql databases from this host to S3.

# remove last backup files first if needed
echo "Cleaning up any old backup files"
rm -f /var/local/mysqldump_all.sql{,.bz2}

# start by dumping out all the database. Remember to use the actual password for the backup user.
echo "Dumping databases"
mysqldump --tz-utc --all-databases --flush-privileges --log-error=/var/log/mysqldump_errors.log  --result-file=/var/local/mysqldump_all.sql --user=backup --password=password || exit

# Compress the database file. You could use gzip if you want it to run faster but compress a bit less.
echo "Compressing"
bzip2 /var/local/mysqldump_all.sql || exit

# Now do the actual upload to S3.
/usr/local/bin/uploadToS3.py database /var/local/mysqldump_all.sql.bz2

# All done.

Extras

One thing I did not cover here that is also useful is encryption of your data backups which is definitely important to some users. S3 has a built in encryption feature these days to help you with that if needed, or you can encrypt the data yourself before uploading it. However, as far as I know Boto does not yet support S3’s encryption features. You could easily use openssl on the command line to perform AES encryption before uploading though.

Another thing you may want to address is building the date into your backup name so that you can archive multiple different backups, or do a backup rotation scheme so that you keep so many days back.

References