How-To: Back-up your blog
Bad things happen. If you've ever worried that the over caffeinated tech might spill his latte down your web server, then today's How-To will help you out. Forgetting to back up your blog (or your website) is something that isn't a big deal until you need it -- like backing up anything, really. But your blog's files and databased aren't really so simply accessible as the files on your PC, so today we're showing you how to automatically back up your blog (or website) with some freely available tools that will use a minimum amount of your precious bandwidth.
For today's How-To you'll need:
A webhost with SSH (secure shell) access
A machine to backup to (we prefer Linux or FreeBSD, but you can use Windows with cygwin if you must)
Internet/network access between the two machines
ssh, rsync and cron installed on the backup machine
For our example backup host, we're using a normal PC with Ubuntu Linux installed.
It's preferable to have the server push the backup data, but the destination machine must be accessible via SSH. If you want to backup your data to home, and can't get a static ip address, this isn't always an option. We don't like to push the data from the server because we'd have to leave an SSH key on a shared server. (Danger Will Robinson! Danger!)
If you can't get or allow SSH access to your backup machine, you can pull the data to your backup machine. This makes our robot companions happier, since we keep our SSH keys to ourselves.
On your backup machine, open up a terminal, type df and hit enter. Our (old) machine has 32GB of space available. Plenty for most websites, unless you're running Engadget or something.
Whip out SSH and login to your web server and change to the directory containing what you want to backup (home directory, www, etc). We want to backup our Wordsmith installation, so we run 'du -hs wordsmith' to get the total size of the directory in human readable form. This is a fresh install, so it's only 2.9 Megabytes.
It's preferable to have the server push the backups to your client machine. Running 'which crontab' and 'which rsync' is an easy way to check for the required software
Everything we need is probably installed, but it's better to check now. From the Ubuntu desktop, click System, then Administration and choose the Synaptic Package Manager.
Click the search button and type in ssh. We only need the client, but the server comes in handy for us.
If it's not installed, select the openssh client by clicking the check box next to it.
Next search for rsync. Select the checkbox if it's not installed.
Lastly, search for cron. Check it if it's not installed. If anything needed to be selected, click the Apply button and Synaptic will install everything you need along with any dependencies.
To write the backup script, we need a few file locations. SSH to your server, and change to the directory that you want to backup. Run the 'pwd' command to get the full directory location. Since we control the backup machine, we created a directory called /backup to keep our data in on the backup host.
Pick a good place for a backup script (like ~user/bin) and use your preferred editor to write it. We like vi but you're allowed to use emacs, pico, etc. if you must. The first line tells the machine that this is a 'sh' shell script. The next line runs rsync and tells it to use ssh. The '\' is a continuation, so these three lines are actually one command. The next line tells rsync that the files are on the remote host 'examplehost.com' and are under the directory '/server/willo/www/wordsmith'. The last line tells rsync to copy the files to '/backup/wordsmith' on the local machine. Once it's written, run 'chmod 755 backup.sh' to make the script executable.
Run the backup script by hand to make sure it works. If everything is good to go, the above is what you'll see. The first time it's run, it will take a while because it is copying every file from the web host to the local backup machine. After that, only files that have been changed since the last backup will be copied over to save bandwidth and network. To keep from entering passwords, you'll need to set up ssh keys if you haven't already. Linux Server Hacks has a quick how-to on that.
In order to run the backup script regularly, we place an entry into our personal 'crontab'. Run 'crontab -e' to edit it. The 30 * * * * tells cron to run the backup.sh script every 30 minutes. Write the file when you exit to enable it. As it's written, you'll get an email every time it's run. Adding a ' > /dev/null' to the end of the line will prevent that. To make sure your new crontab was written properly, run 'crontab -l' and the contents of the crontab will be echoed to the shell.
If you're using software like wordsmith that uses mysql, you'll probably want to write another script that dumps the contents of the database to a file. For security, it should be stored to a directory outside of your website directories. Because you'll be keeping a password in the script, you should create a special backup user that only has read access to the database you want to backup. Running 'chmod 500 mysql.sh' on the script will make it executable, but not readable by anyone else. (add -pmypasswd to the line when you put it in its own backup file).
[Update: Micheal Ossareh suggest the following options to speed up and reduce the cost of a mysql dump.]
--add locks \
--extended-insert \
--quick \
--single-transaction
To finish the job, add a line to the crontab on your server that runs the database dump at midnight every day. Then add a line to your backup hosts crontab to rsync that file to your backup machine at 30 minutes after midnight (the line would start with 0 30 * * *). That should allow plenty of time for the dump to complete. One warning, you should only do this once a day, or your hosting provider may get angry with you and perhaps deface your blog, leaving you ashamed and hostless. Database dumps are fairly expensive (cpu wise) and shouldn't be run too often, if you can help it.
If you've gone through all this and used our example crontab entries, your back up machine is grabbing any changes to your blog directory on your webhost every half an hour. The next time that floods, fire or excited techs with lattes strike, you'll know you're covered and you didn't even need your boomstick.