Hey there, guys! Aaron Wilson here, the ever-present but ever-invisible Olark Ruby Ninja Warrior. I'm coming out of the shadows to tell you a little bit about our fun journey with database backups.
A good backup never lets you know it's there...
The core of any startup is data; how it's stored, how it's processed, and how it's interpreted are the very essentials of computing.
Olark is no exception, and between document stores, keystores, relational databases, message queues, and so on, we've got a lot of information to manage. One of our most important datastores (although becoming less so, which could probably fill its own blog post) is a collection of MySQL databases that store, among other things, user information, user relationships and site configuration.
A lot of these collections of bits are key pieces of product data that keep everything else running. If we lost this datastore, it would be a huge setback for us as a company.
And so, as we've focused over the last nine months or so on eliminating single points of failure from our system, making the databases redundant and backing up this data were two items on the list of problems to solve--not only does having backup data create peace of mind, it also gives us an easy source of staging data to test with (and destroy) before deployment.
At the time, it was about 4.5GB of data to manage (now, it's more). Most of this data was in a single database, the datastore for our Rails website, clocking in at around 95% of the data. The backups had these criteria to meet:
- Compact: Keeping successive backups that are each ~4.5GB in size quickly adds up, even with storage space as cheap as it is these days. Compressing these with gzip is pretty effective, bringing it down to roughly a 30% of that, but it still adds up.
- Quick to restore: If this database goes down, important parts of our system become completely unusable. Even worse, with certain failure modes, future use can become unstable and need corrective maintenance. Minimizing these effects means minimizing the time it takes to get the restored data in place.
- Current: If our latest backup is from a week ago, that's a week of interactions to recreate. Even a single missed day of data can have a huge impact, so our backups need to be current.
- Non-blocking: Obviously, if your backup process interrupts availability, you're asking for problems to solve later. While the time the backups are taken is best done far away from peak load, availability is always important, especially when your customers are global.
- Tested: If you've never restored from your backup, you don't have a backup!
Backup Dojo: Forge me into a sword, that I might slay my demons
MySQL has built-in capabilities that solve part of these problems. Timely backups can be kept with binary logs, which have the ability to replay all the SQL actions taken in a given time. Binary logs churn quickly, though, and should be kept locally to keep MySQL write actions from piling up and potentially hogging resources from other things.
Since a lot of activity occurs (and actions are sometimes redundant), the size of these logs becomes unwieldy very quickly -- we found that keeping more than a couple of days' worth wasn't feasible. We settled, then, on snapshots, which would store state from given points in time that could be synched to present day with whatever binlogs we had on hand.
By default, MySQL doesn't make this easy on any database larger than a few hundred megs (at least, not without paying for a license). The go-to backup tool for MySQL, mysqldump, is fine for small databases, but for us was taking close to an hour to take a full snapshot of the main database (and similar time to load). That's bad, without any other qualification.
All sorts of awful things happen in much less time, and having such a huge window for the snapshot to be interrupted is asking for trouble. Luckily, the community has stepped up to fill in this gap in (the free version of) MySQL's functionality: Percona XtraBackup, a free, non- blocking, and blazing fast backup tool for InnoDB and XtraDB databases (we, incidentally, store all of this in InnoDB--anything MyISAM might serve us better for, we don't store in MySQL). Percona works by making use of InnoDB crash recovery; it essentially simulates a crash, copies the raw datafiles manually to the backup location, and uses crash recovery to validate backup integrity and play up to the binlogs that happened during the file copy. The install and usage of the product isn't completely trivial, but it's not bad, and a wonderful article by Sean Hull covers the essentials, so I won't.
Using XtraBackup cut the backup time from an hour to an astonishing five minutes, during which time the database was completely available (although not without caveats, which I'll talk about in a bit). The restore process takes the same amount of time, and some steps of the restore process can be "pre-loaded" to make restoring from a particular backup take about half as long (more precisely, it can allow about half of the restore process to run in the background while the restored database is available for writes; details below).
All of these steps I encapsulated in two Rake tasks--one for backup, and one for restore--which was then managed by a Python script that would bundle these backups with the others. The high-level of the backup Rake task looks like this:
- Load the Rails environment and grab the database credentials
- Define some things, and look for/create a lock file--this is a low-cost, easy-to-implement way to make sure you're not blindly re-running the backup process after it's failed, or running the backup more than once concurrently.
- Run innobackupex through Rake's sh command, using the --slave-data option, which saves a bunch of useful auxiliary data that makes spinning up a replicated DB easier.
- Make sure the backup actually exists, and run innobackupex with the --apply-log option to run crash InnoDB's crash recovery process
- Create a "prepared copy" of the backup. Since the restore of the database involves turning off the DB, replacing the data directory with the backup, and turning the database back on, we want to cut down on the amount of time it takes to replace those files, which means using "mv" rather than "cp" (shifting disk references to 4GB on the machine in question takes mere seconds--actually copying the files took, at the time, up to five minutes). If you mv the backed up directory, though, then you only get to restore your backup once, after which it no longer exists. That would be pretty silly. To solve this, we make a redundant copy of the backup directory and designate it the "prepared" one--whenever we run a restore, we'll mv this directory in, and then after we turn the DB back on with the backup, we start a cp in the background to create a new prepared directory.
- Delete all but some number of past backups. We actually only keep a day's worth of backups in an uncompressed state--the above-mentioned Python script takes care of compressing older backups and moving them around to keep our disk from filling up. This task, though, only manages them before they're compressed.
- Assuming everything completed successfully, remove the lock file to signal that we're open for business. Pretty straightforward. The restore task is similar: Find the prepared copy (create one if it doesn't exist), turn off MySQL, move the files in, turn MySQL back on. The only weirdness, here, is that a restore might be happening from a different environment than the backup originated--in particular, we completely restore the staging database every day from the last day's production data, and those databases have different credentials, and, crucially, different database names. Since the backup data is binary, there's no simple way to change the name of the database on the backups themselves, meaning the renames have to be performed in MySQL. The commands look like this:
STOP SLAVE; #if we took this backup from a slave in a replication setup, we don't want it to continue trying to run as a slave SET SESSION group_concat_max_len=4096; #We need to generate a very long query, so we want to make sure it doesn't get truncated by the default max SELECT @stmt := CONCAT('RENAME TABLE ',GROUP_CONCAT(table_schema,'.',table_name,' TO ','<current_env_db_name>.',table_name),';') FROM information_schema.TABLES WHERE table_schema LIKE '<previous_env_db_name>' GROUP BY table_schema; PREPARE rename_schema FROM @stmt; EXECUTE rename_schema;
...then appropriate revokes and grants are executed to make sure the Rails environment can properly execute, and that there aren't any old environment users hanging around to make things confusing. Piece of cake, right?
Pain is my greatest teacher, my scars my greatest strength
There are caveats to this process.
One fun thing that I discovered while creating the above task is that XtraBackup (understandably) is murder on disk i/o. The database is still available for reads, but writes will hang. This won't crash MySQL unless you're at high load, but if your subsidiary services that cause writes don't handle timeouts gracefully, they may crash/hang/explode.
The best solution to this is to run a replicated setup, and have XtraBackup run on one of your read-only slaves. For redundancy, we actually have multiple slaves, including one that doesn't process any reads or writes in production; it simply keeps itself dutifully rolled up to the master. Setting up the replication process to run from that server suited us just fine.
And so, those Rake tasks are run by the Python script I've already mentioned, which is scheduled by cron. The Python script also runs mysqldump backups of the smaller MySQL databases, manages compressing/deleting old copies of the database, performs a staging data restore (which has the added benefit of being a daily test of our restore process, since our staging environment is intentionally as close to identical to our production environment as possible), and finally, sends copies of our backups completely offsite, in case of a meteor attack on Rackspace (we're never safe until the dinosaurs can fight back). If any step of the process fails, monitoring systems send us an email.
And that's that! A few incantations, and a whole lot of peace of mind.