Almost everyone agrees that the bakapi should be done. But, nevertheless, this problem comes up again and again. A recent
survey showed two interesting points: firstly, half of us do not do bakapah at all, and secondly, the author did not even think to include the item “once a day” in the survey. What is wrong with the seemingly simple task - to pack your files and put the archive in a warm and dry place?
The main problem is that
bakap does not apply to those things that can be done without thinking ! If you try to stupidly pack the entire contents of the screw, then firstly you will have nowhere to store these archives (daily! :)), and secondly your machine will be busy archiving around itself around 24 hours a day, instead of fulfilling your tasks. And when you start
to think (which is already difficult), it turns out that the data on the screw is
very different , and it is also desirable to back them up differently (which ultimately complicates the situation). As a result, either the decision is made not to make backups at all (disguised as “postpone for later”), or the first available utility is put in and somehow quickly tuned in the hope that this will be enough.
Therefore, there is not, and cannot be, such a solution for bucks that would suit everyone. Someone needs to backup their texts under Windows on a network drive, someone needs to backup the entire server to tape, someone uses
Venti instead of traditional backups, and someone just needs to make regular backups of their web project and send to GMail . And this, in turn, greatly complicates the choice of software suitable for you.
Therefore, IMHO the best way to help someone with bakapami is to tell what, how and why you bakapi. I came across two different scenarios, and I want to talk about my decisions. It should be noted that in both cases, the most critical factor for me was
reliability - why do I need a backup from which it is impossible to correctly restore my data?
')
Backup of a large web project
A large project is not very easy to reliably secure - so that after recovery from the backup it will continue to work guaranteed and there will be no problems (except for the lack of new data that was added after the last backup, of course). The fact is that both the files on the disk and the database are far from always in a “consistent” (consistent) state. Some transactions are executed in the database (which are not always explicitly decorated as SQL transactions, by the way), temporary files are created, logs are written ... And in a large project there are often many other “entry points” besides HTTP: processing incoming letters, RPC requests , cron-tasks, different TCP / UDP services work. To keep all this colossus in a holistic state, and even to keep in it all the time that the bakap is being performed - the task for the external software is simply impossible!
In this regard, our solution is that the project itself must support the possibility of creating complete backups. The approach is about the same as with testing - in order for a project to be possible (and convenient) to test its code, it was originally written taking into account testing requirements.
This is implemented trivially "in the forehead" - all applications of the project support a single system of locks, and for the time of the back-up the project is blocked.
But this led to a new problem - after all, the case is not fast, especially the large database ... and if the project is blocked once a day for a long time, our users will not say thanks for it. We solved this problem by reorganizing the database in such a way that all the tables were divided into two types: static and dynamic. With “static” tables, only SELECT and INSERT SQL queries are executed, and “dynamic” can change as you like. Of course, all volume data was moved to static tables. This made it possible
to reduce the project blocking time for the backup to several seconds : for archiving (tar th, uncompressed) project files, creating a dynamic table dump, and storing the id of the last entry in static tables.
After that, the lock was removed, and, without haste, static tables were dumped (before the memorized id), tar was compressed and sent to the storage. Of course, static tables and project files can (and should) be incremented incrementally.
Despite the universality of the approach (we use it in all our projects), our script for backups will be of little use to others, since sharpened to use with our framework. If you are curious to see it:
asdfBackup .
Backup ordinary dedicated server
Having considered my software requirements for server backup, I came to the following list:
- Buckup software should be simple and reliable. Everybody knows the dependence of the number of bugs on the size of the software, and I only had enough bugs in the bucks for complete happiness!
- When a server is fully backed up, it is impossible to take into account all the nuances of the projects installed on it and ensure their guaranteed integrity after recovery from the backup. Therefore, the approach described below is used not instead of, but together with the specific projects described above. But, nevertheless, there are things that need to be ensured when a server is backed up - for example, database integrity (it may not be complete from the point of view of a particular project, but it must be complete from the point of view of the MySQL server).
- Bacup should be performed quickly so as not to interfere with the server to perform its main function. (On the server of a friend of the admin, I somehow discovered that his cron-task archiving a screw once a day in .tar.bz2 runs for two hours , and it’s almost impossible to do anything else on the server at this time!) In particular, taking into account the previous point, this means that you do not need to lock the database for the entire time of the server’s backup, but only for the time the files of the database itself are backed up.
- File format I really didn’t want to put such a critical thing as backups depending on non-standard file formats. Standard utilities are more than enough for archiving and encryption of backups (for example, tar + gzip + gpg).
- The system should allow sufficiently flexible processing of backups - encrypt, upload to other servers, etc.
- The interface should be sufficiently informative for debugging and setting up backups, but it should not spam me once a day with letters from each server “everything is in order, the backup is made”, I want to receive the letter only if an error has occurred.
powerbackup
Maybe I was not looking there, but I could not find software that meets these requirements. Therefore, I wrote it myself:
powerbackup . :) It turned out to be a small and simple sh-script (70 lines, 2 KB) that fully met the requirements described above and solved for me the server backup problem.
The necessary flexibility while maintaining simplicity was achieved with the help of two architectural solutions:
- The common task “server backup” is divided into several subtasks, for example: server settings bank, server database bases, user home directories.
- Creating a bakap is divided into two stages: preparing a file (incrementally via tar) and archiving it (gzip). And for each task you can specify individual hooks for these stages. For example, the database backup task intercepts the first stage to block MySQL while archiving files in / var / lib / mysql /. And the task of the user directory backup can intercept the second stage in order to replace gzip compression with encryption via gpg and uploading the file via scp.
More detailed configuration examples are on the site.
The license, as usual, is the public domain.