
This article is a continuation of the article. We are
developing a new file format for site backup , which considered a promising format for site backup.
In this article we will look at the second part of the format - the possibility of a block incremental backup for MySQL.
SXB format is intended for users of primary and secondary levels. For those who are not familiar (or cannot use them on a specific site) with hot backup tools (binary logs, file system snapshots, Xtrabackup, etc.). Roughly speaking, for those who use MySQL to mysqldump backup and similar programs that create SQL-dump database.
')
Let me remind you that the format is still at the technology development stage, and at the end of the article a PHP test script is waiting for you.
Why not mysqldump?
So, one of the most popular options for backing up MySQL databases is the bundled mysqldump program. On the Internet, you can find many, as simple scripts (including, there are several articles on Habré), and some backup programs that use mysqldump.
So why not use it? The main disadvantages are associated with the notorious “unix way” (when each program does the minimum task and is unaware of other programs).
In this case, the following problems arise:
Using a temporary file
The standard scheme of work is to create a file with a dump, and then add it to a tar or zip archive. Therefore, you need more space for backup, as well as extra time to copy.
Bad deduplication support
Due to the fact that mysqldump inserts comments and various metadata (such as the value of auto_increment), as well as due to the dependence of some tables on others.
Lack of dump navigation
To get data from only one table, you need to parse a dump until you find the right place. Because most often restore the entire dump (possibly in a temporary database).
No postprocessing
Since the dump is simply a set of SQL queries, and the recovery is performed by the standard program for executing any queries. That recovery takes longer (due to the more complex parser), and it is not possible to modify the queries (for example, use REPLACE instead of INSERT).
Some of these issues can be tried using various mysqldump options, but as a result, the recovery of the dump is complicated. Yes, and in practice, such solutions are not met. I tried to solve some of these problems in Sypex Dumper 2. Special metadata was added to the SQL dump. But in the new version I decided to go further.
New MySQL backup features in SXB format
Due to the fact that we use to restore a special program that can perform post-processing, and not stupidly split the file into requests and send them to the MySQL server - interesting new features appear.

In the pro version of the dumper, SELECT OUTFILE has already been used to speed up the backup process. Since it is quite clear that it will be much faster if the MySQL server itself saves data to a file, rather than transferring the data divided into separate fields, each of which still needs to be escaped, add quotes, brackets, etc. But at the same time, the dump was still brought to the usual SQL view.
In the new version, I wondered why to spend time on dumping during backup, if these decorations take 30-50% of time. Given that backup is usually done much more often than recovery. It is naturally better to transfer the additional load on the recovery process.
Working on the SXB format and block deduplication for files, an idea emerged that it would be nice to try deduplication for the database itself. After all, the database does not change much data between backups.
At the first stage I decided to implement deduplication at the block level, as well as for files. This option, of course, has drawbacks compared to row level deduplication. But it will be much faster and less demanding on resources.
The data itself is stored for each table separately, in the form of lines with tabs as separators. Table blocks do not depend on other tables. This achieves the compactness of the backup file, plus advanced post-processing capabilities during recovery. The structure of each table is also stored separately. In this case, the AUTO_INCREMENT value is cut from the structure of the tables in order not to constantly back up the structure of the table in which only AUTO_INCREMENT changes. And the AUTO_INCREMENT value itself is stored in the table header metadata.
For each block, an identifier (CRC32 + MD5 + Block Size) is considered, and by this identifier the uniqueness of the block is determined. Two hashing algorithms are used to avoid collisions. And the algorithms themselves are selected as the fastest. A common MD5 hash for the entire table is also considered.
Thus, if we encounter a block that is already in our backup (current or previous), then we use the link to this block, and do not add the repeating block itself to the current backup. As a result, we save space for backup, processor time (no need to compress the same data), and also because of the smaller size, backup loading into cloud storages is significantly accelerated.
To test MySQL incremental backup in practice, you can use a simplified script , it is desirable to share data.
You must have FILE access rights for your MySQL user and the MySQL server must be located on localhost. If there are many who want to add a version with simple selects.
As a result of the work, get such a table (cut the date to fit on the page).
+--------+------+---------+--------+--------+----------+----------+----------+--------+ | D & t | Tabs | Rows | Blocks | Dubs | Size | Dub.size | SXB.size | Time | +--------+------+---------+--------+--------+----------+----------+----------+--------+ | 07:38 | 26 | 557180 | 4779 | 0 | 37.03 MB | - | 11.43 MB | 3.8561 | | 08:06 | 26 | 557187 | 4779 | 4761 | 37.03 MB | 36.92 MB | 39.8 KB | 2.8214 | | 08:22 | 26 | 557187 | 4779 | 4775 | 37.03 MB | 37 MB | 12.22 KB | 2.5557 | | 08:37 | 26 | 557187 | 4779 | 4778 | 37.03 MB | 37.03 MB | 3.75 KB | 2.5052 | | 08:52 | 26 | 557193 | 4779 | 4768 | 37.03 MB | 36.96 MB | 24.57 KB | 2.6060 | | 09:10 | 26 | 557196 | 4779 | 4775 | 37.03 MB | 37.01 MB | 7.79 KB | 2.8218 | +--------+------+---------+--------+--------+----------+----------+----------+--------+
The first time is a full backup, the subsequent incremental backups. “SXB.size” shows the size of the file with incremental backup (i.e. modified blocks).