📜 ⬆️ ⬇️

MySQL Database Sync Overview

image

When developing modern web applications, it is difficult to underestimate the benefits of using version control systems. In relation to the files of the developed product, we are able to track any stages of production at any time, starting with the first revision. The tools that help us in this are popularized today, are considered good form when developing, and in many cases, successful production without their use is impossible in principle. And what opportunities do we have when it becomes necessary to trace changes not in the files, but in the project databases? Under the cut, I will share information about existing tools with which I had to read.



1. PHP SQLDIFF, aka SQLDiff

( http://phpsqldiff.sourceforge.net/ )
')
A PHP script that allows you to see the complete differences (both in structure and in data) between any two database tables. There are no tools for automatic synchronization of the structure or data in the tool - only visual information is provided. Another of the significant drawbacks is the ability to connect only to the database, which can be accessed directly (not through an ssh tunnel). Slow speed of work on large amounts of data (work through a pear-module, which does not shine neither novelty nor speed). I consider this script to be very useful for the developer in cases when it is necessary to understand and visualize the differences between different tables - it has a convenient interface, quick setup. I would rather characterize as a useful pocket utility for quickly gaining insight about the desynchronization of tables, which in theory should be identical, rather than as a serious tool that can be used to automate synchronization processes.

2. LIQUIBASE

( http://www.liquibase.org/ )

Convenient multi-functional and easy-to-use database structure migrator to java. I see for myself in this plus, if used in conjunction with Jenkins.
Example (host1 is the server from which the database structure is to be copied; host2 is the server to which the structure is to be transferred from host1):

java -jar liquibase.jar --driver=com.mysql.jdbc.Driver \ --classpath=mysql-connector-java-5.1.xx-bin.jar \ --logFile=db.ExampleChangelog.xml \ --url="jdbc:mysql://host2" \ --defaultSchemaName=db_name \ --username=username \ --password="password" \ --referenceUrl=jdbc:mysql://host1 \ --referenceUsername=username \ --referencePassword="password" \ diffChangeLog > ChangeSet.xml 


Generates a changeset in xml format, the further migration of which brings the structure of the database to host2 to the state identical to host1.

Start Migration:

 java -jar liquibase.jar --driver=com.mysql.jdbc.Driver \ --classpath=/path/to/classes \ --changeLogFile=ChangeSet.xml \ --url="jdbc:mysql://host2" \ --username=user \ --password="password" \ migrate 


After migration, it makes sense to check again that the changeset is empty.
There is not only the synchronization of tables and fields, but also indexes, keys. Not sure about the stored procedures - I did not check.

Changeset can also be formed in other formats - in sql, in json (not only in xml). Forming a changeset in sql will be useful in cases where other utilities are used for migration.

3. schemasync

( http://schemasync.org/ )

A tool to synchronize the structure of the database. It requires python and the corresponding interface for mysql.
Of the significant differences with liquibase:
- schemasync creates not only a changeset, but also a file that allows you to roll back the changes (the most important and most valuable advantage, although, in my opinion, does not eliminate the need to make a backup before synchronization)
- liquibase allows not only to get a changeset, but also immediately start the migration by means of the utility itself. Maybe not a killer feature, but still convenient and useful.

schemasync works only with sql - no intermediate xml and analogues - I see for myself in this both advantages and disadvantages.
Very concise syntax, minimum settings. Allows you not to synchronize comments and autoincrement (customizable) - an unconditional plus.

Usage example:

 schemasync mysql://user:pass@dev-host:3306/dev_db mysql://user:pass@prod-host:3306/production_db 


4. MAATKIT data sync

( http://www.maatkit.org/doc/ )
mk-table-sync is a utility for synchronizing table data. Immediately I will mention that maatkit is a whole complex of tools for working with MySQL, which provides opportunities not embedded in the original MySQL. This is not surprising, considering that this product was created by Percona - we already got used to seeing products a la from them "We will take MySQL and add to it what it should have been in it for a long time."
It works effectively with tables only if there is a primary key or a unique index (which, in general, is justified).
It has an impressive number of options and settings, allows you to synchronize the master-slave, master-master configuration. It allows you to run automatic synchronization, but we are primarily interested not in this, but in the ability to create a change log. I think you can write a separate article on this utility, so I will not delve into the settings - I only emphasize that there are really many options and they give the developer a very flexible synchronization setting, allowing you to take into account many nuances. It makes sense to read the original documentation (http://www.maatkit.org/doc/mk-table-sync.html).

Usage example:

 mk-table-sync --verbose --print --charset=DB_CHARSET, h=host, P=port1, u=user1, p=password1, t=table1, D=db1 h=host2, P=port2, u=user2, p=password2, D=db2 > /__path__/ChangeLogQueries.sql 



Addition 1:

The mentioned tools have a significant drawback, which is less important when deploying to a production-server, but constantly reminds of itself during the development process - execution speed. Initially, the task was to build a mechanism that would automate the syncronization between sites in the chain “developers - test servers - stage - production”. Starting (in the above-mentioned chain) from test servers everything is quite simple - only time is needed from the resources, the load is minimal. If the process is carried out automatically according to a schedule at night, then it is not important how long it has been completed, but not to create a load and that the task should be fully completed by morning. If the synchronization is regular, then the lists of changes will never be excessively large, and the load will be negligible. Another thing when it comes to synchronization between the developer site and the test server. In this case, synchronization, which takes 1 hour, will always be tangible and will create inconvenience. These circumstances and prompted the synchronization tool number 5:

5. "Semi-automatic synchronization".


In the utilities described above, a large (slightly less than completely) part of the time is occupied by the formation of a list of differences. Then apply the script to eliminate the differences - the action is quite fast (again, not an axiom, but in most cases in the development process). This suggests that it is possible to work on the manual formation of a list of differences, and to automate only its application. There are only two obvious ways:
a) in any product there are methods for working with the database - it does not matter whether you use a popular framework or your product has a home-made core. It is enough to fix all the necessary requests (if only the structure is interested in - DDL, if the data is also update and delete, for example) in its own repository;
b) if your framework (and even more with this cms) does not have the ability to extend the methods of the class that works with the database and does not even have an event handler in the method that performs the query to the database, you can parse the log of all queries (I hope the ability to enable such a log has all target software) and record only the necessary actions from this log.

This “semi-automatic synchronization” with the creation of your own change logs has a right to exist, but should encourage the developer to use the above-mentioned utilities to check after synchronization that the tables really match (let it be done at least at night on a schedule to be sure that the human factor has not made its own corrections in data synchronization).

Addition 2:
(updated)
Unfortunately, almost all of the tools I mentioned have a significant drawback (and I will be glad if I am mistaken, but I have not found any refutations to my words) - they all require remote access to the database using tcp / ip. Given that the normal practice is to allow access to the database only from the local host and prohibit remote access, it will not be possible to achieve synchronization using only the tools of the mentioned utilities.
Many thanks to DarkByte user for the proposed solution with port forwarding - this solves the problem.

Successful application and do not forget to make backups!

Source: https://habr.com/ru/post/178875/


All Articles