
When writing any large enough project, there are always more or less similar problems. One of them is the problem of the speed of receiving system updates. It’s relatively easy to get small updates fast. Quite simply, occasionally getting large volume updates. But what if you need to quickly update a large data array?
For Target Mail.Ru, as well as for any advertising system, quick change accounting is important for the following reasons:
• the ability to quickly turn off the campaign if the advertiser stopped it in the interface or if it ran out of money, which means we will not show it for free;
• convenience for the advertiser: he can change the price of a banner in the interface, and within a few seconds his banners will begin to show at a new cost;
• quick response to changing situations: changing CTR, the arrival of new data for teaching mathematical models. All this allows you to adjust the strategy of displaying advertising, sensitively responding to external factors.
')
In this article I will talk about updating the data contained in large tables in the MySQL database, focusing on speed and consistency - I would not like to get a new banner, but not to get this ad campaign.
How could we do this using standard MySQL tools? It would be possible to periodically re-read all tables entirely. This is the most non-optimal option, because along with the new data a lot of old, already known data will be distilled, a huge load will fall on the network and on the MySQL server. Another option is to prepare the data storage scheme accordingly: enter the last update time into all tables and make selections at the required time intervals. However, if there are a lot of tables, and too many machines, where it is necessary to store a copy of the data, then there will be very, very many selections, and the load on the MySQL server, again, is large. In addition, it is necessary to take care of the consistency of the received updates.
Libslave offers us a convenient way to solve the problem:
• the data comes to us from the database "themselves" - there is no need to refill the database if there are no updates at the moment;
• updates come in the order in which they were performed on the wizard, we can see the whole history of changes;
• it is not necessary to specially prepare tables, enter timestamps that are unnecessary from the point of view of business logic;
• transaction boundaries are visible — that is, data consistency points;
• low load on the master: it does not execute requests, does not load the processor - it simply sends files.
How we use libslave for our purposes will be this article. I will briefly describe how replication of data in MySQL is arranged (I think everyone is well aware of it, but still), how libslave itself is arranged, how to use it, I will give the results of benchmarks and some comparative analysis of existing implementations.
Replication device
The master database records each request that changes data or a data scheme in a special file - the so-called
binary-log . When the binary log reaches a certain size, the recording moves to the next file. There is a special index of these binary logs, as well as a specific set of commands for managing them (for example, for removing old binlog).
A slave receives these files over the network in raw form (which simplifies the implementation of the wizard) and applies them to its data. Slave remembers the position to which he read the binlogs, and therefore, when restarting, the master asks him to continue sending logs to him starting from the desired position.
There are two replication modes - STATEMENT and ROW. In the first mode, the wizard writes to the initial log the initial queries that it performed to modify the data (UPDATE / INSERT / DELETE / ALTER TABLE / ...). On the slave, all these queries are executed in the same way as they would be executed on the master.
In the ROW mode, available since MySQL version 5.1, it is not queries that are written to the binlog, but data already modified by these queries (however, queries that modify data schemas (DDL) are still written as they are). An event in ROW mode is:
• single data line for INSERT and DELETE commands. Accordingly, the inserted string is written for INSERT; for DELETE, the deleted
• two lines - BEFORE and AFTER - for UPDATE.
With some massive data changes, such binlogs are much more than if we recorded the request itself, but this is a special case. To use ROW replication in our daemon, we are very comfortable in that we don’t need to be able to execute queries and we immediately get what we need - modified rows.
By the way, it is not necessary to include ROW replication on your most important master server. The master server can feed several slaves using normal STATEMENT replication (reserves, backups), and some of these slaves can write ROW logs, and they will already be accessed by the daemon.
How it works for us
Data acquisition from the database occurs in two stages:
1. initial data loading at the start of the daemon
2. receiving updates
Initial loading of data, in principle, can be facilitated by the presence of dumps - the demon can drop its state to disk and write binlog positions to the same dump. Then, at startup, you can load data into memory from the dump and continue reading the binlog from the last position. But when there is no dump, you need to do something. And this is something, of course, Select.
Select the data so that it is consistent in memory. To do this, you can select data from all tables in a single transaction. But after that we need to start reading updates on libslave and decide from which position we will read. We cannot take the current position after the selects, since during the selects, new data could be written to the database, which did not get into the select, but the position of the binlog was moved. Taking a position before the start of selects is also impossible, since from the moment when we take the current position, until the beginning of the select, new data may come. Start the transaction with the BEGIN command, and then get the current binlog position, again, it does not come out - there is no synchronization between them, and if one client made BEGIN, then other clients could write data at that time, and the binlog position, respectively, could shift .
All these reflections lead us to the idea that ensuring the consistency of reading will be partly the task of the demon. The data we have in our memory is arranged so that if a non-consistent object comes to us (in the sense that we can detect its inconsistency — for example, it lacks the necessary connections), then it will simply be thrown out of the demon’s memory; however, if later it comes consistent, it will be inserted into memory. If he comes consistent twice, his last state will remain in memory. If it was consistent in memory, but it is non-consistent, then we will not apply non-consistent state, and the object in the demon's memory will not change.
For all these reasons, the correct, from our point of view, bootstrap model looks like this:
1. We get the current binlog position on the master p1 - we do this at an arbitrary point in time.
2. We do all selekta.
3. We get the new current binlog position on the p2 master.
4. Using libslave, we read all the events between p1 and p2. In this case, the daemon can come as new objects that were formed during the select and modified old ones that already exist in memory.
5. After that, the daemon has a consistent copy of the data, the daemon is ready to work - we can respond to requests and receive updates using libslave, starting from the p2 position.
I emphasize that, since we maintain data consistency in the daemon, we don’t need to do selections in Section 2 in a single transaction. Consider, for example, such a complex sequence of events:
1. Get the current binlog position.
2. Began to read the table of campaigns in which there is a campaign campaign1.
3. A new banner banner1 was created in state 1 in the existing campaign1 campaign.
4. A new campaign, campaign2, was created that does not fall into the result of the select.
5. A new banner2 has been created, which is tied to campaign2.
6. Banner1 has transitioned to state 2.
7. Finished reading the table of campaigns, proceeded to reading the table of banners, and banner1 in state 2 and banner2 will be included in this reading.
8. Read the table of banners.
9. A new banner3 was created in the campaign2.
10. Got a new binlog current position p2.
And now let's see what happens in the daemon:
1. The daemon selects all campaigns and memorizes them (perhaps checking some criteria - maybe we don’t need all the campaigns in memory).
2. The demon moved to the select banner. He will read banner1 immediately in state 2 and will memorize it, tying it to the campaign1 already read.
3. He will read the banner2 and throw it back, because there is no campaign2 for him in his memory - she did not get into the results of the select.
4. On it ended. Moving on to reading the changes from position p1 to position p2.
5. We meet the creation of banner banner1 in state 1. Let me remind you that in the memory of the demon it already exists in its last state 2, but, however, we will apply this update and translate the banner to state 1 - this is not a problem, since the data will be used for work only after we finish reading to the p2 position, and before this position we will receive changes to this banner again.
6. Read the creation of a new campaign campaign2 - remember it.
7. Read the creation of a banner2 tied to it — now we will remember it, since there is a corresponding campaign for it, and the data is consistent.
8. Read the translation of the banner1 in state 2 - applied, now and then consistently.
9. Read the creation of banner3 in the campaign campaign2 - inserted into the memory.
10. We reached the position of p2, stopped - all data is loaded consistently, we can give it to the user and read the updates further in the normal mode.
Note that at the stage of the initial sampling of data, their inconsistency does not mean that errors are present in the database - this is just such a feature of the data loading process. Such inconsistency will be corrected further by the daemon when reading data. But if after that there are any inconsistencies in them - then yes, then this is the base.
The code looks like this:
slave::MasterInfo sMasterInfo; // Slave sSlave(sMasterInfo); // // const slave::Slave::binlog_pos_t sInitialBinlogPos = sSlave.getLastBinlog(); select(); // // — const slave::Slave::binlog_pos_t sCurBinlogPos = sSlave.getLastBinlog(); // init_slave(); // , Slave::init Slave::createDatabaseStructure sMasterInfo.master_log_name = sInitialBinlogPos.first; sMasterInfo.master_log_pos = sInitialBinlogPos.second; sSlave.setMasterInfo(sMasterInfo); sSlave.get_remote_binlog(CheckBinlogPos(sSlave, sCurBinlogPos));
The CheckBinlogPos functor will cause completion of reading data from the binlog upon reaching the position of sCurBinlogPos. After this, primary data preparation for use occurs and data reading from the slave from the last position is started without any functors.
Libslave device
Let us take a closer look at what libslave is. My description is based on the most popular
implementation . Below I will compare several forks and a completely different implementation.
Libslave is a C ++ library that can be used in your application to get updates from MySQL. Libslave is not code related to the MySQL server; it is collected and linked only with the client - libmysqlclient. The efficiency of the library was tested on wizards from version 5.1.23 to 5.5.34 (not at all! Only on those that came to hand).
To work, we need a MySQL-server with the recording of binlog in ROW mode enabled. To do this, it should have the following lines in the config file:
[mysqld] log-bin = mysqld-bin server-id = 1 binlog-format = ROW
The user, under which libslave will go, will need REPLICATION SLAVE and REPLICATION CLIENT access rights, as well as SELECT on those tables that he will process (on those that will be in binlogs, but which he will skip, SELECT is not needed). The SELECT right is needed to get the table schema.
The micro-classic nanomysql :: Connection is built into libslave to execute normal SQL queries on the server. In life, we use it not only as part of libslave, but also as a client for MySQL in general (we didn’t want to use mysqlpp, mysql-connector and other things).
The main class is called Slave. Before we begin, we define custom callbacks for events from the tables that we will follow. Information about the event in the RecordSet structure is transmitted to the callback: its type (Write / Update / Delete) and data (inserted / updated / deleted record, in the case of Update its previous state).
When the library is initialized with wizard parameters, the following checks occur:
1. Check the ability to connect to the server.
2. Making SELECT VERSION () - make sure that the version is not less than 5.1.23.
3. Make SHOW GLOBAL VARIABLES LIKE 'binlog_format' - make sure that the binowlog format is ROW.
4. We read the saved position of the last read message through the user function. If the user function did not return anything (empty binlog name, zero position), then we read the current binlog position in the master via the SHOW MASTER STATUS request.
5. Read the structure of the database for the tables, which we will follow.
6. Generate slave_id - such that it does not match any of the SHOW SLAVE HOSTS requests.
7. Register the slave on the master by running simple_command (COM_REGISTER_SLAVE).
8. We request the transfer of the dump command simple_command (COM_BINLOG_DUMP).
9. We start the cycle of processing incoming packets - their parsing, calling the necessary callbacks, error handling.
We should also mention the fifth item - reading the database structure. In the case of a real MySQL-slave, we always know the correct layout of the labels, because we started with some SQL-dump and continued to read the tables from the corresponding binlog position, following all the DDL-statement. Libslave, in general, starts from the binlog position given by the user (for example, from the one we saved last time, or from the current position of the wizard). In general, it does not have past knowledge of the database structure, so it gets the table schema by parsing the output of the SHOW FULL COLUMNS FROM query results. And it is from there that information is taken about which fields, what types, and in what order to parse from binlog. There can be such a problem with this: the description of the tables we get are current, and binlog can start reading the previous ones, when the table still looked different. In this case, libslave is most likely to come up with an error that the data is inconsistent. You have to start reading from the current position of the master.
It is not scary to change the description of the tables in the process of working libslave. It recognizes ALTER TABLE and CREATE TABLE queries, and immediately after receiving them, rereads the table structures. Of course, problems are possible here. Suppose that we quickly changed the structure of the table twice, between these events, having written there some data. If libslave receives a record about the first alter, only when the second has already been completed, then through SHOW FULL COLUMNS FROM will immediately receive the second database state. Then an event to update the table, which will correspond to the first description, has a chance to stop replication. However, in practice this happens very rarely (we have never had it), and if anything happens, it is treated by restarting the daemon from scratch.
Using libslave, you can track transaction boundaries. Despite the fact that until a transaction is completed, not one of its entries gets into the binlog, it may still be important to distinguish between transactions: if you have any two related changes in different tables, you may not want to use only one updated until the second one is updated. There is no BEGIN event in the binlog - at the beginning of a transaction, the immediately changed lines end with a COMMIT. Those. transactions are tracked not by BEGIN / COMMIT, but by two consecutive COMMITs.
If the master disappeared
The main loop of libslave, called by the
get_remote_binlog function, receives as a parameter a custom functor, which is checked before reading each new package. If the functor returns true, then the loop ends.
If any errors occur, an error is output to the log and the cycle continues. In particular, if the server is restarted, then libslave will try to reconnect to the server until the victorious end, after which it will continue to read the data. There may be eternal blunders - for example, if from under the wizard they dragged the logs that libslave reads, then libslave will always “wail” in a cycle that there are no necessary logs.
In case of a network break, the connection is configured with a timeout of 60 seconds. If no new data packet has arrived in 60 seconds (which may be the case if there are simply no updates), the database will be reconnected and the message flow reading will continue from the last read position.
In principle, the cycle can be completed before, without waiting for the end of the timeout. Suppose you want to be able to quickly, but correctly, terminate the application with Ctrl + C, without waiting for the possible 60 seconds if the network is broken or there are no updates. Then, in the signal handler, it is enough to set a flag that will cause the next call to the functor to return true, and call the
Slave :: close function, which will forcefully close the MySQL socket. Because of this, the call to read the package will end with an error, and when checking the response from the user functor, the loop will exit.
StatisticsThe library has an abstract class
ExtStateIface , to which various information is passed from libslave: the number of reconnects, the last event time, the status of the connection to the database. The same class is responsible for saving and loading the current binlog position into any persistent storage. There is a default implementation of this class,
DefaultExtState , that works through a mutex (
since statistics can be set up by a slave in one thread, and read by someone else in another). The sad news is that the correct implementation of this class is necessary for the correct operation of libslave, that is, it is not just a statistics object - it is an object that can manage the work of the library.
Benchmarks
Benchmarks were carried out on two sets of machines.
The first set was one machine on which the database was installed, and the test was performed on it. Configuration:
• CPU: Intel® Core (TM) i7-4770K CPU @ 3.50GHz
• mem: 32 GB 1666 MHz
• MB: Asus Z87M-PLUS
• HDD: SSD OCZ-VERTEX3
• OS Gentoo Linux, kernel 3.12.13-gentoo x86_64
Database settings were default. Honestly, I don’t think they are of great importance for the wizard, who actually just “pours” the file over the network.
The second set consisted of two cars. The first machine with the database:
• CPU: 2 x Intel® Xeon® CPU E5620 @ 2.40GHz
• mem: 7 x 8192 MB TS1GKR72V3N 800 MHz (1.2ns), 1 x 8192 MB Kingston 9965434-063.A00LF 800 MHz (1.2ns), 4 x Empty
• MB: ETegro Technologies ETRS370G3
• HDD: 14 ​​x 300 GB HUS156030VLS600,2 x 250 GB WDC WD2500BEVT-0
• PCI: LSI Logic / Symbios Logic SAS2116 PCI-Express Fusion-MPT SAS-2, Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller
• OS CentOS release 6.5 (Final) kernel 2.6.32-220.13.1.el6.x86_64
Dough machine:
• CPU: 2 x Intel® Xeon® CPU E5-2620 0 @ 2.00GHz
• mem: 15 x 8192 MB Micron 36KSF1G72PZ-1G4M1 1333 MHz (0.8ns), 1 x 8192 MB Micron 36KSF1G72PZ-1G6M1 1333 MHz (0.8ns)
• MB: ETegro Technologies ETRS125G4
• HDD: 2 x 2000 GB Hitachi HUA72302, 2 x 250 GB ST250DM000-1BD14
• PCI: Intel Corporation C602 chipset 4-Port SATA Storage Control Unit, Intel Corporation C600 / X79 series chipset 6-Port SATA AHCI Controller
• OS CentOS release 6.5 (Final) kernel 2.6.32-358.23.2.el6.x86_64
Network between machines 1 Gb / s.
It should be noted that in both tests the database did not access the disk, i.e. the binlogs were cached in memory and the test was not rested on the data transfer. CPU load in all tests was 100%. This suggests that we rested against the libslave library itself, that is, we investigated its performance.
For the test, two tables were created - small and large:
CREATE TABLE short_table ( id int NOT NULL auto_increment, field1 int NOT NULL, field2 int NOT NULL, PRIMARY KEY (id) ); CREATE TABLE long_table ( id int NOT NULL auto_increment, field1 timestamp NOT NULL DEFAULT 0, field2 timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, field3 enum('a','b','c') NOT NULL DEFAULT 'a', field4 enum('a','b','c') NOT NULL DEFAULT 'a', field5 varchar(255) NOT NULL, field6 int NOT NULL, field7 int NOT NULL, field8 text NOT NULL, field9 set('a','b','c') NOT NULL DEFAULT 'a', field10 int unsigned NOT NULL DEFAULT 2, field11 double NOT NULL DEFAULT 1.0, field12 double NOT NULL DEFAULT 0.0, field13 int NOT NULL, field14 int NOT NULL, field15 int NOT NULL, field16 text NOT NULL, field17 varchar(255) NOT NULL, field18 varchar(255) NOT NULL, field19 enum('a','b','c') NOT NULL DEFAULT 'a', field20 int NOT NULL DEFAULT 10, field21 double NOT NULL, field22 double NOT NULL DEFAULT 1.0, field23 double NOT NULL DEFAULT 1.0, field24 double NOT NULL DEFAULT 1.0, field25 text NOT NULL DEFAULT "", PRIMARY KEY (id) );
Each table contained one million records. When inserting data, one text field was filled with a short line. All other lines were actually empty, the fields were filled with default values. Those. This insertion method allowed us to get full binlog, but most of the string fields were empty:
INSERT INTO short_table (field1, field2) values ​​(1, 2);
INSERT INTO long_table (field5, field25) values ​​("short_string", "another_short_string");
Each of these tables was first inserted into the database, and then completely updated with queries:
UPDATE short_table SET field1 = 12;
UPDATE long_table SET field6 = 12;
Thus, we managed to get a set of INSERT binlogs and a set of UPDATE binlogs (which are two times more, since they contain, in addition to the modified line, its previous state). Before each operation, the binlog position was memorized, i.e., we thus obtained 4 binlog intervals:
inserts of the short table (5429180 - 18977180 => 13548000)
updates of the short table (18977180 - 45766831 => 26789651)
inserts of a long table (45768421 - 183563421 => 137795000)
updates of the long table (183563421 - 461563664 => 278000243).
The test was compiled by the gcc-4.8.2 compiler with the -O2 -fomit-frame-pointer -funroll-loops flags. Each test was run three times, the results of the third test were taken as a result.
And now some tables and graphs. But first, the notation:
• “without callbacks” means that we asked libslave to read a certain set of binlogs without hanging any callbacks on the table, that is, no RecordSet records were created.
• “With benchmark callbacks” means that callbacks that measure per second performance were hung, trying to have a minimal effect on the total test run time (needed for plotting graphs). They did nothing more - all the work on libslave was only to parse the record, create the
RecordSet object (s) and pass them to the user function by reference.
• “With lockfree-malloc” means that the
allocator was used in the test.
“Time 1” and “Time 2” are the test execution times for a set of machines 1 and 2, respectively.
Test | Time 1, sec. | Time 2, sec. |
---|
Inserts in a small table without cones | 00.0299 | 00.1595 |
Inserts in a small table with benchmark callbacks | 02.4092 | 03,8958 |
Updates to a small table without cones | 00,0500 | 00,2336 |
Updates to a small table with benchmark callbacks | 04.8499 | 07,4892 |
Inserts in a large table without callbacks | 00,2627 | 01.1842 |
Inserts on a large table with benchmark callbacks | 20.2901 | 33,9604 |
Inserts to a large table with benchmark callbacks with lockfree-malloc | 19.0906 | 34.5743 |
Updates to a large table without callbacks | 00.6225 | 02.3860 |
Updates to a large table with benchmark callbacks | 40.4330 | 70,7851 |
Updates to a large table with benchmark callbacks with lockfree-malloc | 37,9637 | 68,3616 |
Inserts and updates in both tables without callbacks | 00.9499 | 03,9179 |
Inserts and updates in both tables with benchmark callbacks for short | 08.0445 | 14,8126 |
Inserts and updates in both tables with benchmark callbacks for a long | 62,8213 | 100.9520 |
Inserts and updates in both tables with benchmark callbacks on both | 67.8092 | 118.3860 |
Inserts and updates in both tables with benchmark callbacks on both with lockfree-malloc | 64,5951 | 113.3920 |
Below is a graph of the reading speed of the latest benchmark from both machines. The graph gradually decreases: small inserts are most readable, small updates follow, then large inserts, and small updates are processed most slowly. You can roughly imagine the processing speed of each type of binlog.

I did not investigate the DELETE event processing speed in this benchmark, but I suspect that it is identical to the INSERT speed, because a message appears in the log with the same length as when it was inserted.
Different implementations
At the moment, I know only two implementations of libslave. One of them is the already
mentioned , in its time, the company “Runner” opened it, and there are a lot about this written (for example, on
OpenNet ). It is this implementation that is used in FreeBSD ports.
The Mail.Ru Group uses Runner's fork, which I sometimes
file off . Part of the changes in it was also made to the runner fork. From undelivered: cutting out unused code, reducing the inclusion of headers, more tests (tests for BIGINT, long SETs), checking the format version of each binlog, support for mysql-5.5, type decimal (returned as double - of course, is not used in billing, and where there is enough approximate idea of ​​balances), support for bit fields (borrowed from the
fork , which is now almost in the same state as mine).
The second implementation, which is known to me - from the aforementioned
Pianist and Dimarika . What it is architecturally and in terms of performance, I still have to figure out.
Code examples
There are code examples in the library itself, but I will give a few comments.
File types.h: through typedefs shows mapping between MySQL types and C ++ types. You can see that all string types, including BLOBs, are simply std :: string, and all integer types are unsigned. Those. even if the definition of the table is simply an int (not unsigned), the library will return the type uint32_t.
The same file contains two convenient functions for translating the DATE and DATETIME types provided by libslave to regular time_t. These two functions are external (they are not called inside libslave) for historical reasons: initially, libslave returned strange coded numbers for these dates, and I did not change this.
The recordset.h file contains the definition of the
RecordSet structure, which is a single binlog record. It contains the type of message, its time, the name of the database and the table to which it belongs, as well as two lines - new and previous (for update).
The string is an associative array from the column name into an object of type boost :: any, which will contain the type described in types.h and the corresponding field.
The main Slave object is described in the Slave.h file.
The simplest code to start read replication is:
void callback(const slave::RecordSet& event) { switch (event.type_event) { case slave::RecordSet::Update: std::cout << "UPDATE"; break; case slave::RecordSet::Delete: std::cout << "DELETE"; break; case slave::RecordSet::Write: std::cout << "INSERT"; break; default: break; } } slave::MasterInfo masterinfo; masterinfo.host = host; masterinfo.port = port; masterinfo.user = user; masterinfo.password = password; slave::Slave slave(masterinfo); slave.setCallback(«database», «table», callback); slave.init(); slave.createDatabaseStructure(); slave.get_remote_binlog();
Sample session with test client test_clientFor clarity, consider a small example of a session: creating a user, database, table, filling it with data and the corresponding output test_client. That is an example of a ready-made replication client contained in the libslave source codes.
The call to test_client looks like this:
Usage: ./test_client -h -u -p -d dev.mysql.com/doc/internals/en/event-data-for-specific-event-types.html
Usage: ./test_client -h -u -p -d dev.mysql.com/doc/internals/en/event-data-for-specific-event-types.html