How we migrated millions of countries in a working day

Badoo is the world's largest social network for meeting new people, with 190 million users.
All data is stored in two data centers - European and American. Some time ago we investigated the quality of the Internet connection among our users from Asia and found that for 7 million users, our site will load 2 times faster if we move them from a European data center to an American one. For the first time, we faced the challenge of large-scale migration of user data between data centers, which we successfully coped with: we learned to move 1.5 million users in one working day! We were able to move whole countries! In the first part, we will describe in detail the task set before us and what result we have achieved.

Badoo architecture

The architecture of Badoo has been much discussed at various conferences and at Habré itself, but we will still repeat the main points that are important for understanding our task. In Badoo, the standard technology stack is used to upload web pages: Linux, nginx, PHP-FPM (developed by Badoo), memcached, MySQL.
Almost all user data is located on a couple of hundreds of MySQL servers and is distributed (“shared”) over it using a “self-written” service called authorizer. It stores in itself a large table of correspondences user id and server id on which the user is located. We do not use the "remnants of division" and other similar ways to distribute users across servers, so we can move users between database servers without any particular problems: this was originally incorporated in the Badoo architecture.

In addition to MySQL, we have many C / C ++ services that store various data: sessions, dates of the last visit to the site, photos of users for voting, the basic data of all users for searching.
')
We also have a service that serves the "Dating" section. It differs from others in that it exists in one copy for each country. Simply, it consumes so much RAM that all its data can not physically fit on one server, even if you put 384 GB there. At the same time, he “lives” only in “his” data center.

In addition, we have a number of “central” MySQL databases that store general information about all users, and separate billing databases, which also store information for all Badoo users at once. In addition, we have separate storage for uploaded photos and videos, which will be a separate article. This data must also be moved.

Formulation of the problem

The task is formulated very simply: to transfer user data from a whole country in one working day, and so that during migration these users can use the site. The largest country we have migrated is Thailand, where we have about 1.5 million registered users. If we divide this number into 8 working hours (plus lunch), we get the required migration speed, which is about 170 thousand users per hour.
The requirement to migrate the country in one working day is dictated by the fact that anything can happen at that time. For example, they can “lie down” or start to slow down some servers or services, and then it will be necessary to edit the code “live” to reduce the load created on them. There are also errors in the migration code that will lead to problems for users, and then it should be possible to quickly see it and pause (or even roll back) the process. In short, the implementation of such a large-scale user transfer operation requires the presence of an “operator” who will control what is happening and make the necessary adjustments during work.

Technically, for each user, you need to make selections from all tables of all MySQL instances that may contain data about this user, as well as transfer data that is stored in C / C ++ services. And for one of the services, you need to transfer the daemon itself, and not the data between the running daemon instances in both data centers.
The delay in data transfer between data centers is about 100 ms, so operations should be optimized so that data is loaded by the stream, rather than by a large number of small queries. During migration, the site’s unavailability time for each user should be minimal, so the process should be carried out for each user individually, and not in a large bundle. The time we focused on is not more than 5 minutes of inaccessibility of the site for a specific user (preferably 1-2 minutes).

Work plan

Based on the fact that we needed to migrate 170,000 users per hour, and the migration time of each user should be about 2-3 minutes, we calculated the number of concurrently executed threads that will be required to fulfill these conditions. Each thread can transfer an average of 25 users per hour, so the total number of threads was 6,800 (ie, 170,000 / 25). In fact, we were able to limit ourselves to “only” 2,000 streams, since Most of the time, the user simply “waits” for various events to occur (for example, MySQL replication between data centers). Thus, each thread took into the processing of three users at the same time and switched between them when one of them went into the waiting state for something.

The migration of each user consisted of many consecutive steps. The execution of each step began strictly after the end of the previous one and provided that the last one was completed successfully.
Also, each step must be repeatable, or, “speaking in Russian,” idempotent. Those. Each step can be interrupted at any time for various reasons, and he should be able to determine which operations he has to complete and perform these operations correctly.
This is required in order not to lose user information in case of an emergency stop of migration and in case of temporary failures of internal services or database servers.

Structure and sequence of our actions

Preparatory steps
We mark the user as “migrated” at the moment and wait until the end of its processing with background scripts, if any. This step took us about a minute, during which you can migrate another user in the same stream.

Billing Data Migration
During the migration of the country, we completely disconnected the billing in it, so no special actions and locks were required for this - the data was simply transferred from one central MySQL database to another. MySQL connection from each stream was established to this database, so the total number of connections to the billing database was over 2,000.

Photo migration
Here we “counted a little”, because we found a way to transfer photos relatively easily in advance and separately. Therefore, for most users, this step simply checked that they did not have new photos from the moment of transfer.

Fill user master data
In this step, we formed a SQL dump of each user's data and applied it on the remote side. In this case, the old data in this step was not deleted.

Updating data in the authorizer service
The authorizer service stores correspondences between the user id and server id, and until we update the data in this service, the scripts will follow the user data to the old place.

Delete user data from the old place
Using DELETE FROM queries, we clear the user's data on the source MySQL server.

Steps for transferring data from central databases
One of the central bases under the eloquent title Misc (from the English. Miscellaneous - different) contains many different tables, and for each of them we did one SELECT and DELETE per user. We “squeezed” 40,000 SQL queries per second from the poor database and kept more than 2,000 connections open to it.

Steps for transferring data from services
As a rule, all data is contained in the database, and services only allow you to quickly access it. Therefore, for most services, we simply deleted data from one place and refilled it with information from a database that was already in a new place. However, we simply transferred one service entirely, and not by users, because the data was stored in it in a single copy.

Waiting for replication
Our databases are replicated between data centers, and until replication is “finished”, user data is in an inconsistent state in different data centers. Therefore, we had to wait for the end of replication for each user, so that everything worked correctly and the data were consistent with each other. And in order not to lose time at this step (from 20 seconds to a minute), it was used to migrate other users at this moment.

Final steps
We mark the user as finished migration and allow him to log in on the site, already in the new data center.

MySQL data transfer

As mentioned earlier, we store user data on MySQL servers, which are about a hundred and fifty for each data center. Each server has several databases, each of which contains thousands of tables (on average, we try to have one table per 1000 users). The data is arranged in such a way as to either not use auto-increment fields at all, or at least not to refer to them in other tables. Instead, the combination of user_id and sequence_id is used as the primary key, where user_id is the user identifier and sequence_id is the counter that automatically increases and is unique within the same server. Thus, records about each user can be freely moved to another server without loss of referential integrity and the need to build correspondences between the values of old and new auto-increment fields.

Data transfer is done in the same way for most MySQL servers (note that in case of any errors, the entire step crashes and restarts after a short time):

Go to the "side-receiver" and check if there is already user data there. If there is, it means that the data filling was successful, but the step was not completed correctly.
If there is no data on the remote side, we do a SELECT from all the necessary tables with filtering by user and form a SQL dump containing BEGIN at the beginning and COMMIT at the end.
Fill the dump via SSH to the "proxy" on the remote side and use it with the console utility mysql. It may happen that the COMMIT request passes, but we cannot get an answer, for example, due to network problems. To do this, we first check to see if the dump failed in the previous attempt. Moreover, in some databases, the lack of data for the user is a normal situation, and in order to be able to check whether the data was transferred, we in such cases added INSERT to a special table, according to which we checked if necessary.
Delete the original data using the DELETE FROM with the same WHERE that was in the SELECT queries. As a rule, WHERE conditions contained user_id, and on part of the tables this field was not part of the primary key for a variety of reasons. Where possible, an index has been added. Where it turned out to be difficult or impractical, when deleting data, a sample was first taken by user_id, and then deleted by the primary key, thus avoiding locks for reading and significantly speeding up the process.

If we know for sure that we have never before proceeded to the corresponding step, then we skip the data availability check on the remote side. This allows us to win about a second for each server to which we transfer data (which is due to a delay of 100 ms for each packet being sent).

During the migration, we encountered a number of problems that we want to talk about.

Auto-increment fields (auto_increment)

In the billing database, auto-increment fields are actively used, so for them they had to write the complex logic of “mapping” old id into new ones.
The difficulty was that the data from the tables, where only the above described sequence_id is included in the primary key, cannot simply be transferred, as sequence_id is unique only within the server. Replacing sequence_id with NULL, thus causing the generation of a new auto-increment, is also impossible, because, first, the generation of sequence_id is performed by inserting data into one table, and the resulting value is used in another. And secondly, other tables reference the table using sequence_id. That is, you need to get the required number of values of the auto-increment field on the server where the data is being transferred, replace the old sequence_id with new ones in the user data and write the finished INSERTs to a file that will later be used by the console utility mysql.
To do this, we opened a transaction on the receiving server, made the required number of inserts, called mysql_insert_id (), which, in the case of inserting multiple rows in one transaction, returns the auto increment value for the first row, and then rolls back the transaction. In this case, after the transaction is rolled back, the auto-increment will remain increased by the number of rows inserted by us, unless the database server is rebooted. Having obtained the autoincrement values we need, we formed the corresponding insertion requests, including the table responsible for autoincrement generation. But in these queries, the auto-increment values were already explicitly indicated in order to fill the holes formed in it after the rollback of the transaction.

Max_connections and MySQL load

Each thread created one MySQL connection to the server with which it had to deal. Therefore, we kept 2,000 connections on all central MySQL servers. With more connections, MySQL started to behave inadequately, until the fall (we use versions 5.1 and 5.5).

One day during the migration, one of the central MySQL servers fell down (one of those that had a very large load). The migration was immediately aborted, and we began to find out the cause of the fall. It turned out that the RAID controller simply “flew out” on it. And although the administrators said that it was not related to the load we gave to this server, but the sediment remained.

InnoDB, MVCC and DELETE FROM: pitfalls

Since we store all the data in InnoDB and all the transferred data was immediately deleted, we have started to slow down all the scripts that rake up the queues that are in tables on some servers. We were surprised to see how SELECT from an empty table took minutes. MySQL purge thread did not have time to clear the deleted records, and despite the fact that the tables with queues were empty, they contained a lot of deleted records that were not physically deleted yet and were just skipped MySQL during the sample. A quantitative description of the length of a queue for clearing records can be obtained by typing SHOW ENGINE INNODB STATUS and looking at the History list length line. The greatest value we have seen is several million entries. We recommend carefully deleting many entries from InnoDB tables using DELETE FROM. It is much better to avoid this and use, for example, TRUNCATE TABLE, if possible. Requests of the form TRUNCATE TABLE completely clear the table, and these operations are DDL, therefore the deleted records do not add up to the undo / redo log (InnoDB does not support transactions for DDL operations).
If, after deleting all data using DELETE FROM, you need to select a table, then try to impose a BETWEEN condition on the primary key. For example, if you use auto_increment, select MIN (id) and MAX (id) from the table, then select all the records between them - this is significantly faster than choosing records with some limit or only with one of the conditions of the form id> N or id <n . Requests that receive MIN (id) and MAX (id) will take a very long time, because InnoDB will skip deleted records. But requests for key ranges will be executed with the same speed as usual - deleted records with such requests will not get into the sample.

We were also surprised to see many “hanging” queries of the form DELETE FROM WHERE user_id =, where all the queries are the same, and there is no index on user_id in this table. As it turned out, MySQL version 5.1 (and to a lesser extent 5.5) has a very poor scalability of such queries, if FULL SCAN tables are made when deleting records and the REPEATABLE READ isolation level (by default). There is a very high competition of locks for the same records, which leads to an avalanche-like increase in the processing time of the request.
One of the possible solutions to the problem is to set the isolation level of READ COMMITED for a transaction that deletes data, and then InnoDB will not put locks on those rows that are not suitable for the WHERE clause. To illustrate how serious this problem was, let's take a screenshot taken during the migration. The tmp.tiw_fix table in the screenshot contains only about 60 entries (!) And does not contain an index on user_id.

Distribution of users by threads

Initially, we distributed users to threads evenly, regardless of which server a particular user is on. Also in each stream we leave open connections to all MySQL servers we had to meet with for migrating users assigned to the corresponding stream. As a result, we got two more problems:

When a MySQL server started to slow down, the migration of users living on this server slowed down. As all other threads continued execution, they gradually also reached the user living on the problem server. Gradually, an increasing number of threads accumulated on this server, and it began to slow down even more. To prevent the server from falling down, we inserted temporary patches into the code while working, using various “crutches”, limiting the load on this server.
Since we kept open MySQL connections in each of the streams to all the necessary MySQL servers, we gradually came to the conclusion that each thread had a large number of open connections to all MySQL servers, and we started abutting max_connections.

In the end, we changed the algorithm for distributing users to threads and assigned users to each thread who live on only one MySQL server. Thus, we immediately solved the problem of the avalanche-like growth of the load in case of “brakes” of this server, as well as the problem with too many simultaneous connections on some weak hosts.

To be continued…

In the following sections, we will describe how we pre-migrated user photos and what data structures we used to limit the load on servers with photos. After that, we will describe in more detail how we managed to coordinate the work of 2,000 simultaneously executing migration flows on 16 servers in two data centers and what technical solutions were used to make all this work.

Yury youROCK, Nasretdinov, PHP developer
Anton antonstepanenko Stepanenko, Team Lead, PHP developer

Source: https://habr.com/ru/post/197456/

All Articles