📜 ⬆️ ⬇️

Migrating a database from InnoDB to MyRocks


Zoji La, India


Hi, Habr! My name is Oleg Efimov, I work at Badoo in the Platform team, I’m working on storing photos, service interfaces and much more.


I often hear that in terms of server technology, Badoo is a rather conservative company. This is partly true, but in fact we use many young programming languages, new tools and technologies. One of them is RocksDB, on the basis of which Facebook created the MySQL storage engine - MyRocks. A post about how Facebook migrated one of its databases from InnoDB to MyRocks and I wanted to translate for you.


Last year, we introduced MyRocks, our new database engine MySQL, which was designed to make recording and disk usage more efficient than compressed InnoDB. We decided to migrate one of our main databases (UDB) with compressed InnoDB to MyRocks and halve the storage size and number of servers used. The migration of the UDB database, which manages Facebook social graph data, has been carefully planned and implemented. We completed the work last month and successfully “halved” the repository. In this post we will talk about the preparation and implementation of migration, as well as lessons learned.


UDB was limited to storage capacity


A few years ago, we replaced the UDB hardware with Flashcache with pure Flash to solve some performance problems. The MySQL bundle with InnoDB worked quickly and reliably, but we wanted to use fewer servers to handle the same workload and data volume. When using pure flash-storage UDB was limited by its volume. And although we used compression in InnoDB, while having an excess of processor resources and throughput of random I / O operations, we could hardly force InnoDB to use less disk space.



InnoDB was limited by the amount of free disk space, and CPU / IO resources were idle. This is a common situation when using flash-storage.


This was one of the reasons for creating MyRocks - the RocksDB storage engine for MySQL. We implemented several features in InnoDB, where the size of the database was much larger than the RAM:



Our early experiments showed that MyRocks uses half the amount of compressed InnoDB, and without significantly increasing the consumption of processor resources and the number of I / O operations. Therefore, we decided to completely migrate UDB from InnoDB to MyRocks.



Load on MyRocks with twice the density of placements. There is still a reserve for processor resources and I / O operations.


Simplify migration from InnoDB to MyRocks


For services where users access databases directly, such a migration can be a difficult task. In general, for the current OLTP database this can be done without stopping services, without increasing delay and degradation of bandwidth, without returning erroneous data and without affecting the workload.


MyRocks has a unique advantage over other databases: this is the MySQL storage engine, which greatly facilitates migration from InnoDB. In particular:



Data consistency check


MyRocks / RocksDB is a younger technology, so we had to resort to a comprehensive consistency check to prevent the emergence of new bugs in our database. Verification is ongoing, both during the migration and after launch in production.


Check the consistency of the primary and secondary key. Each table compares the number of rows and checksums of the columns of primary and secondary keys. For example, in table T1 there is a column (id, value1, value2) and a secondary index (value1); then, for value1, primary and secondary keys are scanned, then the numbers of rows and checksums are compared. If one of the keys is damaged, the checksums will not converge.


Check the consistency of the two instances. In the same replica set (master – slave pair), two different instances (MyRocks and InnoDB) are compared by the number of rows and the checksum of the primary key. Thanks to GTID, you can start transactions with consistent snapshots with the same GTID for both instances. Subsequent SELECT statements (returning the number of rows and checksums) for both instances must be consistent.


Shadow validation of requests for two copies. MySQL has a feature called Audit Plugin that allows you to register running queries. We created a tool for shadow replay of these queries in several instances and comparison of results. This allowed us to make sure that the queries in InnoDB and MyRocks give the same results.


Stages of migration


The migration was carried out in the manner described below, and we were greatly facilitated by the ability to configure replication between InnoDB and MyRocks.


1) Deployed the first slave MyRocks. We had one master and four slaves for each set of replicas in regions all over the world. In each replica set, we created the first instance of MyRocks using a dump (mysqldump) from InnoDB in the same region. During upload and download, we stopped replication in the original InnoDB slaves to speed up and simplify migration. Also, during upload and download, all read requests were redirected to other available InnoDB slave instances.


Since MyRocks was optimized for loading data and creating a secondary index, uploading and downloading did not take much time (the engine can copy hundreds of gigabytes per hour from InnoDB). After the replication caught up and the consistency check was performed, we deleted InnoDB in the same region.



2) Deployed the second slave MyRocks . We wanted to avoid manipulations with a single copy of MyRocks, since its loss would lead to the need to reload and unload. When working with two instances after the failure of one of them, it would be possible to recover using the MyRocks binary online copy (myrocks_hotbackup), which would be much easier compared to uploading and downloading. Before deploying MyRocks, we had five instances of InnoDB for each set of replicas.


In each region, after removing two instances of InnoDB, we added two instances of MyRocks. The result is three InnoDB instances and two MyRocks for each replica set. Since MyRocks loaded data faster, uploading to it was much easier than copying to InnoDB. This allowed us to migrate without increasing the number of physical servers even during the migration itself.



3) Made slave MyRocks master. It was much more difficult to deploy master than slave, because the master instance can handle both write and read requests. For example, master simultaneously performs write operations on the same lines, so you need to implement proper handling of row locks. Poor consistency implementation can lead to a lot of errors.



4) In all regions, MyRocks copied and deleted all InnoDB instances . The final step was copying MyRocks and removing InnoDB. Now the replica sets have completely migrated from InnoDB to MyRocks.



In production, we first performed the first two steps for all UDB replica sets. Then for the majority of sets the third stage was completed, and finally the last. Since the most difficult was the deployment of master'ov, we spent a lot of tests before moving on. With the help of our query duplication tool, we checked the master traffic, so that we could fix any consistency bugs before deploying the master in production.


Other technical issues


In InnoDB, we used direct I / O, but in MyRocks / RocksDB its support is limited, so we used buffered I / O operations. Older Linux kernels have a known problem when, with the active use of buffered operations, paging begins from the disk and there are difficulties in allocating virtual memory. Our team responsible for the Linux kernel fixed a problem with allocating virtual memory in Linux 4.6, and before the migration we updated the kernel to this version.


We also ran multiple instances of MySQL on each machine. The size of each instance was much smaller than the 5TB flash storage capacity, which helped improve the performance of database operations: backup, recovery, replica creation, and the prevention of replication lags. During the migration, we gradually created instances of MyRocks following the removal of InnoDB instances. In flash storage, active file deletion results in long TRIM delays lasting up to tens of seconds. We knew about this limitation and created a simple tool for slowly deleting files in small chunks. Instead of deleting 100GB InnoDB files with the "rm" command, he divided large files into pieces of ~ 64 MB each, and then each of them deleted within a short waiting period. We slowed down the removal speed to about 128 MB per second.


Lessons learned


Efficiency is important for our operations, and we were happy when we managed to quickly migrate without creating problems in production. But in the process, we learned several important lessons.



  1. sending in the production-environment of shadow traffic for reading / writing and monitoring regressions (especially the number of errors and freezes and the level of delay);
  2. checking the consistency of data in InnoDB and MyRocks;
  3. the deliberate collapse of slave and master instances of MyRocks and testing their ability to recover.


Future plans


Migrating our core database from InnoDB to MyRocks has halved the storage space used. We continue to improve MyRocks and RocksDB, including more aggressively consolidating instances to free servers for other tasks. We are also working on supporting different engines. MyRocks is optimized to save storage space and slow down the growth of data being written, while InnoDB is optimized for reading and has more specialized features like gap-blocking, foreign keys, full-text and spatial indices.


Our current development of MyRocks does not imply the introduction of such things, but we plan to provide reliable support for several engines. Using InnoDB and MyRocks in one copy at the same time will allow InnoDB to be used for small, actively read pages, and MyRocks for everything else (we often hear requests for the realization of this feature). Finally, we will work on MyRocks support for the upcoming MySQL 8.0.


')

Source: https://habr.com/ru/post/342336/


All Articles