Working with MySQL: how to scale the data warehouse 20 times in three weeks

Earlier in the blog at Habré we talked about the development of our product - billing for telecom operators Hydra , and also addressed issues of working with infrastructure and the use of new technologies. For example, we looked at the advantages of Clojure , situations where it is worthwhile and not worth using MongoDB and restrictions in PostgreSQL .

Today we will talk about scaling. The developers of the open-source email application Nylas published on their blog a story about how they managed to scale the system 20 times in three weeks using the ProxySQL tool. To do this, they had to move from Amazon RDS to MySQL to EC2. We present to you the highlights of this interesting article.
')

One beautiful day ...

The story began in the summer of 2015. The sun made its way through the fog hovering over San Francisco, on the radio played the hit Bad Blood Taylor Swift and Kendrick Lamar. Nylas had only two synchronization machines and one MySQL database, managed through Amazon RDS. It was the simplest solution that the company managed to find at that time.

The company's engineers managed to achieve good performance of this configuration so that they slowly worked to improve the synchronization logic and telemetry. The team understood all the dangers of premature product optimization. Everyone believed that the best way to succeed in horizontal scaling of the system (sharding) is to postpone it as far as possible. Therefore, to gain time, engineers were engaged in optimizing work with disks, instance upgrades and data compression.

And then users appeared

Everything changed on October 5, 2015. On this day, Nylas N1 was released - an open email application with a beautiful user interface and modern architecture using plug-ins. The link to the product was published in Hacker News. According to the Nylas team, this led to the emergence of tens of thousands of new users in minutes. And along with the users came large amounts of new information.

For an hour, the MySQL database stopped dealing with such volumes, and the delay in API work broke all records. I had to temporarily disable the possibility of registration. The company has not yet realized the full depth of the problems, but one thing was obvious to everyone: you need to scale as soon as possible.

On the barricades

Within a few days after the launch of Nylas N1, several bottlenecks were detected in the code at once. I had to face the limitations of Redis and MySQL on the intensity of data recording. Engineers have learned a lot about AWS.

But the most serious problem was the database. Day and night, the Nylas team tried to correct errors such as MySQL connection leaks, as well as poorly optimized queries that negatively affect performance. In addition to patching holes, it was necessary to think about the future. The only option is to apply horizontal scaling of data to provide quality support for new users. That is, the time has come to sharding the system.

Why mysql

In 2013, when the company Nylas was founded, its engineers chose MySQL, not only on the basis of specific characteristics, but also because this tool has proven itself in the work of high-loaded projects. The Cassandra, MongoDB, and Postgres technologies looked impressive and progressed quickly, but engineers were confident that they could create a petabyte processing system using MySQL. The experience of companies like Facebook, Dropbox, Pinterest, and Uber spoke about this. Using MySQL allowed to draw on the experience of these technological giants.

This base is easy to use, and the requirements of Nylas were initially not particularly large-scale. Based on this, it was decided to use Amazon RDS hosting. RDS does not give root access to the MySQL host, but many processes are automated. For example, backups and point updates.

Time to scale

By the time of performance problems, engineers had already squeezed long text columns in the database, which helped save 40% of disk space, and also switched to the use of 6 terabyte disks. But InnoDB in MySQL DBMS has a limit of 2 TB per table, which was a problem.

Engineers considered an option using Amazon Aurora - a MySQL-compatible storage with autoscaling up to 64 TB. However, even in this case, if a given limit of 64 terabytes is exceeded, it would be necessary to break the data into different tables in order to fit in the InnoDB restrictions. In addition, the team did not want to depend so heavily on Amazon technology in the long term.

Instead, Nylas engineers scaled out using simple data partitioning, as well as creating additional MySQL clusters running on Oracle versions of the database on regular EC2 instances. The main goal was the simplicity of the configuration, which made it possible to complete the optimization in three weeks.

Goodbye RDS

In the process of scaling, it was decided to abandon the RDS. The reason is the lack of control capabilities, as well as poor experience in implementing optimization solutions. It seemed that RDS was simply not a solution for building reliable systems with zero downtime.

In addition, the ability to configure replication is extremely limited in RDS. Yes, it has the multi-AZ option for synchronous data copying, but the performance of such operations left much to be desired. Once its use has led to a 3-hour downtime in the operation of the application Nylas. I had to contact Amazon engineers and wait for them to investigate the problem.

Nylas also tried to create asynchronous copies for a given master database, but it turned out that such a copy was activated with a delay of several minutes. RDS does not support sequential copying, which was planned to be used to distribute shards across instances of the base.

Instead, it turned out that many tasks can be successfully solved using EC2. Yes, problems may arise with individual instances, but at least it was possible to create a base architecture that is resistant to the effects of the problems described above. The company decided to abandon the RDS and manage the database independently.

Running MySQL on EC2

Translation of MySQL to EC2 is not an easy process. It was necessary to write our own scripts to manage tasks that were automated in RDS — for example, provisioning new servers, creating and distributing slave instances in the event of a master database failure, and creating and restoring backups. Nylas has written such scripts from scratch, but there are a number of open tools that solve the same problem.

One of the shard clusters in MySQL consists of a control node and a sub node. The second supports updates via the MySQL copy log, and is usually used for backups. In the process of reconfiguring the system, transferring nodes to different types of instances, the cluster must have more than one slave instance. If additional slaves are not active, they are marked with the “debug” EC2 tag, which excludes them from monitoring, backups and other processes for active nodes.

Emergency switching

For each base cluster, it was necessary to foresee a situation where the master node refuses to function correctly. This can happen when the processes in the database get lost or hang, the host becomes unavailable, or something else happens. In this case, you need to have in stock the possibility of recovery, manually or automatically, when the slave node replaces the master node with time and receives both read and write requests.

To solve this problem, a special script was written that allows engineers to manually operate the slave node in this way. As soon as the team was convinced that this operation can be carried out without fear of data loss, the script was integrated into the standard mysqlfailover tool, with some additions that allow identifying the belonging of nodes. If a failure occurs in the master node, mysqlfailover finds out about this through the state check mechanism and identifies the slave as a new master. The mysqlfailover construct, among other things, updates the EC2 tags to display the new configuration.

Recovery works automatically, but engineers receive alerts. After that, one of them can manually detect the node on which the failure occurred, and convert it to a new slave for the cluster. If this is not possible - for example, in case of data corruption, the previous master is decommissioned, and a new slave is created from the most recent backup. The system has become resilient to failures, and it has become easier to manage.

Application sharding

When transferring from one database to several databases on different machines, it is necessary to develop a new mechanism for mapping object IDs to specific servers. This is especially true if the application refers to the database by the primary key in the secondary storage, as is the case with Redis.

Nylas engineers selected a simple solution to this problem: the primary key value of the table is a 64-bit integer, the upper bits contain the shard ID, the lower bits are a local counter of automatic numeric keys.

  | <- 16bits -> | <-------- 48bits -------> |
 + ------------ + ----------------------- +
 |  shard id |  autoincrement |
 + ------------ + ----------------------- +

This makes it easy to calculate the shard-ID by the primary key. A request for any object can be sent to the correct shard simply through the primary key. The simpler the system, the more reliable the work with it in the future.

For example, the application's authentication level is able to convert hashed access tokens to an account ID for accessing the API. The project team managed to shard the service without changing its key components. Since the Nylas engine as well as the application is an open product, anyone can learn the specifics of implementing such a system on GitHub.

Creating new shards

Each logical shard is a separately numbered MySQL schema. All shards have the same table structure and different shards are usually located in the same instance.

When creating a new shard, calculate high order bits simply by:

N = (shard_id<<48) + 1

The value of the automatic finishing keys is calculated as follows:

 ALTER TABLE mailsync_<shard_id>.<tablename> AUTO_INCREMENT=N

The new rows in the table, respectively, will have primary keys N, N + 1, and so on. In this case, the database tables have automatic keys starting with the value 1. Nylas engineers advise you to pay attention to the MySQL error number # 199 when using this method. This bug is already 13 years old, but it is still not fixed. To avoid problems with it, the application code should be protected from incorrect values of automatic keys - here is a description of the solution to this problem (in English).

Minimize latency on failure

After sharding, another problem was discovered. The mysqlfailover configuration used the Amazon VPC secondary IP address as the virtual IP to send application traffic to the wizard site. But detecting a failure in a node and switching traffic through this virtual ip took about 10 minutes. All requests during this period, of course, were rejected.

Therefore, it was necessary to implement a smooth failover. Something like how HAProxy can hold requests when the application service restarts, only in the application to MySQL.

Here you can use HAProxy in combination with a key value storage, or you can use MySQL cluster tools: Orchestrator or Galera . In this case, the company chose the simplest solution available. Ideally, so that there are no additional services that need to be serviced.

ProxySQL

This solution was found in ProxySQL - a new tool with an open license. If you do not go into details, it can be considered as HAProxy, only sharpened by SQL queries. After Nylas went to work with ProxySQL, the waiting time when switching nodes decreased from 10 minutes to 10 seconds.

Switching without ProxySQL (idle time 10 minutes)

Switching from ProxySQL (idle time 10 seconds)

In the process, other advantages of working with ProxySQL were discovered:

dynamic routing;
automatic restrictions on queries or transactions, preventing accidental drops in performance;
query statistics: allows you to compare queries across application nodes, that is, understand how the load is distributed across shards;
the ability to dramatically reduce the number of export connections to the current database of the machine.

Nylas engineers use ProxySQL locally for each service connecting to the database, including internal management tools and scripts. Using proxy in local mode instead of centralized allows you to achieve built-in redundancy, and also minimizes the likelihood of traffic delays on the proxy. As a result, shards in the system can dynamically shift between database instances — the cluster idle time does not exceed 12 seconds.

Transfer data to another platform

In March 2016, the company migrated shard 0 (RDS) to in-house MySQL to EC2 through the replication function. This allowed us to update the primary keys for the largest table from 32 to 64 bits, which could not be imagined with RDS. Primary key space (4 billion rows) was fully selected in a couple of months.

Plans

Cloud infrastructure Nylas continues to evolve. After updating the database level, it's time to update the level of the application itself. The current project includes the separation of synchronization logic across several microservices and updating the process topology with a main data source, such as Kafka. There is also an option to implement the denormalization of schemas for the API storage level in order to make further work with the application more flexible.