Practical optimization and scalability of MySQL InnoDB on large amounts of data

This post will not tell about indexes, query plans, triggers for building aggregates and other general methods of query optimization and database structure. Also, it will not talk about the optimal settings with the prefix innodb_. Perhaps reading the text below will better understand the meaning of some of them. This post focuses on InnoDB and its operation.

What problems can help solve this post?

What to do if you have multiple selects in your process list that no one seems to interfere with?
What to do if everything is well tuned, requests fly by like rockets and the list of processes is constantly empty, but on the server high LA and requests start working a little slower, well, for example, instead of 100 ms you get 500ms?
How to quickly scale the system when there is no possibility to redo everything?
Do you have a commercial project in a competitive environment and the problem must be solved immediately?
Why does the same query run something fast then slowly?
How to organize a fast cache and keep it up to date?

How usually work with the database

Approximately the scheme of work is usually such

Request
Query plan
Search by index
Getting data from tables
Sending data to the client

Even if you did not do start transaction, each of your separate requests will be essentially a transaction from a single request. As you know, transactions have an isolation level that MySQL has by default REPEATABLE READ . And what does this mean for us? And the fact that when you “touch any table” in a transaction, its version is fixed at that time and you stop seeing changes made in other transactions. The longer your request or transaction, the more “old” data continues to accumulate MySQL and there is reason to believe that this is happening with the active use of the main memory pool. Those. Each of your innocuous selects combining 10 tables on the primary key, in the case of active work with the database, begins to have a rather serious side effect. In PostgreSQL, as in Oracle, the default isolation level is READ COMMITTED, which functions much easier than REPEATABLE READ . Of course, with the READ COMMITTED isolation level, you will have to use line-by-line replication . You can easily check the isolation levels of transactions by simply connecting the two clients to the database and do select, delete and update on the same table. This is the answer to the question about hang-up selections, try changing the isolation level of the database, it can help you.

How InnoDB works with data

InnoDB stores data on the hard disk in pages. When accessing the desired page, it is loaded into RAM and then various actions take place with it, be it reading a record or something else. This is exactly the memory of Innodb_buffer_pool, the size of which you put in innodb_buffer_pool_size. The scheme of work is quite classic and there is nothing unusual in it. View InnoDB activity reports as follows:
')

SHOW VARIABLES like 'Innodb%';
SHOW GLOBAL STATUS like 'Innodb%';
SHOW ENGINE INNODB STATUS;

Total we get the following time spent on reading or writing to the database

Time to load data into memory from hard disk
Time to process data in memory
Time to write data to the hard disk if it is required (it’s worth noting that not all data is immediately written to the disk, the main thing is that they were recorded in the journal)

I think it is intuitive that if the data is in the InnoDB pool at the time of the request, then they are not loaded from the disk, which greatly reduces the time to perform any operations from the database.

The fastest version of the database is when all the data and indexes are easily placed in the pool, and in fact the database is always running from memory. In version 5.6.5, it is even possible to save the entire pool to disk when the database is restarted, thus avoiding a cold start.

Now let's consider a slightly different version of events when the amount of data on the disk exceeds the size of the memory pool. Let the size of the pool we will have 4 pages it will look like [0,0,0,0] and 16 data pages 1..16, respectively. Users most often request pages 15 and 16 from us. they contain the latest data and they are always in memory. Obviously, everything works as fast as in the case described above.

Well, an unfortunate option when you have 2 pages actively requested by users and 8 pages that are constantly used by internal scripts and various demons. Thus, over the 4th page in the buffer there is a constant struggle that turns into perpetual reading from the disk and slowing down the system for users. demons usually behave much more actively.

In this mode, you can be helped by setting up replication with one more MySQL server which can take over some requests and reduce the struggle for the memory pool. But as you know, replication in MySQL has a significant drawback, namely the application of changes to 1 stream. Those. under certain conditions, the slave will either start to lag or give you a completely insignificant performance increase. In this situation, the ability to create slaves with a limited number of tables can help. That will allow you to get a gain in both the application of changes and the use of a memory pool. In many cases, when the data that users request most often are known, you can create a cache for them that stores data only from the required tables. In a successful case, you will have a cache that automatically maintains its relevance. For those who are interested in how to quickly create another slave, I suggest looking in the direction of

STOP SLAVE;
SHOW SLAVE STATUS;//Master_Log_File Exec_Master_Log_Pos

START SLAVE;

After you insert the data into the new slave server, you only need to do

CHANGE MASTER TO ... ;
START SLAVE;

Of course, everything depends on the amount of data, but as a rule, such a slave can be raised fairly quickly.

Decomposition of the system into modules using partial replication

And so we can create partial replicas of the main database, which allows us to control the memory allocation between certain groups of data. What opportunities does this open up for us?
As you can find out empirically no one bothers you to create your own tables on the slave server and even create foreign keys in them for the replicated data. Those. You can have not only a complete main database, but also complete slaves with an extended set of tables. For example, your master database contains a table of users and various auxiliary tables of the type of payments . You also have a blog service that allows users to post. You replicate users to another database in which you create a posts table. If the database containing the posts table has a high reading load, you create replicas containing the users and posts tables. Thus it is possible to make decomposition until the amount of data of the required set of tables will not exceed reasonable limits. In this case, it is already worth looking in the direction of the sharding of huge tables, for example, using the hash of the user ID, and sending requests to the necessary workers via MQ.

Results

MySQL provides a simple replication mechanism for simply writing data to specified tables. This gives ample opportunities for the deployment of additional services containing integral parts of the database.

UPD. In the case of a large number of small transactions in the database, you can try to change the value of innodb_flush_log_at_trx_commit, this will reduce the load on the hard drive, but be careful! theoretically this could lead to data loss . To see the result of this variable is quite easy, because You can install it via SET GLOBAL.

UPD2. Successful system decomposition allows you to vary the settings of individual servers depending on the tasks they perform. For example, on the payment server, you can have SERIALIZABLE and innodb_flush_log_at_trx_commit = 1. And on the post server, READ COMMITTED and innodb_flush_log_at_trx_commit = 2. Everything depends on the tasks performed, data criticality and probability of failure.

Source: https://habr.com/ru/post/142530/

All Articles