Clustered index in InnoDB and query optimization

Recently, networks often write about clustered index in InnoDB and MySQL tables, but, despite this, they are used in practice quite rarely.
In this article, we will show two real-world examples of how we have optimized quite complex Badoo systems, based on an understanding of how clustered index works.

Clustered index - the form of organization of the table in the file. In InDOB data is stored in a tree, in the same one in which the usual B-TREE keys lie. The InnoDB table itself is already a big B-TREE. The key values are clustered index. According to the documentation , PRIMARY KEY is selected as clustered index. If PRIMARY KEY is absent - the first UNIQUE KEY is selected. If this is not the case, the internal 6-byte code is used.

What follows from this organization of data on disk?

Inserting in the middle of the table may be slow due to the need to rebuild the key.
Updating the clustered index value of a string results in the physical transfer of information on disk or in its fragmentation.
The need to use the ever-increasing value of clustered index for quick insertion into the table. The most optimal is the auto-increment field.
Each row has a unique identifier-value clustered index.
Secondary keys simply refer to these unique identifiers.
In fact, a secondary key of the form KEY `key` (a, b, c) will have the structure KEY` key` (a, b, c, clustered_index).
The data on the disk is ordered by clustered index (we do not consider the example of SSD).

Read more about this in the MySQL manual .
')
We will talk about two types of optimization, which helped significantly speed up the work of our scripts.

Test environment

To reduce the impact of caching on the results of the study, add SQL_NO_CACHE to the samples, and also reset the file system cache before each query. And, because we are interested in the worst case when the data must actually be pulled from the disk, we will restart MySQL before each query.

Equipment:

Intel® Pentium® Dual CPU E2180 @ 2.00GHz
RAM DIMM 800 MHz 4Gb
Ubuntu 11.04
MySQL 5.5
HDD Hitachi HDS72161

The scripts we used can be taken on GitHub .

Optimization of deep offset

For example, take the abstract table of messages, which contains user correspondence.

  CREATE TABLE messages ( 
             message_id int not null auto_increment, 
             user1 int not null, 
             user2 int not null, 
             ts timestamp not null default current_timestamp, 
             body longtext not null, 
             PRIMARY KEY (message_id), 
             KEY (user1, user2, ts) 
         ) ENGINE = InnoDB

Consider this table in light of the listed features of InnoDB.
The clustered index here coincides with the PRIMARY KEY and is an auto-increment field. Each line has a 4-byte identifier. Inserting new rows into the table is optimal. The secondary key is actually KEY (user1, user2, ts, message_id), and we will use it.

Add 100 million messages to our table. This is quite enough to reveal the necessary features of InnoDB. There are only 10 users in our system, so each pair of interlocutors will have an average of one million messages.

Suppose that these 10 test users have exchanged many messages and often re-read old correspondence - the interface allows you to switch to a page with very old messages. And behind this interface is a simple query:

SELECT * FROM messages WHERE user1=1 and user2=2 order by ts limit 20 offset PageNumber*20

The most common, in fact, request. Let's look at the time of its execution depending on the depth of the offset:

offset	execution time (ms)
100	311
1000	907
5000	3372
10,000	6176
20,000	11901
30,000	17057
40,000	21997
50,000	28268
60,000	32805

Surely many expected to see linear growth. But getting 33 seconds on 60 thousand records is too much! Explaining what takes so much time is quite simple - just mention one of the features of the MySQL implementation. The fact is that MySQL for this query reads offset + limit strings from the disk and returns limit from them. Now the situation is clear: all this time MySQL has been reading from the disk 60 thousand lines that we do not need. What to do in a similar situation? Begs many different solutions. Here, by the way, is an interesting article about these options.

We found a very simple solution: the first query selected only the clustered index values, and then chose only one of them. We know that at the end of the secondary key there is a clustered index value, so if we replace message_id in a query *, we get a query that works only by key, respectively, the speed of such a query is high.

It was:

  mysql> explain select * where messages user1 = 1 and user2 = 2 order by ts limit 20 offset 20000;
 + ---- + ------------- + ---------- + ------ + ------------ --- + ------- + --------- + ------------- + -------- + ----- -------- +
 |  id |  select_type |  table |  type |  possible_keys |  key |  key_len |  ref |  rows |  Extra |
 + ---- + ------------- + ---------- + ------ + ------------ --- + ------- + --------- + ------------- + -------- + ----- -------- +
 |  1 |  SIMPLE |  messages |  ref |  user1 |  user1 |  8 |  const, const |  210122 |  Using where |
 + ---- + ------------- + ---------- + ------ + ------------ --- + ------- + --------- + ------------- + -------- + ----- -------- +
 1 row in set (0.00 sec)

It became:

  mysql> explain select message_id from messages where user1 = 1 and user2 = 2 order by ts limit 20 offset 20000;
 + ---- + ------------- + ---------- + ------ + ------------ --- + ------- + --------- + ------------- + -------- + ----- --------------------- +
 |  id |  select_type |  table |  type |  possible_keys |  key |  key_len |  ref |  rows |  Extra |
 + ---- + ------------- + ---------- + ------ + ------------ --- + ------- + --------- + ------------- + -------- + ----- --------------------- +
 |  1 |  SIMPLE |  messages |  ref |  user1 |  user1 |  8 |  const, const |  210122 |  Using where;  Using index |
 + ---- + ------------- + ---------- + ------ + ------------ --- + ------- + --------- + ------------- + -------- + ----- --------------------- +
 1 row in set (0.00 sec)

Using index in this case means that MySQL will be able to get all the data from the secondary key, and will not access clustered index. Read more about this here .

Now it only remains to retrieve the string values directly
SELECT * FROM messages WHERE message_id IN (....)

Let's see how productive this solution is:

offset	execution time (ms)
100	243
1000	164
5000	213
10,000	337
20,000	618
30,000	756
40,000	971
50,000	1225
60,000	1477

The achieved result suited everyone, so it was decided not to conduct further searches. In addition, it is not known whether it is possible to access this data in a fundamentally faster way without changing the procedure of working with history. It should be noted once again that the task was to optimize a specific query, and not the data structure itself.

Optimization of large table update procedure

The second need for optimization arose when we needed to collect relevant data about our users once a day in one large table. We had 130 million users by that time. The script, bypassing all our bases and collecting new data, works out in half an hour and selects 30 million changed lines. The result of the script is tens of thousands of text files with serialized new values on the hard disk. Each file contains information about hundreds of users.

We transfer information from these text files to the database. We read the files sequentially, group the lines in batches of several thousand and update. Script execution time ranges from 3 to 20 hours. Naturally, this script behavior is unacceptable. Moreover, it is obvious that the process needs to be optimized.

The first thing suspicion fell on is the “parasitic” load on the database server disk. But numerous observations did not reveal evidence of this hypothesis. We concluded that the problem is hidden in the depths of the database and we need to think about how to fix it. How does the data lie on the disk? What do OS, MySQL and hardware have to do to update this data? While we were answering these questions, we noticed that the data is updated in the same order in which they were collected. This means that each query updates a completely random place in this large table, which entails a loss of time for positioning the disk heads, a loss of the file system cache and a loss of the database cache.

Note that the process of updating each row in MySQL consists of three stages: reading the values, comparing the old and the new values, writing the values. This can even be seen from the fact that, as a result of the query, MySQL answers how many rows matched and how many were actually updated.

Then we looked at how many rows actually changed in the table. Of the 30 million lines, only 3 million have changed (which is logical, since the table contains very limited user information). And this means that 90% of the time MySQL spends it on reading and not on updating. The solution came by itself: you should check whether the random access to clustered index loses sequentially. The result can be summarized in the case of updating the table (before it is updated, reading and comparison still takes place).

The technique is extremely simple - measure the difference in the speed of the query
SELECT * FROM messages where message_id in ($values)
where as values to transfer an array of 10K elements. The values of the elements to make completely random to check for random access. To test the sequential access, it is necessary to do 10K elements sequentially, starting with a certain random offset.

  function getValuesForRandomAccess () { 
     $ arr = array (); 
     foreach (range (1, 10000) as $ i) { 
         $ arr [] = rand (1,100000000); 
     } 
     return $ arr; 
 } 

 function getValuesForSequencialAccess () { 
     $ r = rand (1, 100000000-10000); 
     return range ($ r, $ r + 10000); 
 }

Query execution time with random access and sequential:

N	random	sequential
one	38494	171
2	40409	141
3	40868	147
four	37161	138
five	38189	137
6	36930	134
7	37398	176
eight	38035	144
9	39722	140
ten	40720	146

As you can see, the difference in runtime is 200 times. Therefore, it is necessary to fight for it. To optimize execution, you need to sort the source data by the primary key. Can we quickly sort 30 million values in files? The answer is simple - we can!

After this optimization, the running time of the script was reduced to 2.5 hours. It takes 30 minutes to pre-sort 30 million lines (and most of the time is gzip).

Same optimization, but on SSD

After writing this article, we found one extra SSD, on which we also tested.

Sampling with a deep offset:

offset	execution time (ms)
100	117
1000	406
5000	1681
10,000	3322
20,000	6561
30,000	9754
40,000	13039
50,000	16293
60,000	19472

Optimized sampling with a deep offset:

offset	execution time (ms)
100	101
1000	21
5000	24
10,000	32
20,000	47
30,000	94
40,000	84
50,000	95
60,000	120

Comparison of random and sequential access:

N	random	sequential
one	5321	118
2	5583	118
3	5881	117
four	6167	117
five	6349	120
6	6402	126
7	6516	125
eight	6342	124
9	6092	118
ten	5986	120

These figures show that SSD, of course, has an advantage over a regular drive, but its use does not eliminate the need for optimization.

And in conclusion of our article, we can say that we managed to increase the speed of data sampling by 20 times. We accelerated the actual update of the table up to 10 times (the surrogate test was accelerated 200 times). Recall that the experiments were conducted on a system with disabled caching. The gain on the real system turned out to be less impressive (the cache still corrects the situation quite well).

The conclusion from the foregoing lies on the surface: it is not enough to know the strengths and weaknesses of the software with which you work, it is important to be able to apply this knowledge in practice. Knowing the internal structure of MySQL sometimes allows you to speed up queries tenfold.

Alexey alexxz Eremihin, Badoo developer

Source: https://habr.com/ru/post/135966/

All Articles

Clustered index in InnoDB and query optimization

Test environment

Optimization of deep offset

Optimization of large table update procedure

Same optimization, but on SSD

More articles: