Optimize complex MySQL queries

Introduction

MySQL is a very controversial product. On the one hand, it has an incomparable speed advantage over other databases on the simplest operations / queries. On the other hand, it has such an undeveloped (if not to say underdeveloped) optimizer that on complex queries it loses outright.

First of all, I would like to limit the range of optimization problems to “wide” and large tables. Say up to 10m records and up to 20Gb in size, with a large number of variable queries to them. If there are many millions of records in your table, each 100 bytes in size, and five simple possible queries to it - this article is not for you. NB: MySQL innodb / percona engine is considered - hereinafter simply MySQL.

Most queries are not very complex. Therefore, it is very important to know how to build an index for use by the necessary query and / or modify the query so that it uses the already existing indices. We will look at the work of the optimizer for selecting the index of ordinary queries ( select_type = simple ), without joins, subqueries and unions.
')
We discard the simplest cases for very small tables, for which the optimizer often uses type = all (full view) regardless of the presence of indexes - for example, a classifier with 40 records. MySQL has an algorithm for using multiple indexes ( index merge ), but this algorithm does not work very often, and only without an order by . The only sensible way to try to use index merge is to sample cases in different columns with OR .

Another departure: it is understood that the reader is already familiar with explain . Often the query itself is slightly modified by the optimizer, so in order to understand why an index was used or not, you should call

explain extended select xxx;

and then

 show warnings;

which will show the query modified by the optimizer.

Covering index - from thick tables to indexes

So the task: let us have a fairly simple request that is performed quite often, but for such a frequent call it is relatively slow. Let's consider the strategy of reducing our query to using index as the quickest choice.

Why using index ? Yes, MySQL uses only B-tree indexes, but nevertheless MySQL tries to keep indexes as much as possible in memory (and even add adaptive hash indexes on top of them) - all this gives Fabulous MySQL performance gains in relation to other databases. In addition, the optimizer will often prefer to use, though not the best, but already loaded into the memory index, rather than a better one, but on the disk (for type = index / range ). From here several conclusions:

too heavy indices are evil. Either they will not be used because they are not yet in memory, or they will not be loaded into memory because this will displace other indices.
if the size of the index is comparable to the size of the table, or the set of used indices for various frequent queries significantly exceeds the size of the server memory - no significant optimization is achieved.
Nuance - index / sort by TEXT - doom yourself to constant using filesort .

One subtle point that you sometimes forget about is that MySQL creates only clustered indexes. Cluster - essentially indicating not the absolute position of the record in the table, but (conditionally) the record of the primary key, which in turn allows you to extract the record itself. But MySQL, without further ado, in order to do without the second one, comes in simply - expanding any key to the width of the primary key. Thus, if you have a primary key (ID), key (A, B, C) in the table, then in reality you have a second key not (A, B, C) , but (A, B, C, ID) . Hence morality — the fat primary key is evil.

It is necessary to indicate the difference in the caching of requests in different databases. If PostgreSQL / Oracle caches query plans (like prepare for some timeout), then MySQL just caches the query string (including the value of the parameters) and stores the result of the query. That is, if you sequentially select

 select AAA from BBB where CCC=DDD

several times, if DDD does not contain changing functions, and the AAA table has not changed (in the sense of the isolation used), the result will be taken directly from the cache. Pretty controversial improvement.

Thus, we believe that we do not just call the same request several times. Request parameters change, these tables change. The best option is to use a covering index. What index will be covering?

First, look at the order by clause. The used index must begin with the same columns as those mentioned in order by , in the same or completely reverse sorting. If the sorting is not direct or reverse, the index cannot be used. There is one thing here ... MySQL still does not support indexes with mixed sorts. The index is always asc . So if you have an order by A asc, B desc - say goodbye to using index .
Columns that are retrieved must be present in the covering index. Very often this is an impossible condition due to the infinite growth of the index, which, as we know, is evil. Therefore, there is a way around this point — using self join . That is, the division of the query on the selection of rows and data retrieval. First, we select only the primary key columns for a given condition (which is always present in the cluster index), and second, the result is join to the selector of all required columns using this primary key. Thus, we will have a clean using index in the first select, and eq_ref (the essence of multiple const ) for the second select. So we get something like:
```
 select AAA,BBB,CCC,DDD from tableName as a join tableName as b using (PK) «where over table b» 
```
Next clause where . Here, in the worst case, we can iterate over the entire index ( type = index ), but if possible we should strive to use functions that do not go beyond type = range ( >,> =, <, <=, like "xxx%" and so on) . The index used must include all fields from where , in order to save using index . As noted above - you can try to use index_merge - but often this is simply not possible with difficult conditions.

Actually, this is all that can be done for the case when we have only one type of request. Unfortunately, the MySQL optimizer may not always have a covering index and choose it to execute the query. Well, in that case it is necessary to help the optimizer with the help of standard use / force index hints.

Isolation of thick fields from the covering index - from thick to thin indexes

But what to do if we have requests of several types, or different sorts are required and thick fields ( varchar ) are used? Just count the size of the varchar (100) field index in a million entries. And if this field is used in different types of queries - for which we have different covering indexes? Is it possible to keep in memory only ONE index for this thick field, while maintaining the same (or almost the same) performance in different queries? So - the last point.

Thick and thin fields. Obviously, having several DIFFERENT options for keys using thick fields is an unaffordable luxury. Therefore, if possible, we should try to have only one key starting at the thick field. And here it is appropriate to use some artificial replacement conditions. That is, replace the condition on the thick field with the join according to the results of this condition. For example:
```
 select A from tableName where A=1 or fatB='test' 
```
instead of creating the key key (fatB, A), we will create a thin key key (A) and a thick key (fatB) . And rewrite the condition in the following way.
```
 create temporary table tmp as select PK from tableName where fatB='test'; select A from tableName left join tmp using (PK) where A=1 or tmp.PK is not null; 
```

Consequently, we can have many thin keys for different requests and only one thick fatB field. Real memory savings, with almost full performance.

Task for self-analysis

It is required to create a minimum number of keys (in terms of memory) and optimize queries of the form:

 select A,B,C,D from tableName where A=1 and B=2 or C=3 and D like 'test%'; select A,C,D from tableName where B=3 or C=3 and D ='test' order by B;

Suppose queries are not reducible to type = range .

Bibliography

High Performance MySQL, 2nd Edition
Optimization, Backups, Replication, and More
By Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Jeremy D. Zawodny, Arjen Lentz, Derek J. Balling
Publisher: O'Reilly Media
Released: June 2008
Pages: 712
www.mysqlperformanceblog.com

Source: https://habr.com/ru/post/154167/

All Articles