⬆️ ⬇️

How to find out whether to optimize MySQL queries?

I am often asked the same question: how can I find queries that need to be optimized. After all, let's say, looking at the pt-query-digest report, we can easily find slow queries or queries that cause a greater load on the system - but how do we know if there is or is there no way to make the execution of this query faster? A complete answer to this question will definitely require complex analysis, since there are many ways to optimize queries. However, there is one very useful metric that you can apply - the ratio between the number of rows returned by the query and the rows passed.



Suppose we have an example:



# Time: 120911 17:09:44 # User@Host: root[root] @ localhost [] # Thread_id: 64914 Schema: sbtest Last_errno: 0 Killed: 0 # Query_time: 9.031233 Lock_time: 0.000086 Rows_sent: 0 Rows_examined: 10000000 Rows_affected: 0 Rows_read: 0 # Bytes_sent: 213 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 12F03 use sbtest; SET timestamp=1347397784; select * from sbtest where pad='abc'; 


The query in this case returned 0 rows (since we have no matches), but for this it had to go through 10 million rows. Which scenario would be preferable? If the request went through the same number of rows, which it will eventually return. In this case, if I place the indexes in the table, I will get the following entry in the slow query log, where all the slow queries fall:



 # Time: 120911 17:18:05 # User@Host: root[root] @ localhost [] # Thread_id: 65005 Schema: sbtest Last_errno: 0 Killed: 0 # Query_time: 0.000323 Lock_time: 0.000095 Rows_sent: 0 Rows_examined: 0 Rows_affected: 0 Rows_read: 0 # Bytes_sent: 213 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 12F14 SET timestamp=1347398285; select * from sbtest where pad='abc'; 


The value Rows_examined = 0, which coincides with Rows_sent, means that the query is reasonably well optimized. Notice, if you thought that in this case, accessing the database does not occur at all - you are mistaken. The index traversal is performed, but since only the rows that were found and returned to the top for processing by the MySQL part are counted, the Rows_examined value remains zero.

It would seem that everything is very simple, but it is too hasty conclusion. Such mathematics will work only with queries without aggregating functions / group by, plus only for queries that follow exactly one table. And what about queries that affect more than one table?

')

 # Time: 120911 17:25:22 # User@Host: root[root] @ localhost [] # Thread_id: 65098 Schema: sbtest Last_errno: 0 Killed: 0 # Query_time: 0.000234 Lock_time: 0.000063 Rows_sent: 1 Rows_examined: 1 Rows_affected: 0 Rows_read: 1 # Bytes_sent: 719 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 12F1D SET timestamp=1347398722; select * from sbtest a,sbtest b where a.id=5 and b.id=ak; mysql> explain select * from sbtest a,sbtest b where a.id=5 and b.id=ak; +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ | 1 | SIMPLE | a | const | PRIMARY,k | PRIMARY | 4 | const | 1 | | | 1 | SIMPLE | b | const | PRIMARY | PRIMARY | 4 | const | 1 | | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+ 2 rows in set (0.00 sec) 


In this case, we actually do the joining of two tables, but due to the fact that the type of access to the tables is specified as a “constant”, MySQL does not consider it because of access to two tables. In the case of “real” access, the output will be as follows:



 # Time: 120911 17:28:12 # User@Host: root[root] @ localhost [] # Thread_id: 65099 Schema: sbtest Last_errno: 0 Killed: 0 # Query_time: 0.000273 Lock_time: 0.000052 Rows_sent: 1 Rows_examined: 2 Rows_affected: 0 Rows_read: 1 # Bytes_sent: 719 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 12F23 SET timestamp=1347398892; select * from sbtest a,sbtest b where ak=2 and b.id=a.id; +----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+ | 1 | SIMPLE | a | ref | PRIMARY,k | k | 4 | const | 1 | | | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | sbtest.a.id | 1 | | +----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+ 2 rows in set (0.00 sec) 


In this case, we have 2 analyzed rows for each set of rows, which is to be expected, because we have 2 (logical) tables used in this query. This rule also will not work if you have a group request:



 # Time: 120911 17:31:48 # User@Host: root[root] @ localhost [] # Thread_id: 65144 Schema: sbtest Last_errno: 0 Killed: 0 # Query_time: 5.391612 Lock_time: 0.000121 Rows_sent: 2 Rows_examined: 10000000 Rows_affected: 0 Rows_read: 2 # Bytes_sent: 75 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0 # InnoDB_trx_id: 12F24 SET timestamp=1347399108; select count(*) from sbtest group by k; 


This query will return only 2 lines, although it will go through 10 million, and we cannot really optimize this query in a simple way, since passing through all the lines is really necessary to group the results.

In this case, you can think about removing from the group by query and aggregating functions. Then the query will turn into “select * from sbtest” , which will return all 10 million rows and, therefore, there will be no simple optimization methods.

This method was not created to give you a clear “yes or no” answer, but it can help you in any way what optimization you can achieve in the end. Suppose I have a query that uses an index of 1000 rows and returns 10 ... I can still have the opportunity to reduce the number of rows it passes by 100 times - using, for example, adding combined indexes.



So briefly - how can you quickly find out whether to optimize a query or not?

- see how many rows the query returns after deleting group by, distinct and aggregation functions (A)

- take the number of rows passed divided by the number of tables in the join (B)

- if B is less than or equal to A, your request is “perfect”

- if B / A is 10 or more. then this query is one of the most worthy candidates for optimization.



This is a simple method and it can be safely used with pt-query-digest, so the latter reports not only average values, but also marginal values.



Original article: here .

Source: https://habr.com/ru/post/151560/



All Articles