📜 ⬆️ ⬇️

MySQL Performance real life Tips and Tricks. Part 3-rd.

I decided to continue the cycle of notes on this topic. In this article, I wanted to give a special place to MySQL query profiling. Describe the tools that MySQL provides for profiling, and what needs to be done to identify the bottlenecks of the query.

Also, after the publication of the first two articles, I received a couple of reviews and questions related to the database design / indexing / querying. He tried to answer many questions. I will share with some of them in this article.


Query profiling


Let's start with the standard query profiling tools using MySQL.
  1. EXPLAIN
  2. SHOW STATUS
  3. PROFILING

EXPLAIN

Using the EXPLAIN command, MySQL shows an approximate execution plan of the query (The description of all returned fields by the EXPLAIN command can be viewed at - dev.mysql.com/doc/refman/5.0/en/using-explain.html ). In fact, in this definition, the keyword is indicative , since EXPLAIN can be wrong :-) The reason for these errors is that the generation of the execution plan is part of the execution of the query (query execution), sometimes the construction of the execution plan occurs dynamically depending on the data. Using the EXPLAIN command, MySQL tries to simulate the execution of a query, but does not “touch” the data, and therefore does not have access to this dynamic component. MySQL does this assessment mainly relying on index statistics. Which, in turn, should always be kept up to date, and depending on the intensity of database requests to add / change / delete data, perform with a certain periodicity on a cron (say, every night) perform queries - ANALYZE TABLE (rebuilds the index tree, not allowing it to be listed in the list, this can happen if we, say, insert ordered data, then the effectiveness of the record search operation will not be O (logn), but O (n). I also note that you should not always run ANALYZE TABLE on production server, because MySQL locks the tables MyISAM (read lock), InnoDB (write lock), OPTIMIZE TABLE (defragmentation of the data file. Useful for large deletion of records from the table).
Also EXPLAIN gives far from all the information to assess the speed of the query. For example, if temporary tables are used during the query, MySQL will not show whether it is in-memory or in-disk temporary tables. Also, MySQL on the display will not show the cost of writing operations or performing functions used in the query.
')
Also in this paragraph I wanted to note the main thing - that the query performance depends on the number of records that were checked (investigated) by MySQL, and not on the number of records returned by the query.

SHOW STATUS

MySQL has long provided monitoring of system variables by means of the command - SHOW STATUS. Earlier (up to version 4.1.), This command showed the status of global variables, but since version 4.1. it became possible to show the state of session variables - SHOW SESSION STATUS. (The manual for this command is dev.mysql.com/doc/refman/5.0/en/show-status.html )
Let me just say that this command, unlike EXPLAIN, which is based on heuristic estimates, shows directly WHAT happened after the execution of the query, i.e. the number of records that MySQL physically accessed (moreover, through this command you can find out how many of them were received from memory, and how many by accessing the disk) I also want to note that this number is not an estimated result, but a real number (actually MySQL incrementing a variable all the time when referring to each next line). In general, I will say that SHOW STATUS returns a huge amount of statistics, I will not naturally describe it all. I will show the main points that should be noted during its (statistics) analysis.

In order to get statistics on the query of interest to us, you first need to run the command:

FLUSH STATUS - this command will reset (reset) all session variables, global variables will not be affected.

After that we fulfill the request that interests us.

SELECT bla-bla-bla FROM test_table as tt WHERE tt.bla = 'test' LIMIT 10

mysql> SHOW STATUS LIKE 'Key_read%';
+ -------------------- + ---------- +
| Variable_name | Value |
+ -------------------- + ---------- +
| Key_read_requests | 96882 |
| Key_reads | 36200 |
+ -------------------- + ---------- +

What can give such information? And what decisions to make, based on this information?

Now we will try to analyze the obtained values ​​and make a decision on optimization, based on these statistics.

In this case, we see that more than a third of the index blocks are read from the disk, and not from the cache => respectively, an adequate measure for optimization will be an increase in this parameter (key_buffer_size).
But you should also take into account the following: key_buffer_size is one of the most important configuration parameters, especially if you use MyISAM tables. Then the value of this parameter should be approximately 25-30% of the amount of free RAM. But you should also pay attention that this parameter is not unnecessarily large (for example, if the total size of the .MYI index files is 1GB and key_buffer_size = 4GB, in this case you just waste memory). You should also pay attention to the fact that the maximum value of this parameter is 4GB (as of MySQL 5.0.52, values ​​of 4GB are allowed for 64-bit platforms (except 64-). bit of windows for a small number of minutes with a warning). If you want to see how much money you can allocate, it will give you the amount of memory that you can allocate as much as you can. .ys.mysql.com/doc/refman/5.0/en/server-system-variables . html ). But even if you do not use MyISAM tables, the value of key_buffer_size should still be set to 32-64MB, since it will be used by indexes for temporary tables (MyISAM)

So, what other useful statistics can be obtained by more complex queries, for example, on this:

FLUSH STATUS;

SELECT
SQL_NO_CACHE fe.username, COUNT (*) as ` count`
FROM
`tx_images` as ti
INNER JOIN
`fe_users` as fe
ON ti.cruser_id = fe.uid
GROUP BY
fe.uid
ORDER BY
` count` desc ;
* This source code was highlighted with Source Code Highlighter .


| Variable_name | Value |
+ ------------------------ + ------- +
| Select_full_join | 0 |
| Select_full_range_join | 0 |
| Select_range | 0 |
| Select_range_check | 0 |
| Select_scan | 2 |
+ ------------------------ + ------- +

The last field shows that MySQL did the FULL TABLE SCAN, in fact, this information is confirmed by EXPLAIN, the Extra field, which contains Using temporary; Using filesort.
If our query contains several tables that are glued together during a query, then other values ​​in this list may not be null.

mysql> SHOW SESSION STATUS LIKE 'Sort%';
+ ------------------- + ------- +
| Variable_name | Value |
+ ------------------- + ------- +
| Sort_merge_passes | 0 |
| Sort_range | 0 |
| Sort_rows | 598 |
| Sort_scan | 1 |
+ ------------------- + ------- +
4 rows in set (0.00 sec)

Again, the ghost has a description of these fields from the manual.


It is also useful to know whether temporary (temporary) tables were created that were used during the execution of a query on disk or in memory.

mysql> SHOW SESSION STATUS LIKE 'Created%';
+ ------------------------- + ------- +
| Variable_name | Value |
+ ------------------------- + ------- +
| Created_tmp_disk_tables | 0 |
| Created_tmp_files | 0 |
| Created_tmp_tables | 3 |
+ ------------------------- + ------- +
3 rows in set (0.00 sec)

Here MySQL shows that inmemori tables were used, only for some reason it shows that 3 were created :-) In fact, only one is created for this query.

Here I will make a small lyrical digression.
After the publication of the first two articles, I received a couple of letters with questions and suggestions. So, as I understand it, quite a common mistake are flaws in terms of architectural solutions, which pull a very poor performance in sampling requests.
For example, if you have a table containing some articles / news, etc. and has approximately the following structure.

CREATE TABLE `tx_articles` (
`uid` int (11) NOT NULL auto_increment,
`pid` int (11) NOT NULL default '0' ,
`tstamp` int (11) NOT NULL default '0' ,
`crdate` int (11) NOT NULL default '0' ,
`cruser_id` int (11) NOT NULL default '0' ,
`deleted` tinyint (4) NOT NULL default '0' ,
`hidden` tinyint (4) NOT NULL default '0' ,
`headline` varchar (70) NOT NULL default '' ,
`bodytext` text NOT NULL ,
`type` int (11) NOT NULL default '0' ,
`citycat_id` int (11) NOT NULL default '0' ,
ʻeditable` int (11) NOT NULL default '0' ,
`confirm` int (11) NOT NULL default '0' ,
`confirm_code` varchar (64) NOT NULL default '' ,
ʻeditorspick` int (11) NOT NULL default '0' ,
`newspaper_id` int (11) NOT NULL default '0' ,
PRIMARY KEY (`uid`),
KEY `parent` (` pid`),
KEY `citycat_id` (` citycat_id`, `tstamp`)
KEY `newspaper_id` (` newspaper_id`)
KEY `tstamp` (` tstamp`)
) ENGINE = MyISAM DEFAULT CHARSET = latin1
* This source code was highlighted with Source Code Highlighter .


It is also assumed that the lion's part of the sample requests will be addressed to this table. Moreover, as a rule, such a table may contain a sufficiently large number of columns, some of which, as in the case above, have the TEXT or BLOB data type (`bodytext` text NOT NULL), as well as variable length fields (VARCHAR). All this complicates the work of MySQL especially in the operations of sorting / grouping records. Because Imagine that each record has an average length of 1Kb, and when MySQL will sort / group such records, the server will simply “choke” turning over such large amounts of data each time.
And this is not all the disadvantages of this approach. You should pay special attention to this - MySQL for temporary table uses a HEAP engine that works only with fields of fixed length, i.e. varchar it makes char-ohm (maximum length specified when declaring varchar). This is done, as you understand, for a quick search for records. But he does not know how to work with the TEXT and BLOB fields, so he will convert the table type to MyISAM and the data will be on the disk, and this is very slow.
Therefore, such an architecture should be avoided on large bases with a large number of hits and requiring quick response time.

What is proposed to do? In some cases, it may be useful to use covering indexes and self-join, this example was described in my first article ( http://habrahabr.ru/blogs/mysql/38907/ ), so I will not repeat ... I will not repeat ...: - )

More correct, in terms of the architectural approach, in this case is the selection of variable-length fields, which as a rule do not participate in WHERE conditions (TEXT, BLOB, depending on the task condition, this can be VARCHAR) into a separate table. Accordingly, the basic conditions for the sample, as a rule, on index fields with a fixed length, we perform on one table, also perform LIMIT on it and only after that we paste together, add additional fields, in my example it will be “bodytext” from another table. In this case, such gluing will be performed by PRIMARY KEY.

PROFILING

Since version 5.0.37, MySQL includes the ability to profile queries. This utility writes statistics on the execution of queries in the service database information_schema

In order for MySQL to start recording statistics on a query, set the variable profiling value to 1

mysql> set profiling = 1;
Query OK, 0 rows affected (0.00 sec)

* This source code was highlighted with Source Code Highlighter .


After that, execute the SQL query (s) we are interested in.

Next, perform the following query

mysql> show profiles;
+ ---------- + ------------ + ------------------------- ---------------------- +
| Query_ID | Duration | Query |
+ ---------- + ------------ + ------------------------- ---------------------- +
| 0 | 0.00005100 | set profiling = 1 |
| 1 | 0.80246730 | SELECT SQL_NO_CACHE fe.username, COUNT (*) as `count` FROM `tx_images` as ti INNER JOIN` fe_users` as fe ON ti.cruser_id = fe.uid GROUP BY fe.uid ORDER BY `count` desc |
+ ---------- + ------------ + ------------------------- ---------------------- +
2 rows in set (0.00 sec)
* This source code was highlighted with Source Code Highlighter .


As I said earlier, these statistics are recorded in the database - information_schema, the table is profiling, so we can get statistics on the execution of the query by making a query to this table.

mysql> select sum (duration) from information_schema.profiling where query_id = 1;
+ --------------- +
| sum (duration) |
+ --------------- +
| 0.80246730 |
+ --------------- +
1 row in set (0.00 sec)
* This source code was highlighted with Source Code Highlighter .


Then you can see the statistics of all stages of the query, this is done with the help of the command

mysql> show profile for query 1;
+ -------------------- + ------------ +
| Status | Duration |
+ -------------------- + ------------ +
| (initialization) | 0.00007300 |
| Opening tables | 0.00005100 |
| System lock | 0.00000600 |
| Table lock | 0.00002000 |
| init | 0.00002200 |
| optimizing | 0.00003400 |
| statistics | 0.00010600 |
| preparing | 0.00014800 |
| executing | 0.50000700 |
| Sending data | 0.30226800 |
| end | 0.00000700 |
| query end | 0.00000500 |
| freeing items | 0.00001300 |
| closing tables | 0.00000700 |
| logging slow query | 0.00000400 |
+ -------------------- + ------------ +
15 rows in set (0.00 sec)

Also, using profiling, you can view statistics not only on SELECT queries. You can view statistics even for queries that change the structure of ALTER TABLE tables and change / delete data. Accordingly, the stage of execution of these requests will be different.

Also, this type of profiling allows you to monitor CPU usage during each stage of the request execution, SWAP, etc.

Slow query log


A very useful tool for finding bottlenecks and slow queries to the database is logging slow queries, namely the directive --log-slow-queries when starting MySQL. Before MySQL version 5.1.21 , the minimum value of the long_query_time parameter is 1, the default value is 10. This value should be of the integer type. Those. if the query execution time exceeds the value of this parameter in seconds, then it is recorded in the log. It should also be noted that if the request goes into the slow query log, this does not mean that it is a bad request, maybe it waited a long time until lock was released. Also a useful parameter is logging queries that do not use indexes - log-queries-not-using-indexes .
This is very useful information, but if we have a large project and a loaded base, we need means of monitoring and visualizing such statistics. I personally use MONyog

editorial responses :-)


Terer will answer a couple of questions that I did not have time to answer:
During the discussion of the second article, I was in the comments ( http://habrahabr.ru/blogs/mysql/39260/#comment_941339 ) arguing with the habrachelean juks about the relevance of using force index in queries.

Consider, for example, such a table structure.

CREATE TABLE `some_table`
(
`id` INT (10) UNSIGNED NOT NULL AUTO_INCREMENT,
`field1` INT (10) NOT NULL ,
`field2` VARCHAR (50) NOT NULL default '' ,
`field3` TIMESTAMP ,
`field4` VARCHAR (255) default '' ,
PRIMARY KEY (`id`),
KEY fields1_2 (field1, field2 (8)),
KEY fields2_3 (field1, field3)
) ENGINE = InnoDB DEFAULT CHARSET = "UTF8";
* This source code was highlighted with Source Code Highlighter .


After that, fill it with data, for this we need, for example, such a script in PHP

<? php
set_time_limit (0);

mysql_connect ( "127.0.0.1" , "root" , "root_password" );
mysql_select_db ( "database_name" );

function random_string () {
$ string = "ABCDEFGHIKJLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" ;
$ len = strlen ($ string );
$ return = '' ;
$ length = rand (10, 50);
for ($ i = 0; $ i <$ length; $ i ++) {
$ return . = $ string {rand (0, $ len)};
}
return $ return ;
}

$ names = array ( "maghamed" , "maghamed1" , "maghamed2" , "maghamed3" , "maghamed4" );

mysql_query ( "ALTER TABLE` some_table` DISABLE KEYS " ); // stop updating non-unique indexes

for ($ i = 0; $ i <10,000,000; $ i ++) {
$ a = $ i% 1000;
$ b = rand (0, 4);
$ b = $ names [$ b];
$ c = random_string ();
$ sql = "INSERT INTO` some_table` SET field1 = $ {a}, field2 = '$ {b}', field4 = '$ {c}' " ;
mysql_query ($ sql) or die ( “Invalid query:„ . mysql_error ());
}

mysql_query ( “ALTER TABLE` some_table` ENABLE KEYS ” );
?> * This source code was highlighted with Source Code Highlighter .


The code in this script is intuitive. I’ll only pay attention to “ALTER TABLE` some_table` DISABLE KEYS ”before inserting a large number of records, and“ ALTER TABLE `some_table` ENABLE KEYS”
after insertion. These directives will significantly speed up the script and will be generally useful in similar situations. “ALTER TABLE ... DISABLE KEYS” stops updating non-unique indexes (in our case, “fields1_2”, “fields2_3”) when inserting new records. MySQL uses a special algorithm that is much faster than updating indexes after inserting each record, so disabling indexes before large data insertion should give significant acceleration.

And such a request to this table:

mysql> EXPLAIN
-> SELECT
-> *
-> FROM
-> `some_table`
-> WHERE
-> field1 = 50 AND
-> field2 = 'maghamed'
-> ORDER BY
-> field3 DESC
-> LIMIT 100
-> \ G
*************************** 1. row ******************** *******
id: 1
select_type: SIMPLE
table : some_table
type: ref
possible_keys: fields1_2, fields2_3
key : fields1_2
key_len: 30
ref : const, const
rows : 3042
Extra: Using where ; Using filesort
1 row in set (0.05 sec)
* This source code was highlighted with Source Code Highlighter .


As you can see, MySQL in this case prefers to use the “fields1_2” index, which facilitates quick lookups, but the sorting is performed without an index, the Extra - Using where field; Using filesort tells us this.

In this case, we can force MySQL to choose an index at which the query will be executed faster.

mysql> explain
-> SELECT
-> *
-> FROM
-> `some_table`
-> FORCE INDEX (`fields2_3`)
-> WHERE
-> field1 = 50 AND
-> field2 = 'maghamed'
-> ORDER BY
-> field3 DESC
-> LIMIT 100;
+ ---- + ------------- + ------------ + ------ + ---------- ----- + ----------- + --------- + ------- + ------ + ------- ------ +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + ------------ + ------ + ---------- ----- + ----------- + --------- + ------- + ------ + ------- ------ +
| 1 | SIMPLE | some_table | ref | fields2_3 | fields2_3 | 4 | const | 1928 | Using where |
+ ---- + ------------- + ------------ + ------ + ---------- ----- + ----------- + --------- + ------- + ------ + ------- ------ +
1 row in set (0.03 sec)
* This source code was highlighted with Source Code Highlighter .


* Let me say that MySQL can make a choice in favor of the index `fields2_3` and without FORCE INDEX it depends on the version of MySQL and the relevance of the statistics for the indices. Therefore, it is important to keep the indexes current (ANALYZE TABLE `some_table`). You shouldn’t force MySQL to choose between fast search, or sorting by index in this case, since Both variants will be executed more slowly than the variant in which we have a composite index for 3 fields.

First, remove the old index:

DROP INDEX `fields1_2` ON` some_table`
* This source code was highlighted with Source Code Highlighter .


And add a new one.

CREATE INDEX `fields1_2_3` ON` some_table` (field1, field2 (8), `field3`);
* This source code was highlighted with Source Code Highlighter .





Also after the first article where it was proposed to use the construction - GROUP BY BINARY crc32 (tag_text) for a quick grouping (GROUP BY) of the results, I received quite a lot of question, and doubts about this variant, since extra time is spent on performing crc32 , and many believe it will be ineffective, some suggested using MD5 instead of CRC32 to reduce the number of possible matches between the results of the function at different values.

I will answer all such doubts with such a small benchmark.

SET @ input : = 'hello world' ;
SELECT BENCHMARK (1000000, CRC32 (@ input )); * This source code was highlighted with Source Code Highlighter .

1 row in set (0.22 sec)

SET @ input : = 'hello world' ;
SELECT BENCHMARK (1000000, MD5 (@ input )); * This source code was highlighted with Source Code Highlighter .

1 row in set (6.02 sec)

As you can see function CRC32 runs very quickly.
* BENCHMARK - performs the specified f-iy CRC32 or MD5 in this case, the specified number of times 1 000 000

On this I will finish the third article, and so it was longer than planned. I will try to answer all the questions.
If the community is still not tired of my articles, then in the next article on this topic I wanted to talk about the problem of scale-ing large tables and other tastes of MySQL :-)

Source: https://habr.com/ru/post/39818/


All Articles