📜 ⬆️ ⬇️

Mysql performance

The writing of this article is inspired by this trilogy: one , two , three . I wanted to add my $ 0.02, on the use of tricks and features.

So that I do not get lost in thoughts, in my first post, let's take examples of requests from a single open-source product. The “tricks” used also work on PostgreSQL, Oracle, SQLite, DB2 and are not MySQL focused, although the optimization is primarily aimed at MySQL InnoDB:

SELECT h.hostid,hg.groupid,h.name
FROM hosts h,hosts_groups hg
WHERE (hg.groupid IN ( '4' ))
AND hg.hostid=h.hostid
AND hg.groupid BETWEEN 000000000000000 AND 099999999999999
AND h.status IN (0,1)
AND EXISTS (
SELECT hh.hostid
FROM hosts hh, hosts_groups hgg, rights r, users_groups ug
WHERE hh.hostid=h.hostid
AND hh.hostid=hgg.hostid
AND r.id=hgg.groupid
AND r.groupid=ug.usrgrpid
AND ug.userid=3
AND r.permission>=3
AND NOT EXISTS (
SELECT hggg.groupid
FROM hosts_groups hggg, rights rr, users_groups gg
WHERE hggg.hostid=hgg.hostid
AND rr.id=hggg.groupid
AND rr.groupid=gg.usrgrpid
AND gg.userid=3
AND rr.permission<3
))
ORDER BY h.name ASC
LIMIT 1001


* This source code was highlighted with Source Code Highlighter .


Situation: there is an authorized user, the user belongs to a user group. There are hosts belonging to a group of hosts.
The bottom line: pull the hosts to which the user has access. Access is given to user groups on host groups.
')
The first thing I want to draw attention to is the EXISTS condition. Very rarely see the use of this structure. EXISTS is a subquery that checks for rows in a subquery. This query design allows you to manipulate the indices used in the query (both in the main query and in the subquery), regardless of the general query; moreover, if successful, the subquery stops at the first row found that satisfies the query. Key manipulation is often necessary when sorting is used in a query. Because MySQL cannot use different keys for searching and sorting.

If the query occurs on the same table, then sometimes you can use these tricks:

CREATE TABLE events (
eventid bigint unsigned NOT NULL ,
source integer DEFAULT '0' NOT NULL ,
object integer DEFAULT '0' NOT NULL ,
objectid bigint unsigned DEFAULT '0' NOT NULL ,
clock integer DEFAULT '0' NOT NULL ,
value integer DEFAULT '0' NOT NULL ,
acknowledged integer DEFAULT '0' NOT NULL ,
ns integer DEFAULT '0' NOT NULL ,
value_changed integer DEFAULT '0' NOT NULL ,
PRIMARY KEY (eventid)
) ENGINE=InnoDB;
CREATE INDEX events_1 ON events ( object ,objectid,eventid);
CREATE INDEX events_2 ON events (clock,eventid);


* This source code was highlighted with Source Code Highlighter .


EXPLAIN
SELECT eventid,clock, value
FROM events
WHERE objectid=17131
AND object =0
AND clock>=1325635327
ORDER BY clock DESC , eventid DESC ;


* This source code was highlighted with Source Code Highlighter .


+----+-------------+--------+------+-------------------+----------+---------+-------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+-------------------+----------+---------+-------------+------+-----------------------------+
| 1 | SIMPLE | events | ref | events_1,events_2 | events_1 | 12 | const,const | 113056 | Using where; Using filesort |
+----+-------------+--------+------+-------------------+----------+---------+-------------+------+-----------------------------+


For example, MySQL decided that using the "events_1" index is more advantageous in WHERE, this is understandable, because WHERE uses two key fields, but it did not take into account that the result contains 100k rows and they need to be sorted.
In this case, by changing the criteria of the first field of the selected MySQL index in the request:

EXPLAIN
SELECT eventid,clock, value
FROM events
WHERE objectid=17131
AND object +0=0
AND clock>=1325635327
ORDER BY clock DESC , eventid DESC ;


* This source code was highlighted with Source Code Highlighter .


+----+-------------+--------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+----------+---------+------+------+-------------+
| 1 | SIMPLE | events | range | events_2 | events_2 | 4 | NULL | 113056 | Using where |
+----+-------------+--------+-------+---------------+----------+---------+------+------+-------------+


MySQL cannot use indexes on modified field values; in this case, by applying arithmetic assignment, we force the index to use events_2, which is suitable for both data selection and sorting, as can be seen from EXPLAIN.

I pay attention that MySQL is not able to sort by an index if the used fields are sorted in a different order:

EXPLAIN
SELECT eventid,clock, value
FROM events
WHERE objectid=17131
AND object +0=0
AND clock>=1325635327
ORDER BY clock ASC , eventid DESC ;


* This source code was highlighted with Source Code Highlighter .


+----+-------------+--------+-------+---------------+----------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+----------+---------+------+------+-----------------------------+
| 1 | SIMPLE | events | range | events_2 | events_2 | 4 | NULL | 113056 | Using where; Using filesort |
+----+-------------+--------+-------+---------------+----------+---------+------+------+-----------------------------+


Also, the index is not used if the sorting takes place in fields that are in a different order than in the index. In general, MySQL stores indices as a B-tree , so using the fields in the middle or the end of the index will fail.

If it is impossible to use a single index for sampling and sorting, then it happened that MySQL did not properly evaluate and select the indexes to sort. This happens during the passage of a certain number of records in the table, when it becomes more profitable to use the MySQL index for sampling, and not for sorting or vice versa. Such moments are difficult to predict, and are detected by checking the same requests for different numbers of records.

Let's return to the first request. In my opinion, it is better to sort by the means of the executable script, of course, if the sample is within thousands of lines.
Firstly, in this case MySQL does not have to do a full selection from the table, but stops when LIMIT is reached;
Secondly, it is rarely when you really need to simultaneously display more information, unless it is just reports;
Thirdly, pages with a lot of information will be slow, even if it is just a drop-down list with 1000+ options, it is no longer usable;
Fourth, there is no natural sorting in MySQL;

But in real examples it is rarely possible, because it is necessary to give users the ability to sort, and even without specifying sorting, MySQL does not guarantee the same selection. Alternatively, the index can be put in the first field, then by which sorting occurs more often, thus MySQL will use one index for searching and for sorting.

COUNT, many make the second request for paginated output, but the same Google, although it says that it has found millions of matches +, really gives you about the first thousand and that's it. And on the last page will report that actually found less. So, choosing 1001 lines, we simply tell the user that 1000+ matches were found and there is no need to choose more at this stage. When the user requests more, then we will choose step by step on the 1st page. Checking for 1 line more than necessary.

Indices. As you query the tables, the indexes of these tables are cached in memory and remain there until the memory runs out and then they are “asked” to exit. So, if you have gigabytes of information, then the indices will take up ± 40% of the space, depending on the number of those indices. For example, you have a modest server with 16GB of RAM allocated for MySQL. When querying a table with an index weighing 10GB +, all the memory allocated for MySQL will be released and filled with this index and all previous cached indexes will be thrown into oblivion. Thus, by making one heavy request you can kill the entire performance of the server. What to do? There are many options, but I would not say that they are simple and converge to storing large> 10 million + tables on separate custodians, for example, BigTable , NoSQL or even NoSQL for MySQL , etc.

On this, for now. I will be glad to hear your decisions and advice on the above.

Source: https://habr.com/ru/post/135775/


All Articles