At the request of workers decided to write another article on query optimization in MySQL.
The last article on
habrahabr.ru/blogs/mysql/38907 addressed optimization issues of LIMIT, GROUP BY, COUNT.
In this article, I will return to the above and describe a couple of examples that I encountered recently on the project, after that I will give a few more small examples about what is good and what is bad in MySQL.
Returning to the topic of large LIMIT requests. For example, we have such a request, which pulls out of the database the pictures and user names that these pictures are laid out, and we pull this information for pictures with ID> 2500 and we are interested in the results from 5000 to 5100. (In this example, you should pay attention to request structure).
So, what should our query do in this case:
1. Of the many pages of `tx_localrep_images` as tli
2. From them we select those that satisfy the condition tli.uid> 2500
3. Further, because we have a LEFT JOIN, then the number of results (records) in the resulting sample does not depend on the gluing, therefore MySQL could well do the LIMIT on the sample that we have at this point (obtained after step 2) and only after that start gluing with the `fe_users` table. BUT MySQL does NOT do this!
And he does the following: he does the gluing first, and only then cuts off the first “unnecessary” 5000 records !!!
')
SELECT
SQL_NO_CACHE tli.`.uid`, tli.`caption`, fe`.username`
FROM
`tx_localrep_images` as tli
LEFT JOIN
`fe_users` as fe ON` tli`. cruser_id = fe.uid
WHERE
tli.uid> 2500
LIMIT
5000, 100;
* This source code was highlighted with Source Code Highlighter .
This is not the behavior that we expected, but it does not matter! There is a very simple solution to this problem with DERIVED TABLE. Here such a request is performed much faster (the difference in execution speed is directly proportional to the number of entries in LIMIT)
SELECT
SQL_NO_CACHE tli.`.uid`, tli.`caption`, fe`.username`
FROM
(
SELECT
tli.` uid`, tli.`caption`, tli.`cruser_id`
FROM
`tx_localrep_images` as tli
WHERE
tli.uid> 2500
LIMIT
5000, 100
) as tli
LEFT JOIN
`fe_users` as fe ON` tli`. cruser_id = fe.uid;
* This source code was highlighted with Source Code Highlighter .
After that, I decided to experiment a little how MySQL optimizer works with queries of the form SELECT (*)
Accordingly, there is such a standard query.
SELECT
SQL_NO_CACHE count (*)
FROM
`tx_localrep_images` as tli
WHERE
tli.uid> 500;
* This source code was highlighted with Source Code Highlighter .
There is such a less standard. I ask the distinguished community not to write that the LEFT JOIN in this query does not make sense, because does not affect the number of records in the resulting sample. This is clear to everyone ... everyone except MySQL optimizer :-), which will properly perform the gluing, and then it will calculate the number of records in it.
SELECT
SQL_NO_CACHE count (*)
FROM
`tx_localrep_images` as tli
LEFT JOIN
`fe_users` as fe ON` tli`. cruser_id = fe.uid
WHERE
tli.uid> 500;
* This source code was highlighted with Source Code Highlighter .
Therefore, this request for a relatively small data set (50,000 rows) is executed 20 times slower.
COUNT (*) vs COUNT (column_name)
Noticed more than once that many believe that COUNT (*) is a COUNT alias (column_name). This is absolutely not true.
First, these queries can return different results. This can occur when the column column_name may contain NULL values. Those. The COUNT construction (column_name) returns the number of records with column_name IS NOT NULL.
Secondly, these requests are executed at different speeds. For example, such a request
SELECT
SQL_NO_CACHE count (tli.` type`)
FROM
`tx_localrep_images` as tli
WHERE
tli.uid> 500;
* This source code was highlighted with Source Code Highlighter .
can perform much longer
SELECT
SQL_NO_CACHE count (*)
FROM
`tx_localrep_images` as tli
WHERE
tli.uid> 500;
* This source code was highlighted with Source Code Highlighter .
because a query using COUNT (*), if there is an index across the tli.uid field, will use the covering index and, accordingly, it will be executed very quickly. The first query will look for the COUNT method ROW SCAN, which is what “Extra: USING WHERE” says in the EXPLAIN of this query.
In fact, it is possible to make the first query use a covering index. To do this, add the following index to this table.
alter table `tx_localrep_images` add key cover_key (uid, type);
* This source code was highlighted with Source Code Highlighter .
But IMHO, rather than introducing additional indexes, for a single query that uses them, and which take up too much disk space and what is most noticeable in the cache, and also slows down our data changes (UPDATE, INSERT, DELETE), it’s better to use COUNT (*)
Fields declared as DEFAULT NULL
Also if I made a reservation and cited as an example saying that a column can be declared as DEFAULT NULL.
I’ll say right away that it’s better not to declare such columns at all if you don’t initially plan to store NULL values. There are many tasks where there is no need to store NULL values, however, many create tables containing NULL values. The storage of NULL values ​​complicates the work of the internal MySQL mechanism, namely, working with indexes on these fields and maintaining statistics on indexes, as well as comparing values ​​on such fields.
It is a number of reasons why there is a fixed-size index. It can be a null. Consider using zero, a special value, or an empty string instead. It’s not a problem. However, if you’re planning to index the columns, avoid them if possible. © - High Performance MySQL, Second Edition
The length of the field INT (length)
A little distracted, and I will say a few words about data storage, in general in this article I was not going to talk about it. but now there was a conversation with one grief optimizer ... Accordingly, I will add a couple of lines and about this, maybe among the community there are those who repeat his mistakes ...
As you know, it is better to select data types for columns that are minimal enough to store the entire range of values ​​in it. Those. sometimes people use int where they can calmly use tiny int, etc.
In MySQL, when creating tables, you can specify the length of the fields, something like `column` int (10) UNSIGNED NOT NULL
So, I found the creation of tables in which the field length values ​​were different from 1 to 20. As it turned out, some people think that MySQL will allocate such an amount of memory to store the field in order to accommodate its maximum value. Maybe badly explained. I will show an example. Take this ad:
`column` int (10) UNSIGNED NOT NULL
well, some think that mysql will do the following
1. create a number of 10 9-k (maximum number), i.e. 999999999
2. find the degree of 2-ki minimum sufficient (ie, the minimum of the large) to store this number, that is, in our case
2 ^ 30 = 1073741824
3. This value is rounded up to bytes, i.e. up to 32 = 4 bytes
In general, the whole algorithm described above has nothing to do with real life!
In fact, for MySQL, under the INT (1) field, the same amount of memory will be allocated for storage as under INT (20).
Because This value is used to display values, i.e. size of reserved space. For example, when we execute requests through the command-line interface, etc.
Storing IP addresses in the database.
By the way, I have already moved away from the topic of the article, so I’ll bring the last trick, which many people actually use, except for beginners, but for them I will write it.
Very often I see that people store an IP address in the database using the VARCHAR data type (15), this is very uneconomic, moreover, it works rather slowly in searches on the range
To store IP addresses in MySQL, there are 2-fii.
The first is INET_ATON
Allows you to convert a string consisting of 4 numbers separated by dots into an INT UNSIGNED value.
This is done according to this algorithm.
SELECT INET_ATON ('XYZJ');
X * 256 ^ 3 + Y * 256 ^ 2 + Z * 256 ^ 1 + J * 256 ^ 0
The INET_NTOA function performs the inverse conversion (from a number, to the usual IP address type separated by dots).
Accordingly, all we need is to create a UNSIGNED INT field in the table that will hold the number converted by the INET_ATON function.
Accordingly, if we want to, say, perform a search and retrieve all IP addresses from a subnet, say 255.255.0.0
Then you can perform such a request
SELECT
ip
FROM
`ips`
WHERE
`ips`.ip> INET_ATON ( '255.255.0.0' )
* This source code was highlighted with Source Code Highlighter .
That if there is an index on the ip field, it will be rather fast.
And the last thing is always think what the optimal type of data you can choose to store the field you need!
For example, to store a
subnet mask, some will choose VARCHAR (15), some will choose INT and use the INET_ATON, INET_NTOA functions.
But the correct options in this case is to select the TINY INT field to store the number of units.
On this probably finish. I'm tired of writing.
Shl. Leave an opinion in the comments whether you are interested or not. If not, write so: “Spare us, golden antelope! Enough! ”
ZZY. Thanks to the habr-community for advice on the layout of articles and the presentation of the material. In particular,% hlomzik% and% maxshopen%