Web project optimization strategy using MySQL

Introduction

In the life of any large web project, especially in PHP, but, in general, this applies to any server-side PL suitable for web development, usually comes the understanding that "you can not live like this", and that the time has come when you need to optimize site work, so he stopped to slow down (at least in production).

It is interesting that, as a rule, even heavy frameworks (like Symfony or RoR) in “slow” languages, work well enough in speed in the production environment, and the main “brakes” are caused by SQL queries and illiterate caching (for example, initialization is sufficient complex and large configuration of the project on symfony takes about 80 ms, and the page execution time, sometimes reaching a second or more).

If you were able to determine that this is your case and your project on MySQL, then this article can help you take concrete measures and correct the situation with fixing the result and preventing the emergence of overt problems with the DBMS afterwards.

Identify bottlenecks

To begin with, I would like to say that the real benefits of optimizing working with MySQL will not always be felt, so before this it is advisable to make sure that the problems are really caused by working with MySQL. Here it is quite difficult to give general advice, but in any case, it is worthwhile first to measure, for example, the total percentage of request execution time in relation to the page execution time (in the production environment) in your DBMS driver. If this percentage is of the order of 50 or higher, then the optimization of working with MySQL is more than reasonable.
')

What and how to measure

As a rule, even in the most “bad” (in terms of performance) frameworks and / or CMS there are some tools for tracking DBMS performance problems - usually, at least, the number of requests per page is always counted. Just in case, I will say that the number of requests per page is more than 100 - this is bad :), and it’s worth starting with.

Let's assume that the queries themselves in the DBMS are not so many, and it still monstrously slows down, and it is not very clear what exactly has the greatest impact.

Measurement of query execution times

Of course, the easiest and moderately effective way to track “slow” requests is to count the execution time of each request and display the execution times of requests on the screen (on the basis of which to conclude that some requests “inhibit”). The problem is that this method is more or less well suited only for the production environment, when the DBMS is well configured and the MySQL server has enough memory so that the I / O is minimal, etc., that is, when the query execution time it will really depend only on its complexity for execution and the CPU time required for it, and there will be a minimum of third-party factors affecting performance. And if query cache is enabled, simply repeating the same query will give completely different execution times (unless functions from the stop list are used ).

If you use this approach in the dev-environment, then, often, requests "slow down" not because they are complex, but simply because of an accidental extraneous load or a banal I / O. You can, of course, just refresh the page immediately after the request and perform the same queries again, but even this does not guarantee the absence of side effects (for example, some include the query cache and on the MySQL dev server - if so, then it is necessary, of course, disconnect immediately).

Since, often, the execution times of requests on a dev-base strongly “jump”, the threshold for slow-requests is set to very large values (in Propel with Symfony, this is, by default, 100 ms) or do not pay attention to requests that are slowly executed by writing everything off on i / o.

Despite all the shortcomings, of course, the time of execution of the request is a basic criterion for evaluating performance, and you just need to be able to interpret these same times correctly.

Automatic EXPLAIN for queries

You may be tempted to make one more query with all the “EXPLAIN” prefix for all SELECT queries and read the “rows” field for all unique id queries in EXPLAIN to evaluate the query complexity. The idea is good, but, often, difficult to implement due to difficulties with the implementation of the same prepared-request, for example. But this is not the biggest problem (the problem with the prepared statements still, in principle, can somehow be solved). The biggest problem is that EXPLAIN in MySQL is often brazenly lying .

I will give a simple example (for example, in the table there are 100,000 entries with random values of some_field):

Nameplate

CREATE TABLE some_table(id INT PRIMARY KEY AUTO_INCREMENT, some_field INT)

We execute a simple query that looks at 100,000 lines and does not find anything.

 SELECT * FROM some_table WHERE some_field = 100500 ORDER BY id LIMIT 100

We look at EXPLAIN on MySQL 5.1.49

 EXPLAIN SELECT * FROM some_table WHERE some_field = 100500 ORDER BY id LIMIT 100

EXPLAIN says: to the head, I have to look at 100 lines, nada, I will use the primare index, to the head, I wake up all good, I swear to my mother!

We execute a simple query that will scan 100 lines (or a little more) and immediately return them.

 SELECT * FROM some_table WHERE some_field <> 0 ORDER BY id LIMIT 100

We look EXPLAIN on MySQL 5.0.51

 EXPLAIN SELECT * FROM some_table WHERE some_field <> 0 ORDER BY id LIMIT 100

EXPLAIN says: to the crib, 100,000 lines I need to look at the nada, I’ll use the primare index, wake up the opsy, and cuddle! (yes, that's exactly what he says, I give my word :))

These were very simple examples so that you can easily understand that the number of lines in EXPLAIN should be treated with caution - after all, EXPLAIN does not execute the query, which means that it cannot know how many lines it actually needs to read lines, it only does assessment (and often wrong 10 times :))

Count the number of rows read

If you thought that the most advanced way to measure performance in MySQL is EXPLAIN, then you are mistaken (not so long ago, I thought so too :)).

In fact, the following queries:

 SET query_cache_type = OFF; --  query cache ,   FLUSH STATUS; --   :) SELECT * FROM some_table WHERE some_field = 100500 ORDER BY id LIMIT 100; --   SHOW SESSION STATUS LIKE 'handler_%' --  ;)!

It will give something similar to the following image:

To better understand what Handler is in MySQL, you can read, for example, the description of the HANDLER commands in MySQL . In short, Handler is a common interface for reading / writing rows (and not only) in MySQL for different types of tables. The name HandlerSocket for the corresponding interface in MySQL should also hint to you what kind of animal it is.

Thus, by completing the first 2 queries at the very beginning of the page (disabling query cache and resetting statistics), and at the very end of the page - the last query, you can get the total number of read / written lines from MySQL. For a project for which I was engaged in a similar optimization, I added alert () on a page in the dev version if more than 10,000 lines were read / written as a result of the page execution.

Of course, this is not a panacea, but it can seriously help you find and “neutralize” the slow pages on the site and take appropriate measures. This method absolutely doesn’t give a damn what this query is (it works even for INSERT INTO ... SELECT ...) and it always gives accurate statistics on the actions taken as a result of the query, even if EXPLAIN cannot say anything worthwhile.

The amount of information sent by the server

SESSION STATUS, in fact, contains much more information, such as information on traffic between a client and a SQL server (Bytes_%) - if you have “wide” tables, this may also be relevant (especially when working with ORM, who like to select all the columns from the tables, even if they are not needed there)

Actually, query optimization

After the bottlenecks were found, it is necessary to tackle the optimization of these requests and / or reconsider your view on what a certain block should do on the site.

I hope the information from this topic will be useful for you :). Using this technique, I was able to identify bottlenecks and reduce the execution time of most pages on the Symfony site from ~ 1000 ms to about 200-300 ms and add a tool to the dev-version to automatically prevent similar problems in the future. All this took me about 3 days for a site with ~ 20 MB of active "one's own" (about 100 MB in total) of the PHP code :). A lot or a little - think for yourself :)

Source: https://habr.com/ru/post/130905/

All Articles