⬆️ ⬇️

MySQL Query Killer - DBMS Overload Fuse

It describes a procedure designed to protect a highly loaded system database from overloading.



After your queries are optimized, in theory you should not have situations where

1. One request blocks others

2. Some requests block each other.

We strive to ensure that such situations do not arise.



Therefore, a good "guardian of performance" will be a smart "Query killer",

which will track suspicious situations and release the database.

')

This killer allows a situation where the database performs a couple of heavy queries.

But when he sees that many long requests begin to appear, he begins to take action.



Purpose



Query killer is intended for:

1. Tracking situations when:

1.1. Some requests make locks that block other requests.

1.2. A large number of queries running in a short amount of time block each other

2. Logging of such situations

3. Resolving such situations by:

3.1. For situations where one query blocks others, kill the source of the locks: kill the longest query

3.2. For situations where requests block each other - try to free the database: kill a number of requests that are longer than some lower threshold.



So, even if your optimized cache generation procedures or work requests once again start to fail, Qeury-Killer will not allow the database to be bent due to cache regeneration, but will simply kill the regeneration and cause an error log.

(Regeneration of the cache should provide for the possibility of its death, and be generated in a temporary table, which is later renamed to the target table.)



Logic - pseudocode



Below is the pseudocode for Query-killer



-- CONSTANTS: SET @alert_query_duration:=10; SET @alert_queries_max_count:=6; SET @dangerous_query_duration:=15; SET @dangerous_queries_max_count:=30; SET @dangerous_queries_affect_duration_bottom:=5; -- Check long queries: SELECT @alert_queries_count:=COUNT(0) FROM information_schema.processlist WHERE COMMAND != 'Sleep' AND TIME>@alert_query_duration ORDER BY TIME DESC; IF (@alert_queries_count >= @alert_queries_max_count) THEN 1. Report to error_log: module = Query killer error = Maximum amount of long queries reached context = List of all long queries (which longer than @alert_query_duration) 2. Take actions: -- Check if there is a lot of dangerous queries already -- (which may cause each other locks): SELECT @dangerous_queries_count:=COUNT(0) FROM information_schema.processlist WHERE COMMAND != 'Sleep' AND TIME>@dangerous_query_duration ORDER BY TIME DESC; IF (@dangerous_queries_count >= @dangerous_queries_max_count) THEN -- KILL dangerous queries and queries, which probably might be already affected by them -- (so kill all queryes, where TIME>@dangerous_queries_affect_duration_bottom) ... ELSE -- KILL one query (the longes one) which probably locks all others ... END END 




Examples of situations



The following are examples of hazardous situations:



1. Requests make lock-and, and thus block other requests

For example, a request to generate a denomalized cache makes a long lock on a table, which must be constantly updated — a similar picture arises:

  97669965    root    localhost     gtf    289  Query CREATE TABLE gtf.cache_vw_... 99057101    root localhost:33092     gtf    284  Query UPDATE media INNER JOIN image_file USING(media_id) INNER JOIN i 99057467    root localhost:51863     gtf    276  Query UPDATE media INNER JOIN image_file USING(media_id) INNER JOIN i 99057499    root localhost:51868     gtf    275  Query UPDATE media INNER JOIN image_file USING(media_id) INNER JOIN i 99057840    root localhost:33164     gtf    267  Query UPDATE media INNER JOIN image_file USING(media_id) INNER JOIN i 99057907    root localhost:54313     gtf    266  Query UPDATE media INNER JOIN image_file USING(media_id) INNER JOIN i 99057987    root localhost:59942     gtf    264  Query UPDATE media INNER JOIN image_file USING(media_id) INNER JOIN i 99059528    root localhost:52062     gtf    229  Query UPDATE media INNER JOIN image_file USING(media_id) INNER JOIN i 


Here, the first request to create a cache table updates the media table update.



2. A large number of requests launched in a short period of time and block each other



These can be queries using Using temporary; Using filesort

 +------+-------------+-------+--------+----------------------------------+----------------------------------+---------+-----------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+--------+----------------------------------+----------------------------------+---------+-----------------+------+----------------------------------------------+ | 1 | SIMPLE | vw | ref | fk_cache_vw_video_website1 | fk_cache_vw_video_website1 | 4 | const | 8643 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | mtc | eq_ref | PRIMARY,content_id | PRIMARY | 4 | gtf.vw.media_id | 1 | Using where | +------+-------------+-------+--------+----------------------------------+----------------------------------+---------+-----------------+------+----------------------------------------------+ 




Which, nevertheless, quite reasonably performed in normal mode, but bend the database at peak load.



The script is useful in systems where the development methodology is applied without aiming at the "premature" optimization of all possible parts of the system,

and where the primary optimization is performed in obvious bottlenecks, and the subsequent optimization is revealed in the system that is in operation.

Source: https://habr.com/ru/post/151418/



All Articles