📜 ⬆️ ⬇️

Counting the total number of rows in a sample in mySQL using LIMIT

One good person wants to get on Habr. To confirm his good intentions, he wrote an article that I bring to you.

Surely many people know about the existence of the function FOUND_ROWS () in mySQL. It is most often used to count the total number of rows in a table that satisfy specified conditions. It is usually used in conjunction with the SQL_CALC_FOUND_ROWS directive as follows:

Mysql> SELECT SQL_CALC_FOUND_ROWS * FROM table WHERE column > 1 LIMIT 0, 50;
Mysql> SELECT FOUND_ROWS();

The result of the second query will be the total number of rows in the "table" table that satisfy the condition "column> 1" (and not just those that were returned by the first query).
Some features of this function should be taken into account when using it without the SQL_CALC_FOUND_ROWS directive, which the mySQL documentation conscientiously warns: in this case, it will return the total number of processed rows (not returned!). For example:
')
Mysql> SELECT * FROM table LIMIT 0, 50;
Mysql> SELECT FOUND_ROWS();

The result, as expected, will be "50". But the following requests will return not too expected result:

Mysql> SELECT * FROM table WHERE column > 1 LIMIT 50, 50;
Mysql> SELECT FOUND_ROWS();

Despite the fact that the first query returns 50 lines, the result will be “100”, since mySQL had to view exactly 100 lines.

In addition, when using UNION with SQL_CALC_FOUND_ROWS, some specific requirements also appear:

The mySQL documentation directly recommends using this function to count the total number of rows in a table that satisfy the parameters of the query (for example, when splitting a list into pages), arguing that it is faster than two classically used queries like:

Mysql> SELECT * FROM table WHERE column > 1 LIMIT 50, 50;
Mysql> SELECT COUNT(*) FROM table WHERE column > 1;

Question: if the variant using the FOUND_ROWS () function is indeed faster, then why is it not used everywhere? I decided to compare both approaches. For this table was created:
CREATE TABLE `table_1` (
`id` int (10) unsigned NOT NULL auto_increment,
`column_1` varchar (32) default NULL ,
`column_2` varchar (32) default NULL ,
PRIMARY KEY (`id`),
KEY `column_2` (`column_2`)
) ENGINE=MyISAM AUTO_INCREMENT=1;

* This source code was highlighted with Source Code Highlighter .


The table contains three columns: a numeric id field with an auto_increment and a PRIMARY key; the text column column_1 without an index and the text column with the index column_2. The table was filled with the following PHP script:

<?php
for ($i = 0; $i < 457128; $i ++) {
$db->insert ( 'table_1' , array ( 'column_1' , 'column_2' ), array ( md5(uniqid(rand(), true )), md5(uniqid(rand(), true )) ) );
}
?>

* This source code was highlighted with Source Code Highlighter .


$ db is a wrapper for PHP functions for working with mySQL. In particular, it measures the execution time of queries to the database.
Further, the following types of samples were made:


Each query was executed 10 times with different parameters, while the parameters of the Standard and SQL_CALC_FOUND_ROWS queries of the same type on the same iteration are the same. Those.:

for ($i = 0; $i < 10; $i ++) {
$id_limit = rand(15000, 20000);
$id_limit_2 = rand(15000, 20000);

$column_1_limit = rand(1, 9);
$column_2_limit = rand(1, 9);

$column_1_limit_2 = rand(10, 20);
$column_2_limit_2 = rand(10, 20);

$column_1_limit_3 = rand(20, 30);
$column_2_limit_3 = rand(20, 30);
// –
}

* This source code was highlighted with Source Code Highlighter .


Test results can be seen in the chart:

image

It turns out that everything is not so simple, as they say in the documentation of mySQL. When using a sample of columns with indices, the “classical” scheme is definitely faster. When using columns without indices, as well as mixed queries, the FOUND_ROWS () function becomes faster, but its gain is very insignificant.
Thus, the choice of approach is a matter of each specific case. When performance is not important and the convenience and speed of writing code are important, then FOUND_ROWS () + SQL_CALC_FOUND_ROWS can be used. Otherwise, apparently, it is preferable to use the classical scheme of the two queries.
Testing was conducted on two machines:

The chart shows the results in Windows XP. The results (not the sampling time) are the same for both systems.

Source: https://habr.com/ru/post/64655/


All Articles