Counting the total number of rows in a sample in mySQL using LIMIT

One good person wants to get on Habr. To confirm his good intentions, he wrote an article that I bring to you.

Surely many people know about the existence of the function FOUND_ROWS () in mySQL. It is most often used to count the total number of rows in a table that satisfy specified conditions. It is usually used in conjunction with the SQL_CALC_FOUND_ROWS directive as follows:

Mysql> SELECT SQL_CALC_FOUND_ROWS * FROM table WHERE column > 1 LIMIT 0, 50;
Mysql> SELECT FOUND_ROWS();

The result of the second query will be the total number of rows in the "table" table that satisfy the condition "column> 1" (and not just those that were returned by the first query).
Some features of this function should be taken into account when using it without the SQL_CALC_FOUND_ROWS directive, which the mySQL documentation conscientiously warns: in this case, it will return the total number of processed rows (not returned!). For example:
')
Mysql> SELECT * FROM table LIMIT 0, 50;
Mysql> SELECT FOUND_ROWS();

The result, as expected, will be "50". But the following requests will return not too expected result:

Mysql> SELECT * FROM table WHERE column > 1 LIMIT 50, 50;
Mysql> SELECT FOUND_ROWS();

Despite the fact that the first query returns 50 lines, the result will be “100”, since mySQL had to view exactly 100 lines.

In addition, when using UNION with SQL_CALC_FOUND_ROWS, some specific requirements also appear:

the SQL_CALC_FOUND_ROWS directive should appear only once and only in the first query;
accurate results are returned only when using UNION ALL, because when using UNION without ALL, some lines will be deleted, which is not taken into account by the FOUND_ROWS () function;
if global LIMIT is not used, then the SQL_CALC_FOUND_ROWS directive is not ignored.

The mySQL documentation directly recommends using this function to count the total number of rows in a table that satisfy the parameters of the query (for example, when splitting a list into pages), arguing that it is faster than two classically used queries like:

Mysql> SELECT * FROM table WHERE column > 1 LIMIT 50, 50;
Mysql> SELECT COUNT(*) FROM table WHERE column > 1;

Question: if the variant using the FOUND_ROWS () function is indeed faster, then why is it not used everywhere? I decided to compare both approaches. For this table was created:

CREATE TABLE `table_1` ( `id` int (10) unsigned NOT NULL auto_increment, `column_1` varchar (32) default NULL , `column_2` varchar (32) default NULL , PRIMARY KEY (`id`), KEY `column_2` (`column_2`) ) ENGINE=MyISAM AUTO_INCREMENT=1; * This source code was highlighted with Source Code Highlighter .

The table contains three columns: a numeric id field with an auto_increment and a PRIMARY key; the text column column_1 without an index and the text column with the index column_2. The table was filled with the following PHP script:

<?php for ($i = 0; $i < 457128; $i ++) { $db->insert ( 'table_1' , array ( 'column_1' , 'column_2' ), array ( md5(uniqid(rand(), true )), md5(uniqid(rand(), true )) ) ); } ?> * This source code was highlighted with Source Code Highlighter .

$ db is a wrapper for PHP functions for working with mySQL. In particular, it measures the execution time of queries to the database.
Further, the following types of samples were made:

“Standard” query with selection by PRIMARY key
SELECT * FROM `table_1` WHERE `id` < $id_limit LIMIT 1000
SELECT COUNT(*) FROM `table_1` WHERE `id` < $id_limit
SQL_CALC_FOUND_ROWS with selection by PRIMARY key:
SELECT SQL_CALC_FOUND_ROWS * FROM `table_1` WHERE `id` < $id_limit LIMIT 1000
SELECT FOUND_ROWS() AS count
“Standard” query with a selection of a column WITHOUT A KEY
SELECT * FROM `table_1` WHERE `column_1` LIKE '$column_1_limit%' LIMIT 1000
SELECT COUNT(*) FROM `table_1` WHERE `column_1` LIKE '$column_1_limit%'
SQL_CALC_FOUND_ROWS with selection by column WITHOUT KEY
SELECT SQL_CALC_FOUND_ROWS * FROM `table_1` WHERE `column_1` LIKE '$column_1_limit%' LIMIT 1000
SELECT FOUND_ROWS() AS count
“Standard” query with selection by VARCHAR column with KEY
SELECT * FROM `table_1` WHERE `column_2` LIKE '$column_2_limit%' LIMIT 1000
SELECT COUNT(*) FROM `table_1` WHERE `column_2` LIKE '$column_2_limit%'
SQL_CALC_FOUND_ROWS with selection by column WITHOUT KEY
SELECT SQL_CALC_FOUND_ROWS * FROM `table_1` WHERE `column_2` LIKE '$column_2_limit%' LIMIT 1000
SELECT FOUND_ROWS() AS count
"Standard" query with a sample of both columns
SELECT * FROM `table_1` WHERE `column_1` LIKE '$column_1_limit_2%' AND `column_2` LIKE '$column_2_limit_2%' LIMIT 1000
SELECT COUNT(*) FROM `table_1` WHERE `column_1` LIKE '$column_1_limit_2%' AND `column_2` LIKE '$column_2_limit_2%'
SQL_CALC_FOUND_ROWS with a sample of both columns
SELECT SQL_CALC_FOUND_ROWS * FROM `table_1` WHERE `column_1` LIKE '$column_1_limit_2%' AND `column_2` LIKE '$column_2_limit_2%' LIMIT 1000
SELECT FOUND_ROWS() AS count
"Standard" query with a sample of both columns + PRIMARY
SELECT * FROM `table_1` WHERE `column_1` LIKE '$column_1_limit_3%' AND `column_2` LIKE '$column_2_limit_3%' AND `id` < $id_limit_2 LIMIT 1000
SELECT COUNT(*) FROM `table_1` WHERE `column_1` LIKE '$column_1_limit_3%' AND `column_2` LIKE '$column_2_limit_3%' AND `id` < $id_limit_2
SQL_CALC_FOUND_ROWS with sample by both columns + PRIMARY
SELECT SQL_CALC_FOUND_ROWS * FROM `table_1` WHERE `column_1` LIKE '$column_1_limit_3%' AND `column_2` LIKE '$column_2_limit_3%' AND `id` < $id_limit_2 LIMIT 1000
SELECT FOUND_ROWS() AS count

Each query was executed 10 times with different parameters, while the parameters of the Standard and SQL_CALC_FOUND_ROWS queries of the same type on the same iteration are the same. Those.:

for ($i = 0; $i < 10; $i ++) { $id_limit = rand(15000, 20000); $id_limit_2 = rand(15000, 20000); $column_1_limit = rand(1, 9); $column_2_limit = rand(1, 9); $column_1_limit_2 = rand(10, 20); $column_2_limit_2 = rand(10, 20); $column_1_limit_3 = rand(20, 30); $column_2_limit_3 = rand(20, 30); // – } * This source code was highlighted with Source Code Highlighter .

Test results can be seen in the chart:

It turns out that everything is not so simple, as they say in the documentation of mySQL. When using a sample of columns with indices, the “classical” scheme is definitely faster. When using columns without indices, as well as mixed queries, the FOUND_ROWS () function becomes faster, but its gain is very insignificant.
Thus, the choice of approach is a matter of each specific case. When performance is not important and the convenience and speed of writing code are important, then FOUND_ROWS () + SQL_CALC_FOUND_ROWS can be used. Otherwise, apparently, it is preferable to use the classical scheme of the two queries.
Testing was conducted on two machines:

Windows XP SP3, Intel Pentium Core 2 Duo E8300 @ 2.83 GHz, 2 GB, mySQL 5.0.51a
Ubuntu 8.04, AMD Opteron 2344 HE Quad-Core, 4 GB, 5.0.51a-3ubuntu5.4

The chart shows the results in Windows XP. The results (not the sampling time) are the same for both systems.

Source: https://habr.com/ru/post/64655/

All Articles

Counting the total number of rows in a sample in mySQL using LIMIT

More articles: