📜 ⬆️ ⬇️

Selection of arbitrary records in MySQL

There seem to be ordinary tasks that can be solved immediately and without thinking, but with intensive use of such solutions, problems arise, and not small ones. I want to tell you about one of these tasks.


Problem

They took the outsourcer here to write a small and simple code in PHP and MySQL. There was one of the tasks - to select several arbitrary records from a table in the MySQL database. And what did this lazy and stupid outsourcer do? Of course I wrote a nonsense like this:
SELECT * FROM tTable ORDER BY RAND ( ) LIMIT 10 ;
At first glance, everything is logical and works correctly. 10 random entries are selected. But if you look at the execution plan for this query, it will become clear why I folded a dozen of raw curses against a stupid outsourcer.
During the execution of this query, MySQL writes to the temporary table all (!!!) rows of the original table, with one new field in which the results of the RAND () function are written - i.e. set of arbitrary values. Then this temporary table is sorted by filesort according to the added field with arbitrary values ​​and then the first 10 records are selected. Full gut. And now what will happen if there are 10,000 entries in the source table. And what if 1,000,000? And what if this sample needs to be done ten times per second. Yes, here any super-duper server will go into meditation for a long time.

But if you show a little sharpness (and the outsourcers do not want to think, they give up the work and go to drink money), then you can come up with an elegant and fast option, the speed of which does not depend on the number of rows in the table.
')

Idea

So let's start on the sly. First, simplify the task, suppose that we need to choose not 10, but only one record.
Everything is pretty simple here. We need to operate only with the number of records in the table, since the key can be any (composite, not numeric), as well as it can be “discharged” as a result of deleting records. First of all, let's find out the total number of records in the table:
SELECT COUNT ( * ) FROM tTable ;
Then simply calculate an arbitrary number from 0 to the number of records in this table.
rand_row = round(rand() * row_count);
Now, without problems, you can make a sample of an arbitrary record:
SELECT * FROM tTable LIMIT rand_row, 1;

PHP solution

So, they coped with the simplified task. Now you need to overcome the originally set, i.e. select 10 entries. The logic here is simple: you need to count 10 arbitrary numbers from 0 to the number of records in the table, and then make 10 queries like the previous one and combine them with UNION.
There are two options for how to do this: you can arrange it in the form of a piece of PHP code, or you can in the form of a MySQL stored procedure.
Everything is very simple in PHP:
$row_count = mysql_result ( mysql_query ( 'SELECT COUNT(*) FROM tTable;' ) , 0 ) ;
$query = array ( ) ;
while ( count ( $query ) < 10 ) {
$query [ ] = '(SELECT * FROM tTable LIMIT ' . rand ( 0 , $row_count ) . ', 1)' ;
}
$query = implode ( ' UNION ' , $query ) ;
$res = mysql_query ( $query ) ;

Everything is simple and fast. On the source table with ten thousand records, the performance increase is more than 12 times over the original “lazy” variant.
If there are not so many records in the source table and the appearance of duplicate rows in the sample is unacceptable - then you can preliminarily form a list of non-repeating arbitrary values, and then make a query on them.

MySQL solution

Alternatively, you can still do this in the form of a stored procedure:
CREATE PROCEDURE `spRandomSelect` ( IN aSchema VARCHAR ( 50 ) , IN aTable VARCHAR ( 50 ) , IN aNumRows INTEGER ( 11 ) )
NOT DETERMINISTIC
READS SQL D
BEGIN
DECLARE iQuery VARCHAR ( 10000 ) ;
DECLARE iNumRows INTEGER ( 11 ) ;

SET iNumRows = ( SELECT `TABLE _ ROWS` FROM `information _ schema` . `TABLES` t
WHERE t. `TABLE _ SCHEMA` = aSchema AND t. `TABLE _ NAME` = aTable ) ;
SET iQuery = '' ;
loop1: LOOP
SET iQuery = CONCAT ( iQuery , '(SELECT * FROM `' , aSchema , '`.`' , aTable ,
'` LIMIT ' , ROUND ( RAND ( UNIX_TIMESTAMP ( ) + aNumRows ) * iNumRows ) , ', 1)' ) ;
IF aNumRows > 1 THEN
SET iQuery = CONCAT ( iQuery , ' UNION ' ) ;
END IF ;
SET aNumRows = aNumRows - 1 ;
IF aNumRows > 0 THEN
ITERATE loop1 ;
END IF ;
LEAVE loop1 ;
END LOOP loop1 ;
SET @iQuery = iQuery ;
PREPARE iExecStmt FROM @iQuery ;
EXECUTE iExecStmt ;
DRP PREPARE iExecStmt ;
END ;

The performance of this solution is smaller than when preparing a compound query in PHP, but the point is to show the possibility of implementation on "pure" SQL.

Source: https://habr.com/ru/post/54176/


All Articles