The task of choosing random lines from a table quite often arises before developers.
If MySQL is used, it is usually solved in the following way:
SELECT *
FROM users
WHERE role_id=5
ORDER BY rand()
LIMIT 10
Such code works extremely slowly for large tables.
If the query does not need to use WHERE or the table is small, there are effective solutions, for example
habrahabr.ru/post/54176 or
habrahabr.ru/post/55864 .
But I did not find any ready-made solutions for a large table and the need to filter by condition, getting new values for each request, so my method description is under the cut.
As it turned out, MySQL does not know how to efficiently select random rows using ORDER BY rand () LIMIT N, where you need to filter rows by condition (although the same MSSQL copes with the selection of random rows from a table with a large number of records).
')
So, solving the problem "in the forehead," the query (in the table of 5 million records):
SELECT *
FROM users
WHERE role_id=5
ORDER BY rand()
LIMIT 10
The request took 41.3544 seconds, which is unacceptable for a long time. It is impossible to find the maximum and minimum id, and then select random id from the gap in this case: because of the WHERE condition, id is no longer in order and discharged.
My solution is as follows: the random_seed table is added, containing the id and random_seed fields, filled with random numbers, an index is added to this column, and the index is also added to the column that will be sampled.
Now, in order to select random lines by condition, the query needs to be changed as follows (in table 5 million records):
SELECT
u1.*
FROM
users u1,
random_seed rs
WHERE
u1.role_id=5 AND u1.id=(rs.id+random_from_php)
ORDER BY
rs.random_seed
LIMIT 10
The request took 0.0460 seconds, which is already more than an acceptable result. The variable random_from_php is generated by the code calling the request, which provides a random set of values for each request, this number will provide a sample of new random numbers. In the table, random_seed should be as many values as in the table, from which you need to take random rows + N records, where N is the maximum possible value of random_from_php.
A real example from my work is the selection of random categories of products from different combinations (total 4,000,000 entries):
“Normal”: request | "Accelerated" request: |
SELECT oc1. * FROM object_category oc1 WHERE oc1.region_id = 6 ORDER BY RAND () LIMIT 10
| SELECT oc1. * FROM object_category oc1, random_seed rs WHERE oc1.id = (rs.id + 564756) AND oc1.region_id = 6 ORDER BY rs.random_seed LIMIT 10
|
Lead time: | |
1.726 | 0.007s |
1.851s | 0.010s |
1.803s | 0.006s |
1.784 | 0.008s |
The advantages of the method described above:
+ The fastest possible way to select random rows from a table by condition
+ No need to re-generate random numbers for each of the rows in the table.
+ The request of all necessary values does not occur iteratively, in one request
Minuses:
- The need to enter an additional table
- The need to change the usual requests