There is a weak laptop, a table of several million lines and you need to choose a different number of random lines in one query. Further samples do not interest us.
The table (test) has the following structure:
- pk_id (primary key)
- id (field filled with different numbers)
- value (field filled with rand ())
The primary key has no holes and starts with 1 .
Ways to get
ORDER BY rand + LIMIT
Getting one line:
SELECT pk_id FROMtestORDERBYrand() LIMIT1
The average execution time in MySQL is 6.150 seconds. ')
Let's try to take 100 entries.
SELECT pk_id FROMtestORDERBYrand() LIMIT100
Average Execution Time 6.170-6.180 seconds That is, the time difference between getting 1 and 100 random lines is not significant.
COUNT * rand ()
Getting one line:
SELECT t.pk_id FROMtestas t, (SELECTROUND(COUNT(pk_id)*rand()) as rnd FROMtestLIMIT1) t WHERE t.pk_id = rnd
Via
ROUND(COUNT(pk_id)*rand())
we get a random number from 0 to the number of rows in the table. Next, we assign an alias “rnd” to our random number and use it in WHERE for an equivalent comparison with pk_id. Average execution time - 1.04 seconds Next, you need to slightly modify this query, so that you can pull out a few lines. Add a few more fields to our subquery and change the WHERE check from "=" to IN
SELECT t.pk_id FROMtestas t, (SELECTROUND(COUNT(pk_id)*rand()) as rnd, ROUND(COUNT(pk_id)*rand()) as rnd2, ROUND(COUNT(pk_id)*rand()) as rnd3 FROMtestLIMIT1) t WHERE t.pk_id IN (rnd,rnd2,rnd3)
The average execution time is 1.163 seconds. As the number of rows received increases, the query execution time increases noticeably. It’s even scary to think about 100 lines :)
INFORMATION_SCHEMA + LIMIT
Getting one line:
SELECT t.pk_id FROMtestas t, (SELECTROUND((SELECT table_rows as tr FROM information_schema.tables WHERE table_name = 'test') *rand()) as rnd FROMtestLIMIT1) tmp WHERE t.pk_id = rnd
Using the subquery, we get the number of rows in the 'test' table, without using the aggregate function COUNT and further comparison occurs as in method 2. Average execution time - 0.042 seconds The minimum observed execution time is 0.003 seconds. Let's try to get 100 lines:
SELECT t.pk_id FROMtestas t, (SELECTROUND((SELECT table_rows as tr FROM information_schema.tables WHERE table_name = 'test') *rand()) as rnd FROMtestLIMIT100) tmp WHERE t.pk_id in (rnd) ORDERBY pk_id
Change in WHERE "=" to IN and change the limit of returned rows by the subquery to 100. Average execution time - 0.047 seconds Time to receive 1000 entries - 0.053 seconds Time to receive 10,000 records ~ 0.21 seconds And finally, we take 100,000 records in 1.9 seconds. The disadvantage of this approach is that the resulting number of rows from INFORMATION_SCHEMA is slightly larger than COUNT (*) and therefore, 7-8 rows are lost when returning 100,000 rows. At 1-100, there is almost no such thing (the larger the table, the less chance). But you can always take 1-2 lines more for reinsurance :)
MAX * rand ()
Getting one line:
SELECT t.pk_id FROMtestas t, (SELECTROUND((SELECTMAX(pk_id) FROMtest) *rand()) as rnd FROMtestLIMIT1) tmp WHERE t.pk_id = rnd
Average execution time - 0.001 seconds Getting 100 lines:
SELECT t.pk_id FROMtestas t, (SELECTROUND((SELECTMAX(pk_id) FROMtest) *rand()) as rnd FROMtestLIMIT100) tmp WHERE t.pk_id in (rnd) ORDERBY pk_id
Average execution time - 0.003 seconds
The fastest way, relative to the previous ones.
findings
The first way is good because in any case it will return you a random string, regardless of the holes in the fields and their initial value, but the slowest
The second method is much better suited to tables where there are no holes. Works 6 times faster than the first method (on returning one line).
The third method can be used at your own risk (which is very minor), because you can lose a line (s) when the rand () value is as close as possible to 1. The return rate per line differs 150 times from the first method. If we’re not returning 100 lines, but 99, then you can once again send a request to the server.
The fourth method is the fastest and 6000 times faster ORDER BY rand ()
UPD: In the case of holes in the table, when returning one line in the second and third way, you can do a non-equivalent check a> = and add LIMIT 1. Then the value will be returned, even if it falls into the “hole” Thanks xel for this remark. UPD2: Added 4 way to get. Thank you smagen for the idea.