📜 ⬆️ ⬇️

MySQL Select random strings in one query

What do we have?


There is a weak laptop, a table of several million lines and you need to choose a different number of random lines in one query. Further samples do not interest us.

The table (test) has the following structure:

The primary key has no holes and starts with 1 .

Ways to get


  1. ORDER BY rand + LIMIT

    image

    Getting one line:
    SELECT pk_id FROM test ORDER BY rand() LIMIT 1 

    The average execution time in MySQL is 6.150 seconds.
    ')
    Let's try to take 100 entries.
     SELECT pk_id FROM test ORDER BY rand() LIMIT 100 

    Average Execution Time 6.170-6.180 seconds
    That is, the time difference between getting 1 and 100 random lines is not significant.

  2. COUNT * rand ()

    image

    Getting one line:
     SELECT t.pk_id FROM test as t, (SELECT ROUND(COUNT(pk_id)*rand()) as rnd FROM test LIMIT 1) t WHERE t.pk_id = rnd 

    Via
     ROUND(COUNT(pk_id)*rand()) 
    we get a random number from 0 to the number of rows in the table.
    Next, we assign an alias “rnd” to our random number and use it in WHERE for an equivalent comparison with pk_id.
    Average execution time - 1.04 seconds
    Next, you need to slightly modify this query, so that you can pull out a few lines.
    Add a few more fields to our subquery and change the WHERE check from "=" to IN
     SELECT t.pk_id FROM test as t, (SELECT ROUND(COUNT(pk_id)*rand()) as rnd, ROUND(COUNT(pk_id)*rand()) as rnd2, ROUND(COUNT(pk_id)*rand()) as rnd3 FROM test LIMIT 1) t WHERE t.pk_id IN (rnd,rnd2,rnd3) 

    The average execution time is 1.163 seconds.
    As the number of rows received increases, the query execution time increases noticeably.
    It’s even scary to think about 100 lines :)

  3. INFORMATION_SCHEMA + LIMIT

    image

    Getting one line:
     SELECT t.pk_id FROM test as t, (SELECT ROUND((SELECT table_rows as tr FROM information_schema.tables WHERE table_name = 'test') *rand()) as rnd FROM test LIMIT 1) tmp WHERE t.pk_id = rnd 

    Using the subquery, we get the number of rows in the 'test' table, without using the aggregate function COUNT and further comparison occurs as in method 2.
    Average execution time - 0.042 seconds
    The minimum observed execution time is 0.003 seconds.
    Let's try to get 100 lines:
     SELECT t.pk_id FROM test as t, (SELECT ROUND((SELECT table_rows as tr FROM information_schema.tables WHERE table_name = 'test') *rand()) as rnd FROM test LIMIT 100) tmp WHERE t.pk_id in (rnd) ORDER BY pk_id 

    Change in WHERE "=" to IN and change the limit of returned rows by the subquery to 100.
    Average execution time - 0.047 seconds
    Time to receive 1000 entries - 0.053 seconds
    Time to receive 10,000 records ~ 0.21 seconds
    And finally, we take 100,000 records in 1.9 seconds.
    The disadvantage of this approach is that the resulting number of rows from INFORMATION_SCHEMA is slightly larger than COUNT (*) and therefore, 7-8 rows are lost when returning 100,000 rows. At 1-100, there is almost no such thing (the larger the table, the less chance). But you can always take 1-2 lines more for reinsurance :)

  4. MAX * rand ()

    Getting one line:
     SELECT t.pk_id FROM test as t, (SELECT ROUND((SELECT MAX(pk_id) FROM test) *rand()) as rnd FROM test LIMIT 1) tmp WHERE t.pk_id = rnd 

    Average execution time - 0.001 seconds
    Getting 100 lines:
     SELECT t.pk_id FROM test as t, (SELECT ROUND((SELECT MAX(pk_id) FROM test) *rand()) as rnd FROM test LIMIT 100) tmp WHERE t.pk_id in (rnd) ORDER BY pk_id 

    Average execution time - 0.003 seconds

    The fastest way, relative to the previous ones.


findings




UPD: In the case of holes in the table, when returning one line in the second and third way, you can do a non-equivalent check a> = and add LIMIT 1. Then the value will be returned, even if it falls into the “hole”
Thanks xel for this remark.
UPD2: Added 4 way to get. Thank you smagen for the idea.

Source: https://habr.com/ru/post/154905/


All Articles