In everyday work, one encounters rather similar mistakes when writing queries.
In this article I would like to give examples of how NOT to write queries.
Selection of all fields SELECT * FROM table
When writing queries do not use a selection of all fields - "*". List only the fields that you really need. This will reduce the number of selectable and forwarding data. In addition, do not forget about covering indexes. Even if you actually need all the fields in the table, it is better to list them. First, it improves the readability of the code. When using an asterisk, it is impossible to know which fields are in the table without looking into it. Secondly, over time, the number of columns in your table can change, and if today there are five INT columns, then in a month TEXT and BLOB fields can be added, which will slow down the selection. ')
Requests in a loop. You need to be clear about the fact that SQL is a language that operates on sets. Sometimes programmers who are accustomed to thinking in terms of procedural languages ​​find it difficult to restructure thinking into the language of sets. This can be done quite simply by adopting a simple rule - “never execute queries in a loop”. Examples of how this can be done:
1. Samples $ news_ids = get_list ('SELECT news_id FROM today_news'); while ($ news_id = get_next ($ news_ids)) $ news [] = get_row ('SELECT title, body FROM news WHERE news_id ='. $ news_id);
The rule is very simple - the fewer requests, the better (although there are exceptions to this, as well as from any rule). Do not forget about the design of IN (). The following code can be written in one query: SELECT title, body FROM today_news INNER JOIN news USING (news_id)
2. Inserts $ log = parse_log (); while ($ record = next ($ log)) query ('INSERT INTO logs SET value ='. $ log ['value']);
It is much more efficient to glue and execute one query: INSERT INTO logs (value) VALUES (...), (...)
3. Updates Sometimes you need to update multiple rows in a single table. If the updated value is the same, then everything is simple: UPDATE news SET title = 'test' WHERE id IN (1, 2, 3).
If the changeable value for each record is different, then this can be done with such a query: UPDATE news SET title = CASE WHEN news_id = 1 THEN 'aa' WHEN news_id = 2 THEN 'bb' END WHERE news_id IN (1, 2)
Our tests show that such a query is performed 2-3 times faster than several separate queries.
Performing operations on indexed fields SELECT user_id FROM users WHERE blogs_count * 2 = $ value
In such a query, the index will not be used, even if the blogs_count column is indexed. In order for the index to be used, no transformations should be performed on the indexed field in the query. For such queries, take out the conversion functions to another part: SELECT user_id FROM users WHERE blogs_count = $ value / 2;
Similar example: SELECT user_id FROM users WHERE TO_DAYS (CURRENT_DATE) - TO_DAYS (registered) <= 10;
will not use the index across the registered field, whereas SELECT user_id FROM users WHERE registered> = DATE_SUB (CURRENT_DATE, INTERVAL 10 DAY); will be.
Fetching rows for counting only $ result = mysql_query ("SELECT * FROM table", $ link); $ num_rows = mysql_num_rows ($ result); If you need to select the number of rows that satisfy a specific condition, use the SELECT COUNT (*) FROM table query, and do not select all the rows to count them.
Fetching extra lines $ result = mysql_query ("SELECT * FROM table1", $ link); while ($ row = mysql_fetch_assoc ($ result) && $ i <20) { ... } If you need only n sample lines, use LIMIT instead of discarding extra lines in the application.
Using ORDER BY RAND () SELECT * FROM table ORDER BY RAND () LIMIT 1;
If the table has more than 4-5 thousand rows, then ORDER BY RAND () will work very slowly. It will be much more efficient to perform two requests:
If the auto_increment table has a primary key and there are no gaps: $ rnd = rand (1, query ('SELECT MAX (id) FROM table')); $ row = query ('SELECT * FROM table WHERE id ='. $ rnd);
or: $ cnt = query ('SELECT COUNT (*) FROM table'); $ row = query ('SELECT * FROM table LIMIT'. $ cnt. ', 1'); which, however, can also be slow with a very large number of rows in the table.
Using a large number of JOINs SELECT v.video_id a.name, g.genre FROM videos AS v LEFT JOIN link_actors_videos AS la ON la.video_id = v.video_id LEFT JOIN actors AS a ON a.actor_id = la.actor_id LEFT JOIN link_genre_video AS lg ON lg.video_id = v.video_id LEFT JOIN genres AS g ON g.genre_id = lg.genre_id
It must be remembered that when one-to-many tables are connected, the number of rows in the sample will grow with each successive JOIN. For such cases, it is faster to break such a request into a few simple ones.
Use LIMIT SELECT ... FROM table LIMIT $ start, $ per_page
Many people think that such a query will return $ per_page records (usually 10-20) and therefore will work quickly. It will work quickly for the first few pages. But if the number of records is large, and you need to perform a SELECT query ... FROM table LIMIT 1000000, 1000020, then to execute such a query, MySQL will first select 1000020 records, discard the first million and return 20. This may not be very fast. There are no trivial solutions to the problem. Many simply limit the number of available pages to a reasonable number. You can also speed up such queries using cover indexes or third-party solutions (for example, sphinx).
Failure to use ON DUPLICATE KEY UPDATE $ row = query ('SELECT * FROM table WHERE id = 1');
if ($ row) query ('UPDATE table SET column column = column + 1 WHERE id = 1') else query ('INSERT INTO table SET column = 1, id = 1');
Such a construction can be replaced by a single request, provided that there is a primary or unique key by the id field: INSERT INTO table SET column = 1, id = 1 ON DUPLICATE KEY UPDATE column = column + 1