When you execute a query, the MySQL query optimizer tries to come up with an optimal execution plan for this query. You can look at this very plan using the query with the keyword EXPLAIN. EXPLAIN is one of the most powerful tools at your disposal for understanding MySQL queries and optimizing them, but the sad fact is that many developers rarely use it. In this article, you will learn about what data EXPLAIN offers at the output and see an example of how to use it to optimize queries.
What does EXPLAIN offer?
Using the operator EXPLAIN is simple. It must be added to queries before the SELECT statement. Let's analyze the output to see the information returned by the command.
EXPLAIN SELECT * FROM categories
')
********************** 1. row ********************** id: 1 select_type: SIMPLE table: categories type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 4 Extra: 1 row in set (0.00 sec)
The conclusion may not look exactly the same, however, it will contain the same 10 columns. What is this returned column?
- id - the sequence number for each SELECT within the query (when there are several subqueries)
- select_type is a SELECT query type.
- SIMPLE - A simple SELECT query without subqueries or UNIONS
- PRIMARY - this SELECT is the outermost query in JOIN
- DERIVED - this SELECT is part of a subquery inside a FROM
- SUBQUERY - the first SELECT in a subquery
- DEPENDENT SUBQUERY - a subquery that depends on an external query
- UNCACHABLE SUBQUERY - not cached subquery (there are certain conditions for the query to be cached)
- UNION - the second or subsequent SELECT in UNION
- DEPENDENT UNION - second or subsequent SELECT in UNION, dependent on external query
- UNION RESULT - the result of UNION
- Table - the table to which the displayed row belongs
- Type - indicates how MySQL binds the tables used. This is one of the most useful fields in the output because it can report missing indexes or why a written query should be revised and rewritten.
Possible values:
- System - the table has only one row.
- Const - the table has only one matching row that is indexed. This is the fastest type of connection because the table is read only once and the value of the string can be perceived as a constant for further connections.
- Eq_ref - all parts of the index are used for binding. Indices used: PRIMARY KEY or UNIQUE NOT NULL. This is another best possible type of binding.
- Ref - all corresponding rows of the index column are read for each combination of rows from the previous table. This type of join for indexed columns looks like using the operators = or <=>
- Fulltext - the join uses the full-text index of the table
- Ref_or_null is the same as ref, but also contains rows with null for a column.
- Index_merge - the join uses a list of indexes to get the result set. The key column of the EXPLAIN command output will contain a list of used indexes.
- Unique_subquery - the IN subquery returns only one result from the table and uses the primary key.
- Index_subquery is the same as the previous one, but returns more than one result.
- Range - the index used to find the corresponding row in a certain range, usually when the key column is compared with a constant, using operators like: BETWEEN, IN,>,> =, etc.
- Index - scans the entire index tree to find the corresponding rows.
- All - Scan the entire table to find the corresponding rows. This is the worst type of compound and usually indicates a lack of suitable indices in the table.
- Possible_keys - shows the indexes that can be used to find rows in a table. In practice, they may or may not be used. In fact, this column can do a good job in query optimization, since the value NULL indicates that no suitable index has been found.
- Key - indicates the index used. This column may contain an index that is not listed in the possible_keys column. In the process of joining tables, the optimizer looks for the best options and can find keys that are not displayed in possible_keys, but are more optimal for use.
- Key_len is the length of the index that the MySQL optimizer chose to use. For example, a key_len value of 4 means that memory is required to store 4 characters. On this topic here is the link
- Ref - indicates columns or constants that are compared with the index specified in the key field. MySQL will choose either the constant value to compare, or the field itself, based on the query execution plan.
- Rows — Displays the number of records processed to produce the output. This is another very important field that gives reason to optimize queries, especially those that use JOINs and subqueries.
- Extra - contains additional information related to the query execution plan. Values ​​such as “Using temporary”, “Using filesort”, etc. can be an indicator of a problem query. You can find a full list of possible values here.
After EXPLAIN in the query, you can use the EXTENDED keyword and MySQL will show you additional information about how the query is executed. To see this information, you need to immediately after the request with EXTENDED execute the query SHOW WARNINGS. It is most useful to look at this information about the query that was executed after any changes made by the query optimizer.
EXPLAIN EXTENDED SELECT City.Name FROM City JOIN Country ON (City.CountryCode = Country.Code) WHERE City.CountryCode = 'IND' AND Country.Continent = 'Asia'
********************** 1. row ********************** id: 1 select_type: SIMPLE table: Country type: const possible_keys: PRIMARY key: PRIMARY key_len: 3 ref: const rows: 1 filtered: 100.00 Extra: ********************** 2. row ********************** id: 1 select_type: SIMPLE table: City type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 4079 filtered: 100.00 Extra: Using where 2 rows in set, 1 warning (0.00 sec)
SHOW WARNINGS
********************** 1. row ********************** Level: Note Code: 1003 Message: select `World`.`City`.`Name` AS `Name` from `World`.`City` join `World`.`Country` where ((`World`.`City`.`CountryCode` = 'IND')) 1 row in set (0.00 sec)
Find and fix performance problems with EXPLAIN.
Now let's look at how we can optimize a not-so-smart query by analyzing the output of the EXPLAIN command. There is no doubt that in existing working applications there are a number of tables with many links between them, but sometimes it is difficult to foresee the most optimal way to write a query.
I created a test database for an e-commerce application that does not have any indexes or primary keys, and demonstrate the impact of this not-so-good way to create tables using scary queries. You can download this dump table here -
github.com/phpmasterdotcom/UsingExplainToWriteBetterMySQLQueries EXPLAIN SELECT * FROM orderdetails d INNER JOIN orders o ON d.orderNumber = o.orderNumber INNER JOIN products p ON p.productCode = d.productCode INNER JOIN productlines l ON p.productLine = l.productLine INNER JOIN customers c on c.customerNumber = o.customerNumber WHERE o.orderNumber = 10101
If you look at the result (you only have to look at it in the example below, the link above contains a dump with the keys already added), you will see all the symptoms of a bad query.
UPDATE.
Here lies the corrected dump without indexes. For some reason, the indexes were originally added to the original author's dump.
But even if I write a better query, the result will be the same until I add indexes. The specified connection type is ALL (worst), which means that MySQL could not identify any keys that could be used during the connection. It also implies that possible_keys and key are NULL. Most importantly, the rows field shows that MySQL scans all the records for each table for a query. This means that it will scan 7 Ă— 110 Ă— 122 Ă— 326 Ă— 2996 = 91,750,822,240 records to find the appropriate four (remove EXPLAIN from the query, check it yourself). This is very bad and the number of these records will increase exponentially as the database grows.
Now let's add the obvious indexes, such as the primary key for each table, and run the query again. Taking this as a basic rule, you can use the columns used in JOINs as candidates for adding keys, since MySQL always scans them to find the relevant entries.
ALTER TABLE customers ADD PRIMARY KEY (customerNumber); ALTER TABLE employees ADD PRIMARY KEY (employeeNumber); ALTER TABLE offices ADD PRIMARY KEY (officeCode); ALTER TABLE orderdetails ADD PRIMARY KEY (orderNumber, productCode); ALTER TABLE orders ADD PRIMARY KEY (orderNumber), ADD KEY (customerNumber); ALTER TABLE payments ADD PRIMARY KEY (customerNumber, checkNumber); ALTER TABLE productlines ADD PRIMARY KEY (productLine); ALTER TABLE products ADD PRIMARY KEY (productCode), ADD KEY (buyPrice), ADD KEY (productLine); ALTER TABLE productvariants ADD PRIMARY KEY (variantId), ADD KEY (buyPrice), ADD KEY (productCode);
Let's execute our previous query after adding indexes. You will see this:
********************** 1. row ********************** id: 1 select_type: SIMPLE table: o type: const possible_keys: PRIMARY,customerNumber key: PRIMARY key_len: 4 ref: const rows: 1 Extra: ********************** 2. row ********************** id: 1 select_type: SIMPLE table: c type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: ********************** 3. row ********************** id: 1 select_type: SIMPLE table: d type: ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 4 Extra: ********************** 4. row ********************** id: 1 select_type: SIMPLE table: p type: eq_ref possible_keys: PRIMARY,productLine key: PRIMARY key_len: 17 ref: classicmodels.d.productCode rows: 1 Extra: ********************** 5. row ********************** id: 1 select_type: SIMPLE table: l type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 52 ref: classicmodels.p.productLine rows: 1 Extra: 5 rows in set (0.00 sec)
After adding indexes, the number of read records dropped to 1 Ă— 1 Ă— 4 Ă— 1 Ă— 1 = 4 For each record, order_number = 10101 in the orderdetails table means that MySQL was able to find the corresponding entries in all other tables using indexes and did not resort to to full table scan.
In the first output, you can use the connection type “const”, which is the fastest connection type for tables with more than one record. MySQL was able to use PRIMARY KEY as an index. The “ref” field displays “const”, which is nothing but the value 10101 specified in the query after the keyword WHERE.
We look at another request. In it, we choose the union of two tables, products and productvariants, each combined with productline. productvariants, which consists of different variants of products with a field ProductCode - a link to their prices.
EXPLAIN SELECT * FROM ( SELECT p.productName, p.productCode, p.buyPrice, l.productLine, p.status, l.status AS lineStatus FROM products p INNER JOIN productlines l ON p.productLine = l.productLine UNION SELECT v.variantName AS productName, v.productCode, p.buyPrice, l.productLine, p.status, l.status AS lineStatus FROM productvariants v INNER JOIN products p ON p.productCode = v.productCode INNER JOIN productlines l ON p.productLine = l.productLine ) products WHERE status = 'Active' AND lineStatus = 'Active' AND buyPrice BETWEEN 30 AND 50
********************** 1. row ********************** id: 1 select_type: PRIMARY table: <derived2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 219 Extra: Using where ********************** 2. row ********************** id: 2 select_type: DERIVED table: p type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 110 Extra: ********************** 3. row ********************** id: 2 select_type: DERIVED table: l type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 52 ref: classicmodels.p.productLine rows: 1 Extra: ********************** 4. row ********************** id: 3 select_type: UNION table: v type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 109 Extra: ********************** 5. row ********************** id: 3 select_type: UNION table: p type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 17 ref: classicmodels.v.productCode rows: 1 Extra: ********************** 6. row ********************** id: 3 select_type: UNION table: l type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 52 ref: classicmodels.p.productLine rows: 1 Extra: ********************** 7. row ********************** id: NULL select_type: UNION RESULT table: <union2,3> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: 7 rows in set (0.01 sec)
You may notice a number of problems in this query. It scans all entries in products and productvarians. Since In these tables there are no indices for the columns ProductLine and buyPrice, the possible_keys and key fields display NULL values. The status of the products and productlines tables is checked after UNION, so moving them inside UNION will reduce the number of entries. Add indexes.
CREATE INDEX idx_buyPrice ON products(buyPrice); CREATE INDEX idx_buyPrice ON productvariants(buyPrice); CREATE INDEX idx_productCode ON productvariants(productCode); CREATE INDEX idx_productLine ON products(productLine);
EXPLAIN SELECT * FROM ( SELECT p.productName, p.productCode, p.buyPrice, l.productLine, p.status, l.status as lineStatus FROM products p INNER JOIN productlines AS l ON (p.productLine = l.productLine AND p.status = 'Active' AND l.status = 'Active') WHERE buyPrice BETWEEN 30 AND 50 UNION SELECT v.variantName AS productName, v.productCode, p.buyPrice, l.productLine, p.status, l.status FROM productvariants v INNER JOIN products p ON (p.productCode = v.productCode AND p.status = 'Active') INNER JOIN productlines l ON (p.productLine = l.productLine AND l.status = 'Active') WHERE v.buyPrice BETWEEN 30 AND 50 ) product
********************** 1. row ********************** id: 1 select_type: PRIMARY table: <derived2> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 12 Extra: ********************** 2. row ********************** id: 2 select_type: DERIVED table: p type: range possible_keys: idx_buyPrice,idx_productLine key: idx_buyPrice key_len: 8 ref: NULL rows: 23 Extra: Using where ********************** 3. row ********************** id: 2 select_type: DERIVED table: l type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 52 ref: classicmodels.p.productLine rows: 1 Extra: Using where ********************** 4. row ********************** id: 3 select_type: UNION table: v type: range possible_keys: idx_buyPrice,idx_productCode key: idx_buyPrice key_len: 9 ref: NULL rows: 1 Extra: Using where ********************** 5. row ********************** id: 3 select_type: UNION table: p type: eq_ref possible_keys: PRIMARY,idx_productLine key: PRIMARY key_len: 17 ref: classicmodels.v.productCode rows: 1 Extra: Using where ********************** 6. row ********************** id: 3 select_type: UNION table: l type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 52 ref: classicmodels.p.productLine rows: 1 Extra: Using where ********************** 7. row ********************** id: NULL select_type: UNION RESULT table: <union2,3> type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: 7 rows in set (0.01 sec)
As you can see, as a result, the number of scanned lines decreased from 2,625,810 (219 Ă— 110 Ă— 109) to 276 (12 Ă— 23), which is an excellent gain in performance. If you execute the same query without previous permutations in the query immediately after adding the indices, you will not see such a decrease in the scanned rows. MySQL is not able to use indices when using WHERE in a derived result. After placing these conditions inside UNION it becomes possible to use indexes. This means that adding indexes is not always enough. MySQL will not be able to use them until you write suitable queries. (http://www.php.su/mysql/manual/?page=MySQL_indexes - additional information).
Total
The article discusses the keyword EXPLAIN, information on the output, and examples of how you can use the output of the command to improve queries. In the real world, this command may be more useful than in the scenarios discussed. Almost always you will join a series of tables together using complex constructions with WHERE. At the same time, simply adding indexes to tables does not always lead to the desired result. In this case, you need to review your requests.