📜 ⬆️ ⬇️

Tips & tricks for MySQL Developers. Work with SQL


This article was conceived by me as a collection of some interesting points on the use and optimization of SQL queries in the MySQL database, in my opinion, poorly lit on the Internet. So, from the article you will learn about the with rollup construction, and how to rewrite the in and not in subqueries on joines, as well as update and delete data in several tables - with one query, and much more. Let's start in order.


We rewrite subqueries with in and not in to join


One of the most common subqueries are queries with in and not in. Moreover, we know that in MySQL join queries most often work faster due to the effective construction of a query execution plan by the optimizer (under certain conditions, as well as for other databases, this statement can be diametrically opposite), so we will try to rewrite the classic join subqueries. How are we going to do this? To begin with, let's clarify that in inquiries output all values ​​that are present in both tables, which means that the inner inner join will definitely correspond to such a query. Queries from not in on the contrary output all values ​​that are not present in the subquery, which means that they will already correspond to an outer outer join. So, a start, try to do something about it.

For example, I will use the world test database, which can be taken from the official mysql website here.
')
In the world database there are tables Country (countries) and CountryLanguage (official languages). We set ourselves the task of finding all the countries in which they speak at least one language. In other words, we are looking for territories with a population that has official languages. We write the usual in subquery:

SELECT * FROM Country WHERE Code IN (SELECT CountryCode FROM CountryLanguage) 


Note that this query can be rewritten as follows:

 SELECT * FROM Country c WHERE EXISTS ( SELECT 1 FROM CountryLanguage cl WHERE c.Code = cl.CountryCode ) 


Now, based on the assumption above, we rewrite the subquery to an inner join:

 SELECT c.Name FROM Country c INNER JOIN CountryLanguage cl ON c.Code = cl.CountryCode +-------------+ | Name | +-------------+ | Aruba | | Aruba | | Aruba | | Aruba | | Afghanistan | +-------------+ 5 rows in set (0.00 sec) 


Almost succeeded, but we had duplication of data, which we remove through the distinct option. The final version for all fields of the table would be:

 SELECT DISTINCT c.* FROM Country c INNER JOIN CountryLanguage cl ON c.Code = cl.CountryCode 


Fine! Sub query in successfully rewritten on join.

Now it's a little more difficult - rewrite not in to the outer join. The purpose of the request is all territories where people do not live and there are no official languages. Again, first I quote the standard not in subquery:

 SELECT Name FROM Country WHERE Code NOT IN (SELECT CountryCode FROM CountryLanguage) 


And I show it for not exists:

 SELECT Name FROM Country c WHERE NOT EXISTS ( SELECT 1 FROM CountryLanguage cl WHERE c.Code = cl.CountryCode ) 


As in the first case, we rewrite it to the left join:

 SELECT c.* FROM Country c LEFT JOIN CountryLanguage cl ON c.Code = cl.CountryCode 


As a result, we obtain, as in the first case, duplication of data, and, of course, the rows that did not find a pair value in the second table. It is these lines that give the solution to the problem, so we simply remove all the paired lines:

 SELECT c.* FROM Country c LEFT JOIN CountryLanguage cl ON c.Code = cl.CountryCode WHERE cl.CountryCode IS NULL 


With such simple conversions, we were able to help the query optimizer a little.

Comparing rows in subqueries


There are rare cases when we need to write a subquery, in which the comparison takes place not in one but several columns, but it would be clearly WRONG to write this:

 SELECT c.Name FROM City c WHERE c.ID = (SELECT Capital FROM Country WHERE Name='Finland') AND c.CountryCode = (SELECT Code FROM Country WHERE Name='Finland') 


For these purposes, there is a SQL query:

 SELECT c.Name FROM City c WHERE (c.ID, c.CountryCode) = (SELECT Capital, Code FROM Country WHERE Name='Finland') 


Such a query is called a string constructor and can be underlined by the ROW () function. In this case, we would write:

 SELECT c.Name FROM City c WHERE ROW(c.ID, c.CountryCode) = (SELECT Capital, Code FROM Country WHERE Name='Finland') 


Despite its appeal, string constructor has several limitations:
1. The subquery should return one row, not several
2. You cannot use the comparison operators <,> or <>, although this restriction can be circumvented with the special words all, any, in or exists

It should be noted that such a construction can be used not only for subqueries, but also in scalar expressions:

 SELECT Name, Population FROM Country WHERE (Continent, Region) = ('Europe', 'Western Europe') 


However, in practice, the query designer is not very effective for scalar expressions, so we rewrite the query to the normal form:

 SELECT Name, Population FROM Country WHERE Continent = 'Europe' AND Region = 'Western Europe' 


Update and delete data from multiple tables simultaneously.


Perhaps someone will surprise such a headline, but why not? Let's start by updating the data. Official documentation talks about the following syntax:

 UPDATE [LOW_PRIORITY] [IGNORE] table_references SET col_name1={expr1|DEFAULT} [, col_name2={expr2|DEFAULT}] ... [WHERE where_condition] 


Most likely, you will make a query of the form:

 UPDATE t1, t2 SET t1.name1 = t2.name2 WHERE t1.id = t2.id; 


On the other hand, no one bothers to make a query that will update the data in two, three or more tables at once:

 UPDATE t1, t2 SET t1.name1 = 'name1', t2.name2 = 'name2' WHERE t1.id = t2.id; 


True, it is unlikely to make sense, but, nevertheless, this is possible.

With the delete operation, things are even more interesting. Official documentation declares the following syntax:

 DELETE [LOW_PRIORITY] [QUICK] [IGNORE] tbl_name[.*] [, tbl_name[.*]] ... FROM table_references [WHERE where_condition] 


Or

 DELETE [LOW_PRIORITY] [QUICK] [IGNORE] FROM tbl_name[.*] [, tbl_name[.*]] ... USING table_references [WHERE where_condition] 


What corresponds to requests of the form:

 DELETE t1 FROM t1, t2 WHERE t1.id = t2.id; DELETE FROM t1 USING t1, t2 WHERE t1.id = t2.id; 


In these two queries, the deletion occurs from the t1 table, and t2 is used to create the data selection condition.

And you guessed it, to delete data from two tables at the same time, do this:

 DELETE t1, t2 FROM t1, t2 WHERE t1.id = t2.id; DELETE FROM t1, t2 USING t1, t2 WHERE t1.id = t2.id; 


A little about OLAP. WITH ROLLUP modifier


Perhaps those who have come across such industrial databases as Oracle or SQL Server will scream when reading the header: “Well, I must!”, But, alas, I will immediately cool their fiery exclamations. Since MySQL version 4.1.1, when the with rollup modifier appeared, this topic has not advanced a millimeter, so you cannot build any data cubes using the built-in tools of this database.

For those who do not know what the with rollup modifier means, I’ll briefly explain that it is used to create reports containing subtotals and a final final value. In the examples, I will use the world database again.

Suppose that we need to get the total and average number of people living in all geographic areas (regions), as well as on continents and around the world. If you decide to head on, we get the following queries:

The total and average number of people living in all geographic areas (regions):

 SELECT Continent, Region, SUM(Population) AS pop, AVG(Population) AS avg_pop FROM Country GROUP BY Continent, Region 


Total and average number of people living on all continents:

 SELECT Continent, SUM(Population) AS pop, AVG(Population) AS avg_pop FROM Country GROUP BY Continent 


The total and average number of people living around the world:

 SELECT SUM(Population) AS pop, AVG(Population) AS avg_pop FROM Country 


Instead of executing these queries and the subsequent complex merging of results, you can do just one:

 SELECT Continent, Region, SUM(Population) AS pop, AVG(Population) AS avg_pop FROM Country GROUP BY Continent, Region WITH ROLLUP; +---------------+---------------------------+------------+----------------+ | Continent | Region | pop | avg_pop | +---------------+---------------------------+------------+----------------+ | Asia | Eastern Asia | 1507328000 | 188416000.0000 | | Asia | Middle East | 188380700 | 10465594.4444 | | Asia | Southeast Asia | 518541000 | 47140090.9091 | | Asia | Southern and Central Asia | 1490776000 | 106484000.0000 | | Asia | NULL | 3705025700 | 72647562.7451 | | Europe | Baltic Countries | 7561900 | 2520633.3333 | | Europe | British Islands | 63398500 | 31699250.0000 | ... | Europe | Eastern Europe | 307026000 | 30702600.0000 | | Europe | Nordic Countries | 24166400 | 3452342.8571 | | Europe | Southern Europe | 144674200 | 9644946.6667 | ... | Europe | Western Europe | 183247600 | 20360844.4444 | | Europe | NULL | 730074600 | 15871186.9565 | | North America | Caribbean | 38140000 | 1589166.6667 | | North America | Central America | 135221000 | 16902625.0000 | | North America | North America | 309632000 | 61926400.0000 | | North America | NULL | 482993000 | 13053864.8649 | | Africa | Central Africa | 95652000 | 10628000.0000 | ... | Africa | Eastern Africa | 246999000 | 12349950.0000 | | Africa | Northern Africa | 173266000 | 24752285.7143 | | Africa | Southern Africa | 46886000 | 9377200.0000 | ... | Africa | Western Africa | 221672000 | 13039529.4118 | | Africa | NULL | 784475000 | 13525431.0345 | | Oceania | Australia and New Zealand | 22753100 | 4550620.0000 | ... | Oceania | Melanesia | 6472000 | 1294400.0000 | | Oceania | Micronesia | 543000 | 77571.4286 | | Oceania | Micronesia/Caribbean | 0 | 0.0000 | ... | Oceania | Polynesia | 633050 | 63305.0000 | | Oceania | NULL | 30401150 | 1085755.3571 | | Antarctica | Antarctica | 0 | 0.0000 | | Antarctica | NULL | 0 | 0.0000 | | South America | South America | 345780000 | 24698571.4286 | | South America | NULL | 345780000 | 24698571.4286 | | NULL | NULL | 6078749450 | 25434098.1172 | +---------------+---------------------------+------------+----------------+ 


Note that in some rows in non-aggregating columns, it is NULL, which indicates that the string is a subtotal. For example, the string

 | South America | NULL | 345780000 | 24698571.4286 | 


you need to read how in South America the total population is 345780000 people, and the average value is 24698571.4286

And here is the string

 | NULL | NULL | 6078749450 | 25434098.1172 | 


It is the final outcome in relation to the population of the entire globe.

The positive effect of the modifier with rollup is that the passage through the records occurs once! By the way, this functionality is very handy when displaying any statistics on the site (program). If you are interested in this functionality or have any questions, then I ask the official documentation for more details.

Startup option --i-am-a-dummy


"- i-am-a-dummy" is colloquially translated as - "I'm a dummy." This option is a synonym for the --safe-updates option, which is created for beginners and imposes a number of restrictions so that your favorite database does not become ill from your actions:
1. Update and delete requests are executed only if the condition where or limit is present
2. select a single table (without joines and subqueries) returns only the first 1000 rows, unless limit is explicitly specified
3. select queries with join or subquery process only the first 1,000,000 rows

To override these restrictions, run the following query with your own parameters:

 SET sql_safe_updates=1, sql_select_limit=1000, sql_max_join_size=1000000; 


To view the current settings:

 SELECT @@sql_safe_updates, @@sql_select_limit, @@sql_max_join_size 


It is also possible to parameterize settings when launching a client in a shell.

 mysql -u user -p -h host --i-am-a-dummy --select_limit=500 --max_join_size=10000 


Aesthetic comment


And for a snack. After a database other than MySQL, I was always surprised why MySQL in a single-line comment that looks like a double dash requires a space, tab or other control character after itself, although there should not be a mandatory control character. Agree, when you write a request and you need to quickly comment out a part of the code, it’s a very long time to put so many characters.

What I mean. In MySQL, we write this:

 -- SELECT 1 


(with a space before SELECT), and in other databases:

 --SELECT 1 


(without control character).

The solution was very simple. The fact is that if you write such a sloppy request

 UPDATE account SET credit=credit--1 


then the first dash will be interpreted as a minus, and the second -1, but not the comment. If after two dashes we place the control character, then only in this case we will receive a comment. Read more about this comment here.

As a result, most of the time in my work for single-line comments I use the hash mark (#), rather than a double dash with a control symbol :-)

UDP:
In the comments there are doubts about the expediency of rewriting requests from in to join. Below, my little benchmark.

Initial data:

 mysql> SELECT @@version; +-------------------------+ | @@version | +-------------------------+ | 5.5.32-0ubuntu0.12.04.1 | +-------------------------+ 1 row in set (0.00 sec) mysql> SHOW CREATE TABLE t11\G *************************** 1. row *************************** Table: t11 Create Table: CREATE TABLE `t11` ( `id` int(10) unsigned NOT NULL DEFAULT '0', `t11` varchar(45) NOT NULL, PRIMARY KEY (`id`), KEY `t11` (`t11`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> SELECT COUNT(*) FROM t11; +----------+ | COUNT(*) | +----------+ | 977360 | +----------+ 1 row in set (0.03 sec) mysql> SHOW CREATE TABLE t22\G *************************** 1. row *************************** Table: t22 Create Table: CREATE TABLE `t22` ( `id` int(10) unsigned NOT NULL DEFAULT '0', `t22` varchar(45) NOT NULL, PRIMARY KEY (`id`), KEY `t22` (`t22`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> SELECT COUNT(*) FROM t22; +----------+ | COUNT(*) | +----------+ | 962078 | +----------+ 1 row in set (0.04 sec) 


JOIN vs IN vs EXISTS

When working with the JOIN index (7.84 seconds), it loses much in comparison with IN (1.74 seconds) and EXISTS (2.44 seconds).
Below is an example, when the columns t11 and t22 without an index:
 mysql> SELECT SQL_NO_CACHE * -> FROM t11 -> WHERE t11 IN (SELECT t22 FROM t22) -> LIMIT 1000; +------+-------+ | id | t11 | +------+-------+ | 2 | 5718 | | ........ | | 1001 | 54192 | +------+-------+ 1000 rows in set (13.59 sec) mysql> SELECT SQL_NO_CACHE * -> FROM t11 c -> WHERE EXISTS ( -> SELECT 1 FROM t22 cl WHERE c.t11 = cl.t22 -> ) -> LIMIT 1000; +------+-------+ | id | t11 | +------+-------+ | 2 | 5718 | | ........ | | 1001 | 54192 | +------+-------+ 1000 rows in set (13.94 sec) mysql> SELECT DISTINCT SQL_NO_CACHE c.* -> FROM t11 c -> INNER JOIN t22 cl ON c.t11 = cl.t22 -> LIMIT 1000; +-------+-------+ | id | t11 | +-------+-------+ | 25 | 74376 | | ......... | | 34359 | 62666 | +-------+-------+ 1000 rows in set (3.36 sec) 

Put the limit, so long not to wait for an answer. It does not affect the result.

OUTER JOIN vs NOT IN vs NOT EXISTS

Without an index, these queries in MySQL 5.5 are processed in approximately the same time.
Below are examples using an index:
 mysql> SELECT SQL_NO_CACHE * -> FROM t11 -> WHERE t11 NOT IN (SELECT t22 FROM t22) -> LIMIT 1000; +--------+---------------------+ | id | t11 | +--------+---------------------+ | 1 | 0.23670763544431667 | | ........................ | | 891610 | 97811 | +--------+---------------------+ 91 rows in set (1.36 sec) mysql> SELECT SQL_NO_CACHE * -> FROM t11 c -> WHERE NOT EXISTS ( -> SELECT 1 FROM t22 cl WHERE c.t11 = cl.t22 -> ) -> LIMIT 1000; +--------+---------------------+ | id | t11 | +--------+---------------------+ | 1 | 0.23670763544431667 | | ........................ | | 891610 | 97811 | +--------+---------------------+ 91 rows in set (1.87 sec) mysql> SELECT SQL_NO_CACHE c.* -> FROM t11 c -> LEFT JOIN t22 cl ON c.t11 = cl.t22 -> WHERE cl.t22 IS NULL -> LIMIT 1000; +--------+---------------------+ | id | t11 | +--------+---------------------+ | 1 | 0.23670763544431667 | | ........................ | | 891610 | 97811 | +--------+---------------------+ 91 rows in set (1.27 sec) 


As a result, the result depends on the database version and the source data!

Source: https://habr.com/ru/post/196692/


All Articles