📜 ⬆️ ⬇️

Some interesting MySQL features

In the not too distant past, I had to dig a little bit into the MySQL source code, and figure out some aspects of its work. In the course of working with a spatula, and experiments, I came across some very interesting features, some of which are just fun, and in the case of some, it is very interesting to understand what the programmer was guided by, who made the decision to do this.

Let's start with such an interesting type as ENUM.

mysql> CREATE TABLE enums(a ENUM('c', 'a', 'b'), b INT, KEY(a)); Query OK, 0 rows affected (0.36 sec) mysql> INSERT INTO enums VALUES('a', 1), ('b', 1), ('c', 1); Query OK, 3 rows affected (0.05 sec) Records: 3 Duplicates: 0 Warnings: 0 

')
So, we have a table, it has two columns. In the first, a , type ENUM, in the second, b , INT. In the table there are three lines, for all three, the value of b is 1. It is interesting, what are the minimum and maximum elements in column a ?

 mysql> SELECT MIN(a), MAX(a) FROM enums; +--------+--------+ | MIN(a) | MAX(a) | +--------+--------+ | c | b | +--------+--------+ 1 row in set (0.00 sec) 


It seems strange, it would be reasonable if the smallest were 'a', and the biggest - 'c'.
And what if you select the minimum and maximum only among those lines where b = 1? That is, among all the lines?

 mysql> SELECT MIN(a), MAX(a) FROM enums WHERE b = 1; +--------+--------+ | MIN(a) | MAX(a) | +--------+--------+ | a | c | +--------+--------+ 1 row in set (0.00 sec) 


This is how we forced MySQL to change its opinion on how to compare fields in ENUM, simply by adding a predicate.
The answer to this behavior is that in the first case MySQL uses an index, and in the second it does not. This, of course, does not explain why MySQL compares ENUMs differently for sorting in the index, and in the usual comparison.

The second example is simpler and more concise:

 mysql> (SELECT * FROM moo LIMIT 1) LIMIT 2; +------+ | a | +------+ | 1 | | 2 | +------+ 2 rows in set (0.00 sec) 


When I showed this query to my colleague who is developing the SQL parser, his question was not “why this query returns two lines”, but “how to write the SQL parser so that such a query is valid, without writing a rule specifically permitting such a request. "

Interestingly, not every SELECT in brackets will work, in particular, UNION in brackets is a syntax error:

 mysql> (SELECT * FROM moo UNION ALL SELECT * FROM hru) LIMIT 2; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'UNION ALL SELECT * FROM hru) LIMIT 2' at line 1 


Some more interesting examples under the cut

In general, it is not necessary to go far with an example of strange behavior with UNION and LIMIT:

 mysql> -> SELECT 1 FROM moo LIMIT 1 -> UNION ALL -> SELECT 1 FROM hru LIMIT 1; +---+ | 1 | +---+ | 1 | +---+ 1 row in set (0.00 sec) 


Suddenly, only one row returned, although both tables are not empty. Because the second LIMIT belongs to the entire request, not just the right side of the UNION.

Here we need to talk about such things as the shift-reduce conflict. In modern open source databases, the parser is often written in bison. Such a parser is a so-called L1 parser, which means that the parser must understand the purpose of the next token, looking no further than one token ahead. For example, in the query above, looking at the word LIMIT, the parser cannot understand whether this LIMIT belongs to the second request, or to the whole UNION. When the rules are written in such a way that situations are possible in which it is impossible to understand the purpose of the token by looking only at the next token, this is called a shift-reduce conflict. In this case, the parser will choose a solution based on a specific set of rules. This is very bad, because it leads to the fact that quite normal queries lead to errors. What if in the previous query I want to make LIMIT to both the second SELECT and UNION?

 mysql> SELECT 1 FROM moo -> UNION ALL -> SELECT 1 FROM hru LIMIT 1 -> LIMIT 2; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'LIMIT 2' at line 4 


This cannot be done because of the shift-reduce conflict. Looking at the first LIMIT parser does not yet know that there will be a second ahead, and mistakenly believes that the first limit applies to the entire request.
In PostgreSQL, there is no shift-reduce conflicts at all. Specifically, this situation is resolved there due to the fact that LIMIT can only be in UNION, but not in the SELECTs that it unites.
In MySQL, there are more than 160 such conflicts. This is amazing, because it means that there are 160 places where the parser may not correctly understand what is wanted of it.

A good example of such a conflict is connections. As you know, MySQL supports CROSS JOINs, which have no predicate, and INNER JOINs, which have a predicate. Generally speaking, CROSS JOIN and INNER JOIN are two different things, but in MySQL they are synonyms. That is, INNER JOIN may not have a predicate, while CROSS JOIN may have it. In particular, this leads to an interesting error:

 mysql> SELECT * FROM -> moo -> INNER JOIN -> hru -> INNER JOIN -> baa -> ON hru.a = baa.a -> ON moo.a = hru.a -> ; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ON moo.a = hru.a' at line 8 


At the moment when the parser sees the first ON, it still does not know that the second is waiting for it in front of it, and faces a choice: either it is ON for hru and baa, or hru and baa are connected without a predicate, and the current ON is ON for moo and the result of the connection of hru and baa. The parser erroneously chooses the second, which leads to an absolutely unnecessary error in this situation. If INNER JOIN is replaced with LEFT JOIN, for which the variant without a predicate does not exist, the query will be executed:

 mysql> SELECT * FROM -> moo -> LEFT JOIN -> hru -> LEFT JOIN -> baa -> ON hru.a = baa.a -> ON moo.a = hru.a -> ; +------+------+------+------+ | a | a | b | a | +------+------+------+------+ | 1 | 1 | 1 | 1 | | 2 | 2 | 2 | 2 | +------+------+------+------+ 2 rows in set (0.00 sec) 


The most interesting thing here is that in Bison you need to specify the number of shift-reduce conflicts directly in the code, otherwise the code will not compile. That is, at some point in time, one of the MySQL programmers made CROSS JOIN and INNER JOIN synonymous, which in itself makes no sense, after which he tried to compile the code, and he did not compile with a compilation error warning that the parser was no longer will be able to parse certain requests. To which the programmer, instead of doing everything right, found a constant indicating the number of errors in the parser, and incremented it.

Although, if we talk about what interesting decisions sometimes programmers in MySQL make, it is best to recall this story:
http://bugs.mysql.com/bug.php?id=27877
In it, one of the programmers deliberately made in the default collation for utf8 the letter 's' equal to the symbol 'Ăź'. This is very ironic, because the only language in which it would even remotely make sense is German, but it is this change that makes this collation completely inapplicable to the German language, because now the lines that are completely different from each other become are equal.
This change was not only useless, it also made the transition process from 5.0 to 5.1 for databases with utf8 strings in German very painful, because the unique indexes suddenly began to contain duplicate elements.

Speaking of collations, I still like this example very much:

Suppose we have a table with three rows with different collations:

 CREATE TABLE strings( swedish VARCHAR(100) COLLATE utf8_swedish_ci, spanish VARCHAR(100) COLLATE utf8_spanish_ci, bin VARCHAR(100) COLLATE utf8_bin ); 


Let's execute such request:

 mysql> SELECT * FROM strings WHERE swedish > bin AND swedish < spanish; ERROR 1267 (HY000): Illegal mix of collations (utf8_swedish_ci,IMPLICIT) and (utf8_spanish_ci,IMPLICIT) for operation '<' 


MySQL reasonably complains that it is impossible to compare swedish and spanish, because it is not clear how to compare them.
Let's write a completely identical query:

 mysql> SELECT * FROM strings WHERE swedish BETWEEN bin AND spanish; Empty set (0.00 sec) 


Suddenly, the request became valid, although it still has to compare the swedish and spanish string. And if I want the opposite?

 mysql> SELECT * FROM strings WHERE swedish BETWEEN spanish AND bin; ERROR 1270 (HY000): Illegal mix of collations (utf8_swedish_ci,IMPLICIT), (utf8_spanish_ci,IMPLICIT), (utf8_bin,IMPLICIT) for operation 'between' 


And vice versa is impossible.
If you delve into the code, you can understand that in MySQL BETWEEN is implemented in a completely strange way: if the first or second paramert have binary collation, then all strings will be compared as binary, and collation will be ignored. But if the third argument has a binary collation, then the same logic does not apply.

Speaking of how strangely the functions work in MySQL, let's finish this article with the most beautiful example.

 mysql> SELECT LEAST(9, 11); +--------------+ | LEAST(9, 11) | +--------------+ | 9 | +--------------+ 1 row in set (0.00 sec) 


There are no surprises

 mysql> SELECT LEAST("9", "11"); +------------------+ | LEAST("9", "11") | +------------------+ | 11 | +------------------+ 1 row in set (0.00 sec) 


This is also reasonable, line 11 is less than 9. What if 11 are added to 11?

 mysql> SELECT LEAST("9", "11") + LEAST("9", "11"); +-------------------------------------+ | LEAST("9", "11") + LEAST("9", "11") | +-------------------------------------+ | 18 | +-------------------------------------+ 1 row in set (0.00 sec) 


Of course, 18. It turns out, the function returns a different value depending on the context! Is it possible to force the same LEAST to return three different values ​​depending on the context? Turns out yes

 mysql> SELECT LEAST("9e1", "110"); +---------------------+ | LEAST("9e1", "110") | +---------------------+ | 110 | +---------------------+ 1 row in set (0.00 sec) mysql> SELECT LEAST("9e1", "110") + 0; +-------------------------+ | LEAST("9e1", "110") + 0 | +-------------------------+ | 90 | +-------------------------+ 1 row in set (0.00 sec) mysql> SELECT LEAST("9e1", "110") & -1; +--------------------------+ | LEAST("9e1", "110") & -1 | +--------------------------+ | 9 | +--------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +---------+------+------------------------------------------+ | Level | Code | Message | +---------+------+------------------------------------------+ | Warning | 1292 | Truncated incorrect INTEGER value: '9e1' | +---------+------+------------------------------------------+ 1 row in set (0.00 sec) 


Although here it must be said that in one case we met a warning. But we still managed to force the same operator with the same arguments to return three different values.

To make an even more amazing discovery, you need to get acquainted with the function NULLIF. This function takes two arguments, and returns NULL if they are equal, or the value of the first argument if they are not equal. Putting aside the question of why such a function exists at all, let's look at the result of the following two queries:

 mysql> SELECT NULLIF(LEAST("9", "11"), "11") + 0; +------------------------------------+ | NULLIF(LEAST("9", "11"), "11") + 0 | +------------------------------------+ | NULL | +------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT NULLIF(LEAST("9", "11"), "12") + 0; +------------------------------------+ | NULLIF(LEAST("9", "11"), "12") + 0 | +------------------------------------+ | 9 | +------------------------------------+ 1 row in set (0.00 sec) 


In the first case, we got NULL, which means that LEAST is really equal to the string "11". In the second case, in the same query, with the same types of arguments, but with a different constant in NULLIF, we got the value 9! That is, with exactly the same types of parameters, in the first case LEAST returned “11”, and in the second - 9.
But you can do even better:

 mysql> SELECT NULLIF(LEAST("9", "11"), "9") + 0; +-----------------------------------+ | NULLIF(LEAST("9", "11"), "9") + 0 | +-----------------------------------+ | 9 | +-----------------------------------+ 1 row in set (0.00 sec) 


In this query, LEAST returned something different from the string "9" (otherwise NULLIF would return NULL), but it returned the string "9" at the same time!
If you look at the code, then this is really what happens. LEAST is performed twice, the first time comparing the parameters as strings, and the second time as integers.

Source: https://habr.com/ru/post/246339/


All Articles