📜 ⬆️ ⬇️

When is MIN (DATE)! = MIN (DATE)?

My friend Greg Youngblood inspired me to write this post, who showed me an interesting mystery in MySQL last week.

He ran Percona Server 5.5.21 with approximately the following table structure:

CREATE TABLE foo ( id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, uid INT UNSIGNED NOT NULL, update_time DATETIME NOT NULL, .... INDEX `uid` (uid, update_time), INDEX `bar` (some_other_columns) .... ) ENGINE=InnoDB; 

When he executed the following query:

 SELECT MIN(update_time) FROM foo WHERE update_time IS NOT NULL AND update_time <> '0000-00-00 00:00:00'; 

The result is returned 2012-06-22 10:28:16. However, when he executed a slightly different query:
')
 SELECT MIN(t.update_time) FROM (SELECT uid, MIN(update_time) AS "update_time" FROM foo WHERE update_time IS NOT NULL AND update_time <> '0000-00-00 00:00:00' GROUP BY 1) t; 

The answer that came back was completely different: 2011-08-22 11:27:27. This is the correct answer, the date is not from 2012. We came up with a few ideas on how to calculate the cause, and one of the suggestions was to force MySQL to use a different index. Imagine our surprise when we tried FORCE INDEX on bar or IGNORE INDEX (kid) and received a completely different answer: 2012-06-21 20:36:35.
So, when we tried the "obvious" option:

 SELECT update_time FROM foo ORDER BY update_time ORDER BY update_time LIMIT 3; 

And the result?

 0024-06-22 01:34:25 2011-08-22 11:27:27 2011-08-23 11:31:40 

Aha Curves data! We assumed that the MIN () function did some kind of conversion to unix_timestamp / integer or ctime; and indeed, the execution of SELECT MIN (UNIX_TIMESTAMP (update_time)) .... returns zero, while using CAST () or CONVERT () on the update_time field first (so that it will be explicitly treated as a string) returns the result 0024-06-22.

MySQL 5.5 documentation states that the supported range of dates in the DATETIME type is from 1000-01-01 00:00:00 to 9999-12-31 23:59:59. The sequel says that “For DATE and DATETIME,“ supported ”means that the values ​​can work less, but there is no guarantee.” It turned out that the application brought a strange datetime to the database, and due to the fact that the value passed the format check and fell into the database. Oops ...

What do we understand from this?

Unfortunately, the sql_mode task will not help you here. Although 0024-06-21 is technically outside the supported range for DATE / DATETIME, neither TRADITIONAL, STRICT_ALL_TABLES, or STRICT_TRANS_TABLES even throws out warning. Is it a bug or a feature? The choice is yours.
* When the documentation says “there is no guarantee that it will work,” it is better not to risk it.
* Even if a piece of data fits the expected format, it does not mean that it is the correct value. So perhaps this is the most important lesson of all this: Always, always check the range of entered data! One format check is not enough.
And the final moment - I managed to repeat the same situation with Percona Server 5.5.25a, ​​but in my case, this behavior was even weirder. I created a table with a similar structure:

 CREATE TABLE `foo` ( `i` int(11) NOT NULL AUTO_INCREMENT, `update_date` datetime NOT NULL, PRIMARY KEY (`i`), KEY `i` (`i`,`update_date`) ) ENGINE=InnoDB 

and then inserted several dummy lines with the date 0024-06-21. Then I dumped several thousand random datetime into this table using a simple Perl script. At first glance, everything looked as if I could not repeat my situation - executing the “SELECT MIN (update_time) FROM foo” for my table gave the correct answer (1058-11-06 00:00:00), which was the minimum date with a larger value than the minimum supported value 1000-01-01 00:00:00. But when I decided to move the dummy strings by changing their PK, this happened:

  : (root@localhost) [test]> select * from foo order by update_date limit 5; +-------+---------------------+ | i | update_date | +-------+---------------------+ | 1 | 0024-06-21 10:35:55 | | 2 | 0024-06-21 10:35:55 | | 3 | 0024-06-21 10:35:55 | | 4 | 0024-06-21 10:35:55 | | 1159 | 1058-11-06 00:00:00 | +-------+---------------------+ (root@localhost) [test]> select min(update_date) from foo; +---------------------+ | min(update_date) | +---------------------+ | 1058-11-06 00:00:00 | +---------------------+ (root@localhost) [test]> update foo SET i=i+100000 where i<5; Query OK, 4 rows affected (0.00 sec) Rows matched: 4 Changed: 4 Warnings: 0 

  : (root@localhost) [test]> select min(update_date) from foo; +---------------------+ | min(update_date) | +---------------------+ | 2024-06-21 10:35:55 | +---------------------+ (root@localhost) [test]> select update_date FROM foo order by update_date LIMIT 5; +---------------------+ | update_date | +---------------------+ | 0024-06-21 10:35:55 | | 0024-06-21 10:35:55 | | 0024-06-21 10:35:55 | | 0024-06-21 10:35:55 | | 1058-11-06 00:00:00 | +---------------------+ 

Very strange. On the one hand, it looks like MySQL uses date filtering for two-digit dates (this explains 2024-06-21), on the other hand, it had no effect on changing DATETIME data, and now definitely returns the wrong answer.

Conclusion: always check the input data!

Original article: Ernie Souhrada. When is MIN (DATE)! = MIN (DATE)?

Source: https://habr.com/ru/post/163845/


All Articles