Starting with version 5.1 in MySQL, there was such a useful feature as partitions. Of course, most database developers did not immediately disdain to use it. After a couple of years of work, I finally reaped the fruits of all the flawed implementation of this technology by MySQL AB experts ...
A small lyrical digression.
On October 5, a certain company called
Highload ++ invited Pyotr Zaitsev to the conference to conduct intensive one-day
training for MySQL developers . The man seems to be solid, and the description clearly indicated
The proposed training is a course of increased complexity.
...
The proposed course is an intensive one-day training. Listeners will be provided with personal computers with Linux OS. Time to complete assignments will be limited.
Oh wow, I think something new, some tasks, consideration of the most efficient architectures ... you have to go! Finally, I think the mind will teach the mind. What was my surprise when they started talking about MySQL at the conference only in the second half of the day, there were no tasks, and they didn’t give any clear notes to the Linux users either. Well, I’m not proud to sit and so, I think, we’ll listen, but when after lunch we got to the point that MySQL optimization could be achieved by studying query execution plans, and a discussion broke out in the hall about how to read query plans correctly, I couldn’t stand it. . At the next coffee break, Peter approached Peter and asked if there would be any really advanced information for the DB developers at the conference (for the last two hours), and received a wonderful response. Sort of
In general, advanced MySQL optimization courses do not exist in nature. The reason for this misunderstanding is that the bulk of the courses are designed for the states, and there the “Indians” are engaged in the development and they do not need in-depth knowledge to do this.
From it means how.
I also liked very much the idea expressed by Peter that MySQL is a big bunch of govnokod, which is not optimal for the very reason that it was written extremely crookedly, maybe from the best considerations, but it turned out what happened. And no matter how you optimize your database server, sooner or later you will encounter the fact that you are experiencing performance degradation precisely because of the fact that the source codes have continuous bugs and architectural miscalculations. The only possible way to get away from these problems is to use large amounts of RAM (512 Gb +) and SSD disks (as was shown at the conference, 2 Tb with 10,000 rewriting cycles are best suited for this, with their recording speed you will kill them for at least 3 years ).
Of course, I am not a proponent of solving problems by casting iron, instead of using a more efficient architecture, but problems with partitions have shown that his words are right.
Start
So, you decided to use partitioning in your project.
First, look at the code that opens the partitions and think you need it?
is_part = strstr(norm_name, "#P#"); retry: ib_table = dict_table_get(norm_name, TRUE); if (NULL == ib_table) { if (is_part && retries < 10) { ++retries; os_thread_sleep(100000); goto retry; }
YOUR MOTHER !!!
Well, with the tenth attempt it is 100% open!
The main steps that you most likely want to solve are:
- archiving (for example, in 99% of cases you only need data for the last month, then it makes sense to use monthly partitioning, discarding old data every month to the archive)
- data warehousing (you decided to make an OLAP system based on MySQL and instead of missing OLAP cubes there you use a bunch of pre-aggregated tables with hourly, daily, monthly and annual partitions)
- other cases (I will not consider them as they are more exotic and according to the most important classification, which makes you forget about the partitions in MySQL before version 5.6, these cases fall into the first two points)
I chose such a partition only by one criterion - the number of partitions. In the first case, there are hardly more than 100 in their second 10000+.
As you know, MySQL can work with a VERY large number of tables - 100,000 is an approximate moral threshold, at which you should think about changing the architecture. That means, by all logic, tens of thousands of partitions should not be a problem for him. Unfortunately, this is not the case.
Earlier, I worked with a DBMS that keeps all its metadata in RAM and queries on the dictionary do not cause her difficulties, for this reason, when implementing the support utilities for partitioning, I was not afraid to make a request from
INFORMATION_SCHEMA.PARTITIONS . It was a fatal mistake ... How can we find out exactly which partitions are in this table:
- maintain this list yourself
- perform a couple of requests (for example, daily partitions)
select count(1) into v_av_partitions_count from information_schema.partitions where table_name = i_table_name and substring(partition_name, length(i_table_name) + 2) = cast(date_format(i_data_date, '%Y%m%d') as signed) and table_schema = database(); select min(partition_name), min(partition_description) into v_future_partition, v_future_partition_desc from information_schema.partitions where table_name = i_table_name and substring(partition_name, length(i_table_name) + 2) > cast(date_format(i_data_date, '%Y%m%d') as signed) and table_schema = database();
from the data dictionary to get the necessary results, to create a new partition, or to make the reorganization existing.
I think it is not necessary to explain the minuses of the first decision, I’d better tell you why you can never use the second.
Houston we have problems
My problems started when I filled in the database structure (about 7000 empty partitions) and put the partitions on the event_scheduler every half an hour. As soon as the time came H (launching 2 requests from
INFORMATION_SCHEMA.PARTITIONS ), my laptop took off, producing the maximum possible load on the disks within 3-5 minutes. I thought strangely ... I have an empty database. Having rummaged, I understood that it is necessary to penetrate how the call of requests from this unfortunate table exactly works. PERFORMANCE_SCHEMA assured me that I was reading 7000 files, in view of the fact that I have a fairly trimmed database, I do these readings from the disk. Opening the source, I realized what the actual problem. Data dictionary - not stored in RAM. When you make a query from the table of the describing partition, the following operations occur:
- no matter what exactly you requested - the description of all tables along with all partitions is read from the disk
- if these are INNODB tables, then there is no statistics on them (and to display this view, we need approximate statistics on the number of records in each partition and some characteristics of the cardinality index and others like them), then we read not only the description of the partition but also do dynamic sampling of indices
- but that's not all, because most likely you have enabled adaptive cache, which stores some indexes in the form of a hash map for faster use
Thus, with a single query from the data dictionary, we
- we wash out tabular cache - shared for all threads
- wash buffer cache - used by all sessions
- we wash out the adaptive cache - writing and reading of which is controlled by just one lock to ensure its consistency
Here, for example, the trace gdb
captured during a request for
TABLE_ONE')
Thread 6 (Thread 0x7fccff1be700 (LWP 2561)):
Having seen such a pitiful result, I, of course, was extremely upset, because after rummaging through the source code, I realized that it was easy to fix it - it would not work. But it was not all.
The farther into the forest the more firewood
All manuals teach us that in MySQL there are two schemes for optimizing work with partitioned tables.
-
partition pruning static linking of partitions at the stage of parsing a query. Predictive analysis is used for analysis only on the basis of restrictions for the partitioned table, it will work between, more, less or equal
- partition selection dynamic linking at runtime, only predicates will be used by the condition equal to
I will not dwell on this in detail - this is the topic of a separate article. In short, the first case is when the partition is explicitly specified in the query itself, the second case is when the partition can be calculated at the stage of the query execution based on data from another table. As you understand, the second is better not to use - because if you change the plan, then you will fly through all the partitions, for this reason, everywhere I can use the first method of defining partitions as the fastest and most reliable, flat that only those partitions will be read - which are listed in the request ...
naive .
And so where am I here an ambush? The ambush itself is that partition pruning is executed after opening the tables, all the tables participating in the query. It doesn’t matter what you do. SELECT, INSERT, UPDATE. MySQL will perform the following manipulations:
- open all partitions of all tables participating in the query
- hangs on each partition a read lock according to the rules of the engine used
- if this INNODB will calculate statistics on the table and its indices without forgetting to hang up the X lock - well, how can it be without it
- and only after all this partition pruning will be made
Do not believe? Insert code
char* is_part = NULL; is_part = strstr(table->name, "#P#"); if (is_part) { ut_print_timestamp(stderr); fprintf(stderr, " TRACE dict_update_statistics %s %s\n", table->name, only_calc_if_missing_stats?"true":"false"); }
at the beginning of the procedure
dict_update_statistics which calculates statistics and the results will surprise you. With this, with default settings, the DBMS statistics recalculation is not cached, but it is done every time you try to get metadata.
In light of the above, pleases close bug
# 19588A spoon of honey in a barrel of tar
MySQL 5.6 will be released soon. They promise a lot of useful things like
- static dictionary for
INFORMATION_SCHEMA (I don’t know if there will be a partitions view there, but I really hope so)
- the
select * from table partition (p1)
construction using which partition pruning will work earlier than opening tables and setting locks
Conclusions or how to live with it
- if in your architecture there are many 1000+ partitions better think again and revise the architecture, the partition engine for MySQL is still not out of the sandbox
- set the parameter
innodb_stats_on_metadata = OFF
this will allow you not to re-read the partition statistics for each request, although this will still happen, but not so often
- increase the
ulimit -n
to the maximum possible number of partitions in your system by multiplying by 2
- increase the
table_open_cache
parameter to the maximum possible number of partitions in your system by multiplying by 2 this will allow you to prevent the table cache from being washed out
- if you work with INNODB tables, look towards the inclusion of the
myisam_use_mmap
parameter, MyISAM caches only indexes, and caching data files leaves the operating system; therefore, with increased load your dictionaries will go to disk, which of course is undesirable.
- do not make inquiries to the
INFORMATION_SCHEMA.PARTITIONS
table, although, as you understand it, you still will not be saved from brakes and locks on any DML operation across all partitions of the used tables.