All lie or why in MySQL it is better not to use partitions

Starting with version 5.1 in MySQL, there was such a useful feature as partitions. Of course, most database developers did not immediately disdain to use it. After a couple of years of work, I finally reaped the fruits of all the flawed implementation of this technology by MySQL AB experts ...

A small lyrical digression.

On October 5, a certain company called Highload ++ invited Pyotr Zaitsev to the conference to conduct intensive one-day training for MySQL developers . The man seems to be solid, and the description clearly indicated

The proposed training is a course of increased complexity.
...
The proposed course is an intensive one-day training. Listeners will be provided with personal computers with Linux OS. Time to complete assignments will be limited.

Oh wow, I think something new, some tasks, consideration of the most efficient architectures ... you have to go! Finally, I think the mind will teach the mind. What was my surprise when they started talking about MySQL at the conference only in the second half of the day, there were no tasks, and they didn’t give any clear notes to the Linux users either. Well, I’m not proud to sit and so, I think, we’ll listen, but when after lunch we got to the point that MySQL optimization could be achieved by studying query execution plans, and a discussion broke out in the hall about how to read query plans correctly, I couldn’t stand it. . At the next coffee break, Peter approached Peter and asked if there would be any really advanced information for the DB developers at the conference (for the last two hours), and received a wonderful response. Sort of

In general, advanced MySQL optimization courses do not exist in nature. The reason for this misunderstanding is that the bulk of the courses are designed for the states, and there the “Indians” are engaged in the development and they do not need in-depth knowledge to do this.

From it means how.
I also liked very much the idea expressed by Peter that MySQL is a big bunch of govnokod, which is not optimal for the very reason that it was written extremely crookedly, maybe from the best considerations, but it turned out what happened. And no matter how you optimize your database server, sooner or later you will encounter the fact that you are experiencing performance degradation precisely because of the fact that the source codes have continuous bugs and architectural miscalculations. The only possible way to get away from these problems is to use large amounts of RAM (512 Gb +) and SSD disks (as was shown at the conference, 2 Tb with 10,000 rewriting cycles are best suited for this, with their recording speed you will kill them for at least 3 years ).
Of course, I am not a proponent of solving problems by casting iron, instead of using a more efficient architecture, but problems with partitions have shown that his words are right.

Start

So, you decided to use partitioning in your project.
First, look at the code that opens the partitions and think you need it?

is_part = strstr(norm_name, "#P#"); retry: /* Get pointer to a table object in InnoDB dictionary cache */ ib_table = dict_table_get(norm_name, TRUE); if (NULL == ib_table) { if (is_part && retries < 10) { ++retries; os_thread_sleep(100000); goto retry; }

YOUR MOTHER !!!
Well, with the tenth attempt it is 100% open!
The main steps that you most likely want to solve are:

archiving (for example, in 99% of cases you only need data for the last month, then it makes sense to use monthly partitioning, discarding old data every month to the archive)
data warehousing (you decided to make an OLAP system based on MySQL and instead of missing OLAP cubes there you use a bunch of pre-aggregated tables with hourly, daily, monthly and annual partitions)
other cases (I will not consider them as they are more exotic and according to the most important classification, which makes you forget about the partitions in MySQL before version 5.6, these cases fall into the first two points)

I chose such a partition only by one criterion - the number of partitions. In the first case, there are hardly more than 100 in their second 10000+.
As you know, MySQL can work with a VERY large number of tables - 100,000 is an approximate moral threshold, at which you should think about changing the architecture. That means, by all logic, tens of thousands of partitions should not be a problem for him. Unfortunately, this is not the case.
Earlier, I worked with a DBMS that keeps all its metadata in RAM and queries on the dictionary do not cause her difficulties, for this reason, when implementing the support utilities for partitioning, I was not afraid to make a request from INFORMATION_SCHEMA.PARTITIONS . It was a fatal mistake ... How can we find out exactly which partitions are in this table:
- maintain this list yourself
- perform a couple of requests (for example, daily partitions)

  select count(1) into v_av_partitions_count from information_schema.partitions where table_name = i_table_name and substring(partition_name, length(i_table_name) + 2) = cast(date_format(i_data_date, '%Y%m%d') as signed) and table_schema = database(); select min(partition_name), min(partition_description) into v_future_partition, v_future_partition_desc from information_schema.partitions where table_name = i_table_name and substring(partition_name, length(i_table_name) + 2) > cast(date_format(i_data_date, '%Y%m%d') as signed) and table_schema = database();

from the data dictionary to get the necessary results, to create a new partition, or to make the reorganization existing.
I think it is not necessary to explain the minuses of the first decision, I’d better tell you why you can never use the second.

Houston we have problems

My problems started when I filled in the database structure (about 7000 empty partitions) and put the partitions on the event_scheduler every half an hour. As soon as the time came H (launching 2 requests from INFORMATION_SCHEMA.PARTITIONS ), my laptop took off, producing the maximum possible load on the disks within 3-5 minutes. I thought strangely ... I have an empty database. Having rummaged, I understood that it is necessary to penetrate how the call of requests from this unfortunate table exactly works. PERFORMANCE_SCHEMA assured me that I was reading 7000 files, in view of the fact that I have a fairly trimmed database, I do these readings from the disk. Opening the source, I realized what the actual problem. Data dictionary - not stored in RAM. When you make a query from the table of the describing partition, the following operations occur:

no matter what exactly you requested - the description of all tables along with all partitions is read from the disk
if these are INNODB tables, then there is no statistics on them (and to display this view, we need approximate statistics on the number of records in each partition and some characteristics of the cardinality index and others like them), then we read not only the description of the partition but also do dynamic sampling of indices
but that's not all, because most likely you have enabled adaptive cache, which stores some indexes in the form of a hash map for faster use

Thus, with a single query from the data dictionary, we

we wash out tabular cache - shared for all threads
wash buffer cache - used by all sessions
we wash out the adaptive cache - writing and reading of which is controlled by just one lock to ensure its consistency

Here, for example, the trace gdb captured during a request for TABLE_ONE
')

 Thread 6 (Thread 0x7fccff1be700 (LWP 2561)): #0 0x00007fcd3af4c493 in pread64 () from /lib/libpthread.so.0 ... #5 fil_io (type=10, sync=<value optimized out>, space_id=<value optimized out>, zip_size=<value optimized out>, block_offset=<value optimized out>, byte_offset=<value optimized out>, len=16384, buf=0x7fcd1a0b4000, message=0x7fcd08a6e800) at /home/mcshadow/svn/mysql-5.5.15/storage/innobase/fil/fil0fil.c:4481 #6 0x00000000007e527a in buf_read_page_low (err=0x7fccff1b85c8, sync=1, mode=<value optimized out>, space=52456, zip_size=0, unzip=<value optimized out>, tablespace_version=312, offset=4) at /home/mcshadow/svn/mysql-5.5.15/storage/innobase/buf/buf0rea.c:148 ... #9 0x00000000007bb6f1 in btr_block_get_func (index=0x7fccf4adfb78, mtr=0x4000) at /home/mcshadow/svn/mysql-5.5.15/storage/innobase/include/btr0btr.ic:55 ... #14 0x00000000007f1edf in dict_table_get (table_name=<value optimized out>, inc_mysql_count=1) at /home/mcshadow/svn/mysql-5.5.15/storage/innobase/dict/dict0dict.c:753 #15 0x0000000000776533 in ha_innobase::open (this=0x7fccf4128130, name=0x7fccff1b9230 "./DATABASE/TABLE_TWO#P#TABLE_TWO_20100823", mode=<value optimized out>, test_if_locked=<value optimized out>) at /home/mcshadow/svn/mysql-5.5.15/storage/innobase/handler/ha_innodb.cc:3737 ... #18 0x000000000064716d in handler::ha_open (this=0x28, table_arg=<value optimized out>, name=0x7fccf4076d08 "./DATABASE/TABLE_TWO", mode=2, test_if_locked=26128432) at /home/mcshadow/svn/mysql-5.5.15/sql/handler.cc:2150 ... #23 0x000000000052d6a4 in open_normal_and_derived_tables (thd=0x28, tables=0x7fccf4027968, flags=4294967295) at /home/mcshadow/svn/mysql-5.5.15/sql/sql_base.cc:5489 ... #25 0x00000000005aa51f in get_all_tables (thd=<value optimized out>, tables=<value optimized out>, cond=<value optimized out>) at /home/mcshadow/svn/mysql-5.5.15/sql/sql_show.cc:3781 ...

Having seen such a pitiful result, I, of course, was extremely upset, because after rummaging through the source code, I realized that it was easy to fix it - it would not work. But it was not all.

The farther into the forest the more firewood

All manuals teach us that in MySQL there are two schemes for optimizing work with partitioned tables.
- partition pruning static linking of partitions at the stage of parsing a query. Predictive analysis is used for analysis only on the basis of restrictions for the partitioned table, it will work between, more, less or equal
- partition selection dynamic linking at runtime, only predicates will be used by the condition equal to
I will not dwell on this in detail - this is the topic of a separate article. In short, the first case is when the partition is explicitly specified in the query itself, the second case is when the partition can be calculated at the stage of the query execution based on data from another table. As you understand, the second is better not to use - because if you change the plan, then you will fly through all the partitions, for this reason, everywhere I can use the first method of defining partitions as the fastest and most reliable, flat that only those partitions will be read - which are listed in the request ... naive .
And so where am I here an ambush? The ambush itself is that partition pruning is executed after opening the tables, all the tables participating in the query. It doesn’t matter what you do. SELECT, INSERT, UPDATE. MySQL will perform the following manipulations:

open all partitions of all tables participating in the query
hangs on each partition a read lock according to the rules of the engine used
if this INNODB will calculate statistics on the table and its indices without forgetting to hang up the X lock - well, how can it be without it
and only after all this partition pruning will be made

Do not believe? Insert code

  char* is_part = NULL; is_part = strstr(table->name, "#P#"); if (is_part) { ut_print_timestamp(stderr); fprintf(stderr, " TRACE dict_update_statistics %s %s\n", table->name, only_calc_if_missing_stats?"true":"false"); }

at the beginning of the procedure dict_update_statistics which calculates statistics and the results will surprise you. With this, with default settings, the DBMS statistics recalculation is not cached, but it is done every time you try to get metadata.
In light of the above, pleases close bug # 19588

A spoon of honey in a barrel of tar

MySQL 5.6 will be released soon. They promise a lot of useful things like
- static dictionary for INFORMATION_SCHEMA (I don’t know if there will be a partitions view there, but I really hope so)
- the select * from table partition (p1) construction using which partition pruning will work earlier than opening tables and setting locks

Conclusions or how to live with it

- if in your architecture there are many 1000+ partitions better think again and revise the architecture, the partition engine for MySQL is still not out of the sandbox
- set the parameter innodb_stats_on_metadata = OFF this will allow you not to re-read the partition statistics for each request, although this will still happen, but not so often
- increase the ulimit -n to the maximum possible number of partitions in your system by multiplying by 2
- increase the table_open_cache parameter to the maximum possible number of partitions in your system by multiplying by 2 this will allow you to prevent the table cache from being washed out
- if you work with INNODB tables, look towards the inclusion of the myisam_use_mmap parameter, MyISAM caches only indexes, and caching data files leaves the operating system; therefore, with increased load your dictionaries will go to disk, which of course is undesirable.
- do not make inquiries to the INFORMATION_SCHEMA.PARTITIONS table, although, as you understand it, you still will not be saved from brakes and locks on any DML operation across all partitions of the used tables.

Source: https://habr.com/ru/post/130999/

All Articles