📜 ⬆️ ⬇️

MySQL and partitioning

After reading the article I remembered one very specific customer and a system for collecting statistics on events. In the 21st century, I know about the presence of ClickHouse , but the customer does not want to change the database (the reason is incomprehensible and unknown to me, religion probably does not allow), and so be it, I warned him several times about the consequences. When it becomes slowly quite, aware of the problem.

The essence of the problem


But it's not about that. In general, after reading the article, I remembered this project and decided to try to integrate the partition into a table with 7,000,000 entries. There are already a lot more entries on the prod stand.

Also in the project was used sharding, which, by and large, there is extra. It makes no sense in this kind of system to do sharding and even in time (for each month has its own table).
')
In general, there were actually a few options for how to divide the data, and the most obvious was chosen: add a dYm (date Year month) column to the table, since time is already written to the table, it was not difficult to do so. True, with a certain reservation, since there is not enough memory on the server, I had to re-create the table and import the data into the new table, after adding the required field.

Creating a table with partitions (some fields removed):

CREATE TABLE `event_list_test` ( `dYd` int(6) unsigned NOT NULL COMMENT '  ', `hash` varchar(13) NOT NULL COMMENT 'hash', `time` int(10) unsigned NOT NULL COMMENT 'timestamp', PRIMARY KEY (`time`,`dYd`,`hash`), ) ENGINE=InnoDB DEFAULT CHARSET=utf8 PARTITION BY LIST (dYd) (PARTITION p201703 VALUES IN (201703) ENGINE = InnoDB, PARTITION p201704 VALUES IN (201704) ENGINE = InnoDB, PARTITION p201705 VALUES IN (201705) ENGINE = InnoDB, PARTITION p201706 VALUES IN (201706) ENGINE = InnoDB, PARTITION p201707 VALUES IN (201707) ENGINE = InnoDB, PARTITION p201708 VALUES IN (201708) ENGINE = InnoDB, PARTITION p201709 VALUES IN (201709) ENGINE = InnoDB, PARTITION p201710 VALUES IN (201710) ENGINE = InnoDB, PARTITION p201711 VALUES IN (201711) ENGINE = InnoDB, PARTITION p201712 VALUES IN (201712) ENGINE = InnoDB) 

As it was described in the article that I cited initially, the advantages of such a division are obvious:

  1. Simplicity of administration, since a column with 7,000,000 rows with 1GB of memory does not add a column, and the index is even more so
  2. Initially, this kind of table shardirovalis, but the obvious disadvantage is the writing of sql queries. Often it was necessary to make inquiries for several months, and if aggregation is needed, then there is absolutely trouble.
  3. It is easier to add a partition to a table (especially if you put a task in cron with sending a letter to telegrams or to mail, here it is more convenient for someone)

Next, you need to optimize queries, because with an illiterate query, MySQL will run through all partitions, which adds additional costs, but this is not very good.

After reading the article , the optimization solution also suggests itself: we need to use a search through between for a unique key in the query. As a result, if the application replaces all requests with such:

 SELECT `time` FROM `event_list_test` WHERE (`time` BETWEEN 1505385901 AND 1506934784) AND (`dYd` BETWEEN 201709 AND 201710) LIMIT 10 

then we will get a very good explain:

SIMPLE event_list_test p201709,p201710 range PRIMARY,time PRIMARY 8 NULL 145875 11.11 Using where

What have we achieved?


And we have achieved the following:

  1. Unnecessary data sharding disappeared
  2. It is very easy to build requests for data (sharding had a lot of problems)
  3. It is very easy to administer the tables (insert partitions, delete partitions, insert data, select data, change table, work with indexes)
  4. And as a result - simplification of the application at times.

Source: https://habr.com/ru/post/343824/


All Articles