
Recently we transferred Zabbix to work with the PostgreSQL database. Together with the move to the server with SSD, this gave a significant increase in the speed of work. Also solved the problem with duplicate hosts in the database,
bug request . Here the article could have ended, but we noticed that Zabbix uses a lot of disk space, so I will tell you below how it happened. And how we figured it out.
We have Zabbix with a relatively large database. It monitors almost 1500 hosts and collects about 180 thousand metrics. We use partitioning in the database, which makes it easy to clear historical data. In addition, for each partition has its own index. Well, you understand what I'm hinting at.
Yes, now we will talk about indexes. We found out that the number of indices grows almost 2 times, with our volumes they occupy 5-7 Gb for each partition. And provided that we store historical data for 10 days and trends for 3 months, the total is about 70 Gb superfluous. With a total database of about 220 Gb, and the use of SSD is very noticeable.
Approach number one. Solved the problem in the forehead and run a full reindex. It turned out well, released almost 70 Gb, as expected. Disadvantage: operation time. Or rather, even the fact that for this time lock is set on the table, and reindex takes about 3 hours.
')
Approach number two. We looked in the direction of
pg_repack . This utility allows you to produce vacumm full and reindex without lock on the table. They installed it, set it up, started it and got ready to exult. What was our disappointment when we saw that less than a gigabyte was freed. We open the documentation and read very carefully, there is a point - literally:
"Target table must have a PRIMARY KEY, or at least a UNIQUE total index on a NOT NULL column." Next, open the database and see that the tables we need have neither one nor the other.
You could, of course, add them, but this would increase the size of the database and the developers clearly had a reason not to add PRIMARY KEY. We played and found that, for example, in the table that contains information on monitoring logs, the lines can be completely duplicated. As a result, we had to abandon pg_repack.
Approach the third, victorious. If you take the partition for the previous day, then no one writes to it, and you can reindex with it without the Zabbix server downtime. Invented - checked. No, it turns out, lock is superimposed on the whole table, and not on the partition. Well, well, if you can’t reindex, why not make a new index, and here we finally found a solution. The algorithm is simple:
- We create a new index with the construction: “CREATE INDEX CONCURRENTLY”, which allows us not to impose a long lock on the database, since nobody writes to this partition, then the index is created successfully.
- Delete the old index.
- Rename the new index to the old one.
- We happily beat ourselves in the chest.
Now every night we run a script that goes through the tables and does all these operations.
Scriptfor i in `seq 1 10`; do dd=`date +%Y_%m_%d -d "$i day ago"` index_name="history_p$dd""_1" index_name_new="history_p$dd""_1_new" echo "$index_name" psql -U postgres -d zabbix -c "CREATE INDEX CONCURRENTLY $index_name_new ON partitions.history_p$dd USING btree (itemid, clock); " echo "Done CREATE" psql -U postgres -d zabbix -c "DROP INDEX partitions.$index_name" echo "Done DROP" psql -U postgres -d zabbix -c "ALTER INDEX IF EXISTS partitions.$index_name_new RENAME TO $index_name" echo "Done ALTER" done echo "Reindex history_uint_p Start \n" for i in `seq 1 10`; do dd=`date +%Y_%m_%d -d "$i day ago"` index_name="history_uint_p$dd""_1" index_name_new="history_uint_p$dd""_1_new" echo "$index_name" psql -U postgres -d zabbix -c "CREATE INDEX CONCURRENTLY $index_name_new ON partitions.history_uint_p$dd USING btree (itemid, clock); " echo "Done CREATE" psql -U postgres -d zabbix -c "DROP INDEX partitions.$index_name" echo "Done DROP" psql -U postgres -d zabbix -c "ALTER INDEX IF EXISTS partitions.$index_name_new RENAME TO $index_name" echo "Done ALTER" done echo "Reindex trends_p Start \n" for i in `seq 1 2`; do dd=`date +%Y_%m -d "1 month ago"` index_name="trends_p$dd""_1" index_name_new="trends_p$dd""_1_new" echo "$index_name" psql -U postgres -d zabbix -c "CREATE INDEX CONCURRENTLY $index_name_new ON partitions.trends_p$dd USING btree (itemid, clock); " echo "Done CREATE" psql -U postgres -d zabbix -c "DROP INDEX partitions.$index_name" echo "Done DROP" psql -U postgres -d zabbix -c "ALTER INDEX IF EXISTS partitions.$index_name_new RENAME TO $index_name" echo "Done ALTER" done echo "Reindex trends_uint_p Start \n" for i in `seq 1 2`; do dd=`date +%Y_%m -d "1 month ago"` index_name="trends_uint_p$dd""_1" index_name_new="trends_uint_p$dd""_1_new" echo "$index_name" psql -U postgres -d zabbix -c "CREATE INDEX CONCURRENTLY $index_name_new ON partitions.trends_uint_p$dd USING btree (itemid, clock); " echo "Done CREATE" psql -U postgres -d zabbix -c "DROP INDEX partitions.$index_name" echo "Done DROP" psql -U postgres -d zabbix -c "ALTER INDEX IF EXISTS partitions.$index_name_new RENAME TO $index_name" echo "Done ALTER" done
Please do not strictly criticize the script - this is a draft for clarity of the article.
Thanks for attention!