In the first part of the article “Accelerating the restoration of backups in Postgres,” I described the steps taken to reduce recovery time in the local environment. We started with a simple one: pg_dump-drank (is there such a word?), Packed with gzip, unpacked and sent output to psql < file.sql
. It took about 30 minutes to recover. As a result, we stopped at the Postgres customizable format and applied the -j
argument, achieving a reduction in time to 16 minutes.
In this article, I described how we managed to reduce the size of the backup file, which further accelerated the recovery process.
When I started writing about our attempts to speed up the recovery process, the size of the packed backup file was about 2 GB (unpacked - 30 GB). Since then, our base has almost doubled (packed - 3.7 GB, unpacked - 68 GB). This not only significantly increased the recovery time, but also took more time to copy / transfer the backup file.
Realizing that the backup file had doubled, I began to find out why this happened and what data was to blame.
At first I found out the size of the base:
# SELECT pg_size_pretty(pg_database_size('dbname')); pg_size_pretty ---------------- 68 GB (1 row)
Then I decided to look at the sizes of the tables in order to find out if there are any obvious culprits among the base's proliferation among them.
# SELECT table_name, pg_relation_size(table_name), pg_size_pretty(pg_relation_size(table_name)) FROM information_schema.tables WHERE table_schema = 'public' ORDER by 2 DESC; table_name | pg_relation_size | pg_size_pretty -------------------+------------------+--------------- logging | 19740196864 | 18 GB reversions | 15719358464 | 15 GB webhook_logging | 8780668928 | 8374 MB ... | 1994645504 | 1902 MB ... | 371900416 | 355 MB ... | 304226304 | 290 MB
The listing clearly shows that the top 3 tables, which occupy more than 60% of the entire database, refer to history or logging, which are not needed for the dev environment. Together with their indices (17 GB, 1 GB, 1.5 GB, respectively), these tables occupy 89% of the base. At this point I decided to stop the study of the size of the tables (89% reduction is acceptable) and see if I can exclude the tables from the backup.
Starting to deal with any problem, I try first to get acquainted with the documentation. The PostgreSQL project in this regard did a wonderful job - after a few minutes of reading the pg_dump
section, I found exactly what was needed.
pg_dump dbname -Fc \ --exclude-table-data 'logging*' \ --exclude-table-data 'reversions*' \ --exclude-table-data 'webhooks_logging*' > postgres.dev.sql
* (asterisk) is used here as a wildcard and helps to exclude also the indices of these tables.
Specifying excluded tables using the --exclude-table-data
parameter allowed us to reduce the backup file size from 3.7 GB (uncompressed - 68 GB) to 0.7 GB (uncompressed - 5.4 GB).
Looking at the listings, you can make sure that the results are just wonderful.
Before:
$ pg_restore -d db -j 8 dumpfc.gz real 16m49.539s user 1m1.344s sys 0m39.522s
After:
$ pg_restore -d db -j 8 devfc.gz real 5m38.156s user 0m24.574s sys 0m13.325s
The exclusion of three tables allowed reducing the size of the base by 89% and speeding up the recovery by 66%! If you remember, in the first part we started with 32.5 minutes. So, we were able to reduce the recovery time by 26.9 minutes, or 87%.
According to the results of this article, we achieved an acceleration of recovery from 16 to 5 minutes. This saves us 57 hours of recovery time per year (6 developers 52 times a year for 11 minutes) . In total, we have reduced the waiting time for recovery by 130 hours.
Returning to the PostgreSQL documentation , I note that there are several other ways to speed up the recovery process. For example, you should take a look at the -j
parameter of the pg_dump
, which can shorten the time it takes to create a backup (available starting from PostgreSQL 9.3). It may also help: disabling autocommit
, a significant increase in maintenance_work_mem
and setting a higher value for max_wal_size
.
At the moment, the recovery time of the backup of our database in the local dev-environment suits me perfectly.
References:
Source: https://habr.com/ru/post/328058/
All Articles