📜 ⬆️ ⬇️

We speed up the restoration of backups in Postgres. Part Two (because shortening the time is not enough)



In the first part of the article “Accelerating the restoration of backups in Postgres,” I described the steps taken to reduce recovery time in the local environment. We started with a simple one: pg_dump-drank (is there such a word?), Packed with gzip, unpacked and sent output to psql < file.sql . It took about 30 minutes to recover. As a result, we stopped at the Postgres customizable format and applied the -j argument, achieving a reduction in time to 16 minutes.


In this article, I described how we managed to reduce the size of the backup file, which further accelerated the recovery process.


Investigate the size of the backup


When I started writing about our attempts to speed up the recovery process, the size of the packed backup file was about 2 GB (unpacked - 30 GB). Since then, our base has almost doubled (packed - 3.7 GB, unpacked - 68 GB). This not only significantly increased the recovery time, but also took more time to copy / transfer the backup file.


Realizing that the backup file had doubled, I began to find out why this happened and what data was to blame.


At first I found out the size of the base:


 # SELECT pg_size_pretty(pg_database_size('dbname')); pg_size_pretty ---------------- 68 GB (1 row) 

Then I decided to look at the sizes of the tables in order to find out if there are any obvious culprits among the base's proliferation among them.


 # SELECT table_name, pg_relation_size(table_name), pg_size_pretty(pg_relation_size(table_name)) FROM information_schema.tables WHERE table_schema = 'public' ORDER by 2 DESC; table_name | pg_relation_size | pg_size_pretty -------------------+------------------+--------------- logging | 19740196864 | 18 GB reversions | 15719358464 | 15 GB webhook_logging | 8780668928 | 8374 MB ... | 1994645504 | 1902 MB ... | 371900416 | 355 MB ... | 304226304 | 290 MB 

The listing clearly shows that the top 3 tables, which occupy more than 60% of the entire database, refer to history or logging, which are not needed for the dev environment. Together with their indices (17 GB, 1 GB, 1.5 GB, respectively), these tables occupy 89% of the base. At this point I decided to stop the study of the size of the tables (89% reduction is acceptable) and see if I can exclude the tables from the backup.


Reduce backup size


Starting to deal with any problem, I try first to get acquainted with the documentation. The PostgreSQL project in this regard did a wonderful job - after a few minutes of reading the pg_dump section, I found exactly what was needed.


 pg_dump dbname -Fc \ --exclude-table-data 'logging*' \ --exclude-table-data 'reversions*' \ --exclude-table-data 'webhooks_logging*' > postgres.dev.sql 

* (asterisk) is used here as a wildcard and helps to exclude also the indices of these tables.


Specifying excluded tables using the --exclude-table-data parameter allowed us to reduce the backup file size from 3.7 GB (uncompressed - 68 GB) to 0.7 GB (uncompressed - 5.4 GB).


Looking at the listings, you can make sure that the results are just wonderful.


Before:


 $ pg_restore -d db -j 8 dumpfc.gz real 16m49.539s user 1m1.344s sys 0m39.522s 

After:


 $ pg_restore -d db -j 8 devfc.gz real 5m38.156s user 0m24.574s sys 0m13.325s 

The exclusion of three tables allowed reducing the size of the base by 89% and speeding up the recovery by 66%! If you remember, in the first part we started with 32.5 minutes. So, we were able to reduce the recovery time by 26.9 minutes, or 87%.


According to the results of this article, we achieved an acceleration of recovery from 16 to 5 minutes. This saves us 57 hours of recovery time per year (6 developers 52 times a year for 11 minutes) . In total, we have reduced the waiting time for recovery by 130 hours.


Conclusion


Returning to the PostgreSQL documentation , I note that there are several other ways to speed up the recovery process. For example, you should take a look at the -j parameter of the pg_dump , which can shorten the time it takes to create a backup (available starting from PostgreSQL 9.3). It may also help: disabling autocommit , a significant increase in maintenance_work_mem and setting a higher value for max_wal_size .


At the moment, the recovery time of the backup of our database in the local dev-environment suits me perfectly.


References:


  1. Original: Speeding up Postgres Restores Part 2 (Because it wasn’t just enough) .
  2. We speed up the restoration of backups in Postgres. Part One

')

Source: https://habr.com/ru/post/328058/


All Articles