Accelerating the restoration of backups in PostgreSQL

My feelings from the process of work

Recently, I decided to speed up the recovery of our database in a dev-environment. As in many other projects, the base was initially small, but it grew significantly over time. When we started, its size was only a few megabytes. Now the packed base occupies almost 2 GB (uncompressed - 30 GB). We restore the dev-environment on average once a week. The old way of carrying out the operation ceased to suit us, and the picture “ DB restore foos? ”Prompted me to action.

The following describes how I accelerated the database restore operation.

Easy way

The following describes our first version of the backup and restore procedure. We started by running pg_dump and directing its output to gzip . To restore the database in the dev environment, we copied the archive using scp , unpacked it, and then loaded it with the psql .

 $ pg_dump db | gzip > dump.gz real 7m9.882s user 5m7.383s sys 2m56.495s $ gunzip dump.gz real 2m27.700s user 1m28.146s sys 0m41.451s $ psql db < dump real 30m4.237s user 0m21.545s sys 0m44.331s

The total time with a simple method: 39 minutes 41 seconds (32.5 minutes to recover in the dev-environment).

This approach was simple to understand, elementary in configuration and worked fine until the size of the database did not exceed several hundred megabytes. However, 32.5 minutes to restore the database in a dev-environment is completely unacceptable.

Recovery and unpacking in one command

The first thing that came to mind was simply to send the packed file directly to psql using zcat , which can be considered an analogue of cat for compressed files. This command unpacks the file and outputs it to stdout , which, in turn, can be sent to psql .

 $ pg_dump db | gzip > dump.gz real 7m9.882s user 5m7.383s sys 2m56.495s $ zcat dump.gz | psql db real 26m22.356s user 1m28.850s sys 1m47.443s

Total time: 33 minutes 31 seconds (26.3 minutes to recover in the dev-environment, which is 20% faster).

Great, we managed to speed up the process by 16%, winning 20% in recovery. Since I / O was the main limiting factor, by refusing to decompress the file to disk, we saved more than 6 minutes. But it seemed to me that this was not enough. Losing 26 minutes on base recovery is still bad. I should have come up with something else.

Customizable format

Going deeper into the pg_dump documentation , I discovered that pg_dump creates a plain-text SQL file. Then we compress it with gzip to make it smaller. Postgres has a custom format, which by default uses zlib for compression. I thought that it would be possible to achieve a gain in the speed of creating a backup by immediately packing the data into Postgres instead of sending a simple text file to gzip.

Since psql does not understand the custom format, I had to switch to pg_restore .

 $ pg_dump -Fc db > dumpfc.gz real 6m28.497s user 5m2.275s sys 1m16.637s $ pg_restore -d db dumpfc.gz real 26m26.511s user 0m56.824s sys 0m15.037s

Total time 32 minutes 54 seconds (26.4 minutes to recover in the dev-environment).

I was right in believing that creating a backup would be faster if we did not have to send the output to gzip. Unfortunately, recovering from a custom format on a local machine does not speed up the process. I had to invent something else.

Paralleling

When I start to deal with any problem, first of all I read the documentation and the source code. Postgres has excellent documentation, including command line options with clear and detailed descriptions. One of the options of the pg_restore command determines the number of parallel threads that run during the execution of the most time-consuming tasks, loading data, creating indexes or restrictions.

The pg_restore documentation says that it’s better to start with the number of threads equal to the number of cores. My virtual machine has 4 cores, but I wanted to experiment with different values of this option.

 $ pg_dump -Fc db > dumpfc.gz real 6m28.497s user 5m2.275s sys 1m16.637s $ pg_restore -d db -j 2 dumpfc real 25m39.796s user 1m30.366s sys 1m7.032s

The total time is 32 minutes 7 seconds (25.6 minutes to restore in the dev-environment, which is 3% faster than the single-threaded launch of pg_restore).

Well, won a little. Can we speed up yet?

 $ pg_dump -Fc db > dumpfc.gz real 6m28.497s user 5m2.275s sys 1m16.637s $ pg_restore -d db -j 4 dumpfc.gz real 22m6.124s user 0m58.852s sys 0m34.682s

The total time is 28 minutes 34 seconds (22.1 minutes to recover in the dev-environment, which is 14% faster than with two streams).

Fine! Four streams are faster than two by 14%. Yes, this moment in the dev-environment, we accelerated from 32.5 to 22.1 minutes: time has been improved by 32%!

I decided to find out what will lead to a further increase in the number of cores.

 $ pg_dump -Fc db > dumpfc.gz real 6m28.497s user 5m2.275s sys 1m16.637s $ pg_restore -d db -j 8 dumpfc.gz real 16m49.539s user 1m1.344s sys 0m39.522s

The total time is 23 minutes 17 seconds (16.8 to restore in the dev-environment, which is 24% faster than four threads).

So, by increasing the number of threads to double the number of cores, we managed to reduce the time from 22.1 to 16.8 minutes. Now we have accelerated by 49%, which is just wonderful.

And you can still squeeze something?

 $ pg_dump -Fc db > dumpfc.gz real 6m28.497s user 5m2.275s sys 1m16.637s $ pg_restore -d db -j 12 dumpfc.gz real 16m7.071s user 0m55.323s sys 0m36.502s

The total time is 22 minutes 35 seconds (16.1 minutes to recover in the dev-environment, which is 4% more than 8 threads).

After specifying 12 threads, we accelerated a little more, but the CPU of the virtual machine was so loaded during recovery that no other actions in the system could be performed. In this matter, I decided to stop at 8 threads (number of cores * 2).

Conclusion

As a result, we managed to reduce the time by almost half: from 30 to 16 minutes. This saves us 72 hours of recovery time per year (6 developers for 52 runs of the recovery procedure for 14 minutes). I am very pleased with these results. In the future I plan to do only data recovery, but not the entire database. Let's see how much faster it will be.

References:

Original: Speeding up Postgres Restores .
The second part: Accelerate the restoration of backups in Postgres .

Source: https://habr.com/ru/post/328056/

All Articles