Setting PostgreSQL Options for Performance Optimization

By default, PostgreSQL configuration is not configured for workload. Default values are set to ensure PostgreSQL is operational everywhere with the least amount of resources. There are default settings for all database options. The main responsibility of a database administrator or developer is to configure PostgreSQL according to their system load. In this blog, we will outline the main recommendations for configuring PostgreSQL database parameters to improve database performance in accordance with the workload.

Keep in mind that while optimizing the PostgreSQL server configuration improves performance, the database developer also needs to be careful when writing queries. If queries perform a full scan of the table where the index can be used, or perform heavy joins or expensive aggregation operations, then the system may still work poorly, even if the database parameters are configured correctly. When writing queries to the database, it is important to pay attention to performance.

However, database options are also very important, so let's take a look at the eight that have the greatest potential to improve performance.

PostgreSQL customizable options

PostgreSQL uses its own buffer, and also uses a buffered IO kernel. This means that data is stored in memory twice, first in the PostgreSQL buffer, and then in the kernel buffer. Unlike other databases, PostgreSQL does not provide direct I / O. This is called double buffering. The PostgreSQL buffer is called shared_buffer , which is the most efficient custom parameter for most operating systems. This parameter sets how much of the allocated memory PostgreSQL will use for caching.
')
The default for shared_buffer is set to very low, and you will not get much benefit from it. This is because some machines and operating systems do not support higher values. But in most modern machines you need to increase this value for optimal performance.

The recommended value is 25% of the total RAM of the computer. You should try some lower and higher values, because in some cases you can get good performance with a setting of more than 25%. But the actual configuration depends on your machine and working dataset. If your working dataset can easily fit into your RAM, you can increase the value of shared_buffer so that it contains your entire database and so that the entire working dataset can be in the cache. However, you obviously do not want to reserve all RAM for PostgreSQL.

It is noted that in production environments, great importance for shared_buffer really gives a good performance, although tests should always be carried out to achieve the right balance.

^{Checking the value of shared_buffer}

testdb=# SHOW shared_buffers; shared_buffers ---------------- 128MB (1 row)

_{Note : Be careful, as some kernels do not support greater importance , especially on Windows.}

wal_buffers

PostgreSQL first writes entries to the WAL (Pre-Log) to buffers, and then these buffers are flushed to disk. The default buffer size defined by wal_buffers is 16 MB. But if you have many simultaneous connections, a higher value may improve performance.

effective_cache_size

effective_cache_size provides an estimate of the memory available for disk caching. This is just a guideline, not the exact amount of memory or cache allocated. It does not allocate the actual memory, but informs the optimizer the amount of cache available in the kernel. If this parameter is set too low, the query scheduler may decide not to use some indexes, even if they are useful. Therefore, setting a high value always makes sense.

work_mem

This setting is used for complex sorting. If you need to perform complex sorting, increase the value of work_mem to get good results. Sorting in memory is much faster than sorting data on disk. Setting a very high value may cause a memory bottleneck for your environment, since this parameter is related to the user sorting operation. Therefore, if you have many users trying to perform sorting operations, then the system will highlight:

 work_mem * total sort operations

for all users. Setting this parameter globally can result in very high memory usage. Therefore, it is strongly recommended to change it at the session level.

^{work_mem = 2MB}

 testdb=# SET work_mem TO "2MB"; testdb=# EXPLAIN SELECT * FROM bar ORDER BY bar.b; QUERY PLAN ----------------------------------------------------------------------------------- Gather Merge (cost=509181.84..1706542.14 rows=10000116 width=24) Workers Planned: 4 -> Sort (cost=508181.79..514431.86 rows=2500029 width=24) Sort Key: b -> Parallel Seq Scan on bar (cost=0.00..88695.29 rows=2500029 width=24) (5 rows)

The initial query sorting node is estimated at 514431.86. Cost is an arbitrary calculated unit. For the above query, we have work_mem only 2 MB. For testing purposes, let's increase this value to 256 MB and see if this affects the cost.

^{work_mem = 256MB}

 testdb=# SET work_mem TO "256MB"; testdb=# EXPLAIN SELECT * FROM bar ORDER BY bar.b; QUERY PLAN ----------------------------------------------------------------------------------- Gather Merge (cost=355367.34..1552727.64 rows=10000116 width=24) Workers Planned: 4 -> Sort (cost=354367.29..360617.36 rows=2500029 width=24) Sort Key: b -> Parallel Seq Scan on bar (cost=0.00..88695.29 rows=2500029 width=24)

The cost of the request has been reduced from 514431.86 to 360617.36, that is, it has decreased by 30%.

maintenance_work_mem

maintenance_work_mem is a memory parameter used for maintenance tasks. The default value is 64 MB. Setting a large value helps in tasks such as VACUUM, RESTORE, CREATE INDEX, ADD FOREIGN KEY and ALTER TABLE.

^{maintenance_work_mem = 10MB}

 postgres=# CHECKPOINT; postgres=# SET maintenance_work_mem to '10MB'; postgres=# CREATE INDEX foo_idx ON foo (c); CREATE INDEX Time: 170091.371 ms (02:50.091)

^{maintenance_work_mem = 256MB}

 postgres=# CHECKPOINT; postgres=# set maintenance_work_mem to '256MB'; postgres=# CREATE INDEX foo_idx ON foo (c); CREATE INDEX Time: 111274.903 ms (01:51.275)

The index creation time is 170091.371 ms, if the maintenance_work_mem parameter is set to only 10 MB, but it decreases to 111274.903 ms when we increase the value of the maintenance_work_mem parameter to 256 MB.

synchronous_commit

Used to ensure that a transaction commit will wait for WAL to write to the disk before returning the client to successful completion status. This is a trade-off between performance and reliability. If your application is designed in such a way that performance is more important than reliability, disable synchronous_commit . In this case, the transaction is committed very quickly, because it will not wait for the WAL file to be reset, but the reliability will be compromised. In the event of a server failure, data may be lost, even if the client received a message that the transaction was committed successfully.

checkpoint_timeout, checkpoint_completion_target

PostgreSQL writes changes to the WAL. The checkpoint process flushes data into files. This action is performed when a checkpoint (CHECKPOINT) occurs. This is an expensive operation and can cause a huge number of IO operations. This entire process involves costly read / write operations to disk. Users can always run a checkpoint job (CHECKPOINT) when necessary, or automate the run using the checkpoint_timeout and checkpoint_completion_target parameters.

The checkpoint_timeout parameter is used to set the time between WAL control points. Setting the value too low reduces recovery time after a crash, because more data is written to the disk, but this also reduces performance because each checkpoint ultimately consumes valuable system resources.

checkpoint_completion_target is the fraction of the time between checkpoints to complete a checkpoint. High frequency control points can affect performance. In order to smoothly execute a control point job, checkpoint_timeout should be low. Otherwise, the OS will accumulate all dirty pages until the ratio is met, and then produce a large reset.

Conclusion

There are more options that can be configured to get better performance, but they have less impact than those highlighted here. In the end, we must always remember that not all parameters are relevant for all types of applications. Some applications work better when setting parameters, and some don't. PostgreSQL database settings should be customized to the specific needs of the application and the operating system on which it is running.

Source: https://habr.com/ru/post/458952/

All Articles