Tuning Linux kernel parameters to optimize PostgreSQL

PostgreSQL’s optimal performance depends on well-defined operating system parameters. Poorly configured parameters of the OS kernel can lead to poor performance of the database server. Therefore, it is imperative that these parameters be configured in accordance with the database server and its workload. In this post, we will discuss some important parameters of the Linux kernel that may affect the performance of the database server and how to tune them.

SHMMAX / SHMALL

SHMMAX is a kernel parameter used to determine the maximum size of a single shared memory segment that a Linux process can allocate. Prior to version 9.2, PostgreSQL used System V (SysV), which requires SHMMAX configuration. After 9.2, PostgreSQL switched to POSIX shared memory. So now there are fewer bytes of System V shared memory.

Prior to version 9.3, SHMMAX was the most important parameter of the kernel. The SHMMAX value is specified in bytes.

Similarly, SHMALL is another kernel parameter used to determine
system-wide volume of shared memory pages. To view the current SHMMAX, SHMALL, or SHMMIN values, use the ipcs command.
')
^{SHM * Details - Linux}

$ ipcs -lm ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 1073741824 max total shared memory (kbytes) = 17179869184 min seg size (bytes) = 1

^{SHM * Details - MacOS X}

 $ ipcs -M IPC status from as of Thu Aug 16 22:20:35 PKT 2018 shminfo: shmmax: 16777216 (max shared memory segment size) shmmin: 1 (min shared memory segment size) shmmni: 32 (max number of shared memory identifiers) shmseg: 8 (max shared memory segments per process) shmall: 1024 (max amount of shared memory in pages)

PostgreSQL uses System V IPC to allocate shared memory. This parameter is one of the most important kernel parameters. Whenever you receive the following error messages, it means that you have an older version of PostgreSQL and you have a very low SHMMAX value. Users are expected to adjust and increase the value according to the shared memory they are going to use.

Possible misconfiguration errors

If SHMMAX is configured incorrectly, you may get an error when trying to initialize a PostgreSQL cluster using the initdb command.

^{initdb failure}

DETAIL: Failed system call was shmget(key=1, size=2072576, 03600).

HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.
You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 2072576 bytes),
reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.

If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,
in which case raising the request size or reconfiguring SHMMIN is called for.

The PostgreSQL documentation contains more information about shared memory configuration. child process exited with exit code 1

Similarly, you may get an error when starting the PostgreSQL server using the pg_ctl command.

^{pg_ctl failure}

DETAIL: Failed system call was shmget(key=5432001, size=14385152, 03600).

HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter.

You can either reduce the request size or reconfigure the kernel with larger SHMMAX.; To reduce the request size (currently 14385152 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.

If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter,
in which case raising the request size or reconfiguring SHMMIN is called for.

The PostgreSQL documentation contains more information about shared memory configuration.

Understanding the Differences in Definitions

Defining SHMMAX / SHMALL parameters is slightly different in Linux and MacOS X:

Linux: kernel.shmmax, kernel.shmall
MacOS X: kern.sysv.shmmax, kern.sysv.shmall

The sysctl command can be used to temporarily change a value. To set constant values, add an entry to /etc/sysctl.conf . Details are given below.

^{Changing kernel options on MacOS X}

 # Get the value of SHMMAX sudo sysctl kern.sysv.shmmax kern.sysv.shmmax: 4096 # Get the value of SHMALL sudo sysctl kern.sysv.shmall kern.sysv.shmall: 4096 # Set the value of SHMMAX sudo sysctl -w kern.sysv.shmmax=16777216 kern.sysv.shmmax: 4096 -> 16777216 # Set the value of SHMALL sudo sysctl -w kern.sysv.shmall=16777216 kern.sysv.shmall: 4096 -> 16777216

^{Changing kernel options on Linux}

 # Get the value of SHMMAX sudo sysctl kernel.shmmax kernel.shmmax: 4096 # Get the value of SHMALL sudo sysctl kernel.shmall kernel.shmall: 4096 # Set the value of SHMMAX sudo sysctl -w kernel.shmmax=16777216 kernel.shmmax: 4096 -> 16777216 # Set the value of SHMALL sudo sysctl -w kernel.shmall=16777216 kernel.shmall: 4096 -> 16777216

_{Don't forget : to make changes permanent, add these values to /etc/sysctl.conf}

Huge Pages

Linux uses 4 KB of memory by default, BSD uses Super Pages , and Windows uses Large Pages . A page is a part of RAM allocated to a process. A process can have several pages depending on the memory requirements. The more memory the process needs, the more pages are allocated to it. The OS maintains a page allocation table for processes. The smaller the page size, the larger the table, the longer it takes to search a page in this page table. Therefore, large pages allow you to use a large amount of memory with reduced overhead costs; fewer page views, fewer page errors, faster read / write operations through large buffers. The result is improved performance.

PostgreSQL only supports large Linux pages. By default, Linux uses 4 KB of memory pages, so in cases where there are too many memory operations, you need to install larger pages. There is a performance boost when using large pages of 2 MB and up to 1 GB. Large page size can be set at load time. You can easily check the parameters of a large page and their use on your Linux machine using the command cat / proc / meminfo | grep -i huge .

^{Getting information about large pages (only on Linux)}

 Note: This is only for Linux, for other OS this operation is ignored$ cat /proc/meminfo | grep -i huge AnonHugePages:        0 kB ShmemHugePages:       0 kB HugePages_Total:      0 HugePages_Free:       0 HugePages_Rsvd:       0 HugePages_Surp:       0 Hugepagesize:      2048 kB

In this example, although the size of a large page is set to 2048 (2 MB), the total number of large pages is 0. This means that large pages are disabled.

The script determine the number of large pages

This simple script returns the required number of large pages. Run the script on your Linux server while PostgreSQL is running. Make sure that the PostgreSQL data directory is set for the $ PGDATA environment variable.

^{Getting the numbers of the required large pages}

 #!/bin/bash pid=`head -1 $PGDATA/postmaster.pid` echo "Pid:           $pid" peak=`grep ^VmPeak /proc/$pid/status | awk '{ print $2 }'` echo "VmPeak:          $peak kB" hps=`grep ^Hugepagesize /proc/meminfo | awk '{ print $2 }'` echo "Hugepagesize:  $hps kB" hp=$((peak/hps)) echo Set Huge Pages:    $hp

The output of the script is as follows:

^{Script output}

 Pid:           12737 VmPeak:        180932 kB Hugepagesize:  2048 kB Set Huge Pages: 88

The recommended value for large pages is 88, so you should set the value to 88.

^{Installing large pages}

 sysctl -w vm.nr_hugepages=88

Check the big pages now, you will see that the big pages are not used (HugePages_Free = HugePages_Total).

^{Big Pages Information again (on Linux only)}

 $ cat /proc/meminfo | grep -i huge AnonHugePages:        0 kB ShmemHugePages:       0 kB HugePages_Total:     88 HugePages_Free:      88 HugePages_Rsvd:       0 HugePages_Surp:       0 Hugepagesize:      2048 kB

Now set the huge_pages "on" parameter in $ PGDATA / postgresql.conf and restart the server.

^{And again the information on large pages (only on Linux)}

 $ cat /proc/meminfo | grep -i huge AnonHugePages:        0 kB ShmemHugePages:       0 kB HugePages_Total:     88 HugePages_Free:      81 HugePages_Rsvd:       64 HugePages_Surp:       0 Hugepagesize:      2048 kB

Now you can see that very few large pages are used. Let's now try to add some data to the database.

^{Some database operations for large page utilization}

 postgres=# CREATE TABLE foo(a INTEGER); CREATE TABLE postgres=# INSERT INTO foo VALUES(generate_Series(1,10000000)); INSERT 0 10000000

Let's see if we use more large pages now than before.

^{Once again information about large pages (only on Linux)}

 $ cat /proc/meminfo | grep -i huge AnonHugePages:        0 kB ShmemHugePages:       0 kB HugePages_Total:     88 HugePages_Free:      18 HugePages_Rsvd:       1 HugePages_Surp:       0 Hugepagesize:      2048 kB

Now you can see most of the big pages being used.

_{Note: The approximate value for HugePages used here is very low, which is not a normal value for a machine in a grocery environment.} _{Please estimate the required number of pages for your system and set them accordingly depending on the load and resources.}

vm.swappiness

vm.swappiness is another kernel parameter that can affect database performance. This parameter is used to control swapiness (paging into and out of memory) in Linux. The value ranges from 0 to 100. It determines how much memory will be unloaded or unloaded. Zero means disabling the exchange, and 100 means aggressive exchange.

You can get good performance by setting lower values.

Setting the value to 0 in newer kernels may result in the OOM Killer (Linux memory cleaning process) killing the process. Thus, it is safe to set the value to 1 if you want to minimize paging. The default value in Linux is 60. A higher value causes the MMU (memory management unit) to use more paging space than RAM, while a lower value stores more data / code in memory.

Smaller value is a good bet on performance improvements in PostgreSQL.

vm.overcommit_memory / vm.overcommit_ratio

Applications receive memory and free it when it is no longer needed. But in some cases, the application gets too much memory and does not free it. This may cause an OOM killer. Here are the possible values for the vm.overcommit_memory parameter with a description for each:

Heuristic overcommit (default); core based heuristics
Allow overcommit anyway
Do not overdo it, do not exceed the overcommit coefficient.

Link: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

vm.overcommit_ratio - the percentage of RAM that is available for excessive loading. A value of 50% in a system with 2 GB of RAM can allocate up to 3 GB of RAM.

A value of 2 for vm.overcommit_memory provides better performance for PostgreSQL. This value maximizes the use of RAM by the server process without any significant risk of being killed by the OOM killer process. The application will be able to reboot, but only within the overrun, which reduces the risk that the OOM killer will kill the process. Therefore, a value of 2 gives a better performance than the default value of 0. However, reliability can be improved due to the fact that the memory outside the acceptable range will not be overloaded. This eliminates the risk that the process will be killed by the OOM-killer.

On systems without swapping, there may be a problem with vm.overcommit_memory equal to 2.

https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT

vm.dirty_background_ratio / vm.dirty_background_bytes

vm.dirty_background_ratio is the percentage of memory that is full of dirty pages that need to be written to disk. The flush to disk is performed in the background. The value of this parameter ranges from 0 to 100; however, a value below 5 may be inefficient and some kernels do not support it. 10 is the default on most Linux systems. You can improve performance for intensive write operations with a lower factor, which would mean that Linux will dump dirty pages in the background.

You need to set the value of vm.dirty_background_bytes depending on the speed of your disk.

There are no “good” values for these two parameters, since both depend on the hardware. However, setting vm.dirty_background_ratio to 5 and vm.dirty_background_bytes at 25% of the disk speed improves performance up to ~ 25% in most cases.

vm.dirty_ratio / dirty_bytes

This is the same as vm.dirty_background_ratio / dirty_background_bytes , except that the reset is performed in a working session, blocking the application. Therefore, vm.dirty_ratio should be higher than vm.dirty_background_ratio . This ensures that background processes run earlier to avoid blocking the application as much as possible. You can adjust the difference between these two ratios depending on the disk I / O load.

Total

You can adjust other parameters to increase performance, but the improvements will be minimal and you will not get much benefit. We must remember that not all parameters apply to all types of applications. Some applications work better when we configure some settings, and some do not. You must find the right balance between the configurations of these parameters for the expected workload and type of application, as well as the configuration of the operating system. Configuring kernel parameters is not as easy as setting database parameters: it’s more difficult to make recommendations here.

Source: https://habr.com/ru/post/458860/

All Articles