tmpfs
. The picture has not changed. The disk has nothing to do with it.pg.catalog.pg_locks
and pg_stat_activity
. In the second, in version 9.6, information was added about what the process was waiting for ( Amit Kapila, Ildus Kurbangaliev ) - wait_event_type
. Possible values ββfor this field are described here . postgres=# SELECT COUNT(*) FROM pg_locks; count β---β 88453 (1 row) postgres=# SELECT COUNT(*) FROM pg_stat_activity; count β---β 1826 (1 row) postgres=# SELECT COUNT(*) FROM pg_stat_activity WHERE state ='active'; count β---β 1005
SELECT COUNT(mode), mode FROM pg_locks WHERE pid =580707 GROUP BY mode; count | mode β-----+---------------β 93 | AccessShareLock 1 | ExclusiveLock
share
lock, while writing it is exclusive
. That is, write locks accounted for less than 1% of all requests.pg_locks
types of locks do not always appear as described in the user documentation . AccessShareLock = LockTupleKeyShare RowShareLock = LockTupleShare ExclusiveLock = LockTupleNoKeyExclusive AccessExclusiveLock = LockTupleExclusive
buffer content lock
. A possible solution is to βteachβ INSERTs from different sessions to less intersect across buffers.perf
utility collects a lot of diagnostic information. In record
mode ... it records system event statistics in files (by default they are in ./perf_data
), and in report
mode it maligns the collected data, you can, for example, filter events related only to postgres
or this pid
: $ perf record -u postgres $ perf record -p 76876 , $ perf report > ./my_results
perf
to diagnose PostgreSQL is described, for example, here , as well as in the pg-wiki .perf top
, which works, naturally, in the spirit of top
operating system. Using perf top
we saw that the processor spends most of the time in kernel PinBuffer()
, as well as in the PinBuffer()
and LWLockAttemptLock().
functions LWLockAttemptLock().
.PinBuffer()
is a function that increases the count of buffer references (displaying a data page to RAM), thanks to which postgres processes know which buffers can be preempted and which cannot.LWLockAttemptLock()
- the LWLock
's capture function. LWLock
is a kind of LWLock
with two levels of shared
and exclusive
, without defining deadlock
, locks are first allocated in shared memory
, waiting processes are waiting in the queue.github
, not from our case (neither we nor the client are ready for the disclosure of details yet).perf
, but the flame graph
lucidly visualizes the data, and builds trees based on the collected call stacks. Details about profiling with flame graphs can be found, for example, here , and download everything you need here .nestloop
. Apparently, the JOINs of a large number of tables in numerous parallel read requests caused a large number of access share
locks.perf
show where the processor cycles go. And although we have seen that most of the CPU time is spent on locks, we did not see what exactly leads to such long waits for locks, because we donβt see where locks wait exactly because pending processor time is not wasted.pg_stat_activity
. SELECT wait_event_type, wait_event, COUNT(*) FROM pg_stat_activity GROUP BY wait_event_type, wait_event;
LWLockTranche | buffer_content | UPDATE ************* LWLockTranche | buffer_content | INSERT INTO ******** LWLockTranche | buffer_content | \r | | insert into B4_MUTEX | | values (nextval('hib | | returning ID Lock | relation | INSERT INTO B4_***** LWLockTranche | buffer_content | UPDATE ************* Lock | relation | INSERT INTO ******** LWLockTranche | buffer_mapping | INSERT INTO ******** LWLockTranche | buffer_content | \r
buffer_content
values ββare buffer_content
(locking the contents of buffers) and buffer_mapping
(locks on the components of the shared_buffers
hash labels).GDB
debugger. With GDB
we can get a call stack of specific processes. Applying the sampling, i.e. By collecting a certain number of random call stacks, you can get an idea of ββwhich stacks have the longest expectations.gdb
must be attached to the PostgreSQL process. To do this, find the pid
server process, say from $ ps aux | grep postgres
postgres 2025 0.0 0.1 172428 1240 pts/17 S 23 0:00 /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data
pid
into the debager: igor_le:~$gdb -p 2025
bt
[i.e. backtrace
] or where
. And we get a lot of information like this: (gdb) bt #0 0x00007fbb65d01cd0 in __write_nocancel () from /lib64/libc.so.6 #1 0x00000000007c92f4 in write_pipe_chunks ( data=0x110e6e8 "2018β06β01 15:35:38 MSK [524647]: [392β1] db=bp,user=bp,app=[unknown],client=192.168.70.163 (http://192.168.70.163) LOG: relation 23554 new block 493: 248.389503\n2018β06β01 15:35:38 MSK [524647]: [393β1] db=bp,user=bp,app=["..., len=409, dest=dest@entry=1) at elog.c:3123 #2 0x00000000007cc07b in send_message_to_server_log (edata=0xc6ee60 <errordata>) at elog.c:3024 #3 EmitErrorReport () at elog.c:1479
buffer partition lock
waiting inside the relation extension lock
, that is, locking on a piece of the hash table manager, which was necessary to displace the old buffer, to later replace it with a new, corresponding to the extended part of the table. A number of buffer content lock
was also noticeable, which corresponded to the expectation of locking the B-tree
index pages for the implementation of the insert.LWLock
and LWLock
stuck. But this is unlikely. Because nothing complicated inside the buffer partition lock happens.LWLock
'a. That is, in spite of the fact that no one took a lock for too long, his wait lasted unnecessarily long.relation extension
, what happens inside the RelationAddExtraBlocks()
function. So we will know what time is spent inside RelationAddExtraBlocks().
pg_stat_activity
about what we are currently doing in the relation extension
. It was done this way: when the relation
extended, application_name
becomes RelationAddExtraBlocks
. This process is now conveniently analyzed with maximum details using gdb bt
and perf
.Bβtree
: earlier, when requesting to insert, the sheet was blocked as share
, and after that it received exclusive
. Now he immediately gets exclusive
. Now this patch is already commited for PostgreSQL 12 . Fortunately, this year Alexander Korotkov received committer status - the second PostgreSQL committer in Russia and the second in the company.NUM_BUFFER_PARTITIONS
was also increased from 128 to 512 to reduce the load on mapping locks: the hash table of the buffer manager was divided into smaller pieces, in the hope that the load on each specific piece would decrease.NUM_BUFFER_PARTITIONS
, buffer_mapping
remained, that is, we remind you that the buffer manager buffer_mapping
pieces of the hash table: locks_count | active_session | buffer_content | buffer_mapping ----βββ--βββ+β------βββββββββ+βββ------βββββββ+ββ------βββ 12549 | 1218 | 0 | 15
heap-
expansion came to the fore.buffer parittion lock
'e when the buffer is preempted. Perhaps, at the same buffer parittion lock
, there is some very demanded page, for example, the root of some Bβtree
. In this place there is a non-stop stream of requests for shared lock
from reading requests.LWLock
'e is βnot fair.β Since shared lock
's can be taken as many as you like at once, then if shared lock
already taken, then subsequent shared lock
' and pass without a queue. Thus, if the shared lock stream has enough intensity so that there are no βwindowsβ between them, then the waiting of the exclusive lock
goes almost to infinity.shared locker
conscience and they honestly queue when an exclusive lock
already there (interestingly, heavy locks - hwlock
- have no problems with conscience: they always honestly queue) locks_count | active_session | buffer_content | buffer_mapping | reladdextra | inserts>30sec ββββββ-βββββ+ββββββββββββββββ+ββββββββββββββββ+βββββββββββ--β-β+ββββββ-ββββββ+ββββ------ 173985 | 1802 | 0 | 569 | 0 | 0
insert
. Although the locks on the pieces of hash tablets remained. But what to do, these are the properties of the bus of our little supercomputer.share
-blocks β which allow exclusive
-blocks to pass β solved the problem of hourly delays in a multi-node system. The buffer manager
hash label did not work because of the too large flow of share lock
locks that did not leave a chance for locks needed to wipe out the old buffers and load new ones. Problems with expanding the buffer for database tables were only a consequence of this. Before that, we managed to embroider a bottleneck with access to the B-tree
root.Source: https://habr.com/ru/post/423685/
All Articles