
tmpfs . The picture has not changed. The disk has nothing to do with it.pg.catalog.pg_locks and pg_stat_activity . In the second, in version 9.6, information was added about what the process was waiting for ( Amit Kapila, Ildus Kurbangaliev ) - wait_event_type . Possible values ββfor this field are described here . postgres=# SELECT COUNT(*) FROM pg_locks; count β---β 88453 (1 row) postgres=# SELECT COUNT(*) FROM pg_stat_activity; count β---β 1826 (1 row) postgres=# SELECT COUNT(*) FROM pg_stat_activity WHERE state ='active'; count β---β 1005 SELECT COUNT(mode), mode FROM pg_locks WHERE pid =580707 GROUP BY mode; count | mode β-----+---------------β 93 | AccessShareLock 1 | ExclusiveLock share lock, while writing it is exclusive . That is, write locks accounted for less than 1% of all requests.pg_locks types of locks do not always appear as described in the user documentation . AccessShareLock = LockTupleKeyShare RowShareLock = LockTupleShare ExclusiveLock = LockTupleNoKeyExclusive AccessExclusiveLock = LockTupleExclusive buffer content lock . A possible solution is to βteachβ INSERTs from different sessions to less intersect across buffers.perf utility collects a lot of diagnostic information. In record mode ... it records system event statistics in files (by default they are in ./perf_data ), and in report mode it maligns the collected data, you can, for example, filter events related only to postgres or this pid : $ perf record -u postgres $ perf record -p 76876 , $ perf report > ./my_results 
perf to diagnose PostgreSQL is described, for example, here , as well as in the pg-wiki .perf top , which works, naturally, in the spirit of top operating system. Using perf top we saw that the processor spends most of the time in kernel PinBuffer() , as well as in the PinBuffer() and LWLockAttemptLock(). functions LWLockAttemptLock(). .PinBuffer() is a function that increases the count of buffer references (displaying a data page to RAM), thanks to which postgres processes know which buffers can be preempted and which cannot.LWLockAttemptLock() - the LWLock 's capture function. LWLock is a kind of LWLock with two levels of shared and exclusive , without defining deadlock , locks are first allocated in shared memory , waiting processes are waiting in the queue.github , not from our case (neither we nor the client are ready for the disclosure of details yet).perf , but the flame graph lucidly visualizes the data, and builds trees based on the collected call stacks. Details about profiling with flame graphs can be found, for example, here , and download everything you need here .nestloop . Apparently, the JOINs of a large number of tables in numerous parallel read requests caused a large number of access share locks.perf show where the processor cycles go. And although we have seen that most of the CPU time is spent on locks, we did not see what exactly leads to such long waits for locks, because we donβt see where locks wait exactly because pending processor time is not wasted.pg_stat_activity . SELECT wait_event_type, wait_event, COUNT(*) FROM pg_stat_activity GROUP BY wait_event_type, wait_event; LWLockTranche | buffer_content | UPDATE ************* LWLockTranche | buffer_content | INSERT INTO ******** LWLockTranche | buffer_content | \r | | insert into B4_MUTEX | | values (nextval('hib | | returning ID Lock | relation | INSERT INTO B4_***** LWLockTranche | buffer_content | UPDATE ************* Lock | relation | INSERT INTO ******** LWLockTranche | buffer_mapping | INSERT INTO ******** LWLockTranche | buffer_content | \r buffer_content values ββare buffer_content (locking the contents of buffers) and buffer_mapping (locks on the components of the shared_buffers hash labels).GDB debugger. With GDB we can get a call stack of specific processes. Applying the sampling, i.e. By collecting a certain number of random call stacks, you can get an idea of ββwhich stacks have the longest expectations.gdb must be attached to the PostgreSQL process. To do this, find the pid server process, say from $ ps aux | grep postgres postgres 2025 0.0 0.1 172428 1240 pts/17 S 23 0:00 /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data pid into the debager: igor_le:~$gdb -p 2025 bt [i.e. backtrace ] or where . And we get a lot of information like this: (gdb) bt #0 0x00007fbb65d01cd0 in __write_nocancel () from /lib64/libc.so.6 #1 0x00000000007c92f4 in write_pipe_chunks ( data=0x110e6e8 "2018β06β01 15:35:38 MSK [524647]: [392β1] db=bp,user=bp,app=[unknown],client=192.168.70.163 (http://192.168.70.163) LOG: relation 23554 new block 493: 248.389503\n2018β06β01 15:35:38 MSK [524647]: [393β1] db=bp,user=bp,app=["..., len=409, dest=dest@entry=1) at elog.c:3123 #2 0x00000000007cc07b in send_message_to_server_log (edata=0xc6ee60 <errordata>) at elog.c:3024 #3 EmitErrorReport () at elog.c:1479 buffer partition lock waiting inside the relation extension lock , that is, locking on a piece of the hash table manager, which was necessary to displace the old buffer, to later replace it with a new, corresponding to the extended part of the table. A number of buffer content lock was also noticeable, which corresponded to the expectation of locking the B-tree index pages for the implementation of the insert.
LWLock and LWLock stuck. But this is unlikely. Because nothing complicated inside the buffer partition lock happens.LWLock 'a. That is, in spite of the fact that no one took a lock for too long, his wait lasted unnecessarily long.relation extension , what happens inside the RelationAddExtraBlocks() function. So we will know what time is spent inside RelationAddExtraBlocks().pg_stat_activity about what we are currently doing in the relation extension . It was done this way: when the relation extended, application_name becomes RelationAddExtraBlocks . This process is now conveniently analyzed with maximum details using gdb bt and perf .Bβtree : earlier, when requesting to insert, the sheet was blocked as share , and after that it received exclusive . Now he immediately gets exclusive . Now this patch is already commited for PostgreSQL 12 . Fortunately, this year Alexander Korotkov received committer status - the second PostgreSQL committer in Russia and the second in the company.NUM_BUFFER_PARTITIONS was also increased from 128 to 512 to reduce the load on mapping locks: the hash table of the buffer manager was divided into smaller pieces, in the hope that the load on each specific piece would decrease.NUM_BUFFER_PARTITIONS , buffer_mapping remained, that is, we remind you that the buffer manager buffer_mapping pieces of the hash table: locks_count | active_session | buffer_content | buffer_mapping ----βββ--βββ+β------βββββββββ+βββ------βββββββ+ββ------βββ 12549 | 1218 | 0 | 15 heap- expansion came to the fore.buffer parittion lock 'e when the buffer is preempted. Perhaps, at the same buffer parittion lock , there is some very demanded page, for example, the root of some Bβtree . In this place there is a non-stop stream of requests for shared lock from reading requests.LWLock 'e is βnot fair.β Since shared lock 's can be taken as many as you like at once, then if shared lock already taken, then subsequent shared lock ' and pass without a queue. Thus, if the shared lock stream has enough intensity so that there are no βwindowsβ between them, then the waiting of the exclusive lock goes almost to infinity.shared locker conscience and they honestly queue when an exclusive lock already there (interestingly, heavy locks - hwlock - have no problems with conscience: they always honestly queue) locks_count | active_session | buffer_content | buffer_mapping | reladdextra | inserts>30sec ββββββ-βββββ+ββββββββββββββββ+ββββββββββββββββ+βββββββββββ--β-β+ββββββ-ββββββ+ββββ------ 173985 | 1802 | 0 | 569 | 0 | 0 insert . Although the locks on the pieces of hash tablets remained. But what to do, these are the properties of the bus of our little supercomputer.share -blocks β which allow exclusive -blocks to pass β solved the problem of hourly delays in a multi-node system. The buffer manager hash label did not work because of the too large flow of share lock locks that did not leave a chance for locks needed to wipe out the old buffers and load new ones. Problems with expanding the buffer for database tables were only a consequence of this. Before that, we managed to embroider a bottleneck with access to the B-tree root.
Source: https://habr.com/ru/post/423685/
All Articles