PostgreSQL: analytics for DBA

Many PostgreSQL DBMS users know that the server collects various statistics during its work, but not everyone knows that it is useful to analyze it and how to extract it for this. This small toolkit contains several useful queries that give some insight into how to use this “hidden knowledge” that is constantly accumulating. These queries can be used to monitor the status of PostgreSQL (either manually or using plugins for monitoring systems like Nagios, Cacti or Zabbix), to look for bottlenecks in the server and many other similar tasks. Remember that this is only the tip of the iceberg; In the documentation, you can find descriptions of several dozens of system views, which may also be useful to the PostgreSQL administrator.

For the toolkit to work correctly, you must enable the stats_block_level and stats_row_level options in postgresql.conf, and also configure the stats_reset_on_server_start parameter at your discretion. If each time you restart the PostgreSQL server, you change some essential parameters of its configuration, it makes sense to reset the statistics in order to track the effect of the changes made. If you are interested in a long-term perspective and the restart is not performed due to a PostgreSQL configuration change, set the stats_reset_on_server_start parameter to off.

Hit / read ratio

')
When executing a request, PostgreSQL first looks to see if there is any data in the shared memory in the request (shared buffers). If they are found, hit is counted; if not, a relatively slow fread system call is made to pick up data from the disk or from the operating system disk cache and read is counted. On average, the rule is true: the greater the hit / read ratio, the better PostgreSQL is configured, since it reads very little from the disk, mainly extracting data from shared memory. For most not very large bases, this ratio should lie between 5,000 and 10,000. Do not strive, however, to artificially overstate the setting of shared_buffers, which directly defines hit / read: too large shared memory sizes lead to a loss of performance in databases with intensive writing. It is also worth remembering that fread can be quite fast if the data is in the OS disk cache.

  SELECT 
       datname, 
       CASE 
         WHEN blks_read = 0 THEN 0 
         ELSE blks_hit / blks_read 
       END AS ratio 
     FROM 
       pg_stat_database;

The number of modifications that occurred in the table

List by tables: how many entries have been added, changed and deleted since the last reset of statistics. The DBA should represent which tables are the most loaded in the current database, as well as the ratio between the different types of modifying queries to them.

     SELECT 
       relname, 
       n_tup_ins, 
       n_tup_upd, 
       n_tup_del 
     FROM 
       pg_stat_user_tables 
     ORDER BY 
       n_tup_upd DESC;

Seq scan / index scan statistics

List by tables: how many requests to them were made by sequential viewing; how many queries were performed using indexes; and the ratio of these two numbers. Allows you to evaluate whether all the necessary indexes are created in this table. If your tables contain more than several thousand rows, sequential browsing will be slower than the index scan, so ideally seqscans should not be in such tables. If you still have them, analyze the queries for such tables and create the corresponding indexes. At the same time, it is important not to overdo it: the more indices in the columns of the table, the more expensive the data update operations become.

Also do not forget that after creating the index, the table needs to be done ANALYZE, otherwise the query planner will not notice changes in the table structure.

  SELECT 
       relname, 
       seq_scan, 
       idx_scan 
       CASE 
         WHEN idx_scan = 0 THEN 100 
         ELSE seq_scan / idx_scan 
       END AS ratio 
     FROM 
       pg_stat_user_tables 
     ORDER BY 
       ratio DESC;

Index Statistics

List by index: how many records from the index were returned in queries on this index; how many rows at the same time still had to be viewed in the parent table; the ratio of these two numbers. The essence of this statistic is simple: if you have a lot of fetch, it means the index is outdated and when executing a query, PostgreSQL is forced to look directly at the table as the source of the most relevant data, which slows down its work. In this case, it is necessary to rebuild this index to match the real data in the table.

  SELECT 
       indexrelname, 
       idx_tup_read, 
       idx_tup_fetch, 
       CASE 
         WHEN idx_tup_fetch = 0 THEN 100 
         ELSE idx_tup_read / idx_tup_fetch 
       END AS ratio 
     FROM 
       pg_stat_user_indexes 
     ORDER BY 
       ratio DESC;

Running queries with their duration

A simple list of currently executed server requests. It is useful when you do not know the system well enough or simply did not have time to configure it - with its help you can find and interrupt a “bad” request that interferes with the server’s work (the procpid column contains the PID of the process, which you can do kill if necessary). Remember, however, that a simple periodic review of running queries will in no way replace the excellent pgFouine log analyzer. Also, do not forget that the process in which you execute this query also falls into the resulting list.

  SELECT 
       datname, 
       NOW () - query_start AS duration, 
       procpid 
       current_query	
     FROM
       pg_stat_activity 
     ORDER BY duration duration DESC;

List of current locks

A list of current locks with an indication of the type of lock, the table and database on which it is set, and the number of the transaction that set the lock. Do not be afraid if the request gives a long list of lock-s - not all of them are critical and block the table from all possible changes and even readings. To analyze the list of locks you should definitely get acquainted with the documentation about their types in PostgreSQL and about when and what requests they are exposed to. One common case where a list of locks can be useful: by executing the command ps aux | grep ^ postgres you see that waiting is written as one of the postgres processes, which means that the process is waiting for the blocking it needs to be released. Which one - find out by running this query.

  SELECT 
       l.mode, 
       d.datname, 
       c.relname, 
       l.granted 
       l.transactionid 
     FROM 
       pg_locks AS l 
     LEFT JOIN pg_database AS d ON l.database = d.oid 
     LEFT JOIN pg_class AS c ON l.relation = c.oid;

Source: https://habr.com/ru/post/13992/

All Articles