📜 ⬆️ ⬇️

POWA-like PostgreSQL Monitoring with Prometheus

Prehistory


We used the excellent POWA utility for a long time to collect and easily view data on how PostgreSQL works (overall server performance, slowest queries, most frequent queries). However, this solution was far from ideal and we managed to find a better option, moreover, fully integrated with our main monitoring system.


POWA minimally satisfied most of our needs:



However, there were unpleasant cons:



A couple of POWA screenshots for an example:




In this regard, it was decided to try to collect metrics from PostgreSQL using postgres_exporter for Prometheus.


Installation


TL; DR: here is the role for Ansible, was written under CentOS 7, but after minimal edits (replacing firewalld with iptables) should work on any system with systemd - https://github.com/UnitedTraders/ansible-postgresql-exporter


Solution Architecture


I will not talk in detail about Prometheus itself, there is enough material about it, but I will go through the general architecture and the exporter.


Prometheus uses the pull metrics collection model: he has a list of exporters and he polls them over HTTP, collecting a list of metrics from them and putting them into his store.


An exporter is an agent that collects metrics directly from the entity (the server as a whole, or a specific application) that needs to be monitored. Prometheus has rich possibilities for instrumentation, therefore, exporters are available for most popular applications, and it’s easy to write your own if necessary.


postgres_exporter works as follows: it connects to PostgreSQL, performs queries to service tables, and exposes the results in a special format using an internal HTTP server to collect them by Prometheus. An important point: in addition to a large set of default queries, you can define your own and collect any data that can be obtained using SQL, including some business metrics.


Thus, setting postgres_exporter'a boils down to three actions:



Exporter Installation


The exporter is written in Go, so everything is trivial:



 [Unit] Description=Prometheus exporter for Postgresql (https://github.com/wrouesnel/postgres_exporter) [Service] WorkingDirectory=/opt/postgres_exporter EnvironmentFile=/opt/postgres_exporter/env ExecStart=/opt/postgres_exporter/postgres_exporter_v0.4.1_linux-amd64/postgres_exporter --extend.query-path=/opt/postgres_exporter/metrics.yaml --web.listen-address=:9187 User=pg_exporter [Install] WantedBy=multi-user.target 


Customize your metrics


By default, postgres_exporter cannot collect data on requests. But PostgreSQL has a very useful extension, pg_stat_statements , which is exactly what it does. Installing pg_stat_statements comes down to three simple steps:



Since it was important for us to collect the query execution time first, then the file with metrics turned out like this:


 pg_database: query: " SELECT pg_database.datname, pg_database_size(pg_database.datname) as size FROM pg_database" metrics: - datname: usage: "LABEL" description: "Name of the database" - size: usage: "GAUGE" description: "Disk space used by the database" pg_stat_statements: query: "SELECT queryid, datname, left(query, 100) as short_query, sum(calls) as calls, sum(total_time) as total_time, min(min_time) as min_time, max(max_time) as max_time, sum(mean_time*calls)/sum(calls) as mean_time FROM pg_stat_statements JOIN pg_database ON pg_stat_statements.dbid = pg_database.oid group by queryid, short_query, datname" metrics: - queryid: usage: "LABEL" description: "Query ID" - datname: usage: "LABEL" description: "Database name" - short_query: usage: "LABEL" description: "Query limited to 100 symbols" - calls: usage: "COUNTER" description: "Number of times executed" - total_time: usage: "COUNTER" description: "Total time spent in the statement, in milliseconds" - min_time: usage: "GAUGE" description: "Minimum time spent in the statement, in milliseconds" - max_time: usage: "GAUGE" description: "Maximum time spent in the statement, in milliseconds" - mean_time: usage: "GAUGE" description: "Mean time spent in the statement, in milliseconds" 

There is one pitfall here: records from pg_stat_statements need to be aggregated, because In this table there can be several records with the same combination of query id, database id and query. The presence of such records leads to a fall in postgres_exporter, since it forms the names of the metrics based on this data, and they must be unique.


For simplicity, I did not add read / write metrics (shared_blks_written, etc., they are added by analogy). Also, I repeat, similar queries can be made to any table, and not just to pg_stat_statements.


Prometheus Request Examples


With the above config, the exporter will generate a significant number of metrics - 5 pieces for each type of request. We will aggregate them on the side of Prometheus.


Examples of requests (immediately with graphan variables for templating):



In graphan, it looks like this ( link to json):




')

Source: https://habr.com/ru/post/345370/


All Articles