Device and mechanism of work of Prometheus Operator in Kubernetes

This article is based on our internal documentation for DevOps engineers, explaining how Prometheus works under the control of Prometheus Operator in deployed and maintained Kubernetes clusters.

At first glance, Prometheus may seem like a rather complicated product, but, like any well-designed system, it consists of explicit functional components and actually does three things: a) collects metrics, b) executes the rules, c) saves the result to the base data time series (time series). The article is devoted not so much to Prometheus itself, as to the integration of this system with Kubernetes, for which we actively use an auxiliary tool called Prometheus Operator . But it’s still necessary to start from Prometheus itself ...
')

Prometheus: what is he doing?

So, if you dwell more on the first two functions of Prometheus, then they work as follows:

For each monitoring target (target) , each scrape_interval , an HTTP request is made to this target. In response, you get metrics in your own format , which are saved to the database.
Each evaluation_interval rules are processed, based on which:
- or send alerts,
- or new metrics are written (to themselves in the database) (the result of the rule execution).

Prometheus: how is it configured?

The Prometheus server has config and rule files ( rule files ).

The config has the following sections:

scrape_configs - settings for finding targets for monitoring (for more details, see the next section);

rule_files - the list of directories where the rules are to be loaded:

 rule_files: - /etc/prometheus/rules/rules-0/* - /etc/prometheus/rules/rules-1/*

alerting - alerting search settings to which alerts are sent. The section is very similar to scrape_configs with the difference that the result of its work is a list of endpoints to which Prometheus will send alerts.

Prometheus: where does the list of goals come from?

The general algorithm of Prometheus operation is as follows:

Prometheus reads the scrape_configs section, according to which it configures its internal Service Discovery mechanism.
The Service Discovery mechanism interacts with the Kubernetes API (mainly to obtain endpoints ).
Based on data from Kubernetes, the Service Discovery engine updates Targets (list of targets).

In scrape_configs is a list of scrape scrape_configs (this is an internal concept of Prometheus), each of which is defined as follows:

 scrape_configs: #   - job_name: kube-prometheus/custom/0 #   scrape job' #    Service Discovery scrape_interval: 30s #     scrape_timeout: 10s #    metrics_path: /metrics # path,   scheme: http # http  https #  Service Discovery kubernetes_sd_configs: # ,  targets    Kubernetes - api_server: null #   API-   #  (    ) role: endpoints # targets   endpoints namespaces: names: #  endpoints    namespaces - foo - baz #  "" ( enpoints ,  — )  "" # (     —    ) relabel_configs: #     prometheus_custom_target, #   service,   endpoint - source_labels: [__meta_kubernetes_service_label_prometheus_custom_target] regex: .+ #      action: keep #     - source_labels: [__meta_kubernetes_endpoint_port_name] regex: http-metrics # ,    http-metrics action: keep #   job,    prometheus_custom_target #  service,     "custom-" # #  job —   Prometheus.    , #     target   targets,     #   ,    targets (    #   rules  dashboards) - source_labels: [__meta_kubernetes_service_label_prometheus_custom_target] regex: (.*) target_label: job replacement: custom-$1 action: replace #   namespace - source_labels: [__meta_kubernetes_namespace] regex: (.*) target_label: namespace replacement: $1 action: replace #   service - source_labels: [__meta_kubernetes_service_name] regex: (.*) target_label: service replacement: $1 action: replace #   instance (    ) - source_labels: [__meta_kubernetes_pod_name] regex: (.*) target_label: instance replacement: $1 action: replace

Thus, Prometheus itself tracks:

adding and deleting pods (when adding / removing pods Kubernetes changes endpoints, and Prometheus sees and adds / deletes targets);
adding and removing services (more precisely, endpoints) in the specified namespaces (namespaces) .

Changing the config is required in the following cases:

you need to add a new scrape config (usually this is a new kind of services that need to be monitored);
need to change the list of namespaces.

Having dealt with the basics of Prometheus, we turn to its “operator” - a special auxiliary component for Kubernetes, which simplifies the deployment and operation of Prometheus in the realities of the cluster.

Prometheus Operator: what is he doing?

For the notorious “simplification”, firstly, in Prometheus Operator, using CRD ( Custom Resource Definitions ), three resources are defined:

prometheus - defines the installation (cluster) of Prometheus;
servicemonitor - defines how to monitor a set of services (ie, collect their metrics);
alertmanager - defines a cluster of Alertmanagers (we do not use them, because we send metrics directly to our notification system, which receives, aggregates and ranks data from a variety of sources - including, integrates with Slack and Telegram).

Secondly, the operator monitors the prometheus resources and generates for each of them:

StatefulSet (with Prometheus itself);
Secret with prometheus.yaml (Prometheus config) and configmaps.json (config for prometheus-config-reloader ).

Finally, the operator also monitors the servicemonitor resources and ConfigMaps with the rules, and on their basis updates the configs prometheus.yaml and configmaps.json (they are kept secret).

What is it with Prometheus?

Pod consists of two containers:

prometheus - Prometheus itself;
prometheus-config-reloader is a binding that monitors changes to prometheus.yaml and, if necessary, causes a reload of the Prometheus configuration (with a special HTTP request — see below for details), and also monitors ConfigMaps with the rules (they are specified in configmaps.json — see more below) and downloads them as needed and restarts Prometheus.

Pod uses three volumes (volumes) :

config - the mounted secret (two files: prometheus.yaml and configmaps.json ). Connected to both containers;
rules - emptyDir , which fills prometheus-config-reloader , and reads prometheus . Connected to both containers, but in prometheus - in read-only mode;
data - Prometheus data. Mounted only in prometheus .

How are Service Monitors handled?

Prometheus Operator reads Service Monitors (and also monitors their addition / deletion / modification). Which Service Monitors are specified in the prometheus resource prometheus (see the documentation for details).
For each Service Monitor , if it does not specify a specific list of namespaces (that is, it specifies any: true ), Prometheus Operator computes (referring to the Kubernetes API) a list of namespaces in which there are Services matching the labels specified in the Service Monitor .
Based on the read servicemonitor resources (see the documentation ) and on the basis of the calculated namespaces, Prometheus Operator generates a part of the config (section scrape_configs ) and saves the config to the appropriate secret.
By the standard means of Kubernetes, the data from the secret comes in under (the file prometheus.yaml updated).
The change in the file is noticed by prometheus-config-reloader , which via HTTP sends a request to Prometheus to reboot.
Prometheus re-reads the config and sees changes in scrape_configs , which it processes already according to its work logic (see details above).

How are ConfigMaps handled with the rules?

Prometheus Operator monitors ConfigMaps matching the ruleSelector specified in the prometheus resource.
If a new (or existing) ConfigMap has appeared , Prometheus Operator updates prometheus.yaml , after which the logic exactly corresponding to the Service Monitors processing (see above) is triggered.
As in the case of adding / removing ConfigMap , and when changing the contents of ConfigMap , Prometheus Operator updates the file configmaps.json (it contains a list of ConfigMaps and their checksums).
By the standard means of Kubernetes, the data from the secret comes in under (the file configmaps.json updated).
A file change is noticed by prometheus-config-reloader , which downloads the changed ConfigMaps to the rules directory (this is emptyDir ).
The same prometheus-config-reloader sends an HTTP request to Prometheus to reboot.
Prometheus rereads the config and sees the changed rules.

That's all!

In more detail about how we use Prometheus (and not only) for monitoring in Kubernetes, I plan to tell at the RootConf 2018 conference what will be held on May 28 and 29 in Moscow - come to listen and talk.