Monitoring Docker Swarm with cAdvisor, InfluxDB and Grafana

To monitor the status of running applications, it is necessary to monitor them continuously. And if applications run in a well-scalable environment like Docker Swarm, then a well-scalable monitoring tool will also be required. This article talks about setting up just such a tool.

In the process, we will install cAdvisor agents on each node to collect host metrics and containers. Metrics will be stored in InfluxDB . To plot graphs based on these metrics, use Grafana . These tools are distributed open source and can be deployed as containers.

To build a cluster, we will use Docker Swarm Mode and deploy the necessary services in the form of a stack. This will allow organizing a dynamic monitoring system that can automatically begin monitoring new nodes as they are added to the swarm. Project files can be found here .

Tool overview

The choice of monitoring systems is quite large. To build our stack, we will use open source services that work well in containers. Next, I will describe the composition of the stack.

cAdvisor

cAdvisor will collect metrics of hosts and containers. It is installed as a docker image with a docker socket and the root file system on the host connected as a shared volume. cAdvisor can record collected metrics in several types of time series databases (including time-series database), including InfluxDB, Prometheus, etc. It even has a web interface in which graphs are plotted from the collected data.

InfluxDB

Scalable storage for metrics, events, and real-time analytics.

InfluxDB is an open source time series database that allows you to store numeric metrics and assign tags to them. This system implements an SQL-like query language that can be used to work with stored data. We will filter events using tags by host or even container.

Grafana

Grafana is a feature-rich open source system that allows you to create toolbars and graphs based on metrics from Graphite, Elasticsearch, OpenTSDB, Prometheus, and InfluxDB.

Grafana is a popular visualization tool that allows you to create toolbars, retrieving data from Graphite, Elasticsearch, OpenTSDB, Prometheus and, of course, InfluxDB. Starting with the fourth version, it became possible to customize alerts based on the results of queries. We will create a toolbar with which you can display data on a specific host and service.

Docker swarm mode

Swarm Mode appeared in Docker since version 1.12.0. It allows you to easily create a swarm from a host of hosts and easily manage it. To ensure the operation of the built-in mechanisms for discovering services and orchestration, key-value storage is implemented in Swarm mode. Hosts can play the role of manager (manager) or working node (worker). In general, the manager is responsible for the function of the orchestration, and containers are executed on the worker nodes. Since this is a demo installation, we will place InfluxDB and Grafana on the manager.

In Swarm Mode there is an interesting function of the routing grid (routing mesh) , which performs the role of a virtual load balancer. Suppose we have 10 containers listening to the 80th port, which run on 5 nodes. When trying to access the 80 port of one of these containers, the request can be sent to any of them, even running on a different host. Thus, by publishing the IP address of any node, you automatically turn on balancing requests between ten containers.

If you plan to independently execute the commands given in this demonstration in your system, you will need the following programs:

Docker : version> = 1.13 (for Docker Compose File version 3 and Swarm Mode);
Docker Machine : version> = 0.8;
Docker Compose : version> = 1.10 (for Docker Compose File version 3).

The swarm will consist of three local virtual machines, which we deploy with the help of the docker-machine plugin Virtualbox . To do this, you must have Virtualbox installed . Using other plug-ins, you can deploy virtual machines in cloud services. Steps after creating machines will be the same for all plugins. More information about the docker-machine can be found here .

When creating virtual machines, we will leave the default options. Here is more detailed information about available options. The host that performs the swan manager function is called manager , and the worker nodes are agent1 and agent2 . You can create as many nodes as you like. Just repeat the above commands with a different host name. To create a virtual machine, run the following commands:

 docker-machine create manager docker-machine create agent1 docker-machine create agent2

It may take some time to execute these commands. After the machines are created, the docker-machine ls output should look something like this:

 NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS agent1 - virtualbox Running tcp://192.168.99.101:2376 v17.03.1-ce agent2 - virtualbox Running tcp://192.168.99.102:2376 v17.03.1-ce manager - virtualbox Running tcp://192.168.99.100:2376 v17.03.1-ce

To use the docker engine on the manager’s manager, you need to switch the context. Further we will execute commands in docker installed on the host manager , and NOT on the local system. To do this, run the command:

 eval `docker-machine env manager`

Now that we have switched to docker on manager , we initialize this host as a swarm manager. We will need its IP, which will be published on other connected nodes. The docker-machine ip manager allows you to get the necessary information. So, to create a swarm, run the following command:

 docker swarm init --advertise-addr `docker-machine ip manager`

Now we need two working nodes. To do this, you must transfer the Join Token and IP published when creating the swarm. To get the token, issue the docker swarm join-token -q worker . The docker-machine ip manager , as before, will allow you to get the IP manager and its standard port 2377. We could add new machines to the swarm by switching to the context of each work node, but it is much easier to execute these commands via SSH. To attach the work nodes to the swarm, run the following commands:

 docker-machine ssh agent1 docker swarm join --token `docker swarm join-token -q worker` `docker-machine ip manager`:2377 docker-machine ssh agent2 docker swarm join --token `docker swarm join-token -q worker` `docker-machine ip manager`:2377

The list of members in a swarm of nodes can be displayed using the docker node ls . After adding work nodes, the output should look like this:

 ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 3j231njh03spl0j8h67z069cy * manager Ready Active Leader muxpteij6aldkixnl31f0asar agent1 Ready Active y2gstaqpqix1exz09nyjn8z41 agent2 Ready Active

Docker stack

With the third version of docker-compose file in one file, you can define the entire service stack, including the deployment strategy, and deploy it with a single command, deploy . The main difference between the third version of the docker-compose file and the second one was the appearance of the deploy parameter in the description of each service. This parameter defines the way containers are deployed. The docker-compose file for the test monitoring system is shown below:

 version: '3' services: influx: image: influxdb volumes: - influx:/var/lib/influxdb deploy: replicas: 1 placement: constraints: - node.role == manager grafana: image: grafana/grafana ports: - 0.0.0.0:80:3000 volumes: - grafana:/var/lib/grafana depends_on: - influx deploy: replicas: 1 placement: constraints: - node.role == manager cadvisor: image: google/cadvisor hostname: '{{.Node.ID}}' command: -logtostderr -docker_only -storage_driver=influxdb -storage_driver_db=cadvisor -storage_driver_host=influx:8086 volumes: - /:/rootfs:ro - /var/run:/var/run:rw - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro depends_on: - influx deploy: mode: global volumes: influx: driver: local grafana: driver: local

Our stack has 3 services, which are described below.

influx

Here we will use the influxdb image. For persistent storage, create the influx volume, which will be mounted in the container folder / var / lib / influxdb . We only need one copy of InfluxDB, which will be hosted on the host manager . The docker server is running on the same host, so the commands for the container can be executed here. Since both of the remaining services need influxDB, we will add the depends_on key with the value influx to the description of these services.

grafana

We will use the image of grafana / grafana and forward the 3000th port of the container to the 80th port of the host. The route grid allows you to connect to the grafana through the 80th port of any host that is part of the swarm. To permanently store data, create another volume called grafana . It will be mounted in the container folder / var / lib / grafana . Grafana will also be deployed to the host manager .

cadvisor

To configure cAdvisor, you will have to work a little more than with previous services. More information is available at this link . The choice of the hostname value in this case is not an easy task. We are going to install agents on each node, and this container will collect the metrics of the node and the containers working on it. When cAdvisor sends metrics to InfluxDB, it sets the machine tag, which contains the name of the container with cAdvisor. Its value must match the ID of the node on which it is running. In Docker stacks you can use patterns in the names. More information can be found here . We gave the containers names containing the ID of the node on which they are running, and thus we can determine where the metric came from. This is achieved using the following expression '{{.Node.ID}}' .

We will also add several command line options to cadvisor. The logtostderr option redirects the cadvsior generated logs to stderr , which makes debugging easier. The docker_only flag says that we are only interested in docker containers. The following three parameters determine the location in the repository where the collected metrics should be placed. We will ask cAdvisor to put them in the cadvisor database on the InfluxDB server listening to influx: 8086 . This will enable us to send metrics to the influx service of our stack. Inside the stack, all ports are open (exposed), so they do not need to be specified separately.

The volumes specified in the file are needed by cAdvisor to collect metrics from the host and docker. To deploy cadvisor, we will use the global mode. This will ensure that only one instance of the cadvisor service is executed on each node of the swarm.

At the end of the file we have the key volumes in which the influx and grafana volumes are indicated. Since both volumes will be hosted on the manager's host, we will assign the local driver for them.

To deploy the stack, save the above file as docker-stack.yml and run the following command:

 docker stack deploy -c docker-stack.yml monitor

It will start the stack monitor services. The first launch of the command may take some time, since nodes must load images of containers. You will also need to create a database for storing metrics called cadvisor in InfluxDB .

 docker exec `docker ps | grep -i influx | awk '{print $1}'` influx -execute 'CREATE DATABASE cadvisor'

The execution of the command may fail with the message that the influx container does not exist. The reason for the error is that the container is not ready yet. Wait a bit and execute the command again. We can execute commands in the influx service, since it is running on the manager manager and we use the docker installed here. To find out the container ID with InfluxDB, you can use the docker ps | grep -i influx | awk '{print $1}' docker ps | grep -i influx | awk '{print $1}' docker ps | grep -i influx | awk '{print $1}' , and to create a database named cadvisor, execute the command influx -execute 'CREATE DATABASE cadvisor' .

To display a list of stack services, run the docker stack services monitor . The output of the command will look something like this:

 ID NAME MODE REPLICAS IMAGE 0fru8w12pqdx monitor_influx replicated 1/1 influxdb:latest m4r34h5ho984 monitor_grafana replicated 1/1 grafana/grafana:latest s1yeap330m7e monitor_cadvisor global 3/3 google/cadvisor:latest

The list of running containers can be obtained from the command docker stack ps monitor , the output of which will be something like this:

 ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS n7kobaozqzj6 monitor_cadvisor.y78ac29r904m8uy6hxffb7uvn google/cadvisor:latest agent2 Running Running about a minute ago 1nsispop3hsu monitor_cadvisor.z52c9vloiutl5dbuj5lnykzvl google/cadvisor:latest agent1 Running Running about a minute ago 9n6djc80mamd monitor_cadvisor.qn82bfj5cpin2cpmx9qv1j56s google/cadvisor:latest manager Running Running about a minute ago hyr8piriwa0x monitor_grafana.1 grafana/grafana:latest manager Running Running about a minute ago zk7u8g73ko5w monitor_influx.1 influxdb:latest manager Running Running about a minute ago

Grafana Setup

After all the services are deployed, you can open the grafana. For this, the IP of any node of the swarm is suitable. We will specify the IP manager by running the following command:

 open http://`docker-machine ip manager`

The default login for grafana is admin username and admin password. InfluxDB should be added to grafana as a data source. On the home page should be a link to Create your first data source , click on it. If there is no link, select Add data source from the Data Sources menu, which will open the form for adding a new Data Source.

Adding Data Source to Grafana

The data source can be given any name. Check the default checkbox so that you don’t have to specify it in other forms in the future. Next, set the Type to InfluxDB , the URL is http: // influx: 8086 and Access - proxy . So we pointed to our InfluxDb container. In the Database field, enter cadvisor and click Save and Test — the message Data source is working should appear.

In the github repository of the project there is a dashboard.json file created for importing into Grafana. It describes the dashboard for monitoring systems and containers that run in a swarm. Now we are only importing this toolbar, and we'll talk about it in the next section. Hover over the Dashboards menu item and select Import Option . Click the Upload .json file button and select dashboard.json . Next, select the data source and click the Import button.

Grafana dashboard

Grafana dashboard

The toolbar imported into Grafana is designed to monitor hosts and containers of the swarm. You can drill down to the level of the host and the containers running on it. We need two variables, which require the functionality of working with templates to add to the Grafana toolbar. More information about working with templates in conjunction with InfluxDB is on this page . We have two variables: host to select the node and container to select the container. To see these variables, on the toolbar page, select Settings and click Templating .

The first variable, host , allows you to select a node and its metrics. When cAdvisor sends metrics to InfluxDB, it adds several tags to them that can be used for filtering. We have a tag called machine that contains the hostname of the cAdvisor instance. In this case, it will match the host ID in the swarm. To get the tag values, use the query show tag values with key = "machine" .

The second variable, container , allows you to drill down to the report level. We have a tag called container_name , which quite predictably contains the name of the container. We also need to filter the metrics by the value of the host tag. The query will look like this: show tag values with key = "container_name" WHERE machine =~ /^$host$/ . It will return to us a list of containers in which the host variable contains the name of the host of interest.

The container name will look something like this:

monitor_cadvisor.y78ac29r904m8uy6hxffb7uvn.3j231njh03spl0j8h67z069cy . However, we are only interested in its monitor_cadvisor-parts (up to the first point). If several instances of the same service are running, their data will need to be output in separate lines. To get a substring up to the first point, apply the regular expression / (/ /([^.]+)/ . /([^.]+)/ .

We set up variables, now we can use them in charts. Further, the conversation will focus on the graphics Memory , and with the rest you can work on the same principle. Data related to memory is in InfluxDB in the memory_usage row , so the query will start with SELECT "value" FROM "memory_usage" .

Now you need to add filters to the WHERE clause . The first condition will be if the machine is equal to the value of the host variable: "machine" =~ /^$host$/ . In the following condition, container_name must begin with the value of the variable container . Here we will use the “starts with” operator, since we have filtered the container variable to the first point: "container_name" =~ /^$container$*/ . The latter condition imposes a limit on the time of events in accordance with the time interval $ timeFilter selected in the grafana toolbar. The query now looks like this:

 SELECT "value" FROM "memory_usage" WHERE "container_name" =~ /^$container$*/ AND "machine" =~ /^$host$/ AND $timeFilter

Since we need separate lines for different hosts and containers, we need to group the data based on the values of the machine and container_name tags:

 SELECT "value" FROM "memory_usage" WHERE "container_name" =~ /^$container$*/ AND "machine" =~ /^$host$/ AND $timeFilter GROUP BY "machine", "container_name"

We also created an alias for this request: Memory {host: $tag_machine, container: $tag_container_name} . Here $ tag_machine will be replaced with the value of the machine tag, and tag_container_name will be replaced with the value of the container_name tag. The remaining graphs are configured in a similar way, only the names of the series change. You can create alerts for these metrics in Grafana. For more information about the alert system (Alerting), see here .

Conclusion

In this article, we created a scalable monitoring system for Docker Swarm, which automatically collects metrics from all the hosts and containers in the swarm. In the process, we learned about the popular open source tools: Grafana, InfluxDB and cAdvisor.

After completing the demonstration, the stack can be removed with the command:

 docker stack rm monitor

Unnecessary virtual machines are stopped and deleted by the commands:

 docker-machine stop manager agent1 agent2 docker-machine rm -f manager agent1 agent2

References:

Original: Monitoring Docker Swarm with cAdvisor, InfluxDB and Grafana .

Source: https://habr.com/ru/post/327670/

All Articles