📜 ⬆️ ⬇️

Introducing the new plugin for Grafana - Statusmap panel

Grafana has the ability to show status , Grafana has the ability to display data over time . However, paradoxically, Grafana has so far not had a convenient way to show status in time !

We present our plugin - Statusmap panel . It allows you to visually display the status of a set of objects for a selected period of time. As an example, demonstrating the work of the plug-in, imagine a variety of locations in which coffee is prepared for someone:


You can see how Nikki saves electricity, Gerry quickly replenishes water supplies, Valera's coffee machine often messes up, and Wi-Fi on Bifrost is clearly better than on the lunar station, where it seems the water is very tight.
')
Looks interesting? But let's start with how we all came to this.

What for?


For better visualization of data, we set ourselves a simple task: to display the status of a set of timeseries over a period of time. The set of objects means different timeseries: they can differ in the set of labels and the name. In this case, the timeseries values ​​should be convenient, i.e. without crutches, displayed in text and color.

Examples of the use of such visualization that are relevant to our business are the health of Kubernetes servers or platforms, the results of HTTP services checks. So in the company "Flant" a plug-in to Grafana was born called Statusmap. Reflecting on the great variety of possibilities of its use for other tasks, we quickly made a commitment to share the code with the world community. But really, nobody has solved this problem before us?

Why not ready?


The task is really popular, so we did not become pioneers in it. It all started with the fact that we had several dashboards with cool plugins of the Status Panel and Status Dot . These plug-ins allow you to display the current state of a set of objects, for example, hosts or sweets ... or coffee machines in different parts of the world.





Everything went well until we wanted to see the statuses of these objects in time. The first, simplest solution was to add a regular graph with a check mark stacked .



As planned, the Status Panel + stacked Graph would allow to see the state of the objects “for now” and the development of the situation over time. However, the stacked Graph is not very visual:


We tried to adapt the standard Heatmap - it did not work out: the plugin works with the Y axis only at the level of values ​​and does not know how to display labels there. Then we tried the following plugins for Grafana:


According to the results of all the research we have formulated the following requirements for the plugin:


Let me now make a small digression about Heatmap, Prometheus and discrete statuses ...

Some theory


Classic heatmap is a 3-D graph:


The standard plugin Heatmap displays the Z axis in color - for example, from white to red or over a gradient of green-yellow-red. This works very well for continuous values: response time, queue length, number of requests to the server ... In the case of discrete statuses for a set of objects, the following is needed: on the Y axis, display the names of the objects that we monitor, and on the Z axis, show the observed values ​​for each object currently statuses ... But stay! What does a lot of object statuses mean at the moment of time? I'll try to describe.

Those who use Prometheus with Grafana know about step or interval - setting on the Query tab. If you specify 1m , and you collect data at intervals of 5s , then when you perform a simple query, the coffee_maker_status Prometheus returns every 12th value, and you can’t see 11 values ​​on the graph. How to improve the situation?

The first thing that comes to mind is to use aggregation functions — for example, *_over_time(coffee_maker_status[1m]) . What exactly is the function to take? Time to figure out how status is presented in Prometheus metrics. In most cases, the status is indicated by a certain set of values. For example, for coffee_maker_status can be such status values:


Further, it would seem simple: take the number of zeros, ones, twos, etc. within one minute ... and we have excellent data to display on the chart! But Prometheus has its own view on this: coffee_maker_status[1m] is a range vector, and therefore expressions like max_over_time(coffee_maker_status[1m]==2) or count_values_over_time(coffee_maker_status[1m], 3) , which would be very suitable, are impossible.

Everything works fine if there are two values ​​in the metric: 0 (status was not observed) and 1 (status was observed) - and the status itself is stored in the label. Then you can make such requests: (max_over_time(coffee_maker_status{status="3"}[1m]) == 1) *3

What to do with a metric that has several values? The note “ Composing range vector functions in PromQL ” gave the idea to turn a metric with discrete values ​​into metrics with labels. This can be done using the following recording rule:

 - record: coffee_maker_status:discrete expr: | count_values("status", coffee_maker_status) 

This rule transforms the coffee_maker_status metric coffee_maker_status this: if the value is 3 , then Prometheus creates the metric coffee_maker_status:discrete{status="3"} with value 1. And so for each observed value.

Usually the statuses are defined in advance, so you can create a set of queries in order not to skip the necessary values. The legend of all requests must match so that you can group the values:



Now, if for a minute the coffee machine was turned off for 30 seconds (off status - 1 ), and the rest of the time was working (status ok - 0 ), then we will have information about shutdown, since The plugin will receive two values ​​with one legend at a time: 0 from query A and 1 from query B.

Good: we figured out how to aggregate data on discrete statuses and not lose information. It remains to figure out how to combine the data based on the legend and draw it on the panel.

Statusmap plugin


Of course, we didn’t immediately come to what was described above, but when it all came together, it became clear that, in fact, there was not enough rendering mechanism. Now there is such a mechanism - the Statusmap panel plugin , which can do the following:


The result is a very convenient representation of the status of several objects . And you can see both the current status (these are the rightmost baskets) and the status of the object in time.

Where to get?


The source code of the Grafana Statusmap plugin is distributed under the free MIT license (by analogy with other plugins for Grafana) . At the moment it is available in our GitHub . And we sincerely hope that in the near future it will get into the repository of the plug-ins of Grafana . UPDATED (03/10/18): The plugin has been accepted into the official Grafana directory.

And finally, an illustration of how a Statusmap helps to visualize data with status stats from the Kubernetes production cluster:



PS


Read also in our blog:

Source: https://habr.com/ru/post/423851/


All Articles