Grafana has the ability to
show status , Grafana has the ability to display
data over time . However, paradoxically, Grafana has so far not had a convenient way to show
status in time !
We present our plugin -
Statusmap panel . It allows you to visually display the status of a set of objects for a selected period of time. As an example, demonstrating the work of the plug-in, imagine a variety of locations in which coffee is prepared for someone:
You can see how Nikki saves electricity, Gerry quickly replenishes water supplies, Valera's coffee machine often messes up, and Wi-Fi on Bifrost is clearly better than on the lunar station, where it seems the water is very tight.')
Looks interesting? But let's start with how we all came to this.
What for?
For better visualization of data, we set ourselves a simple task: to display the status of a set of timeseries over a period of time. The set of objects means different timeseries: they can differ in the set of labels and the name. In this case, the timeseries values should be convenient, i.e. without crutches, displayed in text and color.
Examples of the use of such visualization that are relevant to our business are the health of Kubernetes servers or platforms, the results of HTTP services checks. So in the company "Flant" a plug-in to Grafana was born called Statusmap. Reflecting on the great variety of possibilities of its use for other tasks, we quickly made a commitment to share the code with the world community. But really, nobody has solved this problem before us?
Why not ready?
The task is really popular, so we did not become pioneers in it. It all started with the fact that we had several dashboards with cool plugins of the
Status Panel and
Status Dot . These plug-ins allow you to display the current state of a set of objects, for example, hosts or sweets ... or coffee machines in different parts of the world.


Everything went well until we wanted to see the statuses of these objects in time. The first, simplest solution was to add a regular graph with a check mark
stacked .

As planned, the Status Panel + stacked Graph would allow to see the state of the objects “for now” and the development of the situation over time. However, the stacked Graph is not very visual:
- the color indicates different timeseries, not the values that are displayed in color for the Status Dot or Status Panel. That is, the colors for the two graphs are not the same and this is confusing;
- if
null
appears among the values, the graphs fail.
We tried to adapt the standard
Heatmap - it did not work out: the plugin works with the Y axis only at the level of values and does not know how to display labels there. Then we tried the following plugins for Grafana:
- Carpet plot - groups values by day and by the selected fragment of the day;
- Discrete Panel is a good plugin, but we need to discretely show statuses over time;
- Status By Group Panel is a good improvement to the Status panel, which allows you to display many statuses, but still without the features we need.
According to the results of all the research we have formulated the following requirements for the plugin:
- selected clear line of graphics for each object;
- the name of the object is displayed along the Y axis and is set in the legend field;
- there can be several statuses on one object - in such cases the most significant will be displayed in color, and the rest will be shown in the tooltip;
- display baskets (buckets) with a width not less than the specified (5 px), since in single-pixel inconvenient to direct the mouse;
- manual color management - the ability to set the color of each numeric value from a discrete set.
Let me now make a small digression about Heatmap, Prometheus and discrete statuses ...
Some theory
Classic heatmap is a 3-D graph:
- X axis is delayed time
- Y-axis - possible values of a certain value,
- Z axis - the number of observed values at a given time.
The standard plugin Heatmap displays the Z axis in color - for example, from white to red or over a gradient of green-yellow-red. This works very well for continuous values: response time, queue length, number of requests to the server ... In the case of discrete statuses for a set of objects, the following is needed: on the Y axis, display the names of the objects that we monitor, and on the Z axis, show the observed values for each object currently statuses ... But stay! What does a lot of object statuses mean at the moment of time? I'll try to describe.
Those who use Prometheus with Grafana know about
step
or
interval
- setting on the
Query tab. If you specify
1m
, and you collect data at intervals of
5s
, then when you perform a simple query, the
coffee_maker_status
Prometheus returns every 12th value, and you can’t see 11 values on the graph. How to improve the situation?
The first thing that comes to mind is to use aggregation functions — for example,
*_over_time(coffee_maker_status[1m])
. What exactly is the function to take? Time to figure out how status is presented in Prometheus metrics. In most cases, the status is indicated by a certain set of values. For example, for
coffee_maker_status
can be such status values:
- 0 - ok,
- 1 - off,
- 2 - no beans,
- 3 - no water,
- 4 - fail.
Further, it would seem simple: take the number of zeros, ones, twos, etc. within one minute ... and we have excellent data to display on the chart! But Prometheus has its own view on this:
coffee_maker_status[1m]
is a range vector, and therefore expressions like
max_over_time(coffee_maker_status[1m]==2)
or
count_values_over_time(coffee_maker_status[1m], 3)
, which would be very suitable, are impossible.
Everything works fine if there are two values in the metric:
0
(status was not observed) and
1
(status was observed) - and the status itself is stored in the label. Then you can make such requests:
(max_over_time(coffee_maker_status{status="3"}[1m]) == 1) *3
What to do with a metric that has several values? The note “
Composing range vector functions in PromQL ” gave the idea to turn a metric with discrete values into metrics with labels. This can be done using the following recording rule:
- record: coffee_maker_status:discrete expr: | count_values("status", coffee_maker_status)
This rule transforms the
coffee_maker_status
metric
coffee_maker_status
this: if the value is
3
, then Prometheus creates the metric
coffee_maker_status:discrete{status="3"}
with value 1. And so for each observed value.
Usually the statuses are defined in advance, so you can create a set of queries in order not to skip the necessary values. The legend of all requests must match so that you can group the values:

Now, if for a minute the coffee machine was turned off for 30 seconds (off status -
1
), and the rest of the time was working (status ok -
0
), then we will have information about shutdown, since The plugin will receive two values with one legend at a time:
0
from query A and
1
from query B.
Good: we figured out how to aggregate data on discrete statuses and not lose information. It remains to figure out how to combine the data based on the legend and draw it on the panel.
Statusmap plugin
Of course, we didn’t immediately come to what was described above, but when it all came together, it became clear that, in fact, there was not enough rendering mechanism. Now there is such a mechanism - the
Statusmap panel plugin , which can do the following:
- the values at each point in time are grouped into baskets by matching the text of the legends specified in Query ;
- each text of the legend corresponds to its own line on the graph and the text is displayed as a label on the Y axis, and empty values are displayed with a space or as
0
:

- for any value you can set the exact color of the basket:

- if there are several values in the basket, the color will be taken for the value that is defined above on the Colors tab, and when you hover over the basket, all values that fall into it are displayed:

- The plugin is able to form the
interval
for the request to Prometheus so that the baskets do not turn into pixel lines.
The result is a very convenient representation of the
status of several objects . And you can see both the current status (these are the rightmost baskets) and the status of the object in time.
Where to get?
The source code of the Grafana Statusmap plugin is distributed under the free MIT license
(by analogy with other plugins for Grafana) . At the moment it is available in
our GitHub .
And we sincerely hope that in the near future it will get into the repository of the plug-ins of Grafana . UPDATED (03/10/18): The plugin has been
accepted into the official Grafana directory.
And finally, an illustration of how a Statusmap helps to visualize data with status stats from the Kubernetes production cluster:

PS
Read also in our blog: