Charts in Nagios - why and how

Introduction

Choosing Nagios as a monitoring system, we get a system for tracking the environmental characteristics and the history of state changes. And, if you look at the text of the probe message at the moment and during past state changes, it is still possible, then data on the periods between changes are missing as a class. With any more or less active work with monitoring, there is a desire to view the history of changes in quantitative characteristics, too, which the standard Nagios cannot provide. You can, of course, duplicate all the necessary probes on the same Cacti, but this is at least an additional overhead of both the configuration and the machine with the poller. Fortunately, Nagios is able to pass it on to the shoulders of the user, providing a mechanism for so-called “performance data”. This article discusses one of the solutions for the collection and visualization of the data - Pnp4Nagios.

Performance data

The probe status, supplemented with performance data, looks like this:
perfdata example

The delivery method and data requirements are described in detail in the documentation , but in this case it is important to us that Nagios can both take data from the plug-in's output and give it to some external utilities, because the presence of an extra line in the web interface makes some variety , however, it brings a little good :).

External utilities

Utilities for processing performance data [and turning them into graphics] listed only on exchange.nagios.org/directory/Addons/Graphing-and-Trending about 20. Although they seem to be doing the same thing, they are still different. When I chose the tool for myself, I was guided by the following list of characteristics:
')

Interface. Aside from the jokes - look at what's on the list of Graphing-and-Trending add-ons - almost everyone looks pale, if not pathetic. After Cacti, I wanted at least the opportunity to create a Zoom graphic. Well, templates for displaying graphs.
The way of storing statistics could be in the first place if it were not for the almost universal hobby of RRD, www.opmon.org/documentation stands out among this, stores data in the Mysql database, then I didn’t look at it, and Zabbix with it was enough storing data in the database.
Configuration management - after XML, Cacti wanted something human.

Pnp4nagios

Consider what Pnp4nagios offers us.

Interface. A simple comparison of Pnp4Nagios screenshots with NagiosGrapher's, clearly not in favor of the latter.

pnp4nagios

NagiosGraph

In particular, in Pnp4Nagios, you can make an approximation, view graphs for specified periods, including the choice of the date interval in the calendar; export the chart as a pdf file, the file can be sent accordingly without being bothered by the insertion of pictures into the letter; add a chart to the “basket” for a quick transition later; go to the list of alerts in Nagios for the selected period of time. In addition to this, there is a means of grouping charts from different hosts - the so-called “pages”. Localization is supported, which is not a plus :)
The way statistics are stored.
No particular attention should be paid to, except that Pnp4Nagios supports RRDCached - useful in large installations. The benefits of RRDCached and iohell are detailed here.
Configuration.
Configuring Pnp4Nagios, of course, is not as flexible as that of the same Zabbix, but OOM-Killer does not come;)
The most noticeable part, the appearance of the graphs, is determined by the templates. Included is a number of standard templates. For your own checks, you can make a separate template, if the standard is not satisfied. Templates are php scripts executed through include and in fact should form a command line for rrdtool. During template processing, internal data exported by Nagios are available, for example, the time when the host was last alive ($ LASTHOSTUP $), which allows you to display graphs of almost any informativeness. Patterns are determined from the command name, and the selection of a significant part is supported, i.e. If you have checks with check_nrpe, you can configure the template selection so that check_nrpe is dropped.
Parameters set when creating rrd files can also be changed in templates. Supported options include data type (datasource) - GAUGE, COUNTER, DERIVE; use restrictions on the minimum and / or maximum value that is useful for eliminating the "protuberances" for COUNTER / DERIVE counters in the event of a server reboot / restart of the daemon.

Data processing

3 types of performance data processing are supported:

Synchronous mode. The data processing command process_perfdata.pl is called for each check. The easiest to configure (edit 4 lines :) but the “worst” is that until this script is completed, Nagios will be blocked, which with a large number of checks may become noticeable due to the disk subsystem. For example, on an unloaded host, you can observe:

2009-12-23 20:41:54 [28100] [2] RRDs::update /var/lib/pnp4nagios/nginx.local/load_average.rrd 1261590114:0.00:0.00:0.00 2009-12-23 20:41:54 [28100] [2] /var/lib/pnp4nagios/nginx.local/load_average.rrd updated 2009-12-23 20:41:54 [28100] [1] PNP exiting (runtime 0.003274s) ...

At first glance, 3 milliseconds is very small, but keep in mind that file I / O goes to vmcache, which is enough to hold rrd-cards because there are few of them, 28 pieces.
Mode "en masse". Nagios writes data to a file and with a certain periodicity it calls the process_perfdata.pl script, which reads and processes the entire file, which is much faster than the synchronous mode, but it is nevertheless possible to block Nagios for a slightly longer period. Example:

2009-12-23 20:28:11 [7299] [1] 83 Lines processed 2009-12-23 20:28:11 [7299] [1] /var/spool/pnp4nagios/service-perfdata-PID-7299 deleted 2009-12-23 20:28:11 [7299] [1] PNP exiting (runtime 0.118031s) ...
One-tenth of a second is more noticeable, but still fast, as in this case, vmcache is enough (491 rrd files).
“Skopie”, but a separate daemon is almost like the second method, but instead of calling our handler, Nagios will do the file transfer and calm down, and the separately running daemon checks the necessary folder with a certain frequency and starts process_perfdata.pl if files are found. Since the transfer of files within the same file system is almost instantaneous, this method does not block Nagios and can be recommended for loaded environments.

Integration with Nagios web interface

Nagios's web interface can be integrated using action_url for hosts and services, which allows a) to go to viewing the graphs in one click 2) you can watch the graph previews simply by pointing the pointer

Minuses

Is the first minus standard for everyone? systems based on RRD - there is no flexible management of a set of graphs and templates for their display. Edit templates in php, though not difficult, but there are ways to spend time with greater benefits.
Nagios plug-ins should support output of performance data, which is far from everyone. For plug-ins that check "counters" (for example, traffic on the interface), it is highly desirable to support the output of the minimum / maximum allowable value in the performance data line, otherwise the rrd base will be created without restrictions and there will be "prominences" when the counter is reset.
Not that direct, but a minus, partially arising from the first - Nagios / Pnp4Nagios is not a replacement for the system statistics collection complex. That I can claim this role, I will write separately.

References:

Project site - contains fairly clear documentation and installation and configuration instructions. When installing from a package, it’s enough to follow the instructions from /usr/share/doc/pnp4nagios/README.Debian.gz.
Access to source code
The Debian pnp4nagios package is available in my repository at http://repo.coolcold.org .
IRC: # pnp4nagios @ freenode

Source: https://habr.com/ru/post/79354/

All Articles

pnp4nagios
NagiosGraph