Ganglia and Nagios. Complementary Remote Monitoring

All system administrators sooner or later face the problem of monitoring working servers. To solve this problem, there is a zoo of various ways. Nagios is a very popular system because of its powerful alert mechanism. Also, systems are often used more concentrated on collecting the values of various parameters, and tracking these changes over time to collect statistics, such as: Cacti , Zabbix , Ganglia . And Ganglia is unfairly deprived of the attention of the Habrasoobshchestva. In this topic, I will try to correct this flaw, and show how flexible and useful this tool is.

So Ganglia is an open source monitoring system designed to work with thousands of nodes, originally developed at Berkeley University. Ganglia is easy to install and use. Its distinctive feature is high flexibility and scalability. Since setting up and installing ganglia is beyond the scope of this article, you can read about it here . Just add that, unlike cacti, ganglia continues to collect data about the system, even if there was a disconnection from the network. So when the server reappears on the network, it will transfer all the accumulated data and there will be no discontinuity in the graphs of the metrics.
About installing and configuring Nagios, as well as its integration with Ganglia, you can read here .
Using these materials, you can already configure Ganglia and teach Nagios to monitor it, but in real life we are faced with more complex situations of the form: you need to monitor the server on the internal LAN, send metrics via a secure channel and much more. To solve such things, there is nrpe (more details can be read here ).
Actually, this is where the essence of this article begins. Situation: a remote server on the local network, installed Ganglia, a server on the work network with Nagios installed. Task: track the remote system.
At first we will be engaged in installation of all necessary on a remote server.
First of all, install the plugin for ganglia check_ganglia_metric . We act according to the instructions, check the functionality of the plugin.
Then install the nagios-nrpe-server:

sudo aptitude install nagios-nrpe-server

then go to the config:

 sudo nano /etc/nrpe.cfg

fix the lines:

  : allowed_hosts = <your nagios adress>     : dont_blame_nrpe = 1      : command[some_name] = path args     check_ganglia_metric ommand[check_ganglia_metric] = check_ganglia_metric.py --gmetad_host=your_host --metric_host=metric_host_you_neded --metric_name=$ARG1$ --warning=$ARG2$ --critical=$ARG3$

Save and restart the plugin:

 sudo /etc/init.d/nagios-nrpe-server restart

Now we are going to tune up Nagios on our server (you have already configured it using the links above):
Add to services:

 define service{ use generic-service host_name your_remote_host service_description remote_ganglia_checking check_command check_nrpe!check_ganglia_metric!$ARG1$ $ARG2$ $ARG3$ }

We restart Nagios, and we see that it set the Warning state for our metric, and also says that it cannot recognize the answer. Well then - we get the file. ;)
We need a script that starts check_ganglia_metric.py.
How to write plugins for Nagios is here: Write your own plug-in for nagios Joka
Here is the actual code itself, written in Python:

Plugin source code

 !/usr/bin/python2.6 # -*- coding: utf-8 -*- import sys import subprocess import shlex if len(sys.argv) < 5: print("wrong config data") sys.exit(3) argGmetadHost = str(sys.argv[1]) argMetricHost = str(sys.argv[2]) argMetricName = str(sys.argv[3]) argWarning = str(sys.argv[4]) argCritical = str(sys.argv[5]) command_line = "".join(['sudo check_ganglia_metric.py --gmetad_host=', argGmetadHost, ' --metric_host=', argMetricHost, ' --metric_name=', argMetricName, ' --warning=', argWarning, ' --critical=', argCritical]) args = shlex.split(command_line) p = subprocess.call(args)

The check_ganglia_metric.py script at runtime creates the file check_ganglia_metric.cache. When started with Nagios user rights, it tries to create this file in a directory owned by root.
There is a problem that to start check_ganglia_metric.py by the Nagios user he needs root rights - this is not good at all. But we can give him the opportunity to run only this script:

 sudo nano /etc/sudousers

  nagios ALL=(ALL) NOPASSWD: /usr/local/bin/check_ganglia_metric.py

Rule the nrpe config again:
We add our plugin to those that are allowed to be executed:

 ommand[check_ganglia] =/usr/lib/nagios/plugins/ganglia_support.py $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$

Save, restart.
Again rule the Nagios configuration:

 define service{ use generic-service host_name your_remote_host service_description remote_ganglia_checking check_command check_nrpe!check_ganglia!$ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ }

Save, restart. Everything is now working.
All the paths and commands were executed under Debian.
I hope this will be useful and will save time and coffee when setting up monitoring systems on combat servers.

Screenshots

Ganglia:

Nagios:

Source: https://habr.com/ru/post/166171/

All Articles

Ganglia and Nagios. Complementary Remote Monitoring

More articles: