📜 ⬆️ ⬇️

Visual monitoring of the server infrastructure based on Nagios + Grafana

nagios + grafana

We in the Atlas love when everything is under control. This also applies to the entire server infrastructure, which, over the years, has evolved into a living organism from numerous virtual machines, services and services. There was a need to monitor the vital aspects of the IT component of our activity: monitor the combat server, track changes in system resources on virtual databases, monitor the progress of business processes, and so on. The question arose - how to achieve this and most importantly with what tools? Began to look for some ready-made solutions. We tried a bunch of paid / free services that would supposedly provide us with the "most valuable" information about the state of our system. But, ultimately, it all came down to some incomprehensible diagrams, diagrams and figures, which, in fact, had no value for us.


So we came to understand that you need to collect something yourself. We decided to take the most flexible and advanced system as a basis, which can be configured to monitor anything and how you like - Nagios. Set up, set, works - cool! It’s a pity that the interface of this miracle was stuck somewhere in the mid-90s, but we wanted the visual component to be on par.


A short search showed that Grafana is the leader among the solutions for creating beautiful dashboards. So we decided to display all of our monitoring from Nagios on monitors in the form of beautiful graphs in Grafana. The question remained only - how to make friends with each other?


common goal


Monitor the entire infrastructure through Nagios , set up alerts on problems in the system via Slack , connect the output of system performance data to the graphical shell of Grafana for real-time monitoring.


image

Stack technology



Short description


Nagios collects statistical data from various virtual machines of the whole system. We need to save this data to the database in a certain format and at a certain interval so that Grafana can output it. Grafana works with several formats, but the most convenient for us is Graphite . Graphite is essentially the same graphical shell, but its interface was apparently made by the same people as the Nagios interface. Under the hood, he has a database that stores a stat. data - Whisper and interlayer for processing this data - Carbon . Nagios does not know how to communicate with the Graphite directly, so smart people created an add. A plugin that takes the current readings from Nagios and sends them to Carbon - this plugin is called Graphios . Thus, our task is to tie together 6 different technologies. Go!


Immediately a small disclaimer:


  1. The current configuration was built for Debian , but the overall build logic is the same for the whole family.
  2. In this article, I will not talk about installing and configuring Nagios itself, since the network is full of manuals (how to properly configure Nagios - I will write separately). It will be about a bundle of a series of technologies among themselves - no lyrics, pure cardcore.

Carbon


Install and configure Carbon:


apt-get install graphite-carbon sudo nano /etc/default/graphite-carbon 

Set the parameter value to true:


 CARBON_CACHE_ENABLED=true 

Save, exit.


Editing schema file

 sudo nano /etc/carbon/storage-schemas.conf 

This file contains directives that specify the storage options for the stat. Data: how often it is stored and how long it is stored. For ourselves, we use approximately the following directive:


 [atlas] pattern = .* retentions = 60s:1y 

This means that the data will be sent to the database every minute and stored for a year. Adjust the values ​​to fit your needs.


Also, it is important to understand that the frequency of storing data in the database should not exceed the frequency of data output by Nagios itself - otherwise we will add duplicate values ​​to the database. Out of the box, Nagios listens to all services and hosts once every 10 minutes, so if you want to achieve maximum real-time, you also need to change the processing intervals on the Nagios side.


We connect the last config and start Carbon:


 sudo cp /usr/share/doc/graphite-carbon/examples/storage-aggregation.conf.example /etc/carbon/storage-aggregation.conf sudo service carbon-cache start 

Database


We are preparing a base for all further programs. We prefer PostgreSQL, but Graphite supports different databases.


 apt-get install postgresql libpq-dev python-psycopg2 sudo -u postgres psql 

We configure the new user and base:


 CREATE USER graphite WITH PASSWORD 'password'; CREATE DATABASE graphite WITH OWNER graphite; \q 

The database password must be saved - it will still be useful to us.


Graphios


Install Python , Django, and then graphios itself:


 apt-get install -y python2.6 python-pip python-cairo python-django python-django-tagging pip install graphios 

Editing the /etc/graphios/graphios.cfg file:


 debug = False enable_carbon = True 

Create a folder for storing statistical uploads:


 mkdir /var/spool/nagios/graphios/ chown -R nagios:nagios /var/spool/nagios 

Testing:

Add a test string to the definition of the Nagios service:


 define service { use generic-service host_name DB service_description PING check_command check_ping!100.0,20%!500.0,60% _graphiteprefix monitoring.nagios01.pingto } 

Call Graphios in test mode:


 /usr/local/bin/graphios.py --spool-directory /var/spool/nagios/graphios --log-file /tmp/graphios.log --backend carbon --server 127.0.0.1:2004 --test 

At the output should appear records like:


 monitoring.nagios01.pingto.DB.rta 0.248000 1461427743 monitoring.nagios01.pingto.DB.pl 0 1461427743 

If everything is OK, run the graphios daemon:


 service graphios start 

Graphite


Graphite must be set strictly after installing Carbon, otherwise Nagios / Graphios will not be able to send data correctly.


Install the main dependencies

 apt-get install -y libapache2-mod-wsgi python-twisted python-memcache python-pysqlite2 python-simplejson pip install whisper pip install carbon pip install graphite-web pip install pytz pip install pyparsing wget https://raw.github.com/tmm1/graphite/master/examples/example-graphite-vhost.conf -O /etc/apache2/sites-available/graphite 

Next, you need to slightly fix the new Apache2 config:


 nano /etc/apache2/sites-available/graphite 

Change "WSGISocketPrefix / etc / httpd / wsgi /" to:


 WSGISocketPrefix /var/run/apache2/wsgi 

Add another alias after the line "Alias ​​/ content / / opt / graphite / webapp / content /":


 Alias /static/ "/opt/graphite/static/" 

Save, exit.


Configuring local_settings.py

 cd /opt/graphite/webapp/graphite cp local_settings.py.example local_settings.py nano local_settings.py 

In the file that opens, we include the lines and enter the values:


SECRET_KEY needs to be invented, and the values ​​for the DATABASE directive are taken from the previously created base.
The WHISPER_DIR value can be found via the " locate whisper " command.


Values ​​of the CARBONLINK_HOSTS directive must be entered in accordance with the output of the " lsof -i -P | grep carbon " command .


 SECRET_KEY = 'some_secret_key' TIME_ZONE = 'Europe/Moscow' WHISPER_DIR = '/var/lib/graphite/whisper' USE_REMOTE_USER_AUTHENTICATION = True DATABASES = { 'default': { 'NAME': 'graphite', 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'USER': 'graphite', 'PASSWORD': 'password', 'HOST': '127.0.0.1', 'PORT': '' } } CARBONLINK_HOSTS = ["127.0.0.1:2003","127.0.0.1:2004","127.0.0.1:7002"] 

Customize Graphite

During the configuration process, the system will ask for a super user. It is necessary to put down new values ​​and remember them.


 cd /opt/graphite/conf/ cp graphite.wsgi.example graphite.wsgi cd /opt/graphite/webapp/graphite python manage.py syncdb chown -R www-data:www-data /opt/graphite/storage/ a2enmod wsgi a2ensite graphite python manage.py collectstatic --pythonpath=/opt/graphite/webapp chown -R www-data:www-data /opt/graphite/static /etc/init.d/apache2 restart 

Grafana


The simplest part - if Graphite / Carbon is configured correctly - it will be enough to plug in a new resource like Graphite and set up a dashboard for data output - Grafana will do the rest!


 wget https://grafanarel.s3.amazonaws.com/builds/grafana_3.0.0-beta51460725904_amd64.deb sudo apt-get install -y adduser libfontconfig sudo dpkg -i grafana_3.0.0-beta51460725904_amd64.deb sudo service grafana-server start sudo update-rc.d grafana-server defaults 95 10 

The interface will be available on port 3000 . Default login / password - admin .


Bonus: Slack Nagios App


As an alternative to direct visualization and passive letters, let's also include the output of alerts from Nagios to Slack.


1) Create a new channel in Slack, for example #alerts


2) Go to the Slack application page


image

3) Find the application Nagios


image

4) Follow the instructions for loading the config file.


 wget https://raw.github.com/tinyspeck/services-examples/master/nagios.pl cp nagios.pl /usr/local/bin/slack_nagios.pl chmod 755 /usr/local/bin/slack_nagios.pl 

5) Copy the token and the Slack domain and paste them into the new config /usr/local/bin/slack_nagios.pl


image

6) Copy Nagios directives and paste into appropriate places (commands and new contact)


 define contactgroup { contactgroup_name admins alias Nagios Administrators members root,slack } define contact { contact_name slack alias Slack service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,r service_notification_commands notify-service-by-slack host_notification_commands notify-host-by-slack } define command { command_name notify-service-by-slack command_line /usr/local/bin/slack_nagios.pl -field slack_channel=#alerts -field HOSTALIAS="$HOSTADDRESS$" -field SERVICEDESC="$SERVICEDESC$" -field SERVICESTATE="$SERVICESTATE$" -field SERVICEOUTPUT="$SERVICEOUTPUT$ ($LONGDATETIME$)" -field NOTIFICATIONTYPE="$NOTIFICATIONTYPE$" } define command { command_name notify-host-by-slack command_line /usr/local/bin/slack_nagios.pl -field slack_channel=#alerts -field HOSTALIAS="$HOSTADDRESS$" -field HOSTSTATE="$HOSTSTATE$" -field HOSTOUTPUT="$HOSTOUTPUT$ ($LONGDATETIME$)" -field NOTIFICATIONTYPE="$NOTIFICATIONTYPE$" } 

7) Save, reload Nagios, check.


Useful materials:


» How To Configure StatsD to Collect Arbitrary Stats for Graphite on Ubuntu 14.04
» How To Install and Use Graphic on an Ubuntu 14.04 Server
» Https://github.com/shawn-sterling/graphios
" Http://grafana.org/features/#graphite


')

Source: https://habr.com/ru/post/308750/


All Articles