Collectd - Monitor the system with minimal cost. Setting and using notifications

What it is?

Collectd is a small daemon that collects every 10 seconds.
statistics on the use of system resources. There is the possibility of collecting
statistics for several hosts and sending it to a server that
engaged in drawing beautiful graphs.

The main difference of this collector is that it works on the principle of push , not poll / pull . Those. he “hangs” and listens, and the servers send him the statistics themselves.

What do we do?

What I want to describe in this post:

Installation
Setup is shared.
Customization of individual plugins.
Configuring the slave servers that will send us their work statistics to the main server.
Set up email notifications.

Installation

We install it as usual through the favorite batch installer emerge / yum / apt-get or cho-there-still-exists .
For debian . There is no collectd in standard ports, for this we need to connect backports .
This is done quite simply:
Add a line

deb http://backports.debian.org/debian-backports squeeze-backports main

in your sources.list ( or create a new file with this line in /etc/apt/sources.list.d/ )
Then run apt-get update

Next, to install the package from the backports, we write the command

 apt-get -t squeeze-backports install "package"

well or through aptitude

 aptitude -t squeeze-backports install "package"

In our case, it will look like

 apt-get -t squeeze-backports install "collectd"

There is a little nuance in Ghent. Firstly, it is masked ~ x86, and secondly, only a few plugins are installed. To specify which plugins to install, you must specify them either in package.use (of type collectd_plugin_memory ) or in make.conf in the variable COLLECTD_PLUGINS = "";
I have installed these:

 COLLECTD_PLUGINS="apache cpu df disk interface load memory network ntpd processes notify_email ping logfile syslog rrdtool swap hddtemp exec filecount java sensors target_notification target_set target_replace"

Be careful, depending on the plug-ins, it can pull a lot of things behind it;), so choose what you need.
')
The established versions : on gentoo - 5.1.1 , on debian , after some dances with a tambourine - 4.1.1 (but it will have to be manually updated to 5.x, why - read below) , on centos6 - 5.1.0 .

JFYI Why you need to upgrade: because the output of data in rrd is different in these versions, therefore either write a crutch for conversion or write 2 scripts to generate graphs on the front-end face. Yes, and because of changes in schedules, you will have to take into account the client version on the host and write rules for notifications for it separately.

In Debian and Centos, I have all the plugins installed. Well, because of the finished package is put :)

Customization

We go further. I didn’t like the config format at all, for a long time where to find something, so for myself I cut it into the parts I needed, since it’s possible to connect other configs from the config, as they say, inline :)
Again, in the Ghent, the entire config in one file, which is located in /etc/collectd.conf . In Debian, it is placed in the beautiful path /etc/collectd/collectd.conf , as well as some parts of the filters and thresholds configurations are rendered into separate files, which is good news. In general, I made about the same configuration on my Ghent, changing it a little bit. In particular, the connection of the plugins I needed was brought to a separate directory and each plug-in (or rather its configuration) was also in a separate file. Here is how it looks like:

 # Config file for collectd(1). # # Some plugins need additional configuration and are disabled by default. # Please read collectd.conf(5) for details. # # You should also read /usr/share/doc/collectd-core/README.Debian.plugins # before enabling any more plugins. Hostname "gen-collectd-master.local" FQDNLookup true BaseDir "/data/collectd" #PluginDir "/usr/lib/collectd" #TypesDB "/usr/share/collectd/types.db" "/etc/collectd/my_types.db" #Interval 10 #Timeout 2 #ReadThreads 5 LoadPlugin logfile LoadPlugin syslog <Plugin logfile> LogLevel "info" File "/data/collectd/collectd.log" Timestamp true PrintSeverity true </Plugin> <Plugin syslog> LogLevel info </Plugin> LoadPlugin network <Plugin network> Listen "192.168.56.130" "8085" </Plugin> Include "/etc/collectd/inst/*.active" Include "/etc/collectd/conf/*.conf" Include "/etc/collectd/filters.conf" Include "/etc/collectd/thresholds.conf"

This is the main configuration file, if you compare it with the default file, you will notice that there are not all plugins in my file, only those that I consider the main configuration. The remaining files are connected from the inst and conf directories.
JFYI Also pay attention to the FQDNLookup parameter true - if you have something in your hostname , it should be resolved! Otherwise, it will crash with an error, another solution is to set this parameter to false.

The inst directory contains plugin configuration files:

 gen-collectd-master collectd # ls -la /etc/collectd/inst/ total 32 drwxr-xr-x 2 root root 4096 Nov 26 20:57 . drwxr-xr-x 4 root root 4096 Nov 26 21:00 .. -rw-r--r-- 1 root root 15 Nov 26 13:54 cpu.active -rw-r--r-- 1 root root 125 Nov 26 13:54 if.active -rw-r--r-- 1 root root 16 Nov 26 13:54 load.active -rw-r--r-- 1 root root 18 Nov 26 13:54 memory.active -rw-r--r-- 1 root root 122 Nov 26 18:25 mounts.active -rw-r--r-- 1 root root 133 Nov 26 20:57 ping-hosts.active

As you can see from the config, I only connect files with the "extension" active

JFYI All plugin parameters can be found on the collectd.conf documentation page .

Further, the conf directory contains 2 files, one to configure the notify_email plugin, the second to configure rrdtool

 gen-collectd-master collectd # ls -la /etc/collectd/conf/ total 16 drwxr-xr-x 2 root root 4096 Nov 26 20:30 . drwxr-xr-x 4 root root 4096 Nov 26 21:00 .. -rw-r--r-- 1 root root 425 Nov 26 20:30 mail.conf -rw-r--r-- 1 root root 83 Nov 26 13:54 rrdtool.conf

In general, they can be returned calmly to collectd.conf , but for some reason at that time I wanted to do just that :)

The contents of the conf / rrdtool.conf file

 LoadPlugin rrdtool <Plugin rrdtool> DataDir "/data/collectd/rrd" </Plugin>

As you can see, here I am loading the plugin and setting the parameters for it.

Conf / mail.conf file

 LoadPlugin notify_email <Plugin notify_email> SMTPServer "stmp.mail.ru" SMTPPort 25 SMTPUser "collectd@mail.ru" SMTPPassword "my-super-password-for-mail" From "collectd@mail.ru" # # <WARNING/FAILURE/OK> on <hostname>. # # Beware! Do not use not more than two placeholders (%)! Subject "[collectd] %s on %s!" Recipient "recipient@mail.ru" </Plugin>

We need this plugin when we set up notifications.

JFYI can write your own notification handler. To do this, you need to enable the exec plugin and set up a script that will be launched when generating the notification. This is done like this:

 LoadPlugin exec <Plugin exec>    NotificationExec    thunder "/home/thunder/ttest.sh" "test1" </Plugin>

The general specification for this command is:

 NotificationExec <> "<-->" ["1"] ["2"]  ..

I have written the following in the script

 #!/bin/bash cat >> /home/thunder/ttest.log

In the log when notifications will be created something like

 Severity: WARNING Time: 1354181979.770 Host: jen-master-local Plugin: cpu PluginInstance: 0 Type: cpu TypeInstance: user DataSource: value CurrentValue: 9.989738e+01 WarningMin: nan WarningMax: 8.500000e+01 FailureMin: nan FailureMax: nan Host jen-master-local, plugin cpu (instance 0) type cpu (instance user): Data source "value" is currently 99.897375. That is above the warning threshold of 85.000000.

As we see all the data we have here, it will not be difficult to parse it and it is also not difficult to write your own notifier.

Let's go back to the main collectd.conf file .
I will not explain about syslog / logfile , so everything is clear here, hostname too.
Network plugin - more specifically, you can read about the plugin here , in particular, authorization can be specified there. At my place I will not consider it for now, how everyone will decide for himself how to do it for himself :)
This plugin is used to communicate between collectd servers.
To configure the current server as a server for collecting statistics, you must set the Listen parameter “192.168.56.130” “8085”, where 192.168.56.130 is the ip address where the daemon will hang and listen to incoming data from other servers. 8085 - the port on which it will hang.
To configure the client, instead of Listen you need to specify Server "192.168.56.130" "8085" , respectively 192.168.56.130 - ip address where to send the data. 8085 - the port to which to send data.

JFYI Port can be omitted, the default port 25826 will be used, just remember that it will work via the UDP protocol, so keep in mind if you have a firewall somewhere .

Configuring plug-ins here and there is no different.

All that you have configured to monitor on the " Client " will be sent to the " Server ".

Mail notifications

We turn to the most delicious. The only examples of setting the notification of some plug-ins are only in the config thresholds.conf .
The main download of the plugin and example:

 LoadPlugin "threshold" <Plugin "threshold"> <Type "foo"> WarningMin 0.00 WarningMax 1000.00 FailureMin 0.00 FailureMax 1200.00 Invert false Instance "bar" </Type> </Plugin>

A brief explanation of how this works. Threshold is a regular plugin, so it is loaded as a plugin. All parameters are set inside the <Plugin “threshold”> container. Inside it can be set containers in the following sequence - " Host ", " Plugin ", " Type ". Those. Inside the Host container there can be a Plugin container, inside of which there can be a Type container. The Host block is optional, with it you can bind notifications for a specific host. Also, all values should be set only inside the Type block, the only value that can be set outside the Type block is Instance.
If several blocks are applied to one value, then the most accurate of the blocks will be used. So You can specify some kind of standard block for the plugin, and then, for example, for a specific host, override it with other parameters. So, let's proceed directly to the configuration of notifications.

Cpu plugin

 <Type "cpu">     Instance "user"     WarningMax 85     Hits 1 </Type>

Here you can not write in front of the Type Block Plugin block. We indicate that you need to monitor the value of user (user processes) and if it reaches the value of 85 , then send a warning. Hits is the number of hits in this value for one Interval (see the configuration of the main config), in our case it is equal to 1 , i.e. if within 10 seconds the value is > = 85 , then a notification will be generated. Here you can put the value more, for example, 6 , that is, if the value is for one minute, then there is something to worry about.

Ping plugin

 <Plugin "ping">    <Type "ping_droprate">        FailureMax 0.9    </Type> </Plugin>

As you can see here, we set for the plugin a ping type equal to ping_droprate. This table contains the value of either 0 or 1 . Accordingly, we set generation of the type Failure if the value exceeds 0.9 . If you specify 1 , it will not work :)

Memory plugin

 <Plugin "memory">    <Type "memory">        Instance "free"        WarningMin 25000000    </Type> </Plugin>

We choose instance free , because we monitor free memory, the less free the worse, we set WarningMin . If the value reaches or becomes less than the specified value, then a notification will be generated.

Now the most interesting thing is that this is not in the documentation and it turned out to be hard to find an example, so I had to experiment.
We make notifications on a place on a disk

Df plugin

 <Plugin "df"> Instance "root" <Type "df_complex-used"> # DataSource "value" WarningMax 4025360000 FailureMax 6025360000 Percentage false </Type> </Plugin>

So, in version 5.x, the logic for creating tables for the df plugin has changed, so the reference to the tables has become different.
Instance - specify the graph for which section to apply
Type - df_complex-used - df_complex is always and necessarily, after the dash in our case the search for data on the used place is set.
Now the DataSource can be omitted, since the table has only one value field.
WarningMax / FailureMax - unfortunately for some unknown reason it is impossible to use percentage data for this plugin, so for each host you will have to fill this plugin with specific values. Also below, we clearly state that we don’t use percentages. The question about this appeared in 2011 and in version 4.9.1, but there is still no answer to it.

That's all, the main plugins are configured, notifications for them too.

Suggestions, suggestions, questions are welcome. I will answer if possible.

Source: https://habr.com/ru/post/162087/

All Articles