What it is?
Collectd is a small daemon that collects every 10 seconds.
statistics on the use of system resources. There is the possibility of collecting
statistics for several hosts and sending it to a server that
engaged in drawing beautiful graphs.
The main difference of this collector is that it works on the principle of
push , not
poll / pull . Those. he “hangs” and listens, and the servers send him the statistics themselves.
What do we do?
What I want to describe in this post:
- Installation
- Setup is shared.
- Customization of individual plugins.
- Configuring the slave servers that will send us their work statistics to the main server.
- Set up email notifications.
Installation
We install it as usual through the favorite batch installer
emerge / yum / apt-get or
cho-there-still-exists .
For
debian . There is no collectd in standard ports, for this we need to connect
backports .
This is done quite simply:
Add a line
deb http://backports.debian.org/debian-backports squeeze-backports main
in your
sources.list (
or create a new file with this line in /etc/apt/sources.list.d/ )
Then run
apt-get updateNext, to install the package from the backports, we write the command
apt-get -t squeeze-backports install "package"
well or through aptitude
aptitude -t squeeze-backports install "package"
In our case, it will look like
apt-get -t squeeze-backports install "collectd"
There is a little nuance in Ghent. Firstly, it is masked ~ x86, and secondly, only a few plugins are installed. To specify which plugins to install, you must specify them either in
package.use (of
type collectd_plugin_memory ) or in
make.conf in the variable
COLLECTD_PLUGINS = "";I have installed these:
COLLECTD_PLUGINS="apache cpu df disk interface load memory network ntpd processes notify_email ping logfile syslog rrdtool swap hddtemp exec filecount java sensors target_notification target_set target_replace"
Be careful, depending on the plug-ins, it can pull a lot of things behind it;), so choose what you need.')
The established versions : on
gentoo -
5.1.1 , on
debian , after some dances with a tambourine -
4.1.1 (but it will have to be manually updated to 5.x, why - read below) , on
centos6 -
5.1.0 .
JFYI Why you need to upgrade:
because the output of data in rrd is different in these versions, therefore either write a crutch for conversion or write 2 scripts to generate graphs on the front-end face. Yes, and because of changes in schedules, you will have to take into account the client version on the host and write rules for notifications for it separately.In Debian and Centos, I have all the plugins installed. Well, because of the finished package is put :)
Customization
We go further. I didn’t like the config format at all, for a long time where to find something, so for myself I cut it into the parts I needed, since it’s possible to connect other configs from the config, as they say, inline :)
Again, in the Ghent, the entire config in one file, which is located in
/etc/collectd.conf . In Debian, it is placed in the beautiful path
/etc/collectd/collectd.conf , as well as some parts of the
filters and
thresholds configurations are rendered into separate files, which is good news. In general, I made about the same configuration on my Ghent, changing it a little bit. In particular, the connection of the plugins I needed was brought to a separate directory and each plug-in (or rather its configuration) was also in a separate file. Here is how it looks like:
# Config file for collectd(1). # # Some plugins need additional configuration and are disabled by default. # Please read collectd.conf(5) for details. # # You should also read /usr/share/doc/collectd-core/README.Debian.plugins # before enabling any more plugins. Hostname "gen-collectd-master.local" FQDNLookup true BaseDir "/data/collectd" #PluginDir "/usr/lib/collectd" #TypesDB "/usr/share/collectd/types.db" "/etc/collectd/my_types.db" #Interval 10 #Timeout 2 #ReadThreads 5 LoadPlugin logfile LoadPlugin syslog <Plugin logfile> LogLevel "info" File "/data/collectd/collectd.log" Timestamp true PrintSeverity true </Plugin> <Plugin syslog> LogLevel info </Plugin> LoadPlugin network <Plugin network> Listen "192.168.56.130" "8085" </Plugin> Include "/etc/collectd/inst/*.active" Include "/etc/collectd/conf/*.conf" Include "/etc/collectd/filters.conf" Include "/etc/collectd/thresholds.conf"
This is the main configuration file, if you compare it with the default file, you will notice that there are not all plugins in my file, only those that I consider the main configuration. The remaining files are connected from the
inst and
conf directories.
JFYI Also pay attention to the
FQDNLookup parameter
true - if you have something in your
hostname , it should be resolved! Otherwise, it will crash with an error, another solution is to set this parameter to
false.The
inst directory contains plugin configuration files:
gen-collectd-master collectd
As you can see from the config, I only connect files with the "extension"
activeJFYI All plugin parameters can be found on the
collectd.conf documentation page
. Further, the
conf directory contains 2 files, one to configure the
notify_email plugin, the second to configure
rrdtool gen-collectd-master collectd
In general, they can be returned calmly to
collectd.conf , but for
some reason at that time I wanted to do just that :)
The contents of the
conf / rrdtool.conf file LoadPlugin rrdtool <Plugin rrdtool> DataDir "/data/collectd/rrd" </Plugin>
As you can see, here I am loading the plugin and setting the parameters for it.
Conf / mail.conf file LoadPlugin notify_email <Plugin notify_email> SMTPServer "stmp.mail.ru" SMTPPort 25 SMTPUser "collectd@mail.ru" SMTPPassword "my-super-password-for-mail" From "collectd@mail.ru" # # <WARNING/FAILURE/OK> on <hostname>. # # Beware! Do not use not more than two placeholders (%)! Subject "[collectd] %s on %s!" Recipient "recipient@mail.ru" </Plugin>
We need this plugin when we set up notifications.
JFYI can write your own notification handler. To do this, you need to enable the
exec plugin and set up a script that will be launched when generating the notification. This is done like this:
LoadPlugin exec <Plugin exec> NotificationExec thunder "/home/thunder/ttest.sh" "test1" </Plugin>
The general specification for this command is:
NotificationExec <> "<-->" ["1"] ["2"] ..
I have written the following in the script
In the log when notifications will be created something like
Severity: WARNING Time: 1354181979.770 Host: jen-master-local Plugin: cpu PluginInstance: 0 Type: cpu TypeInstance: user DataSource: value CurrentValue: 9.989738e+01 WarningMin: nan WarningMax: 8.500000e+01 FailureMin: nan FailureMax: nan Host jen-master-local, plugin cpu (instance 0) type cpu (instance user): Data source "value" is currently 99.897375. That is above the warning threshold of 85.000000.
As we see all the data we have here, it will not be difficult to parse it and it is also not difficult to write your own notifier.
Let's go back to the main
collectd.conf file
.I will not explain about
syslog /
logfile , so everything is clear here,
hostname too.
Network plugin - more specifically, you can read about the plugin
here , in particular, authorization can be specified there. At my place I will not consider it for now, how everyone will decide for himself how to do it for himself :)
This plugin is used to communicate between
collectd servers.
To configure the current server as a server for collecting statistics, you must set the
Listen parameter
“192.168.56.130” “8085”, where
192.168.56.130 is the ip address where the daemon will hang and listen to incoming data from other servers.
8085 - the port on which it will hang.
To configure the client, instead of
Listen you need to specify
Server "192.168.56.130" "8085" , respectively
192.168.56.130 - ip address where to send the data.
8085 - the port to which to send data.
JFYI Port can be omitted, the default port
25826 will be used, just remember that it will work via the
UDP protocol, so keep in mind if you have a
firewall somewhere
.Configuring plug-ins here and there is no different.
All that you have configured to monitor on the "
Client " will be sent to the "
Server ".
Mail notifications
We turn to the most delicious. The only examples of setting the notification of some plug-ins are only in the config
thresholds.conf .
The main download of the plugin and example:
LoadPlugin "threshold" <Plugin "threshold"> <Type "foo"> WarningMin 0.00 WarningMax 1000.00 FailureMin 0.00 FailureMax 1200.00 Invert false Instance "bar" </Type> </Plugin>
A brief explanation of how this works.
Threshold is a regular plugin, so it is loaded as a plugin. All parameters are set inside the
<Plugin “threshold”> container. Inside it can be set containers in the following sequence - "
Host ", "
Plugin ", "
Type ". Those. Inside the
Host container there can be a
Plugin container, inside of which there can be a
Type container. The
Host block is optional, with it you can bind notifications for a specific host. Also, all values ​​should be set only inside the
Type block, the only value that can be set outside the
Type block is
Instance.If several blocks are applied to one value, then the most accurate of the blocks will be used. So You can specify some kind of standard block for the plugin, and then, for example, for a specific host, override it with other parameters. So, let's proceed directly to the configuration of notifications.
Cpu plugin
<Type "cpu"> Instance "user" WarningMax 85 Hits 1 </Type>
Here you can not write in front of the
Type Block
Plugin block. We indicate that you need to monitor the value of
user (user processes) and if it reaches the value of
85 , then send a warning.
Hits is the number of hits in this value for one
Interval (see the configuration of the main config), in our case it is equal to
1 , i.e. if within
10 seconds the value is
> = 85 , then a notification will be generated. Here you can put the value more, for example,
6 , that is, if the value is for one minute, then there is something to worry about.
Ping plugin
<Plugin "ping"> <Type "ping_droprate"> FailureMax 0.9 </Type> </Plugin>
As you can see here, we set for the plugin a
ping type equal to
ping_droprate. This table contains the value of either
0 or
1 . Accordingly, we set generation of the type
Failure if the value exceeds
0.9 . If you specify
1 , it will not work :)
Memory plugin
<Plugin "memory"> <Type "memory"> Instance "free" WarningMin 25000000 </Type> </Plugin>
We choose
instance free , because we monitor free memory, the less
free the worse, we set
WarningMin . If the value reaches or becomes less than the specified value, then a notification will be generated.
Now the most interesting thing is that this is not in the documentation and it turned out to be hard to find an example, so I had to experiment.
We make notifications on a place on a disk
Df plugin
<Plugin "df"> Instance "root" <Type "df_complex-used"> # DataSource "value" WarningMax 4025360000 FailureMax 6025360000 Percentage false </Type> </Plugin>
So, in version 5.x, the logic for creating tables for the df plugin has changed, so the reference to the tables has become different.
Instance - specify the graph for which section to apply
Type -
df_complex-used -
df_complex is always and necessarily, after the dash in our case the search for data on the used place is set.
Now the
DataSource can be omitted, since the table has only one
value field.
WarningMax /
FailureMax - unfortunately for some unknown reason it is impossible to use percentage data for this plugin, so for each host you will have to fill this plugin with specific values. Also below, we clearly state that we don’t use percentages.
The question about this appeared in 2011 and in version 4.9.1, but there is still no answer to it.
That's all, the main plugins are configured, notifications for them too.
Suggestions, suggestions, questions are welcome. I will answer if possible.