📜 ⬆️ ⬇️

collectd - collect system and user statistics

Question number 0 - why?



In a post about pnp4nagios, I wrote “Nagios / Pnp4Nagios is not a replacement for the system statistics collection complex”. Why do I think so? Because 1) the system status statistics is extensive and includes many indicators 2) it does not always make sense to monitor them, or rather to generate alerts. For example, knowing how many I / O operations make a disk or context switching takes place well, but almost never critical. Well, besides, Nagios is simply not meant for that. In this article I will not make a complete description of the system, I will confine myself to especially interesting moments, from my point of view.

Question number 1 - why collectd?


')
The main points why I chose collectd from Munin, Cacti and others:
  1. Scalability
  2. Lightness
  3. Concept - everything has plugins
  4. Data collection and recording is divided
  5. Number of indicators collected
  6. Extensibility




The general scheme of work collectd:
image

Scalability

To take data to the central node (s), push is used (as opposed to Cacti / Munin’s poll / pull). More than one node can store data and moreover, it is possible to separate data for storage on different nodes. Data transfer is handled by a separate plugin - network.

Lightness

The main daemon and plugins are written in C and easily survive the 10-second data collection interval without loading the system.

Everything has plugins

CPU load data collector - plugin. Information about the processes - plugin. Record and create RRD / CSV - plugin.

Data collection and recording is divided

Data can be both read and written. collectd divides plug-ins into "readers" and "writers." Those that collect information - readers. After while the data is read, it is sent to registered writers, who can generally be any. The most "popular" writer is the network plugin which sends data to the central node and RRDTool, as RRD, as a rule, implies statistics. Thus, a node can have both statistics in the RRD and send data for further processing.

Number of indicators collected

Currently there are more than 90 basic plugins for collecting information about the system and applications.

Extensibility

To add your own data sources exist:
  1. The exec plugin is generally a standard extension method - the program is started, the data output to stdout is processed, but collectd plus also has here - the program does not have to exit after outputting values, moreover, it is recommended to start and output data in a loop, saving resources on startup, which especially relevant for scripts.
  2. Python / Perl / Java Binding - are both readers and writers, more detailed description below


Expansion due to bindings


Bindings are essentially plugins for accessing internal collectd mechanisms from other languages ​​and writing plugins for them. Java / Perl / Python are currently supported. For example, for Python, the interpreter starts at the start of collectd, is contained in memory, saving startup resources every few seconds and allows scripts to have access to the API.

So the script can register as a data provider (reader) and / or as a writer, the registered procedure will be called every time interval specified in the configuration. If everything is clear with the reader, then a separate attention should be paid to the writer - your script can easily be embedded to process all the data passing through, i.e. You can, for example, make your database of stored values. A simple example of such a Python plugin is in the project documentation.

Interesting and useful features of plugins




Other features


Filters and chains

Starting from version 4.6, a filter and chain mechanism appeared, similar to chains in iptables. Using this mechanism, you can filter data, for example, to cut off values ​​for which the timestamp is more or less than the current time by N, which can be useful if a clock gets stuck on a server. RRD will get time from the future and the readings will be distorted.

Notification and threshholds

The basic system of notifications and threshold values appeared since version 4.3. Similarly, readers and writers, there are "producers" and "consumers" - the first produce notifications, the second process them. In particular, the Exec plugin can both respond to notifications, for example, run a script, and transmit notifications from scripts.

By configuring a set of threshold values, you can create alerts for deviations from the norm. However, it should be understood that these basic capabilities do not replace the same Nagios. For full-fledged work with Nagios, you can use the bundled collectd-nagios program that allows you to poll the socket created by the UnixSock plugin and return the result in a standard format for Nagios'a

disadvantages


I can only rank the display system of graphs as disadvantages. Considering that about 200 tons of counters can be generated from a single host, visualization becomes not in the last place. Standard interface collection3 is not bad, but far from perfect. To date, several independent graph display systems are being developed, but I cannot recommend any yet.

Other


One of the developers of Sebastian Harl (tokee) is the maintainer of a package in Debian, therefore there is almost always the latest version in backports

Source: https://habr.com/ru/post/93205/


All Articles