📜 ⬆️ ⬇️

We collect our counters via the collectd protocol

Greetings

Thinking how to collect meters from your own services?
Tired of parsing logs?
Do you constantly forget to set up a collection of counters for a new or relocated service?


')
Then welcome!

In any large project, over time, a bunch of all sorts of different highly specialized services appear that need to be monitored at least in order to understand when you should order more hardware.
To do this, they usually invent “vital signs” for which you want to see beautiful graphics in order to understand closely or far, for example, the “ceiling” of performance.

Of course, the developers do not skimp on the logs, and earlier I had to parse gigabytes of this good for a long and painful time in order to build a graph.

You can live with it, but life can be made better, especially when you have access to the service code. Then you can teach the service to send data about yourself to the data collection service - collectd

On Habré there was already an overview of the architecture of this service, so we will not dwell on this in detail.

The essence of collectd:

It remains to teach your services to form and send such udp-packages and graphics are almost ready.

Configuration collectd

First you need to teach collectd to understand your own counters.

/etc/collectd.conf and add a description of our types:

TypesDB "/usr/lib/collectd/types.db" "/etc/collectd/customtypes.db"

Example number 1. The data will be saved in separate rrd files.

http_server_avg_item_process_time value:GAUGE:0:U
http_server_items_processed value:GAUGE:0:U


Example number 2. Data will be saved in one rrd file.

process_memory working_set:GAUGE:0:U, peak_working_set:GAUGE:0:U

The difference is that for the first example you have to send 2 udp packets, and for the second you can send one packet with two values.

Restart collectd and it is ready to receive your data.

Python binary protocol

In fact, it is quite simple to implement binary protocol in any language, since once a blog about python, it will be on python :)

NB : I use the standard version of the collectd daemon from the debian 5 - 4.4.2 package. It is already quite old, but like the binary protocol has not changed there, so it seems that the version does not play a special role.

The default implementation basically works, but it is unable to send multiple values ​​in one package.

If you throw out almost all unnecessary from the default implementation, and finish sending multiple values, you can get, for example, the following code:
 import struct import time import socket SEND_INTERVAL = 10 # seconds MAX_PACKET_SIZE = 1024 # bytes TYPE_NAME = "gauge" TYPE_HOST = 0x0000 TYPE_TIME = 0x0001 TYPE_PLUGIN = 0x0002 TYPE_PLUGIN_INSTANCE = 0x0003 TYPE_TYPE = 0x0004 TYPE_TYPE_INSTANCE = 0x0005 TYPE_VALUES = 0x0006 TYPE_INTERVAL = 0x0007 LONG_INT_CODES = [TYPE_TIME, TYPE_INTERVAL] STRING_CODES = [TYPE_HOST, TYPE_PLUGIN, TYPE_PLUGIN_INSTANCE, TYPE_TYPE, TYPE_TYPE_INSTANCE] VALUE_COUNTER = 0 VALUE_GAUGE = 1 VALUE_DERIVE = 2 VALUE_ABSOLUTE = 3 VALUE_CODES = { VALUE_COUNTER: "!Q", VALUE_GAUGE: "<d", VALUE_DERIVE: "!q", VALUE_ABSOLUTE: "!Q" } def pack_numeric(type_code, number): return struct.pack("!HHq", type_code, 12, number) def pack_string(type_code, string): return struct.pack("!HH", type_code, 5 + len(string)) + string + "\0" def pack(typeId, value): if typeId in LONG_INT_CODES: return pack_numeric(typeId, value) elif typeId in STRING_CODES: return pack_string(typeId, value) else: raise AssertionError("invalid type code " + str(id)) def pack_counters(counters): length = 6 + len(counters)*9 result = [] result.append(struct.pack("!HHH", TYPE_VALUES, length, len(counters))) for value in counters: result.append(struct.pack("<B", VALUE_GAUGE)) # this code allow to send only gauge value for value in counters: result.append(struct.pack("<d", value)) return result def message_start(when=None, host=socket.gethostname(), plugin_inst="", plugin_name="any", value_type=TYPE_NAME): return "".join([ pack(TYPE_HOST, host), pack(TYPE_TIME, when or time.time()), pack(TYPE_PLUGIN, plugin_name), pack(TYPE_PLUGIN_INSTANCE, plugin_inst), pack(TYPE_TYPE, value_type), pack(TYPE_TYPE, value_type), pack(TYPE_INTERVAL, SEND_INTERVAL) ]) def create_message(counters, when=None, host=socket.gethostname(), plugin_inst="", plugin_name="any", type_name=TYPE_NAME): message = [message_start(when, host, plugin_inst, plugin_name, type_name)] parts = pack_counters(counters) message.extend(parts) return "".join(message) 

An example of the formation of udp-package to send with two counters

 create_message([working_set_value, peak_working_set_value], plugin_name='service_name', type_name='process_memory') 

When receiving such a package, collectd will create or add your data to the file along the following path:
/var/lib/collectd/rrd/hostname/service_name/process_memory.rrd

Wireshark, which understands the collectd protocol and can tell what is wrong with it, if there is nothing intelligible in the collectd log itself can help greatly in debugging.

We build graphics


Collectd is a data collector and nothing more. In order to build a graph, he needs a web muzzle.
The ideal partner for collectd that I managed to find is drraw .
This is a web snout for rrdtool and nothing more.

The main feature that I personally liked and which the rest of the web-mords do not have is a flexible adjustment of the graphs by regular expressions. Drraw will automatically find all the hosts / services / etc and combine them on a single graphic.

Screenshot screenshot settings (partially)



Sample graphics



UPD A small bugfix in the code. The order of the sent values ​​must be the same as they are defined in customtypes.db

Source: https://habr.com/ru/post/139053/


All Articles