All welcome dear readers. In this article, I will tell you about my "bike", on which I do monitoring of various things without leaving the console.
I once faced a situation where quite a lot of different projects and servers were bred, and my hands did not reach for setting up normal monitoring.
Yes, and in the modern world, “proper” monitoring implies the deployment of a whole heap of software, customization of the whole thing. Well, you know there ... the docker, the elastic stack and it went away. For me it was a strong overhead projector. It would be desirable that one-two and in production.
')
I looked in the direction of
Simple monitor on the python, he was closest to me in spirit, but he lacked quite a few features. And at the same time I wanted to teach Go ... well, in general, you yourself know how it all starts as usual.
So I took the go
weld , and knocked this
bike out .
Cli Monitoring is written in Go and is a collection of binaries, each of which receives data from stdin, performs some specific task and outputs the result to stdout.
There are four types of binaries in total:
metrics ,
processors ,
filters , and
outputs .
Metrics , as the name implies, collect any data and usually go first in the chain.
Processors are in the middle and somehow change the data or perform other utility functions.
Filters are almost like processors, but unlike them, they either skip or do not skip data depending on the condition.
Outputs are at the exit from the chain and are used to send notifications to various services.
The whole chain of commands usually looks like:
some_metric | processor_1 | processor_2 ... | cm_p_message | output_1 | output_2 ...
Any piece of this chain can be any Linux command, so long as it receives data in stdin and gives it to stdout without buffering. There is only one small BUT associated with the transfer of lines, but more on that later.
The name of the binaries is formed as
cm_ {type} _ {name} , where type is one of three:
m, p, f, or o , and name is the name of the command.
For example, cm_m_cpu is a metric that displays statistics on the processor in json format to stdout.
And the cm_p_debounce file is a processor that only allows one message to exit at a given interval.
There is one special processor
cm_p_message , which should stand before the first output. It creates a message of the desired format for subsequent processing by its Outputs.
To handle json in the console and various conditions, I used the
jq utility. This is something like sed, only for json.
So, for example, looks at the result of monitoring the CPU load.
cm_m_cpu | cm_p_eot2nl | jq -cM --unbuffered 'if .LoadAvg1 > 1 then .LoadAvg1 else false end' | cm_p_nl2eot | cm_f_regex -e '\d+' | cm_p_debounce -i 60 | cm_p_message -m 'Load average is {stdin}' | cm_o_telegram
And so monitoring of messages in the RabbitMQ queue
while true; do rabbitmqctl list_queues -p queue_name | grep -Po --line-buffered '\d+'; sleep 60; done | jq -cM '. > 10000' --unbuffered | cm_p_nl2eot | cm_f_true | cm_p_message -m 'There are more than 10000 tasks in rabbit queue' | cm_o_opsgenie
So you can monitor that nothing has been written to the file in 10 seconds.
tail -f out.log | cm_p_nl2eot | cm_p_watchdog -i 10 | cm_p_debounce -i 3600 | cm_p_message -m 'No write to out.log for 10 seconds' -s 'alert' | cm_o_telegram
Do not rush to close the screen, now analyze what happens here in the first example.
1) The metric
cm_m_cpu displays once per second (set by the -i parameter, the default is second) strings in json format. For example, {"LoadAvg1": 2.0332031, "LoadAvg2": 1.9018555, "LoadAvg3": 1.8623047}
2) cm_p_nl2eot is one of the utility commands that converts the EOT character to the LF character. The fact is that in order to avoid problems with the line break, I decided to make sure that all my binaries read data up to the ascii EOT (End of Transmission) symbol. This allows you to safely transfer multi-line data between commands.
Therefore, when any other commands are called, they should be surrounded in the form of:
cm_p_eot2nl | any other team | cm_p_nl2eot.
3) This is followed by a call to the
jq utility, which checks the LoadAvg1 field and if it is greater than 1, then displays it further, if less, it displays false
4) Next we need to throw out the entire message
false from the chain. To do this, use the filter
cm_f_regex , which accepts a string as an input, matches it with a regular expression, and in the case with a match, outputs further. Otherwise, the string is simply discarded.
It would be possible to use the usual grep, but firstly it buffers the output, and the full syntax gets a little longer (grep --line-buffered), and secondly, cm_f_regex makes it very easy to display group matches. For example:
cm_f_regex -e '(\d+)-(\d+)' -o '{1}/{2}'
Converts the string 123-345 to the string 123/345
5) The
cm_p_debounce processor, in this case, takes our LoadAvg1 value and outputs it further along the chain only once every 60 seconds. This is necessary in order not to spam yourself. You can set any other interval.
6) Almost everything is ready. It remains only to form a message and send it to the telegram. The message is formed by a special command
cm_p_message . It simply accepts a string as input, creates json with the fields Severity, Message and others and outputs further for processing by outputs. If we did not pass the -m parameter to it, then stdin would be the message, i.e. millet number is our LoadAvg1. This is not very informative.
7) The cm_o_telegram
command simply sends the input message to the telegram. Telegram settings are stored in the ini file.
Configuration
All parameters that accept binaries can be specified in the ini file. Parameters set by the command line argument take precedence over the ini file.
The init file format is:
[global]
host_name=override host name for this machine
[telegram]
cid=....
token=....
[opsgenie]
apiToken=...
apiEndpoint=...
......
[debounce]
i=3600
The ini file itself is selected in the following order:
1) The cm.config.ini file in the current working directory
2) File /etc/cm/config.ini if ​​the file from item 1 is not found
Production
On a real server, I create a file for example cpu.sh, in which all the necessary chain of commands is recorded. Next in the crown I prescribe something like this:
*/5 * * * * flock -n /etc/cm/cpu.lock /etc/cm/cpu.sh > /dev/null
If anything falls, the flock will raise the command. And that's it! The simplicity of which I did not lack.
That turned out to be such a tool, maybe someone will find it convenient. For me, the convenience is that you don’t have to do a lot of unnecessary things just to monitor the necessary things. Yes, and this is all set up quite comfortably: clone the repository, add the path to the binaries in $ PATH, that's all.
Please do not judge strictly. The tool was written for myself, the set of commands is not big yet. But I will be glad to any feedback and wishes. Thank you all for your attention.