📜 ⬆️ ⬇️

How I wrote my monitoring

I decided to share my story. It may even be useful to someone like a budget solution to the well-known problem.

When I was young and hot and did not know where to put my energy, I decided to grind a little. I managed to quickly fill the rating and I found a couple of regular customers who asked to maintain their server on an ongoing basis.

The first thing I thought about was the need for monitoring. I decided to make smart people not to reinvent the wheel, but to see ready-made options, such as Munin or Zabbix. But it was immediately discovered that the Web version requires a good Internet connection, especially if opened for the first time from the phone. If you relax in nature away from the city, it is difficult to get a stable connection. Therefore, the console monitoring option was selected.

As a console monitor, atop helped me well, and the atop program for reading atop logs - atopsar. They were already mentioned on habr, atop was even taken apart , but they almost didn’t tell anything about atopsar.
')

Installation


Very simple installation, only three teams.

#Centos

yum install atop 

# Debian / Ubuntu

 apt-get install atop 


Then you can customize the monitoring work for yourself or use the default settings.

# Debian / Ubuntu / Centos

 /etc/default/atop 

Standard file:

  #cat /etc/default/atop INTERVAL=60 #,       ,    10  LOGPATH="/var/log/atop" #     OUTFILE="$LOGPATH/daily.log" #      

Add to autostart
# Debian / Ubuntu / Centos

 systemctl enable atop 

Run atop as a demon
# Debian / Ubuntu / Centos

 systemctl start atop 

For the lazy gathered in one team
#Centos

 yum install atop && systemctl enable atop && systemctl start atop 

# Debian / Ubuntu

 apt-get install atop && systemctl enable atop && systemctl start atop 

Atopsar


Along with atop, atopsar is also installed, this is a convenient console analyzer of binary logs that are run by the atop daemon. Of course, you can read logs by atop itself, but this is not so convenient if you want to capture a large interval of time.

A small educational program for work atopsar.

When you run atopsar without keys, the log opens today and the load on each core is displayed separately and the idl line for all cores.

The keys that I use are:

-A = remove all information from the log
-c = display information on CPU load, default key
-m = load on RAM and swap
-d = disk activity
-O = top 3 CPU load processes
-G = top 3 RAM load processes
-D = top 3 disk load processes
-N = top 3 network load processes
-r = specify the path to the log that you want to read, if you need to look at the load over the past days
-b = time from which to start output
-e = time to finish output
-M = creates an additional column at the end, which marks the criticality of the line (+ there is a load, * - critical load)

Thanks to monitoring, we can understand the reason for incorrect behavior of the server at any time.

Notifications


So, there is a load monitoring, but it still does not give the ability to quickly find and solve problems. We need notifications about the problem.

I’m the one watching the servers, so I need to notify where I can always see it and at least somehow react to it.

In the beginning there were SMS - quickly, securely, for free. But then the mobile operators covered the free SMS mailing through their gateways.
Mail - long, there may be problems with delivery.
Messengers - you need to put on the phone, you need to create bots.

As a result of the search, the Telegram messenger was chosen for simplicity and convenient application on the phone and desktop.

Created your bot using botfather .
After putting on the server several scripts that track server load (IDL, smartct, etc..l), the presence of errors like “oom killer”, errors during backup creation and other operations that need to be monitored.

The scripts are fairly simple, written in bash, for example, checking LA and notifying that Load Averadge exceeds the number of cores on the server.

 if [ ${LA[0]} -gt 2000 ] || [ ${LA[1]} -gt 3000 ] || [ ${LA[2]} -gt 4000 ] then wget -O /dev/null "https://api.telegram.org/$bot_id:$bot_key/sendMessage?chat_id=$chat_id&text=  $ip LA $LAd" wget -O /dev/null "https://api.telegram.org/$bot_id:$bot_key/sendMessage?chat_id=$chat_id&text=`top -b -n 1 | grep Cpu`" wget -O /dev/null "https://api.telegram.org/$bot_id:$bot_key/sendMessage?chat_id=$chat_id&text= 5  `top -b -n 1 | grep -A 5 'PID USER' | tail -5`" fi 

The simplicity of the syntax gives a lot of use cases (and anyone who knows a little programming language can write / append).

The only caveat is that if the server is located in Russia (and you do not have IPv6 on the server), then you need to use a proxy. To do this, at the beginning of the script you need to register the connection string to the proxy:

 export https_proxy=http://:@IP.: 

This is not the end


You go quietly over the mountains with a backpack on your back, rest from civilization, and then the phone, having accidentally caught the connection, throws a notification about the problem that has arisen on your server. What to do? A serene mood like a wind blew away. Call your wife and dictate the command? Haha

It was necessary to urgently think of some way to eliminate the problems that arose quickly and without a good Internet. Here I was again saved by an instant messenger (# telegrammzhivi). I taught my bot to communicate only with me, ignoring everyone else. Now, along with the notification of the problem, I receive a little more data on which I understand who the source of the problem is, and I can try to solve it remotely. It is enough just to write a message to the bot, throw the phone higher, so that this message is gone, and voila - the bot went to do your work. This way I can, for example, kill some objectionable process, restart the daemon, block IP and so on.

Here I transferred future necessary requests from clients, for example, urgent resetting passwords to users (for “Aaaa, we can't get to the server, we lose millions!”), Search for a user who has access to the right folder, turn the site on and off and other . Of course, I am constantly refining the functionality of the bot, as the fantasy of customers throws up sometimes unexpected requests that I haven't provided for. But the main ones are satisfied.

There is also a version for VK, but it somehow did not catch on.

Now I travel and study this world quietly, without fear that something will break there, but I will not be able to find out or correct it.

Source: https://habr.com/ru/post/453430/


All Articles