monit - an observer of system processes

Theory

Monit is a standalone daemon that runs as root. The daemon runs on Linux, Free / Net / OpenBSD, SUN Solaris, and some other UNIX systems. This is an open source project that has a “big brother” - a commercial MMonit project. The latter has wider functionality in the matter of mass monitoring, interconnection and reporting. The authors' idea is simple - we use Monit for a single server, MMonit for a large network farm.

')
Depending on the settings, the daemon can check:

The existence of a PID process
Job specific port (TCP / UDP)
Answer a specific protocol on a specific port (SMTP, SSH, HTTP ...)
Process Resources (CPU time / RAM)
MD5 checksum
Volume and free space in the file system
Number of active (and total) i-node-in
File or directory permissions

No one forbids combining various verification methods. For one test object (tests) depend on each other, that is, test1 is first carried out, if it passed without errors, test2, then test3, etc.

In case a test fails, monit can:

Stop, start or restart the daemon
Wait a certain time
Notify admin (by mail)
Mount, unmount or remount file system
Run a separate script (previously written by the administrator), and transfer certain parameters to it (process name / error text, etc.)

Nobody forbids actions to combine, for example:
If HTTPd takes more than 200 megabytes - wait a minute, if nothing has changed - restart the service, if it also did not help - wait five minutes. If this did not help, stop the service and notify the admin with a letter.

And further. Monit has its own http server. They should not be abused, as it works with root privileges, but access to the web console can be extremely useful. Web server will be discussed separately in the same article.

Installation and Setup

There is a monitor in almost all widespread distros. In Debian, CentOS and Suse, it is called that. In FreeBSD lies in PORTS / sysmgmt / monit. It is put in the standard way for the operating system, and I will not dwell on this in detail.
The result of the installation will be the actual daemon (monit) and the configuration file that lives here:

 # Linux, Solaris:
  / etc / monit / monitrc
 # FreeBSD / OpenBSD / NetBSD
  / usr / local / etc / monitrc

The config is documented in great detail, it is recommended to read it. There are detailed examples and a lot of interesting things in general. In principle, most of the default settings can not touch, limiting only the necessary changes:

 # process works like a demon, the scan cycle is 120 seconds
 # cycle time can be changed, this is the main unit of time for monit. 
 # Once in a cycle checks are triggered and commands from the admin are sent via the web interface
 set daemon 120
 # servers through which the mail notification will go.  You can do several, the sequence of operation repeats the order of introduction
 set mailserver mail.zooclub.ru 10025,
     localhost
 Who will be notified?
 set alert sysadmin@zooclub.ru

The information that monit should check can be stored in a separate file (s) that are connected to the main config with the include command:

 # one file
 include /etc/devel/monitcheck.monitconf
 # all files with extension from folder.
 include / etc / stable / monit / *

It seems to me that it is more convenient to store the scan of each service in a separate file - this makes debugging easier and simplifies administration.

Monitor the status of the server as a whole:

   check system ws1.zooclub.ru
     if loadavg (1min)> 4 then alert
     if loadavg (5min)> 2 then alert
     if memory usage> 75% then alert
     if cpu usage (user)> 90% then alert
     if cpu usage (system)> 40% then alert
     if cpu usage (wait)> 20% then alert

File systems:

 # /etc/stable/monit/filesystem.conf

 # we check the device on the mount point. 
 # You can check disks directly (/ dev / hda), but with LVM and other logical "disks" this trick will not roll, 
 # they can be checked only by the mount point and nothing else.
 check device homefs with path / home
         start program = "/ bin / mount / home"
         stop program = "/ bin / umount / home"
         if failed permission 755 then alert
         if failed uid root then alert
 # If there is less than 20% of the place, at least five checks in the last 15 - to ring the alarm and do nothing else.
 # With any of its activity monit will warn the administrator with a letter.
         if space usage> 80% for 5 times within 15 cycles then alert
 # The place is over, unmount the filesystem
         if space usage> 99% then stop
 # similar to i-nodes.
         if inode usage> 80% then alert
         if inode usage> 99% then stop
         group server

 check device rootfs with path /
         start program = "/ bin / mount /"
 # Lose / during server operation is a bleak prospect.  So if this is bad, just remount it in read-only
         stop program = "/ bin / mount -o remount, ro /"
         if failed permission 755 then unmonitor
         if failed uid root then unmonitor
         if space usage> 80% for 5 times within 15 cycles then alert
         if space usage> 99% then stop
         if inode usage> 80% then alert
         if inode usage> 99% then stop
         group server

 check device bootfs with path / boot
          start program = "/ bin / mount / boot"
         stop program = "/ bin / mount -o remount, ro / boot"
 # this construction "disables" testing of the filesystem, if the rights to the folder are not 755
         if failed permission 755 then unmonitor
         if failed uid root then unmonitor
         if space usage> 80% for 5 times within 15 cycles then alert
         if space usage> 99% then stop
         if inode usage> 80% then alert
         if inode usage> 99% then stop
         group server

Now check the operation of the apache web server:

# /etc/stable/monit/apache.conf
# file check (size, access rights, etc.):
check file apache_bin with path / usr / local / apache / bin / httpd
if failed checksum and
# sum is a standard md5 hash. You can get it by setting the md5sum program to the desired file.
expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor
if failed permission 755 then unmonitor
if failed uid root then unmonitor
if failed gid root then unmonitor
# a separate letter to a separate address and with separate content.
alert security@zooclub.ru on {
checksum, permission, uid, gid, unmonitor
} with the mail-format {subject: Alarm! }
group server

# process check is performed on the pid-file. The path to the pid file is always absolute.
check process apache with pidfile /var/run/apache2.pid
start program = "/etc/init.d/apache2 start"
stop program = "/etc/init.d/apache2 stop"
if cpu> 60% for 2 cycles then alert
# if the web server has eaten 80% of the CPU time and does not give it five rounds of verification in a row - restart it
if cpu> 80% for 5 cycles then restart
# Similar to the total memory that it has absorbed.
if totalmem> 500.0 MB for 5 cycles then restart
if children> 250 then restart
# if load server average in 5 minutes is more than 10 8 cycles in a row - we cut down.
if loadavg (5min) greater than 10 for 8 cycles then stop
# here is the most interesting - multi-stage verification:
# first step - connect to port 80, http protocol
if failed host 127.0.0.1 port 80 protocol http
# if it works out - request the file /index.html
and request "/index.html"
with timeout 15 seconds
# and if something from the chain did not work - restart the demon
then restart
# HTTP-SSL check. Monitor separately considers SSL, and separately - the protected protocol.
# In order to be able to perform such checks, you need to build a monit with SSL support.
# FreeBSD lovers - be careful when building!
# By default, it must come together with SSL support, but if you disable it, there will be an error
if failed port 443 type tcpssl protocol http
and request "/test.html"
with timeout 15 seconds
then restart
# if in the last five check cycles there were three restarts or more, then skip one check cycle.
if 3 restarts within 5 cycles then timeout
# it only makes sense to check if the first check is passed (which is the access rights and so on).
# Otherwise, all tests are meaningless.
depends on apache_bin
group server

OpenSSHD:

 check process sshd with pidfile /var/run/sshd.pid
         start program "/etc/init.d/ssh start"
         stop program "/etc/init.d/ssh stop"
         if failed port 22 protocol ssh then restart
         if 5 restarts within 5 cycles then timeout
         group server

OpenVPN. We check only the presence of the process:

 check process openvpn with pidfile /var/run/openvpn.link1.pid
    group system
    start program = "/etc/init.d/openvpn start"
    stop program = "/etc/init.d/openvpn stop"
    if 5 restarts within 5 cycles then timeout

PostgreSQL. Checking availability through TCP port and socket

 check process postgres with pidfile /var/run/postgresql/main.pid
         group database
         start program = "/etc/init.d/postgresql start"
         stop program = "/etc/init.d/postgresql stop"
         if failed unixsocket /var/run/postgresql/.s.PGSQL.5432 protocol pgsql then restart
         if failed host 127.0.0.1 port 5432 protocol pgsql then restart
         if 5 restarts within 5 cycles then timeout
	 group database

An exhaustive list of protocols and verification options can be found in the documentation . True, it is in English language.

Web muzzle

As I wrote in the introduction, monit has a small, but quite useful webmord.
Setup Example:

 # enable web interface on a specific port
 set httpd port 10001 and
 # enable SSL
         ssl enable
 # where to get the pem-file.  Needed for ssl, detail below
         pemfile /etc/monit/monit.pem
 # on what address (interface) to listen.
 # if you do not specify the address - will listen at all
         use address 10.10.10.21
 # allow access only from certain addresses
 # strongly recommended!
         allow 10.10.10.22/32
         allow 10.10.12.0/24
 # allow access only to those who know the password.
 # password, unfortunately, is stored in clear text
         allow senegami: aoLouch0aingahce
         allow logan: Jefae2Othaitae1S

Now about the pem-file. The monit web server is quite primitive, and it needs to have an ssl certificate, a key from it and a DH file in one object. Actually, it is called a pem-file. Prepared as follows. First, create a template for the certificate:

  ----- BEGIN: monit.cnf -----
 # create RSA certs - Server

 RANDFILE = ./openssl.rnd

 [req]
 default_bits = 1024
 encrypt_key = yes
 distinguished_name = req_dn
 x509_extensions = cert_type

 [req_dn]
 countryName = Country Name (2 letter code)
 countryName_default = RU

 stateOrProvinceName = State or Province Name (full name)
 stateOrProvinceName_default = NorthWest

 localityName = Locality Name (eg, city)
 localityName_default = Saint Petersburg

 organizationName = Organization Name (eg, company)
 organizationName_default = AnyOne LLC

 organizationalUnitName = Organizational Unit Name (eg, section)
 organizationalUnitName_default = Net

 commonName = Common Name (FQDN of your server)
 commonName_default = ws1.zooclub.ru

 emailAddress = Email Address
 emailAddress_default = security@zooclub.ru

 [cert_type]
 nsCertType = server
 ----- END: monit.cnf -----

Of course, you need to change the values for the specific ones you need.

Then we collect the certificate from the template:

 openssl req -new -x509 -days 720 -nodes \ -config ./monit.cnf -out /etc/monit/monit.pem \ -keyout /var/certs/monit.pem # Generate the Diffie-Hellman number and hide it in the same openssl gendh 512 file >> /etc/monit/monit.pem # check the readability of the certificate openssl x509 -subject -dates -fingerprint -noout -in /etc/monit/monit.pem # As the specific certificate key is in the file, reduce it permissions chmod 400 /etc/monit/monit.pem

Then restart the monitor and admire :)

Source: https://habr.com/ru/post/73506/

All Articles

monit - an observer of system processes

More articles: