📜 ⬆️ ⬇️

Rodent Hunt for Linux

There is a mass of monitoring tools of the operating system, but it makes special sense to catch the moment of the problem and catch the cause of the high load or the source of performance problems. I call it the hunting of "rodents" resources.

For this, I composed for myself a simple ratcatcher.sh script which you can modify to fit your systems and tasks.

The principle of operation is simple - the script runs at a specified frequency, checks the Load Average level (you can use other control parameters) and if the specified value is exceeded, the script executes the specified set of diagnostic commands and creates a report that is sent to your email address.

Sample script for the OpenVZ server


#!/bin/bash #          export LC_ALL=C #   load average            # ,   OpenVZ   75-200,   KVM - 15-45 LALIMIT="80" #    EMAIL="alerts@.tld" #   SUBJECT="WARNING-High load notification" #      5  F5M="$(cut -d. -f1 /proc/loadavg)" #     RESULT="$(echo "$F5M > $LALIMIT" | bc)" #     ,       #   ,     ,       #    .       /tmp/ratkill.flag, #    /tmp/ratkill.flag   . # if (( "$RESULT" == "1" )); then if [ -f /tmp/ratkill.flag ]; then exit 0 fi touch /tmp/ratkill.flag else if [ -f /tmp/ratkill.flag ]; then rm -f /tmp/ratkill.flag fi exit 0 fi #      TEMPFILE="$(mktemp)" #    echo "Load average Crossed allowed limit $LALIMIT." >> $TEMPFILE echo "Hostname: $(hostname)" >> $TEMPFILE echo "Local Date & Time : $(date)" >> $TEMPFILE #   echo "Memory-----------------------------------" >> $TEMPFILE free -m >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE vmstat -s -Sm >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE #     echo "context switches:" >> $TEMPFILE sar -w 1 5 >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE #   "" echo "Top loaded containers:" >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE /usr/sbin/vzlist \ -o veid,ip,hostname,numproc,numfile,numflock,numtcpsock,physpages,laverage \ -s laverage | tail -20 >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE #      echo "Top containers by net. connections count:" >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE /usr/sbin/vzlist \ -o veid,ip,hostname,numproc,numtcpsock -s numtcpsock | tail -20 >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE #     echo "conntrack count" >> $TEMPFILE wc -l /proc/net/nf_conntrack >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE #   echo "I/O statistic:" >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE iostat -x 2 5 >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE #   top echo "System snapshot from top:" >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE top -b | head -30 >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE #    I/O    CPU echo "Report from dstat:" >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE dstat --net --disk --disk-util --sys --load --proc --top-io-adv \ --top-cpu-adv --nocolor 5 5 >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE #   RAID  echo "RAID Logical device information" >> $TEMPFILE #/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aAll >> $TEMPFILE /usr/local/sbin/arcconf GETCONFIG 1 ld >> $TEMPFILE echo "-------------------------------------------" >> $TEMPFILE #     cat $TEMPFILE > /tmp/load.txt echo "${SUBJECT}-${F5M}" | mail -a /tmp/load.txt -s "$(hostname -s)-${SUBJECT}-${F5M}" "$EMAIL" rm -f $TEMPFILE 

In order to have binding to a specific guest, you can also add analysis of PID processes via vzpid and much more, but you can do this yourself if necessary.
')
For the script to work, you will need to additionally install the sysstat and dstat utilities. Use the latest dstat version for your distribution, otherwise you will not get the output you need.

You should get something like this:

image

See also:

Source: https://habr.com/ru/post/274633/


All Articles