Nagios - monitoring system and some homemade plugins

When I chose monitoring systems, I compared cactus, nagios and zabix. And chose nagios. Now my monitoring history has been around for a decade, self-written solutions have accumulated, some of which can be found on the Internet, and some are not. Therefore, I decided to collect everything in one place, let it lie. If you yourself have been using Nagios for many years, you are unlikely to find any revelations here, but it can be useful in the piggy bank. And for beginners it may be useful.

So let's go. A couple of talks - I describe monitoring by the nagios, so how and what to configure on the servers that need to be monitored - I mention in general terms. And the second - I do not like the names of mibs, I try to use oid, tsiferki. If you search on them in Google, then you will find both their names and neighboring MiBs. Actually, knowing the required oid is 2/3 of the case in the case of snmp monitoring.

As a programming language in this case, I prefer perl - it's easier to debug and transfer between platforms.

I will definitely give examples of plug-ins (otherwise why would I have started this article), including maybe standard ones, because I don’t remember what went in the set, what was looked for with the world and dopilivatsya. I only remember that I wrote from scratch myself, my copyrights are standing there.
')
For self-inspection of the oid tree, I recommend the standard snmpwalk utility. I myself have a nagios a) version 3.x, b) installed on FreeBSD, so the paths will often be typical for Free and atypical for linux.

Monitoring windows-servers

We use the banal built-in snmp (which is in windows-servers since windows 2000). This service by default is not worth it, you need to add it, configure the community name (snmp password) and ip addresses from which you can access the service (by default, the password is public and only local ip is allowed). Description windows mibs can be easily found in the internet.

The standard check_disk_snmp.pl plugin allows you to monitor disks by name (which is important, because the order of disks in the tree snmp can change after a reboot; if we are talking about a server that is rebooted 1-2 times a year; during this time it can grow a layer "External" - fibrechannel or iscsi - disks. Their letters remain after reboot, but the order in the snmp tree is not a fact). And it also allows you to monitor the state of RAM - free, busy, swap.

The standard check_snmp_load.pl plugin allows you to monitor the cpu load on the server, while the standard check_tcp and check_udp plugins show the availability of network ports. For what else do you need a server, if not for servicing network requests!

A description of the standard oid that windows responds to is available here . There are both CPU, and RAM, and data storage devices (including the type - CDROM, Floppy, HDD), running processes and installed programs.

Monitoring Unix Servers

Here, too, everything is simple. Install the net-snmp package on the server, configure snmpd.conf. In the latest versions there is hell and chaos, I prefer (I am such a conservative)

good old scheme snmp v2

rocommunity public 127.0.0.1
rocommunity vasik 10.0.9.1

without any newfangled horror, but it's not for everybody. Restart snmpd and monitor if you please.

The above check_disk_snmp.pl can monitor and unix servers. Plus there is an alternative - the check_snmp_storage.pl plugin. Historically, I have windows-servers monitored via check_disk_snmp.pl, and unix-servers are monitored via check_snmp_storage.pl. It uses the same 25 oid branch and also allows you to monitor disk partitions by name (mount point). Because everyone except the admin of the server itself is not at all interested in what exactly is attached to the / data, or / var, or / opt, or / mnt / disk0101019084 point. It is important - how many places there are, how much is occupied, how much is free.
The above-mentioned check_snmp_load.pl can monitor cpu on unix servers, check_tpc and check_udp - availability of network ports.

In addition, the nagios has a useful check_by_ssh plugin. Its essence is that it establishes an ssh connection to the host and starts the specified program there. The program must respond in the nagios format (completion code 0 - successfully, 1 - warning, 2 - critical, 3 - unknown) and can perform any checks that are acceptable to you (and the admin of that server).

On mail servers, it is useful to monitor the status of the mail queue . You can use check_by_ssh to do this, but historically, I use the snmp extension (I remind you that I have an old monitoring system overgrown with shells - but this is good, you can show different ways to get the same result using live examples). The advantage of the “no ssh” approach is obvious - the monitoring server does not have the ability to connect to the server under investigation via ssh and does not create a security hole.

So, expand snmp . In the server under study, in snmpd.conf, we write a line of the form extend mailq /root/getmailq.sh , where extend is a command, mailq is the name of the branch,

/root/getmailq.sh - executable command.

 ls -al /var/spool/mqueue | wc -l

(these particular servers are freebsd, linux or other unix queue locations may differ).

On the monitoring server

write a script (for a change - bash)

 #!/usr/bin/bash # (C) by Smithson Inc, 2006 SNMP=/usr/bin/snmpget HOST=$1 PASS=$2 WARN=$3 CRIT=$4 if [ -z $HOST ]; then echo Usage $0 HOST [SNMPCOMMUNITY [WARN [CRIT]]] exit 3 fi if [ -z $PASS ]; then PASS=public fi if [ -z $WARN ]; then WARN=1000 fi if [ -z $CRIT ]; then CRIT=5000 fi Q=`$SNMP -v 2c -c $PASS $HOST NET-SNMP-EXTEND-MIB::nsExtendOutLine.\"mailq\".1 | awk '{ print $4}'` if [ $Q -ge $CRIT ]; then echo "CRITICAL: mailq is $Q" exit 2 fi if [ $Q -ge $WARN ]; then echo "WARNING: mailq is $Q" exit 1 fi

Actually, the magic lies in the request NET-SNMP-EXTEND-MIB :: nsExtendOutLine. \ "Mailq \". 1 - this is what we read, that the script from the observed server gives us.

Well, the standard description for the team

 define command{ command_name check-snmp-mailq command_line /path-to-scripts/snmp_extend_mailq.sh $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ }

This is a good alternative to check_by_ssh with one reservation. snmpd on the server by default runs from root . And check_by_ssh can work from another user, with less authority. You decide.

Now about dns. Very often, monitoring dns is reduced to the banal check_udp! 53. This is very noble, but uninformative. The server may work, but not resolve names. The server can work and be the root of your names, but the registration of your domain could be rotten. You will not see any of this from the port availability check. Therefore, a couple of scripts check DNS.

The first script allows you to monitor your domain (s) and do not overlook the need to extend them. Even if the domain is prolonged by your provider automatically, check this fact is always useful.

check-domaintime.pl

 #!/usr/bin/perl # # (C) Smithson Inc, 2013 # use HTTP::Date; my $domain = $ARGV[0]; if (!(defined($domain))) { print "Usage: $0 domain.name\n\n"; exit(-1); } $DD=`whois $domain | grep paid-till`; $DAY = 86400; $WARNING = $ARGV[1] ? $ARGV[1] : 36; $CRITICAL = $ARGV[2] ? $ARGV[2] : 10; if ($DD =~ /(\d\d\d\d\.\d\d\.\d\d)/) { my $dx = str2timestamp($1); my $dz = time(); print "Whois $domain end date: $1\n"; if ($dz > ($dx-($CRITICAL*$DAY))) { exit(2); } if ($dz > ($dx-($WARNING*$DAY))) { exit(1); } exit(0); } else { print "Error whois answer: $DD\n"; exit(-1); } sub str2timestamp { my $time = shift; $time =~ s/(\d+)\.(\d+)\.(\d+)/$1-$2-$3/; my $timenix = str2time( $time ); return $timenix; }

It is used trite

 ./check-domaintime.pl smithson.ru [warning [critical]]

The default is 36 days before the end of the registration period - warning, 10 days - critical

The second script is used to test the dns server

check_dns.pl

 #!/usr/local/bin/perl # # (C) Smithson Inc, 2015 # #use strict; use lib "/usr/local/libexec/nagios"; use utils qw($TIMEOUT %ERRORS &print_revision &support); use vars qw($PROGNAME); use Getopt::Long; use Time::gmtime; use vars qw($opt_V $opt_h $verbose $opt_w $opt_c $opt_H $volname $opt_mode $mode); $PROGNAME = `basename $0`; Getopt::Long::Configure('bundling'); GetOptions ("V" => \$opt_V, "version" => \$opt_V, "h" => \$opt_h, "help" => \$opt_h, "a=s" => \$opt_addr, "addr=s" => \$opt_addr, "H=s" => \$opt_H, "hostname=s" => \$opt_H); if ($opt_V) { print_revision($PROGNAME,''); #' exit $ERRORS{'OK'}; } if ($opt_h) { print_help(); exit $ERRORS{'OK'}; } $opt_H = shift unless ($opt_H); my $host = $1 if ($opt_H =~ m/^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|[a-zA-Z][-a-zA-Z0]+(\.[a-zA-Z][-a-zA-Z0]+)*)$/); if (!(defined($host))) { print_usage(); exit $ERRORS{'ERROR'}; }; my $addr = 0; ($opt_addr) || ($opt_addr = shift) || ($opt_addr = 'www.slavneft.ru'); $addr = $opt_addr; $code = $ERRORS{'OK'}; my $look = '/usr/local/bin/nslookup'; my $n = getdns($host, $addr); print "$n\n"; exit ($code); # ================================================================ sub getdns { my $ip = shift, $s, $ret=''; my $addr = shift; $s = "$look $addr $ip"; my $s = `$s`; my @D = split(/\n/, $s); foreach my $i (@D) { if ($i =~ /server can.t find/) { $code = $ERRORS{'WARNING'}; } if ($i =~ /no servers could be reached/) { $code = $ERRORS{'ERROR'}; } $ret = $ret.' '.$i; } return $ret; } # ================================================================ sub print_usage () { print "Usage: $PROGNAME -H <host> [-a] address\n"; } # ================================================================ sub print_help () { print_revision($PROGNAME,''); print "Copyright (c) Smithson Inc, 2015 \n"; print "\n"; print_usage(); print "\n"; print "<address> Address for conversion to ip \n\n"; support(); }; # ================================================================

This script can be used in two ways. The first one is checking via your server the availability of the addresses you need and the general operation of dns. For example:

 check-dns.pl ip-- google.com

If your server resolves google.com, then dns works on it. In case there is no access to the Internet, the rezolving will not work, but you will see this by other checks (ping the gateway of the provider, ping the same 8.8.8.8).

You can also check that the internal names you need are resolved (for example, AD becomes bad if its own dns does not recognize the names).

The second way is to check through guaranteed dns of your names, which should be accessible from the Internet.

 check-dns.pl 8.8.8.8 smithson.ru

If the answer is - your Internet names are available (with a reservation on the availability of the Internet at the moment for your monitoring system).

Netware monitoring

Yes, yes, I know, necrophilia fu, but even today I have about 200 servers in the system that have 2 (two!) Netware. Both are in distant TO, configured in one thousand eight hundred year old and since then work, work and work. One of the uptime today has 834 days. This is by the way. Therefore - monitoring.

The latest netware 6.5.8 has snmp. Honestly, I don't know, I didn't eat. From version 4.11 for netware there is a mrtgext.nlm program that allows you to monitor a bunch of server parameters. Here it is usually used to render server statistics via mrtg or rrdtool, well, it is quite suitable for monitoring via Nagios. In addition, one of these two of my NW has version 5.1 (hussars, be silent!).

mrtgext listens to tcp-port 9999, so do not forget to put it on monitoring. Since Netware is primarily a file server, we are most interested in what happens to volumes. For this there is a script:

check_nwvolsize

 #! /usr/local/bin/perl # # (C) Smithson Inc # # use strict; use lib "/usr/local/libexec/nagios"; use utils qw($TIMEOUT %ERRORS &print_revision &support); use vars qw($PROGNAME); use Getopt::Long; use vars qw($opt_V $opt_h $verbose $opt_w $opt_c $opt_H $volname $opt_prefix $prefix); $PROGNAME = `basename $0`; Getopt::Long::Configure('bundling'); GetOptions ("V" => \$opt_V, "version" => \$opt_V, "h" => \$opt_h, "help" => \$opt_h, "v=s" => \$volname, "volname" => \$volname, "w=s" => \$opt_w, "warning=s" => \$opt_w, "c=s" => \$opt_c, "critical=s" => \$opt_c, "p=s" => \$opt_prefix, "prefix=s" => \$opt_prefix, "H=s" => \$opt_H, "hostname=s" => \$opt_H); if ($opt_V) { print_revision($PROGNAME,''); #' exit $ERRORS{'OK'}; } if ($opt_h) { print_help(); exit $ERRORS{'OK'}; } $opt_H = shift unless ($opt_H); my $host = $1 if ($opt_H =~ m/^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|[a-zA-Z][-a-zA-Z0]+(\.[a-zA-Z][-a-zA-Z0]+)*)$/); if (!(defined($host))) { print_usage(); exit $ERRORS{'ERROR'}; }; ($opt_c) || ($opt_c = shift) || ($opt_c = 92); my $critical = $1 if ($opt_c =~ /([0-9]+)/); ($opt_w) || ($opt_w = shift) || ($opt_w = 80); my $warning = $1 if ($opt_w =~ /([0-9]+)/); ($volname) || ($volname = shift) || ($volname = 'SYS'); my $vname = $volname; my $ppp = $opt_prefix; my $q = "/usr/local/sbin/nwstat.pl $host V".$ppp."U$vname V".$ppp."S$vname"; #print "$host, $vname, $critical, $warning ($ppp = $opt_prefix) \n"; #print "$q \n"; my $res = `$q`; my @aar=split(/\n/, $res); my $size = $aar[1]; my $used = $aar[0]; if (($size < 1) || ($used < 0)) { exit $ERRORS{'ERROR'}; } #print "Size: $size, used: $used\n"; my $percent = ($used/$size)*100; if ($percent > 100) { exit $ERRORS{'ERROR'}; } printf "Used: %.2f\%\n", $percent; if ($percent > $critical) { exit $ERRORS{'CRITICAL'}; } # Critical! if ($percent > $warning) { exit $ERRORS{'WARNING'}; } # Warning! exit $ERRORS{'OK'}; # Okay sub print_usage () { print "Usage: $PROGNAME -H <host> [-v <volumename>] [-w <warn>] [-c <crit>] [-p <prefix>]\n"; } sub print_help () { print_revision($PROGNAME,''); print "Copyright (c) Smithson Inc, 2011\n"; print "\n"; print_usage(); print "\n"; print "<warn> = Signal strength at which a warning message will be generated.\n"; print "<crit> = Signal strength at which a critical message will be generated.\n"; print "<prefix> = Special signal for nwstat, calculated size, free and used as KBytes (value K) or Bytes (value nothing).\n\n"; support(); }

It, in turn, uses the nwstat.pl script, which was once bundled with mrtgext. Does he go now - I do not know, so I'll post it here.

nwstat.pl

 #!/usr/local/bin/perl ######################################################### # Netware Server Stat Extension to MRTG # # Client Access # # # # This is the "client" portion of the Netware Server # # stats extension for MRTG. This will open up a # # connection to the specified server and get the # # information that you specify. # # # ######################################################### # Written by James Drews (drews@engr.wisc.edu) on # # Version 1.46 # # URL: # #http://forge.novell.com/modules/xfmod/project/?mrtgext # ######################################################### # # # Feel free to contact me with any questions/comments/ # # and such. # # # ######################################################### # This program is freeware. *NO* warranty is expressed,# # implied, nor granted to you in any way. Use of this # # program is at your own risk. Your mileage may vary. # # This program was packed by weight, not by volume. # # Some settling may have occurred during shipment. # ######################################################### ######################################################### # Command Line Usage # # nwstat.pl host option1 option2 # # where host is the DNS name of the server to query # # and option1 and option2 are any combination of the # # following (case is not important): # # UTIL1 : 1 minute average CPU utilization # # UTIL5 : 5 minute average CPU utilization # # UTIL15 : 15 minute average CPU Utilization # # LICENSE : Connection License Count # # CONNECT : number currently licensed connections # # CONNMAX : Max number licensed connections used # # CONNPEAK : Peak Connections # # : (netware 3 = error (-1) ) # # : (netware 4 = number connections # # : allocated) # # NAME : Server's name # # UPTIME : Time that the server is operational # # VS<vol> : size of the volume <vol> in bytes # # VF<vol> : bytes free on <vol> # # VU<vol> : bytes used on <vol> # # VKS<vol> : size of the volume <vol> in kbytes # # VKF<vol> : kbytes free on <vol> # # VKU<vol> : kbytes used on <vol> # # VP<vol> : bytes of purgable files on <vol> # # VKP<vol> : kbytes of purgable files on <vol> # # VNP<vol> : bytes of not-yet-purgable files # # VKNP<vol> : kbytes of not-yet-purgable files # # ZERO : Returns the value 0 # # VOLUMES : Returns the list of mounted volumes # # : each volume name is on a seperate # # : line. Used by the nlmcfg.pl script # # S1 : Long Term Cache Hit Percentage # # S2 : Current number cache buffers # # S3 : Number of dirty cache buffers # # S4 : Cache LRU in seconds # # S5 : Cache LRU in minutes # # S6 : Dirty cache buffers as percent of tot # # S7 : Total cache buffers as percent of # # original # # S8 : Original number of cache buffers # # S9 : SAP object Count # # S9.x : SAP Object count for service x # # S10 : CPU Count # # S11 : IS DS Database Open? 1=yes 0=no # # S12 : Logins enabled? 1=yes 0=no # # S13 : DS.NLM Version string # # S14 : MRTGEXT.NLM Version string # # S15 : Packet receive buffer count # # S16 : Get Maximum packet receive buffer cnt # # S17 : Abended thread count (5.x only) # # S18 : Open file count # # S19 : OS Version String # # S20 : Max service processes # # S21 : Current service processes (5.x only) # # S22 : Time In Sync To the Network (0=No, # # 1 = yes) # # S23:<nlm> : Is <nlm> loaded? (0=no,1=yes) # # S24:<nlm> : Get <nlm>'s version # # S25 : Minimum Directory Cache Buffers # # S26 : Maximum Directory Cache Buffers # # S27 : Current Directory Cache Buffers # # # # Example: To get the server utilization for 5 and 15 # # minutes on the myserv.mydomain.com. # # # # nwstat.pl myserv.mydomain.com UTIL5 UTIL15 # # # # Example: To graph the disk space usage on the SYS # # volume on myserv.mydomain.com. # # # # nwstat.pl myserv.mydomain.com VFsys VUsys # ######################################################### # # # Other notes: # # The server side NLM can take ALL the options on the # # command line at once. However, MRTG is written to # # only graph two variables at a time. Should some # # ambitious person modify the program to graph more # # than two items at once, this program can easily be # # expanded to output more items. # # # # The server will stop accepting input at 1023 chars # # or when it gets the first \n character # # # # Thanks to Kevin Keyser <kevin-keyser@uiowa.edu> # # for fixing the problem of loosing the 'W' char from # # the server name # ######################################################### # Required for perl5. use Socket; ($_, $opt1, $opt2) = @ARGV; if (!$_) { print "Usage: $0 HOST OPTION1 OPTION2 \n"; print " where host is the DNS name of the server to query\n"; print " and option1 and option2 are any combination of the\n"; print " following (case is not important):\n"; print " UTIL1 : 1 minute average CPU utilization\n"; print " UTIL5 : 5 minute average CPU utilization\n"; print " UTIL15 : 15 minute average CPU Utilization\n"; print " LICENSE : Connection License count\n"; print " CONNECT : number currently licensed connections\n"; print " CONNMAX : max licensed connections used\n"; print " CONNPEAK : Peak Connections\n"; print " : (netware 3 = error (-1) )\n"; print " : (netware 4 = number connections\n"; print " : allocated)\n"; print " VF<vol> : bytes free on <vol>\n"; print " VS<vol> : size in bytes of <vol>\n"; print " VU<vol> : bytes used on <vol>\n"; print " VKF<vol> : kbytes free on <vol>\n"; print " VKS<vol> : size in kbytes of <vol>\n"; print " VKU<vol> : kbytes used on <vol>\n"; print " ZERO : returns a zero (0)\n"; print " S1 : Long Term Cache Hit Percentage\n"; print " S2 : Number of Cache Buffers\n"; print " S3 : Number of Dirty Cache Buffers\n"; print " S4 : Cache LRU in seconds\n"; print " S5 : Cahce LRU in minutes\n"; print "\n Example: To graph the disk space usage on the SYS\n"; print " volume on myserv.mydomain.com.\n\n"; die " $0 myserv.mydomain.com VFsys VUsys\n"; } if (!$opt2) { printf "No second option specified. MRTG expects two values\n"; die "Use \"ZERO\" for the second option if you only wnat one value.\n"; } $hostname = $_; # if you load the NLM with a different port # from the default, here is where you change it $port = 9999; # Open a socket and get the data ($sockaddr,$there,$response,$tries) = ("Snc4x8"); # On Win95, passing a numeric IP address to inet_aton() is slow, so # detect this case and use a simple conversion. chomp ($hostname); if ($hostname =~ /^(\d+)\.(\d+)\.(\d+)\.(\d+)(.*)/ ) { # $remote_addr = pack('C4',"$1.$2.$3.$4"); $remote_addr = "$1.$2.$3.$4"; } else { $remote_addr = (gethostbyname($hostname))[4] || die (host_not_found_error ($hostname, $?)); } # my $addr_in = 'S n a4 x8'; # $there = pack($addr_in, AF_INET, $port, $remote_addr); #      . my $iaddr = inet_aton($remote_addr); #      connect . $there = sockaddr_in($port, $iaddr); $proto = (getprotobyname ('tcp'))[2]; if (!socket(S,AF_INET,SOCK_STREAM,$proto)) { printf "-1\n-1\n\n\n"; die "$0: Fatal Error. $!\n"; } if (!connect(S,$there)) { printf "-2\n-2\n\n\n"; die "$0: Fatal Error. $!\n"; } select(S);$|=1; select(STDOUT); print S "$opt1 $opt2 uptime name\r\n"; $in = int(<S>); print "$in\n"; while($line = <S>) { print "$line"; } close(S);

Using it is easy and simple:

 define command{ command_name check_nwvolsize command_line $USER1$/check_nwvolsize -H $HOSTADDRESS$ -w $ARG2$ -c $ARG3$ -v $ARG1$ -p K }

To draw statistics:

Netware via mrtg give a template

 HtmlDir: /data/www/admin/mrtg ImageDir: /data/www/admin/mrtg/images LogDir: /data/www/admin/mrtg/logs IconDir: /mrtg/icons Language: Russian Options[_]: gauge, noarrow, nopercent, unknaszero kilo[_]: 1000 PageTop[^]: <b><<<<a href=index_nw6.html>To Index Page</b></a><br><br> <table width=100% cellspacing=0 cellpadding=5 border=1 bgcolor=#dedede align=center> <tr><td><center><H4> PageTop[$]: </td></tr></table> # ------------------- Common Parameters ------------------ Target[nw6_cpu]: `/usr/local/sbin/nwstat.pl 192.168.2.4 util5 util15` ShortLegend[nw6_cpu]: % YLegend[nw6_cpu]: CPU Util (%) LegendI[nw6_cpu]: 5 minute average CPU utilization LegendO[nw6_cpu]: 15 minute average CPU utilization MaxBytes[nw6_cpu]: 100 Title[nw6_cpu]: CPU Utilization Analysis for Server NW6 PageTop[nw6_cpu]: CPU Utilization Analysis for Server NW6 </H4></center> Target[nw6_conn]: `/usr/local/sbin/nwstat.pl 192.168.2.4 connect s18` MaxBytes[nw6_conn]: 140000 ShortLegend[nw6_conn]: YLegend[nw6_conn]: Connections & Open files LegendI[nw6_conn]: Number currently licenzed connections LegendO[nw6_conn]: Open files Title[nw6_conn]: Connection & Open Files for Server NW6 PageTop[nw6_conn]: Connection & Open Files for Server NW6 </H4></center> # s8 = Original number of cache buffers # s2 = Current number cache buffers Target[nw6_s2]: `/usr/local/sbin/nwstat.pl 192.168.2.4 s2 s8` MaxBytes[nw6_s2]: 2000000000 ShortLegend[nw6_s2]: YLegend[nw6_s2]: Cache Buffers LegendI[nw6_s2]: Number of Free Cache Buffers LegendO[nw6_s2]: Number of Total Cache Buffers Title[nw6_s2]: Number of Cache Buffers for Server NW6 PageTop[nw6_s2]: Number of Cache Buffers for Server NW6 </H4></center> # s20 = Max service processes # s21 = Current service processes Target[nw6_proc]: `/usr/local/sbin/nwstat.pl 192.168.2.4 s21 s20` MaxBytes[nw6_proc]: 2000000 ShortLegend[nw6_proc]: YLegend[nw6_proc]: Processes LegendI[nw6_proc]: Current service processes LegendO[nw6_proc]: Max service processes Title[nw6_proc]: Service processes analysis for Server NW6 PageTop[nw6_proc]: Service processes analysis for Server NW6 </H4></center> # s15 = Packet receive buffer count # s16 = Maximum packet receive buffer count Target[nw6_packet]: `/usr/local/sbin/nwstat.pl 192.168.2.4 s15 s16` MaxBytes[nw6_packet]: 2000000 ShortLegend[nw6_packet]: YLegend[nw6_packet]: Buffers LegendI[nw6_packet]: Current packet buffers LegendO[nw6_packet]: Max packet buffers Title[nw6_packet]: Packet receive buffers count for Server NW6 PageTop[nw6_packet]: Packet receive buffers count for Server NW6 </H4></center> #--------------------------------------------------------------------------------- # DISKS #--------------------------------------------------------------------------------- Target[nw6_vol-data]: `/usr/local/sbin/nwstat.pl 192.168.2.4 VKUDATA VKSDATA` MaxBytes[nw6_vol-data]: 500000000000 kilo[nw6_vol-data]: 1024 kMG[nw6_vol-data]: k,M,G,T,P ShortLegend[nw6_vol-data]: b YLegend[nw6_vol-data]: Disk space LegendI[nw6_vol-data]: Kbytes used on volume DATA LegendO[nw6_vol-data]: Size of volume DATA in Kbytes Title[nw6_vol-data]: Volume DATA statistics for server NW6 PageTop[nw6_vol-data]: Volume DATA statistics for Server NW6 </H4></center> Target[nw6_pur_vol-data]: `/usr/local/sbin/nwstat.pl 192.168.2.4 VKPDATA VKNPDATA` MaxBytes[nw6_pur_vol-data]: 1000000000000 kilo[nw6_pur_vol-data]: 1024 kMG[nw6_pur_vol-data]: k,M,G,T,P ShortLegend[nw6_pur_vol-data]: b YLegend[nw6_pur_vol-data]: Purgable files LegendI[nw6_pur_vol-data]: Kbytes of purgable files on volume DATA LegendO[nw6_pur_vol-data]: Kbytes of not-yet-purgable files on volume DATA Title[nw6_pur_vol-data]: Volume DATA purgable files statistics for server NW6 PageTop[nw6_pur_vol-data]: Volume DATA purgable files statistics for Server NW6 </H4></center> Target[nw6_vol-sys]: `/usr/local/sbin/nwstat.pl 192.168.2.4 VKUSYS VKSSYS` MaxBytes[nw6_vol-sys]: 500000000000 kilo[nw6_vol-sys]: 1024 kMG[nw6_vol-sys]: k,M,G,T,P ShortLegend[nw6_vol-sys]: b YLegend[nw6_vol-sys]: Disk space LegendI[nw6_vol-sys]: Kbytes used on volume SYS LegendO[nw6_vol-sys]: Size of volume SYS in Kbytes Title[nw6_vol-sys]: Volume SYS statistics for server NW6 PageTop[nw6_vol-sys]: Volume SYS statistics for Server NW6 </H4></center>

It is clear from it how you can monitor the processor, RAM, buffers on the Netware server via mrtgext, if anyone else needs this.

Novell monitoring (no longer, but not the essence) OES

.
OES is an Open Enterprise Server . At the core of OES is SUSE Linux, so basic monitoring is the same here as for linux, but I’ll describe how to monitor additional services.

Again, monitoring volumes will come first:

get-voldata.pl

 #!/usr/local/bin/perl # # (C) Smithson Inc # # #use strict; use lib "/usr/local/libexec/nagios"; use utils qw($TIMEOUT %ERRORS &print_revision &support); use vars qw($PROGNAME); use Getopt::Long; use Time::gmtime; use vars qw($opt_V $opt_h $verbose $opt_w $opt_c $opt_H $volname $opt_mode $mode); $PROGNAME = `basename $0`; Getopt::Long::Configure('bundling'); GetOptions ("V" => \$opt_V, "version" => \$opt_V, "h" => \$opt_h, "help" => \$opt_h, "v=s" => \$volname, "volname" => \$volname, "w=s" => \$opt_w, "warning=s" => \$opt_w, "c=s" => \$opt_c, "critical=s" => \$opt_c, "m=s" => \$opt_mode, "mode=s" => \$opt_mode, "H=s" => \$opt_H, "hostname=s" => \$opt_H); if ($opt_V) { print_revision($PROGNAME,''); #' exit $ERRORS{'OK'}; } if ($opt_h) { print_help(); exit $ERRORS{'OK'}; } $opt_H = shift unless ($opt_H); my $host = $1 if ($opt_H =~ m/^([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+|[a-zA-Z][-a-zA-Z0]+(\.[a-zA-Z][-a-zA-Z0]+)*)$/); if (!(defined($host))) { print_usage(); exit $ERRORS{'ERROR'}; }; ($opt_c) || ($opt_c = shift) || ($opt_c = 95); my $critical = $1 if ($opt_c =~ /([0-9]+)/); ($opt_w) || ($opt_w = shift) || ($opt_w = 90); my $warning = $1 if ($opt_w =~ /([0-9]+)/); ($volname) || ($volname = shift) || ($volname = 'SYS'); my $vname = $volname; my $mode = 0; ($opt_mode) || ($opt_mode = shift) || ($opt_mode = 'nagios'); if ($opt_mode =~ /mrtg/i) { $mode = 1; } my $servname = $host; my $uptime = ''; my $community = 'ISOlanf'; my $SNMP = "/usr/local/bin/snmpget -v 2c -c $community"; my $snmpwalk = "/usr/local/bin/snmpwalk -v 2c -c $community"; my $OID_NAME = 'SNMPv2-MIB::sysName.0'; my $OID_UPTIME = '.1.3.6.1.2.1.1.3.0'; my $OID_SIZE = 'HOST-RESOURCES-MIB::hrStorageSize'; my $OID_USED = 'HOST-RESOURCES-MIB::hrStorageUsed'; my $OID_DESC = 'HOST-RESOURCES-MIB::hrStorageDescr'; my $pool = '.pools/'; my $n=getvol($host, $vname); if ($n == 0) { exit $ERRORS{'UNKNOWN'}; } my $percent = getInfo($host, $n); if ($mode == 0) { #nagios if ($percent > 100) { exit $ERRORS{'ERROR'}; } printf "Used: %.2f\%\n", $percent; if (($percent > $critical) || ($percent == 0)) { exit $ERRORS{'CRITICAL'}; } # Critical! if ($percent > $warning) { exit $ERRORS{'WARNING'}; } # Warning! exit $ERRORS{'OK'}; # Okay }; if ($mode == 1) { # mrtg print $percent; exit (0); } # ================================================================ sub getvol { my $ip = shift; my $v = shift; my $ret = 0; my $seek = $pool.$v; my @n = getDesc($ip); foreach $l (@n) { if ($l =~ /$OID_DESC\.(\d+).+$seek/i) { return $1; } }; return $ret; } # ================================================================ sub getInfo { my $ip = shift; my $n = shift; my $ret = ''; $servname = getSNMPdata($ip, $OID_NAME); my $size = getSNMPdata($ip, $OID_SIZE.'.'.$n); my $used = getSNMPdata($ip, $OID_USED.'.'.$n); if ($size < 1) { return ''; } $ret = ($used/$size)*100; if ($mode == 1) { $used = $used*4; $size = $size*4; $ret = "$used\n$size\n$uptime\n$servname\n"; } return $ret; } # ================================================================ sub getSNMPdata { my $ip = shift; my $snmpquery = shift; my $q, $dat; $q = "$SNMP $ip $snmpquery | awk '{print \$4}'"; $dat = `$q`; chomp $dat; if (length($dat) < 1) { return 'U'; } return $dat; } # ================================================================ sub getSNMPstring { my $ip = shift; my $snmpquery = shift; my $q, $dat; $q = "$SNMP $ip $snmpquery"; $dat = `$q`; chomp $dat; if (length($dat) < 1) { return ''; } if ($dat =~ /= STRING:\ (.+)/) { $dat = $1 }; return $dat; } # ================================================================ sub getDesc { my $ip = shift; my @ret = ''; my $q = "$snmpwalk $ip $OID_DESC"; @ret = `$q`; return @ret; }; # ================================================================ sub print_usage () { print "Usage: $PROGNAME -H <host> [-v <volumename>] [-w <warn>] [-c <crit>] [-m <mode>]\n"; } # ================================================================ sub print_help () { print_revision($PROGNAME,''); print "Copyright (c) Smithson Inc, 2013 \n"; print "\n"; print_usage(); print "\n"; print "<warn> = Signal strength at which a warning message will be generated.\n"; print "<crit> = Signal strength at which a critical message will be generated.\n"; print "<mode> = Used mode - nagios -> return percents of volume used, mrtg -> return used and max size of volume for mrtg\nBy default use 'nagios'\n\n"; support(); }; # ================================================================

I have this universal script for nagios and mrtg, so with the -m mrtg parameter it gives tsiferki size (busy and total) of the volume, as mrtg waits for them, and without it or with the -m parameter nagios gives the answer typical of the nagios plugin.

As parameters, it takes the name or server ip and volume name. Volumes are searched for in the mounted list, mount points .pools / NAME (technical for OES) are ignored. , 0 CRITICAL ( «» — fc iscsi — ).

- SYS ( Netware, OES-, ).

nagios

 define command{ command_name check_oesvolsize command_line $USER1$/get-voldata.pl -H $HOSTADDRESS$ -v $ARG1$ -w $ARG2$ -c $ARG3$ }

mrtg :

  `/data/rrdtool/oes/get-voldata.pl -H ip- -v  -m mrtg`

. , iPrint, « ». :

check_iprinters.pl

 #!/usr/bin/perl -w # # @File check_iprinters.pl # @Author dbenjamin # @Created Aug 4, 2015 2:59:02 PM # Licence : GPL - http://www.gnu.org/licenses/lgpl-3.0.html # use strict; use LWP::Simple; use HTML::TreeBuilder; use Getopt::Long; my $nagios_plugins_utils = # "/usr/lib/nagios/plugins/utils.pm"; #used to test for the library "/usr/local/libexec/nagios/utils.pm"; #used to test for the library die "\nLooking for nagios utils.pm at $nagios_plugins_utils" unless ( -e $nagios_plugins_utils ); use lib "/usr/local/libexec/nagios"; #use just the path here use utils qw(%ERRORS $TIMEOUT); my ( $opt_h, $opt_I, $opt_Q, $opt_v, $opt_P, $opt_V ); my ( $printer_state, $accepting_jobs, $jobs_scheduled ); my $i = 0; #iteration holder $opt_P = '631'; alarm($TIMEOUT); sub print_version { print "File: check_iprinters.pl\n"; print "Author: dbenjamin\n"; print "Created: Aug 4, 2015\n"; print "Release: 0.0.1\n"; print "Tested against Novell iPrint Server 6.7.0.20150629-0.6.6, "; print "running on SLES 11, SP3 with OES 11, SP2.\n"; exit $ERRORS{'UNKNOWN'}; } sub print_exit { print "Usage: $0 -I <host address> -Q <queue name> [-P <port> default=631] [-v enable verbose] [--version]\n\n"; exit $ERRORS{'UNKNOWN'}; } sub print_verbose { print "Printer State: $printer_state\n"; print "Printer is Accepting Jobs: $accepting_jobs\n"; print "Jobs Scheduled: $jobs_scheduled\n"; } GetOptions( "version" => \$opt_V, "h" => \$opt_h, "help" => \$opt_h, "I=s" => \$opt_I, "Q=s" => \$opt_Q, "P:s" => \$opt_P, "v" => \$opt_v, ) or print_exit; if ($opt_V) { print_version; } if ($opt_h) { print_exit; } if ( !$opt_I ) { print "No Host address specified\n"; print_exit; } if ( !$opt_Q ) { print "No Queue name specified\n"; print_exit; } if ( ( $opt_I eq '' ) or ( $opt_Q eq '' ) ) { print_exit; } my $tree = new HTML::TreeBuilder->new; my $url = "http://$opt_I:$opt_P/ipp/$opt_Q"; my $result = get($url); die "\nCouldn't get $url" unless defined $result; $tree->parse($result); my @tbrows = $tree->look_down( '_tag', 'TR' ); die "No response, check the URL for errors: $url\n\n" unless @tbrows; foreach $i ( 2 .. 4 ) { my @td = $tbrows[$i]->look_down( '_tag', 'TD' ); if ( $i == 2 ) { $printer_state = $td[1]->as_text; } if ( $i == 3 ) { $accepting_jobs = $td[1]->as_text; } if ( $i == 4 ) { $jobs_scheduled = $td[1]->as_text; } } if ( ( $printer_state =~ /error/i ) & ( $printer_state =~ /empty/i ) ) { if ($opt_v) { print_verbose } else { print "$printer_state\n\n"; } exit $ERRORS{'WARNING'}; } else { if ( $printer_state =~ /error/i ) { if ($opt_v) { print_verbose } else { print "$printer_state\n\n"; } exit $ERRORS{'CRITICAL'}; } } if ( $opt_v ) { print "jobs=$jobs_scheduled\n"; } exit $ERRORS{'OK'};

, . :

 define command{ command_name check_iprinter_q command_line $USER1$/check_iprinters.pl -I $HOSTADDRESS$ -Q $ARG1$ $ARG2$ } define service{ use generic-service name iprint_q host_name iprint contacts printer-admins check_period 24x7 max_check_attempts 3 normal_check_interval 10 retry_check_interval 5 notifications_enabled 0 notification_options c,r notification_interval 60 notification_period workhours icon_image printer.gif flap_detection_enabled 0 register 0 } define service{ use iprint_q service_description AHO-2430 check_command check_iprinter_q!AHO-2430!-v }

( ), , , , 40+ ( 40 — ) . .

, . . , — , Synology, vmware , , iPrint.

Source: https://habr.com/ru/post/307832/

All Articles

Nagios - monitoring system and some homemade plugins

Monitoring windows-servers

Monitoring Unix Servers

Netware monitoring

Novell monitoring (no longer, but not the essence) OES

More articles: