Perl MySQL process monitoring script

Hello.

For more than five years I have been working as a system administrator in a hosting company, serving over a hundred servers with freebsd and centos. During this time, a lot of samopisny scripts have accumulated that facilitate my life. I want to share these scripts with the community, and it never hurts to listen to healthy criticism.

Prehistory

Many years ago, a friend of mine, having surplus money in his wallet, bought server hardware in the form of a single-unit Intel server, placed it in the cheapest data center (as it is now fashionable to call it - on a kolokheyshene), and started posting his site sites there first, then began distributing hosting to friends, friends began to post their pages, then pulled up the sites of their friends and employers. Since a comrade poorly understood the installation and configuration of LAMP, without further ado, he set up Cpanel WHM, I also drew as a free labor force adminit all this economy, which is called “for food”, I also needed hosting.
')
Over the course of several years, many servers have taken root on the server, and the workload with which I struggled with varying success grew. Periodically, problems began to arise with mysql. Some users created slow queries that blocked subsequent ones, others resulted in multi-storey requests with a bunch of JOINs that, by virtue of a bug that was still not fixed, hung in statistics, while reaping processor resources. In the end, mysql overeat processes and stopped responding. There was a need for some kind of monitoring script that would look at the list of mysql processes, and sounded the alarm in case of an emergency situation.

At first I wrote a script in bash. Then, when by the nature of the main work I had to get acquainted with the pearl - I rewrote it in Perl.

As practice shows, during normal operation, mysql servers simultaneously work out less than five “slow” requests, even for heavily loaded servers, and if there are any, it is a reason to study these requests. Of course, you need to analyze the log of slow queries periodically, but this is a topic for the next article, about a script that analyzes and even automatically builds simple indexes.

The logic of the monitoring script is simple. We assume that the mysql server is under threat if at the time of the scan more than ten (for example) “slow” requests are executed simultaneously - with a duration of more than one second. Let's call this state "critical." If the condition is critical, then it is necessary to sound the alarm.

Further practice has shown that it would be nice to sound the alarm if mysql is in a precritical state for some time. That is, 10 processes at the same time are not yet, but with each minute the number of long-running queries is growing. For the precritical state, we take the figure of, say, 5.

Run the script once a minute. We look at the list of processes, we consider everything that is not in the Sleep status and takes longer than one second. If the number is greater than 10, then send a letter to the administrator along with a list of processes. Save the resulting number to a file. We read the values from this file the last 5 values in the last 5 minutes, and if there were 5 precritical states during this time interval, then we send a letter to the admin.

Later, a block was inserted into the script, nailing the multi-storey queries that hung in the statistics state.

Actually, the script.

#!/usr/bin/perl #use strict; use DBI; use DBD::mysql; use POSIX; ($sysname, $hostname, $release, $version, $machine) = POSIX::uname(); my $slowtime=1; #       my $warnlevel=5; #       my $warncounter=5; #  $warncounter     my $alarmcounter=10; #     >= $alarmcounter     my $socket='/tmp/mysql56.sock'; #    my $email="admin\@myemail.net"; #   my $wrkdir='/tmp/'; my $procfile=$wrkdir.'alarm.proclist'; #      my $datfile=$wrkdir.'alarm.dat'; #        my $pidfile=$wrkdir.'alarm.pid'; # pid .  mysql         if (-e "$pidfile") { printf("pid file found. Exit.\n"); exit(255); } open (PIDFILE,">$pidfile") || die "cant create $pidfile\n"; print PIDFILE "$$\n"; close PIDFILE; open (PROCFILE,">$procfile") || die "cant create $procfile\n"; my ($proc, $dbh, $sth, $totalcounter, $slowcounter, $sleepcounter, $user, $time, $state, $command, $info, $i); until ($dbh = DBI->connect("DBI:mysql:mysql_socket=$socket", "user", "password")){ unlink($pidfile); die("Can't connect: $DBI::errstr\n"); } $sth = $dbh->prepare("SHOW FULL PROCESSLIST"); $sth->execute; my @proclist=(); $totalcounter=$slowcounter=$sleepcounter=0; while (my $row = $sth->fetchrow_hashref()) { $user=$row->{'User'}; $time=$row->{'Time'}; $state=$row->{'State'}; $command=$row->{'Command'}; $info=$row->{'Info'}; $totalcounter++; next if ($user =~ m/root/); if ($command =~ m/(Sleep|Delayed|Binlog)/){ $sleepcounter++; next; }; ###      statistics if ($state =~ m/statistics/ && $time > 5){ $statinfo="$user: killed $mid: $dbuser | $db | $time | $state | $command | $info\n\n"; $sth2 = $dbh->prepare("kill $mid"); $sth2->execute; $sth2->finish; open (MAIL,"|/usr/sbin/sendmail -F$hostname $email"); print MAIL "To:$email\nSubject:".$subj."Hanged query in the statistics state: $hostname, user $user \n\n"; print MAIL $statinfo; close (MAIL); }; ### if ($time>$slowtime) { $slowcounter++; } $info =~ s/[\r\n\t]+/ /g; push (@proclist,sprintf("%-24s | %4d | %s | %s | %s \n", $user, $time, $state, $command, $info)); printf PROCFILE ("%-24s | %4d | %s | %s | %s \n", $user, $time, $state, $command, $info); } $sth->finish; close PROCFILE; #print "--- $slowcounter slow queries from total $totalcounter ($sleepcounter are sleep) ---- \n"; my @data=(); ### read slowcounter timings from dat file open (DATFILE,"<$datfile"); while(<DATFILE>){ my($line) = $_; chomp($line); push (@data,$line); } close(DATFILE); ### if dat file is smaller than warnlevel then fill timings by zeros if (scalar(@data)<$warnlevel) { for $i ( 0 .. $warnlevel-scalar(@data) ) { push (@data,0); } } ### shift timings with last slowcounter push (@data,$slowcounter); shift(@data); ### dumping slowcounter timings to dat file open (DATFILE,"+>$datfile") || die "cant create $datfile\n"; foreach (@data) { print DATFILE "$_\n"; } close(DATFILE); ### get number of bad states for last minutes my $cnt=0; foreach (@data) { if($_ >= $warnlevel) { $cnt++; } } my $subj=" "; if ($slowcounter>=$alarmcounter) { # very critical state $subj=" VERY "; } my $warnmessage="Critical state of $hostname! There was a $warncounter checks with at least $warnlevel long queries!\n"; if ($slowcounter>=$alarmcounter) { # very critical state $warnmessage=$warnmessage."--- !!! Last check shows $slowcounter long queries!\n"; } if (($cnt >= $warncounter) || $slowcounter>=$alarmcounter){ open (MAIL,"|/usr/sbin/sendmail -F$hostname $email"); print MAIL "To:$email\nSubject:".$subj."Critical state of $hostname\n\n"; print MAIL $warnmessage; print MAIL "---------------------------------------------------------------------------------------------------\n"; print MAIL "--- $slowcounter slow queries from total $totalcounter ($sleepcounter are sleep) \n"; print MAIL "---------------------------------------------------------------------------------------------------\n"; foreach (@proclist) { print MAIL "$_"; } close (MAIL); } unlink($pidfile);

I apologize for the comments in Nizhny Novgorod English.

Later, the script was introduced to the combat mysql servers in the hosting company where I work, and helped to prevent the Mysql server from being serviced many times, and simply reported that a user was wasting resources.

The script works:

- when spam bots attack forgotten god forums. Under load, performance drops, forum tables begin to crash, requests are accumulating in the queue in the status "Locked". From the script comes a very distinctive and illustrative list of processes;

- when the user sites are attacked by benchmark type in blind SQL injections;

- when mysql stupidly hangs, and with it under load it happens sporadically (one process runs indefinitely, all the others just hang without any status, and accumulate until the connection limit is selected) - the script works faster than the monitoring system polling the mysql port ;

- when users have impressive tables in terms of data volume and queries that are not optimized so much that one query is executed for a few seconds, or even minutes. The remaining requests to the table are accumulating and waiting for the queue in the Locked status. The letter immediately shows a non-optimized query, you can quickly see the explain query and build an index if necessary. If innodb is used, then on intensive slow queries the monitoring script also works, because their bundle is hanging in the status of “Sending data” or “Copying to tmp table”. Such requests in large quantities are very dangerous, since they greatly reduce the performance of the server as a whole;

- when multi-story JOIN requests are hanging. The script nails them automatically, but sometimes they are not killed - the reason to restart mysql;

- several times the script caught mysql freezes on specific requests, further study of which eventually resulted in an update, after detecting a bug with similar requests on the bug tracker.

Occasionally, false positives come, of course, if someone repairs, optimizes, or dumps a large table.

I would be glad if this script is useful to someone.

Source: https://habr.com/ru/post/145573/

All Articles

Perl MySQL process monitoring script

More articles: