Survival in moments of critical load

In the life of each visited resource there are moments when the equipment does not cope with the current load. The causes can be very diverse, and they can not always be radically eradicated at a reasonable time. In such cases, developers are faced with the task of reducing the workload with minimal inconvenience to visitors.

I do not pretend to the genius of my decision, but I hope it will help someone.

In my case, the problem is that from time to time the load on the database rises sharply. As usually happens in such cases, an avalanche occurred - requests start to slow down, users get nervous and reload, Kronov scripts slow down and as a result the server “hangs”. In addition, search robots add problems - they tend to pull the deep pages from the site, which normally do not live in the cache. As a result, pages for them are generated and the cache is filled with unnecessary data. With the second part of the problem, you can cope quite easily - just not to cache calls to old pages, or calls from robots. But it is possible to go further and take the load off the robots at least for a while when the server is overloaded.
')
I faced two tasks:

1. How to determine the moment of occurrence of increased load
2. What can be done at these moments to make life easier?

To solve the first task, I used the famous zabbix miracle tool, since it has long been actively and successfully used and allows you to monitor several servers and also execute commands on them when the necessary conditions occur. For my own case, I chose load average on the database server. And I need to react on another server where nginx lives.

I created a trigger with the condition system.cpu.load [, avg1] .last (0)> 3.5, hung on it executing a remote command

HOST: /path/lowerage.sh {STATUS} {ITEM.LASTVALUE} {TIME}

And on the server posted a simple script:

if [ "${1}" != "ON" ] ; then
/bin/unlink /tmp/cpu_load_high
else
/usr/bin/touch /tmp/cpu_load_high
fi
echo "lowerage ${1} ${2} ${3}" |/usr/bin/mail -s lowerage tmp@tmpmail.ru

As a result, at the moments of increasing the load, a file was created, from which you can already dance further.

Smoothly proceed to the actual action.

First of all, I limited the launch of Kronov scripts at a critical time. The solution is trivial, the condition is simply added to the cron. Example:

/ bin / test! -r / tmp / cpu_load_high && / usr / bin / fetch -o - site / cron.php

Then I began to gently press the search robots. I decided to cut them off at the nginx level at a critical time. I had to be smart with the latter, the fact is that nginx does not understand the nested if, but I need at least two conditions. Rescued feint ears. A piece of the configuration file that implements the above, cited below:

if (-f /tmp/cpu_load_high) {
set $troubleflag T;
}
if ($http_user_agent ~ (?:Yandex|Google|Yahoo|Rambler|msnbot) ) {
set $oblomflag Y$troubleflag;
}
if ($oblomflag = YT) {
return 444;
}

By the way, I'm not sure that the option with return 444 is optimal. In this case, nginx just breaks the connection. I think that robots should not be offended by such behavior.

That's actually the whole short. For the future, it is still possible at critical times not to handle any heavy requests - to transfer the user to an apology page.

Thanks for attention.

UPD: in the comments suggested link googlewebmastercentral.blogspot.com/2006/08/all-about-googlebot.html where Google recommends using the code 503.

Source: https://habr.com/ru/post/54856/

All Articles

Survival in moments of critical load

More articles: