What problem did we encounter on shared hosting?

I want to share the experience of diagnosing and solving one problem, which appeared quite suddenly when using virtual hosting with PHP, and, in principle, is unlikely to happen on another type of hosting.

It all started when, one day, one site stopped opening. It turned out that because of the server load, the hoster decided to transfer the account to another server. At the same time, there was no notification either prior to the transfer, or at the time when the transfer began. This, of course, was extremely ugly on the part of the hoster, but that's not the point. After the transfer, strange things began to happen. The following picture began to be observed for several days: when I tried to open any page from the site, it could either open instantly (as usual), or could not open at all. The chance of success was about 50%. I wrote about this to the hoster and looked at the CPU usage, it turned out to be phenomenal. On average, about 500% of the allocated power was output per day, and in some hours the load was above 1000%. Before the transfer, the average daily load was around 50-60%.

Since the load changed from an unplanned transfer, I expected two answers from the host: Either something was wrong with the configuration on the new server (which, judging by the numbering, was really new and could not be configured correctly), or they mixed up the tariff plans , and a smaller value was taken as 100%.

But the answer was completely different: we disable your account due to its unreasonable load on the server and we can offer to switch to a dedicated server. So, categorically, without trying to analyze the cause of the loads and not paying attention to the obvious connection with the transfer. It’s good that a screenshot was provided before that, it was clearly seen that the processes of our account were hanging in the top and, I must say, very well in debt.
')
[there was a picture]

Just looking at this screenshot, I had to explain to the support service by phone, that if the scripts run for such a long time, then the problem is clearly not solved by moving to more expensive solutions. We were given some time to figure out and fix the problem.

Nex hastily wrote the logger of all requests, writing to the REQUEST_URI database, the entry point, the start time, and if the script was successfully executed, the run time. After analyzing the logs, it was finally found out that the execution time does not depend on anything and is a random variable, and lies within the set 0.1-1.0 s, or 40-120 s, and rarely within 1.0-40 with.

In the end, the trial reached a single line with the session_start () function, commenting on which completely eliminated the problem. It remains only to figure out why. First of all, there was a suspicion that CMS uses session_set_save_handler, and crookedly reassigns functions. But no trace of this function was found. Then we began to analyze the established environment variables for the sessions. The folder for storing sessions (session.save_path) was / tmp. In principle, already at this stage it was clear that this was most likely the case, and the scripts are stupid when PHP tries to clean the folder from expired sessions. But why cleaning was started so often with gc_divisor equal to 100, it was not clear. However, the answer was found very quickly. It turned out that gc_divisor works in tandem with gc_probability, which is equal to 1 by default, but in CMS configs it was set to 40, which meant that every 4 launches out of 10 were cleaned. This explained the ~ 50% chance of normal script execution. Well, linking it all to moving to another server was really quite simple: on the new server there was a pretty cluttered daddy / tmp with a lot more files.

All problems were solved by reassigning save_path to its own temporary folder and installing gc_divisor to the default one. Developers, try to always check these values on your projects.

But, in fact, one more thing is interesting. Suppose the number of scripts that use sessions is 50%. Suppose also that 50% of them use default settings. It turns out that 400 script requests will be executed in 0.5 * 400 = 200 seconds, and then one script will be launched, which will cause the default cleaner of old sessions, which, if the folder is sufficiently cluttered, will run for 120 seconds. It turns out that for a substantial part of the time the server is busy with useless work. So, admins, be as attentive.

Ps. The hoster was immediately informed about the elimination of problems, and a day later, when the load had already leveled off, we received a wonderful letter in which the hoster asked us to deal with an exorbitant load, and threatened to disable the account. Well, when there are several managers and administrators in the office. But when everyone does not know what the other is doing, of course, it’s bad.

PPs. Hoster not consciously mention.

Source: https://habr.com/ru/post/51995/

All Articles

What problem did we encounter on shared hosting?

More articles: