📜 ⬆️ ⬇️

Malfunctions of the site. How to effectively organize the support of web resources by third-party services

How can I determine from a third party whether my sites and servers are working? Is there a chance of error? Who should know about the problem and when to take action on time? I will try to answer all these questions by examining in detail the function of instant alerts about crashes of the HostTracker site monitoring service , as well as possible scenarios for escalating alerts and assigning roles.



So, due to certain circumstances (sadly, but usually unpleasant), you decided that it would be nice if someone other than you and your team were watching the site. But questions arise. Some will have to decide for themselves: are we ready to wake up at night for this site, how vigorous will be the enthusiasm of colleagues to respond to nighttime SMS messages, how much this site may not work if something happens, and, of course, who is to blame. With some other questions we will try to help you.

Is it safe?


Using third-party monitoring services, it is almost impossible to miss the problem. Except if the cases of caching the site somewhere on the way, but then in this case, his clients will see, right? Although, if a little to understand the additional settings, then here you can find ways for reliable and unambiguous verification.
')
An important parameter here is the monitoring interval. Checking the site every half hour, you need to be prepared for the fact that you will really learn about the problem only in half an hour.
Well, if on the contrary: there are no problems, but they will wake me up? Or disturb the dream of your favorite chef?

I do not want to worry in vain


It is a logical requirement. First, the verification algorithm provides for cross-checking with multiple servers . Secondly, if short-term failures do occur, which are not at all failures from the point of view of their own significance, then there is an opportunity to delay the notification until clarification of the circumstances:



This means that after 3 minutes the site will be checked again, and if the problem is not solved by itself, then they will announce the alarm. Why can this happen? Network lag, reboot of network or server hardware, technical work on the server, peak load on the server, or just suddenly a little ping that has grown. You never know what. 100% SLA is not yet guaranteed by any hosting. Thus, short-term failures are filtered out.

What else is important and interesting - this delay can be set individually for each contact. For example, a completely working scheme:


That is, it is possible to prudently adjust everything so that the motivating kicks and Valuable Guidance begin to come in exactly at the moment when you really cannot figure out without them.

Thoroughly sleeping in bed, you help society


Yes, there are companies and people who value the personal time of employees. And this is very commendable. For such cases, it is possible to customize the work schedule:



This is very convenient if the “night admin” position is provided (or not even the administrator — just a non IT specialist can also reboot the server) or, for example, there are offices in different time zones and you can divide the areas of responsibility by time.

Wake up at any cost


For especially critical systems, a re-alert function is provided. And it will be repeated as long as the site / server / service does not work, or until someone enters the account and changes the settings. There is also the possibility of repeated voice calls. That is, it is not SMS, which will be cropping only once, but annoying dialing until someone picks up the phone.



And if you still miss something?


You can always choose several methods of notification . And set it up so that any sneeze comes to the post office, and when something really important is done with more operational methods.
In addition, everything is available in the logs:



Similar scenarios are widely used by our clients and are finished according to the wishes. Therefore, as always, we welcome all comments and suggestions.

Source: https://habr.com/ru/post/344172/


All Articles