I was woken by SMS at three in the morning.
My site fell for three minutes and got up.
And I could not fall asleep.
Life story
As many know, HostTracker is
a site health monitoring system . One of the main functions, promptly inform the user about the problems. What matters is the promptness of notifications plus an acceptable level of “detail.” If you send alerts for each "sneeze", then the person in this stream will not find important information.
We have provided several mechanisms that will help get the right alerts to the right people:
- Separation of alerts on the degree of criticality into several groups;
- Do not send notifications for short-term interruptions;
- Promptly notify the administrator of the problem;
- In the event of a long-term failure, notify management;
- First, use free email, gtalk notifications, and then paid notifications - SMS or phone call;
- At the contact level, set the working time when this contact should accept alerts.
Types of alerts
There are three types of notifications:
- The site "fell";
- The site is still "lying";
- Site "rose."
Since fell and rose, everything is clear. Notifications "the site is still lying" is sent with every unsuccessful check, but only for confirmed crashes. We wrote about the failure confirmation algorithm in the article
“Elimination of false positives” .
For each pair of site-contact, you can enable or disable the appropriate type of alerts. Setting, perhaps, both in the properties of the contact, and in the general “matrix” on the “Subscribe to Alerts” tab:
')

Escalation and alert level
Suppose two people are responsible for the site:
Let's try to implement the following script:
- In the event of a fall, we want to immediately send a message to the administrator by email;
- If the site does not rise within 15 minutes, we will send an SMS to the administrator;
- If the site is more than an hour, then send SMS to the head.
We add contacts for users. When adding, pay attention to the "Alert Delay" field:

We have three contacts with the following delays:
- Administrator (email) - without delay;
- Administrator (SMS) - 15 minutes delay;
- Head (SMS) - 1 hour delay.
In this configuration, the administrator will receive on the email all notifications of failures, but SMS messages will only come if the site is “lying” for more than 15 minutes. The manager will receive an SMS only for major failures lasting more than an hour.
Setting a work schedule for a contact
Suppose that our one administrator can not cope, and we took the second administrator. The first half of the week is working the first, the second half of the week is working the second. Accordingly, you need to send notifications to the admin who is “on shift”.
To set up this scenario, use the field “Set contact hours” in the contact parameters:

In this case, the first admin will receive SMS alerts from Monday to Thursday inclusive.
In addition, you can post notices to different employees for the time of day, for example, to make a night and day admin.
findings
Using fairly simple mechanisms, we can cover most user scripts by fine-tuning notifications.
If you have questions, comments, ideas, please contact us in the comments.
Happy New Year! Good uptime to you and your sites!