📜 ⬆️ ⬇️

A few tips to automate data center monitoring. Part 1



Monitoring the infrastructure of the data center is not an easy task. Automation is often used to simplify it. Well, what is great is to get all the monitoring system notifications on your monitor. Somehow we have already written that the automation of everything and everyone is not bad. This is a rather complicated, but solvable problem. Why should it be solved at all by automating the discovery of new devices, connections, software, creating scenarios for the system to respond to the emerging triggers?

This is because the person is lazy, automation in many cases will work better. But there are problems here. It would seem that can happen after the introduction of such a system? It seems that the main problems are solved, any problem will not go unnoticed. But in fact, some important issues remain, often, unresolved. Moreover, they are very common. We are talking about two such problems, and they will be discussed later.

Problem 1: error notification is not all


Strategy can be called the search for broccoli in stores and the buying process. Tactics in this case can be called the ability to persuade children to eat cooked.
Thomas LaRock, SolarWinds
')
Before delving into automation, be it the automatic detection of a problem, the sending of a report or an action scenario in case of an unforeseen situation, it is necessary to take measures with respect to one critical thing. This is the so-called DPR cycle, which stands for Detection, Prevention, Response. In other words, we are talking about the procedure for detecting a problem, preventing its occurrence and responding to data center employees in the event of a problem.

Now we’ll dwell on errors and messages about their occurrences. Say, the support received such an automatic warning system message, great. Now we need to understand why this error occurred, and also to find a way to prevent its future recurrence.

In the process of creating an automatic error notification service, you must also ensure that this is just the beginning. After all, you need to do more and hard work to analyze the situation, in order to find the cause of an undesirable situation. After that, you need to create additional test modules to identify the situation that has already happened. Maybe we are sure that it will not appear anymore, but anything can happen.

The automatic reaction to the notification of the warning system allows you to relax a bit, because automation is responsible for everything. But engineers must still understand why the problem arose. Automation is often incapable of this.



Problem 2: Deploying Monitoring Automation System


The point is that before introducing an automation system, you must have a plan for what such a system should be able to do. It must be carefully considered so that later there will be no problems. Well, in the plan you need to provide the following:


If you use these tips, you can see the shortcomings of the automation system, and figure out how to correct them even before a major failure occurs. In order to use truly relevant tools in the implementation of the automation system, it is worth constantly discussing the next stages of work with the team. What do experts complain the most about? This is what needs to be addressed first.

If everything works out, then you can save yourself and the team from constantly arising and repeating situations, which have to waste your time, which, of course, is always not enough. What is stated in the material is only a small part of the work on the implementation of the data center automation monitoring system. The part has already been shown earlier, the rest we plan to publish in the near future.

Source: https://habr.com/ru/post/317632/


All Articles