📜 ⬆️ ⬇️

How to monitor the work of the business process and not be distracted by nonsense


A source


According to my estimates, there may be hundreds or other monitoring systems in the world (if you have a more accurate reasoned number, I invite you in the comments). These are cloud systems, on-premise, commercial, free, for the network, infrastructure, and so on and on all fronts. Among them are those that support the creation of service-resource models. These are such tree-like things, to the nodes of which the elements of the business system are attached: web servers, databases, application servers, switches, and many other scary words. Each child element affects the parent. For example, if a server’s memory usage threshold is exceeded on some server, an event is generated (say, Critical severity is red), which affects objects higher in the service structure.


Many organizations tie the availability of a business system to the availability of a business process. If I considered the SLA, it would have turned out that the availability of the IT system fell to zero (at the time when the memory threshold was exceeded on some server there), and the business process stopped. But this is not so!? At the bottom of the tree can be a cluster or, in general, memory clogging to the eyeballs - the normal operation of the system. In short, the task sounds like this: how to calculate the availability of a business process correctly and not look back at non-critical events from components of business systems? Her and analyze.


For communication between business and IT, it is necessary to somehow assess the availability of services and business processes. There is a very simple way: an incident flew from a business - start counting down the inaccessibility. Finished work - stop the counter. And this is the correct method of calculation. But I want more. A little nafantazuyu:


Imagine a mobile operator and content selling platform. User - alpha male, stumbled upon SMS with the balance from the operator. At the end of the message, he saw an offer to use the “Dating” service for only 99.99 rubles a day. In a fleeting testosterone rush, dialed a short number - activated the service. The money from the account, of course, immediately written off. It takes half an hour, an hour, and the proposals to meet you all do not go. The rush ends, and the scale of the losses is not great, and the user scores on it. Now he realized that using such services leads to the loss of money. The operator loses revenue.


The story turned out to be fantastic, but the very concept of the situation is very realistic. In order to reduce such losses and give the notorious pro-activity for IT, it is desirable to see the availability / performance of business processes not only from the business side, but also from some other side.


It is not always interesting for the user to inform the provider about the inoperability of the service if he uses it for the first time.

The first idea of ​​monitoring a business process is to monitor related business systems and assess the impact of events on them. Indeed, in the general case, each business process depends on several business systems. If the process is long, then one part of the systems can affect its part, another part - another set of systems. Thus, the spectrum of possible states of the process expands to the situation when the input is working and the output is dead. For example, a bank may file applications for a loan, but cannot issue it. And how in such a situation to determine the status of the process? Does it work or not?


The second idea is more complicated. We did not immediately come to understand how to separate two entities from each other: the process and the system. Tried to add coefficients of influence, adjust the weight of the process connections with the systems and a few more tricks. As a result, we were convinced that to assess the status of a business process, it is not necessary at all to take into account the load of some kind of processor, but a completely different set of metrics is needed.


The real picture of the business process is provided only by metrics that characterize the success and availability of the stages of the business process. The result is two isolated systems with their own events and availability. But, in a single interface, this is the main insight. If we see that one of the steps of the business process does not work that way - this is a reason to look into the associated IT systems. We consider the influence of the system on the process unreliable, but for the duty shift or the owner of the process / system they left the opportunity to view this connection for diagnostics. The very raisin of the “separation of flies from cutlets” approach is that the business is not straining due to events on the infrastructure. Dashboards shine red only in really critical situations, and the technical staff, in which case, knows where to dig. And the wolves are fed and the sheep are safe.


Create two unrelated monitoring loops: business processes and business systems. However, those responsible for the business process should have the opportunity to look at the system-related systems.

And now I will tell you what is needed in the general case to implement such an algorithm:


â—Ź determine the composition of the company's processes (what exactly we want to control);
â—Ź determine the impact of key IT metrics on these processes (for example, the availability of a channel, without which 50% of the business does not work and the retail director calls);
â—Ź decompose the impact into separate systems, and their - into infrastructure and so on;
â—Ź implement the specified model of two circuits - control of the business process and key transaction events plus diagnostic information from IT systems and infrastructure.


If your IT manages to create a similar scheme of work - consider that you have taken the first step to fine-tune contact with the business. If not, keep in mind that most of the time the implementation of the described approach will take analysis of business processes. We will tell you about our experience in this part next time.


The author of the article: Anton KASIMOV , the architect of the monitoring systems of the Technoserv company.


')

Source: https://habr.com/ru/post/350954/


All Articles