📜 ⬆️ ⬇️

IBM Tivoli Netcool: How to draw and animate the model of IT services? And what is visual RCA?

Now it is difficult to determine who first came up with the idea of ​​displaying service models in the interfaces of IT management systems. The apparent inadequacy of the list of current emergency messages and the network map for displaying the mutual influence of heterogeneous infrastructure made the manufacturers look for other approaches to display. Since the proper operation of each object to a certain extent affects the quality of the service, it is logical to imagine the model as a hierarchical inverted tree, at the top of which is a symbol that personifies the service itself (or service), and below, with greater detail, groups and components working on it. Objects at the bottom can be very different, network and server hardware, operating systems, databases, application programs - in a word, everything from which alarm messages can come. There was a malfunction on the server, a critical failure occurred - the corresponding object in the service model turned red. The degree of influence of the status of a given element of the model on the service as a whole was adjusted in accordance with the understanding of this degree by the creator of the model. The idea appealed to customers, and first of all to the managers of exploitation services, since their area of ​​responsibility for covering heterogeneous technologies is, as a rule, wider than the field of concern of specialists.

image

Figure 1. TBSM interface in Tivoli Integrated Portal
')
Over the years that have passed since the first implementation, the functions of service management systems have expanded significantly. From auxiliary images highlighted by accidents, they turned into real service management platforms, in some cases with the functions of automatically creating service models and monitoring SLA compliance in real time. Naturally, different manufacturers have achieved different success and recognition in the market.
A good example and even a model of such rapid functional development can be Tivoli Business Service Manager (TBSM). (see Figure 1)

Without going into the technical description of TBSM, we list its main features:
- based on the resources of Netcool OMNIbus as a comprehensive alarm collection and processing system (Fault Management System), TBSM can connect and display in its interface the behavior of objects from all technological domains without exception. And we are talking about objects, both hardware and software. This method of accounting for states in the service model gives an answer to the question of what exactly is going on inside the working components; it gives the operator on duty a precise data on the malfunction and is the key to its early elimination. This is technical level information.
- in determining the state of the element of the service model can participate in virtually any data stored and changing over time in different DBMS. For example, TBSM can look into a database table, where specialized business-level software constantly updates the number of transactions in the last minute. He compares this with the specified threshold values ​​and changes the status of the service object in the model. As a rule, information of this level has a business character.
- for any object of the service model in TBSM, you can enable the SLA compliance monitoring function, and this will be real-time monitoring of compliance, and not just after the fact. There are three types of SLA: by the duration of downtime, by the number of failures (or violations of SLA by duration) for the time window and, finally, by the total time of all service outages during the reporting period. All three types can be used together and in any combination. On the service symbols, visual indicators are provided not only for the current state of the service, but also separately for complying with each type of SLA. In addition, it is possible to set the price for non-compliance with the SLA right in rubles. Operators and managers in real-time see in the TBSM interface how many minutes are left before the SLA violation; what rate of performance it turns out if the problem is fixed right now; and how the penalties incurred after the violation. This is convenient for prioritizing and prioritizing work to eliminate IT accidents. Naturally, in addition to indicators, there is a function of historical reporting that allows for a detailed “debriefing” on the fact of the violation of the SLA. This feature was well reflected in the very first name of the product, it was called SLAM or SLA Manager.
- after that, the product was called RAD or Realtime Active Dashboards, this title emphasized the function of building personal dashboards that displayed the situation in real time. These views can have beautiful service models, alarm output windows in the context of the selected object, summaries, a convenient navigation tree for services with dynamic status indication and numerical values, a plotter of changes and service status comparisons over time (Timewindow Analyzer) and finally , a library of historical reports. On individual canvases (Custom canvas), when building dashboards, you can use "measuring instruments" by the type of speedometers and thermometer bars. (Figure 2) Elements of services can also be represented in the form of blocks with numerical values ​​of parameters of interest. This is the presentation aspect of using TBSM.

image

Figure 2 Auxiliary TBSM Toolbar Indicators

- Building a service model can be automated and bind its update to changing external data. The OMNIbus kernel, which is called the ObjectServer, is in essence a database in which emergency messages are stored and promptly processed. The algorithms of TBSM with ObjectServer and records in external databases are similar. If the alarm message or the entry in the table of the external database contains all the necessary information for the correct creation and placement in the service model of the object, then you can configure the autopopulation function of the service model. Imagine that an alarm message came from an object unknown to TBSM, and based on the TBSM message fields, you can decide which template (object type) it belongs to, compose the name of the object being created and determine who will be its parent object in the model. Similarly, a new object may appear when recording about it in an external database. With external databases, the model can actually be synchronized. This feature is often used when linking a service model to the inventory database and CCMDB. Without this, it is impossible to keep up-to-date models of changing IT systems with a large number of elements.

Above it was said about the need to have a wide range of “sensors” for the evaluation of facilities working on the production of services. But no less important and useful is the availability of means of objective monitoring of the quality of services from the user's point of view, a kind of generators of artificial calls to the service with an assessment of its quality. In Tivoli, this is TCAM (Tivoli Composite Application Manager). The logic of sharing these two types of “sensors” is very simple. For example, TCAM records the unsatisfactory response time of a service or even its refusal; he reports this to OMNIbus in the format of a critical alarm message, where the object is not the device or server, but the service itself.
In TBSM, these alarm messages are tied directly to the services at the top of the models. At the same time, “field sensors” detected malfunctions at the infrastructure facilities and also sent messages. TBSM tied these messages to the underlying elements of the model and calculated (and displayed) the distribution of influence along the topology of the model upwards. The service model clearly demonstrates the problem situation, and most importantly - its source. Descending from the “reddened” service down the tree and following the color indication, the specialist immediately finds himself at the point of the most probable cause. It is noteworthy that TBSM itself marks objects with asterisks - the logical causes of the problem. It turns out a kind of visual analysis of the root cause.

In conclusion, we note that since TBSM belongs to the Netcool family, it fully complies with carrier-class software requirements. It can be used in business-critical OSS systems. It supports fault-tolerant schemes, load balancing or scaling, external authorization and Single Sign-on, work within a single portal with the OMNIbus Web GUI and Tivoli Network Manager interfaces. Tivoli Integrated Portal maintains contextual interactions between these products, which allows you to create convenient tool environments with quick transitions between monitoring contexts of network and telecommunications equipment, servers and their operating systems, storage and databases, web servers and application servers, and, finally, services as objects of monitoring.

Source: https://habr.com/ru/post/77607/


All Articles