📜 ⬆️ ⬇️

How we built the “Fault Tree”

Hello. I have been working in a company creating corporate information systems for a relatively long time. In this article I want to share some partially negative experiences, maybe someone else will be interested in other people's “rakes”.
One of the interesting tasks for the team of our design engineers was to build a single “fault tree” for a large corporate information system for monitoring equipment.

Formulation of the problem


The information system into which we have embedded this classifier performs the centralization of information about accidents on the most diverse equipment, and also collects data on various problem situations from completely dissimilar systems, databases and devices. It is clear that the primary alarm messages in such an architecture will be completely diverse. For example, 3 different external systems sent us a fault “break”, but in one case it was a break in the carrying cable in the suspension of the contact network, in the other a break in the power supply, and in the third a loss of communication between subscribers. This situation did not suit us, as we were required to have a clear classification for later use in reporting and analytical tasks.

Our accident handler when finding new types of incoming events simply added them to the directory, and by the time work began to systematize only the types of messages, more than 1000 had accumulated.

We set ourselves the following goals:

We sought to ensure that the characteristics of our classification are:

Progress and our mistakes


Shortly after the work began, it became clear that the development of the principle of classification is the key task of the whole topic. It was not possible to take as a basis any of the classifications coming from external systems for the following reasons:
- the narrowness of the overall focus of assessments due to the specificity of the problems solved by specific systems,
- the absence in many cases of a hierarchy of problems (flat fault lists not built into tree structures),
- entanglement wording, mixing in some positions of causes and consequences.
')
We created the first version on the basis of grouping by infrastructural objects on which these fault manifestations occurred. In fact, this was the easiest way, as it assumed a simple merging of separate “foreign” fault lists based on a single (our) infrastructure model.

In general, it turned out like this:



…. Around 1600 lines, of which about 600 could not be tied to specific objects. At the same time, not all problems had a clear object binding and not all the objects mentioned were introduced into our resource base. This approach, though a little unraveling the situation, did not allow us to introduce a common hierarchy, identify synonyms and reduce the total number, which was one of our goals.

In the future, the “applicability” of faults to objects remained with us in the system, but this became a separate reference book from the general hierarchy of faults.

Result


So, at some point, it became clear that we could not create a single structure, either on the basis of previously deployed information databases and systems, or on the basis of regulatory documents adopted by the organization.

As a result, we have developed the following principles of work:


Acting this way, we got about the following set of branches for the first level of the tree:



What is the result?


Unfortunately, this work was not completed, and the result we stopped at was extremely “raw.”
I believe that the reasons for this failure are as follows:
- this work should have been organized and continued by the owner of the infrastructure itself, but there simply were no experts ready to take it on themselves;
- the experts “on the ground” were quite comfortable with the names and classifications that were familiar to them, and our attempts at summarizing and distinguishing subgroups met with their resistance;
- implementation of the global analytical reporting for which this work was carried out has not been launched.
In general, the customer was not ready for such changes, and we did not have sufficient administrative resources to influence its employees.

Of course, you can say that time was not wasted. What has been gained is considerable experience in conducting such work, which is partly articulated in the principles described above. For myself, he personally concluded that it is important to divide such projects into small stages, to constantly demonstrate the intermediate result to the customer and to ensure active support for changes on his part.

Why did it happen after all? Why was the intermediate result, which we got even in our opinion, far from perfect?
As it turned out during the implementation, users are basically ready to accept (and forgive us) any classification, but with one simple condition - Add a text search to the form!

Classification is a product of the systematization of experience. Obviously, each person, guided by a unique personal experience, sees it in his own way. For example, in the mail program, some (including myself) create a complex system of sorting incoming mail, while others do not sort mail at all, store everything in one folder and at the same time are perfectly oriented there. And they quickly find me the right letter. Maybe these people have Yandex in their heads?

In addition, any predefined classification can be 100% perfect only after it has been finalized, taking into account the latest data received in the system. That is, the classification requires constant care, and the user needs not to work on the system, but to use it. Search is indexing, and it effectively works on the actual data always. Is classification necessary then?

Source: https://habr.com/ru/post/246765/


All Articles