Practice is when everything works, but no one understands how. Theory is when nothing works, but everyone knows exactly why. We came to a combination of theory and practice: nothing works - and no one understands why.In the operation of any growing business - not only in IT, but also in other areas - there comes a moment when it becomes impossible to ignore the problems thrown into the far corner and already having time to become covered with a noble patina. Their consequences make themselves felt in unexpected situations. There are more than a dozen methods to deal with the problems and get the business to work, but you always have to start from the same thing: analyzing the root causes of these very problems. And today, the Robots would like to talk about it - not only by translating
an article about the search for root causes of IT business coach and Agile, Scrum and Kanban
specialist Henrik Kniberg - but also talking about how Robots fixed some of their own breakdowns. The article is published with abbreviations, the full version
is available in our blog on Habré.

')
Henrik Kniberg. Purpose of the article
Cause-effect diagrams (cause-effect diagrams) are a simple and convenient way to perform root cause analysis. I have been using these diagrams for many years, helping a business to recognize and solve a wide range of problems, both technical and organizational. The purpose of the article is to show how cause-effect diagrams work and to teach the reader to build them for their own needs.
Troubleshoot problems, not symptoms
The key to effective problem solving is first and foremost to make sure that you understand the problem you are trying to solve. Why it requires a solution, how to determine when it will be solved, and what is the root cause of this problem.
Often the “symptoms” appear in one place, although the true cause of the problem is quite different. If you just deal with the elimination of symptoms, not trying to assess the situation more deeply, it is likely that the problem will let you know again later - but in a different form.
Problem: Smoke in the bedroom.
Bad decision: Open the window and go to bed again.
Good solution: Find the source of the smoke and deal with it. Oops! And in the basement of a fire! Further actions - to extinguish the fire; understand why it came about; set a fire alarm to learn about the problem the next time before.
Problem: Hot forehead, fatigue.
Bad decision: Apply ice to the forehead to cool it. Eat sugar for energy. Continue work.
A good solution: To measure the temperature. Yes, I have a fever! Further actions - go home to rest.
Problem: Server memory leak.
Bad decision: Buy more memory.
A good solution: Find and fix a source of memory leak. Conduct testing to avoid future leaks.
... and further in the same vein.
Most of the problems in organizations is systemic. The system (business) fails, and the failure must be eliminated. While the root cause of this failure is not clear, most attempts to deal with the problem will be ineffective or even counterproductive.
A3 thinking and a lean approach to problem solving
One of the fundamental principles of lean thinking is
Kaizen - continuous improvement of processes. In one of the most successful companies in the world, Toyota, they associate a significant amount of success with their high discipline in terms of approach to solving problems. Sometimes this approach is called “thinking in A3 format” (knowledge gained during each “problem solving session” is recorded on sheets A3).
Here is an example and template:
www.crisp.se/gratis-material-och-guider/a3-template
With the “A3-approach”, a significant part of the time (left side of the sheet) is devoted to the analysis and visualization of the analysis of the root cause of the problem and
precedes the development of any solutions . Causal charts are not the only method for analyzing root causes. There are others: for example,
systematization of the value stream (value stream mapping) and the construction of the
Ishikawa diagram, or, as it is also called, the “fish bone” diagram. Sample A3 above contains a value stream map (top left) and a causal diagram (bottom left).
Causal diagrams are good for their intuitiveness and the absence of the need for additional explanations (especially in comparison with the charts of “fish bone”). Another advantage is the ability to illustrate repetitive
vicious cycles , which is extremely useful from the point of view of system thinking. The following discussion focuses on how to effectively create and use such diagrams.
How to use cause-effect diagrams
The basic process is as follows:
- Choose a problem — anything that bothers you — and write it down.
- Trace its “ascending movement” to assess the implications for the business, the “obvious damage” your problem causes.
- Trace its “downward motion” to identify the root cause (or root causes).
- Identify and underline perverse cycles.
- Repeat the above steps several times to adjust the chart.
- Determine which of the root causes you will take, which methods you will do (which countermeasures can be taken).
The next stage is follow up. If the countermeasures worked, congratulations! If not - do not despair. Analyze why they did not work, update the diagram, adjusted for the knowledge gained, and try other countermeasures.
In fact, countermeasures are
not a solution, but an experiment . Your
hypothesis is that the countermeasures will solve (or minimize) the problem, but you can never be completely sure. In fact, you “poke a sharp stick” into your system, checking how it reacts. Therefore, follow up is important.
The error, in fact, means that your system sends you signals that you should listen to. The only
real mistake is the inability to learn from mistakes!
Example 1: long release cycle
Suppose we have a problem: we always break deadlines. More precisely, our releases always come out later than the scheduled time.

The problem is only a problem if it prevents you from achieving the goal. Therefore, the first step is to determine the goal and think about the consequences of the problem directly in the context of your goal. This will help the question “So what?”, Which must be asked until you can identify the obvious damage.
Suppose that the goal is to make customers happy and get maximum revenue. The dialogue may look something like this:
Q: “What is the negative effect of postponing releases? What could be the consequences? ”A: “Delays make our release cycles long”
Q: “So what?”A: “It postpones the receipt of revenue and negatively affects the speed of money in the company. We also lose customers because of their impatience. ”
In the process of dialogue, we add cells and cause-and-effect arrows to the diagram. Usually I try to move “ascending” from the originally stated problem, “mapping” its consequences. But this is not a strict rule.

That is, it turns out that the delay in releases is actually not a problem.
True problems are revenue lag and loss of customers. At this stage it is necessary to consider three points:
Are there any
other factors leading to loss of customers and revenue lag? If so, can we assume that the delay is due to all releases, or should we turn our attention to something else? Is it possible to quantify the problem? How much money have we lost? How many customers left? These data will help us estimate the amount of effort that justifies itself in solving the identified problem.
How do we understand what solved the problem? Suppose a happy consultant bursts into the office and proudly declares: “I solved the problem!”. How to determine that this is not a bluff?
After analyzing the consequences of the problem, it is time to dig deep into the root cause.
And here come the questions “Why”. Yes, there is a “five why” technique that you could hear about if you studied lean thinking.
Q: “Why are releases being delayed?”
A: “Because the amount of work is growing.”
Q: “Why?”
A: “Because customers invent more and more new features and insist that they should be added to the current release, refusing to exclude low priority features from it.”
Q: “Why? Why not postpone adding new features to new releases? ”
A: “Because the release cycle is so long that new requirements arise before the next release”
There are only three "Why." But you understood the principle. The dialogue allows you to form the following picture:

The vicious cycle is marked by red arrows. Repetitive problems almost always involve such “loops”, but it takes some time to identify them. If you have found such a loop, then the chances for a successful and irrevocable solution to problems increase many times!
Our goal is to identify the root cause of this problem so that we can achieve the maximum effect with minimal effort. At the first stage, you can easily overlook important reasons, so let's go back and ask some more questions.
Q: “Why is the release cycle long? Delayed releases - the only reason? ”A: “Well, in fact, even without delay, our planned release cycles are quite long.”
Q: “How long is your planned release cycle?”A: “Once a quarter.”
Q. “But why is it so long?”A: “Because releases are expensive and complicated.”
Q: “Why?”A: “Because in every release there are a lot of details and also because it’s manual work.”

On the left we see another vicious cycle (red arrows)! The long interval between releases means that each new version includes a large number of updates, which makes product release a complex and expensive process. Because of this, we don’t want to make releases often.
As you noticed, here I decided to point out two root causes. And now - countermeasures:
Root causeLack of automation in the release preparation process
CountermeasureAutomate the release preparation process
Root causeLow-priority features are not excluded from releases.
CountermeasureAgree with the client that new features can be added to the release only if the same number of low-priority features is excluded from it.
There is no strict rule telling which reason is the root cause, but there are some signs:
- The cell has only outgoing arrows, but no incoming
- There is a feeling that from now on, digging deeper (asking additional “Why?”) Makes no sense
- The cell “has a solution” and may have a significant positive effect on the problem.
The “five why” technique is called so because usually about five questions separate us from the original cause. There is a tendency to stop prematurely. Do not do this: keep digging!
It is necessary to take into account that the problem that was initially posed - delayed releases - in fact turned out to be neither a problem nor the root cause. It was just a symptom. We used it as a
pretext to build a causal relationship upward to identify the true problem, and then downward to determine the root cause. This allows you to develop effective countermeasures with all the knowledge.
Without an analysis of this type, there is a risk to come to hasty conclusions and make ineffective and counterproductive changes. For example, hiring additional employees, although the essence of the problem lies not in the amount of labor. Or by changing the reward system (encouraging people to do deadlines and penalizing late submission), although the existing reward system had nothing to do with the problem.
Example 2: defects in the production cycleImagine that we have problems with a defective code that runs in production.
Q: So what?A: Defects anger our customers
Q: Why are defects run in production?A: Because they did not pass the required testing prior to release.
Q: Why was there no testing?Etc.
And that's what we get:

Two vicious cycles! Look at the red arrows.
Loop 1 (inner loop): Defects in the product force to make urgent changes, which distracts the team from work. As employees are not exempt from the bulk of the tasks, they are under stress and do not have the time to properly test new releases. Which, of course, leads to an even greater number of defects at the level of the entire production.
Loop 2 (outer loop): Since employees are under stress, they also do not have time to write automatic test scripts. The consequence is a general lack of automation in testing, which increasingly complicates the regression testing of new releases. This, of course, leads to defects in production and the need to make urgent changes. And as a result - to even more stress.
But that's not all!
Teams hate it when they get distracted. The work process disrupts and ultimately kills motivation. This can be an explanation for the high turnover rates! Thus, solving the underlying problem (defects in production), we get an additional bonus in the form of reducing staff turnover.

This is another advantage of causal analysis. Usually, the root cause is the cause of more than one problem (which is why it is called “root” - primary or root).
Causal analysis: the experience of robotsBoris Ryabchikov, Project Manager:My task was to track down the problems existing in the company. The main problem of production, we have identified in advance. The hypothesis was as follows: we do not give up projects on time, and, as a result, the company has a low speed of money. A number of reasons were also excluded at the start: we assumed that planning was carried out correctly, and the problem was somewhere else. The remaining problems were considered through this prism, and everything that was not relevant to the topic was simply discarded. We wanted to deal with the main problem and built a map based on this.
The production process was graphically marked: where we are and where we lose time and money.
First you had to choose the sources of information. In practice, it turned out that the most effective way is to collect problems individually with each employee. In retrospectives and standard reports, there are usually things that employees and so voiced with the entire team - at general meetings, with superiors. If a company has several divisions, they often compete with each other and “throw” problems at each other. When all employees get together, decency, as a rule, gains the upper hand and nobody blames each other’s faces. That is, problems can be hushed up. With individual interviews, they come to the surface.
I had to communicate with people in an informal setting - in the smoking room, on the road to the subway. Some problems were formulated in such conversations. As it turned out, the “method of five why” does not always work well in practice. Often, people are irritated by numerous “why” and some immediately include protection. That is, in each case requires an individual approach.
The obtained data were gradually collected and entered into the diagram, which turned out to be rather large. According to the results of the analysis, I prepared a brief report, in which four root causes and one main problem were identified. All of them were grouped by competence centers in the company.
Work began with several root causes at the same time. With this approach, there is a risk of not understanding which of the root causes was the main one. But in this case he is justified.
Countermeasures were developed to eliminate the root causes and criteria for their success were identified. The next stage is the analysis of the intermediate result and the preparation of a new adjusted report. Often, the final solution to problems becomes a rather lengthy process.
Practical questions - how to create and maintain charts
Work aloneWhen I make diagrams one, the most convenient tools are Visio or Powerpoint. They allow you to quickly move elements, resize cells and quickly make a backup while working on an image.
Work in small groups (2 - 8 people)Gather near the board or flipchart. Instead of cells, use stickers, connecting them with hand-drawn arrows. The board is preferable, since you can erase and redraw the arrows on it as you move the stickers. Let all members of the group participate, not just one person. It is important not to forget to take a clear photo of what happened and send it to all participants after the meeting.
Work in larger groups (9 - 30 people)Let the group members break up into small teams, each of which will focus on a specific problem. Working with several teams on the same problem is useful: you can come to the same or different conclusions, and both results will be interesting. Each team works with a separate flipchart / board and stickers. Periodically, teams should meet for short joint discussions and share experiences.
Work with a diagram in perspectiveLet the diagram remain in the program you used: Visio or Powerpoint. If you decide to return to it again in the workshop, determine its purpose: to show the diagram or update it. If this is an update, repeat it on the board / flipchart with stickers and arrows so that the workshop participants can work together effectively on the diagram. After the meeting, “synchronize” the results with electronic tools.
This type of synchronization takes some time, but often it is worth it. For collaboration, nothing will surpass the real tools: a board and stickers.
DangerToo many arrows and cellsIt happens that a diagram becomes so chaotic that it cannot be disassembled. Then it should be simplified. Here are some techniques:
- Get rid of irrelevant cells (cells that do not add extra meaning to the diagram).
- Adopt the principle “first deep” instead of “first wide”. Do not try to fix every cause of this or that problem, write down only one or two of the most important ones, and move further inland.
- Get over the imperfections. Such a diagram will never be perfect. “All models are wrong, but some of them are useful,” said George Box.
- Maybe your problem area is too wide: try to limit yourself to a more narrowly defined problem.
- Divide the diagram into parts, as I demonstrated in Example 3 above.
SimplicityThis is a simplified type of cause-and-effect diagrams - so designed specifically. It does not replace personal interaction. If you need more advanced or formalized techniques, refer to the system thinking literature. For example, “
The Fifth Discipline” by Peter Senge . But keep in mind: even a “perfect” diagram has no special value, if only a PhD can understand it.
Go to the individualAvoid personal charges of this type:

Problems are solved as efficiently as possible, assuming that they are all systemic. Of course, there are clumsy people. But even if it gives us substantial disadvantages, the problem is still
systemic : there is a system in which clumsy people are not considered as such, or a system that lets in terribly clumsy people, but does not help them to become less clumsy, and so on . It should be emphasized: treat all problems as systemic.