Technical support in the era of DevOps

DevOps goes to large IT companies, regardless of whether they are ready for it or not. There can be many problems here, and I would like to talk about one of them. Perhaps this is an overly bold statement, but I believe that the current organizational structure of most IT support services is fundamentally wrong.

In such a situation, the successful implementation of DevOps practices may be almost impossible.

As an alternative, I would like to propose a new methodology called Swarming, which is already ready for implementation in a large business and is ideally suited to perform technical support tasks in the DevOps era.

Current situation: the conservatism of the three-tier technical support system

We should start with a small review of the existing governing structures, which underlie the vast majority of technical support services in organizations related to medium and large businesses.

Classical technical support, built according to the principles of IT service management, has a three-level hierarchy.

Level 1. This is in direct contact with users - usually by telephone - the Technical Support Service (Service Desk). Aims to provide mid-level technical support on non-specialized issues. The main task is to maintain a stable quality of services subject to the solution of the majority of incoming requests here, on the first level.
Level 2. Usually closely related to the first, but implies deeper general or specialized knowledge and skills of employees. Second-level specialists, for example, may receive additional training to support common operating systems (such as Microsoft Windows) or hardware, gaining the necessary skills to solve more complex problems.
Level 3. Here are teams that specialize in specific technologies or applications. Self-developed software companies often organize separate support teams for specific applications and services.

For a better understanding of the three-tier structure, it is necessary to analyze the business reasons that generated and support it. The considered methodology is applied almost everywhere. There are several advantages that encourage its use.

Customers are provided with a single channel of communication with technical support regardless of the nature of the problem.
It is easy to find specialists in the labor market with the technical skills necessary to work in the first two levels of support. It also facilitates the transfer of the task to an outsourcing, which is done quite often.
Specialists in specific issues and areas are isolated from communication with customers and get to perform tasks that relate only to their areas of expertise.

The journey of customer support calls can end at the first level, almost without starting. In fact, in many organizations, part of the requests is processed using fully automated services, which are often called “level zero”.

However, there are many problems that cannot be solved at the first level. The process of transferring them to levels 2 and 3 is called escalation:

Second-level specialists usually process fewer applications than their counterparts from the first, but these are more complex tasks that, on average, require more time to be solved.

Tickets that reached the third level (escalated from the second or sent directly from the first) usually make up a small part of all incoming calls. But these are the most complex tasks, the solution of which requires high skill of specialists and considerable time-consuming.

Many attempts have been made to calculate the relative cost of solving a problem at each level of support. In this work in 2014 , for example, the average cost of closing a ticket at the first level is estimated at $ 22, at the second at $ 62, and at the third at $ 85 (according to other studies, the last figure is several times more).

The problems of the three-tier model

Criticizing such a generally accepted structure is not easy. However, the Swarming-movement aimed precisely at this, taking as a basis the essential but correctable shortcomings of the multi-level model. Let's look at some issues affecting DevOps.

Multi-level support involves multiple queues . Since the first level seeks to solve problems as quickly as possible, everything that could not be fixed right away is put in a queue. The actual status of the problem changes, and it goes from current to deferred. Essentially, Levels 2 and 3 are warehouses of tasks that are in progress (Work in Progress), which is a problem within the Lean philosophy that underlies DevOps. Successful implementation of DevOps as part of Lean requires decisive steps to reduce work in progress. This problem alone is a significant deterrent to the arrival of DevOps in technical support.
Multi-level support can block the path to the right specialist. DevOps advocates increased employee autonomy and involvement. Encouraged code support by the developers themselves. Companies practicing DevOps manage to achieve higher speeds of processing user requests ( according to the State of DevOps report for 2016, 24 times faster ). But this does not have any benefit if the ticket has to break through several queues before getting to the right specialist. Once, while discussing the implementation of Swarming in customer support, one of the BMC support managers (BMS) asked a fair question: “Why do we put our best people at the end of the chain?”

Multi-level support leads to a “banging” of appeals. In multi-level technical support, a task is unilaterally assigned to the contractor. This can be done at the previous level or on another team of specialists. The contractor sees the application for the first time at the moment when it appears in his task queue. Unfortunately, the ticket is often sent back, because either specialists need additional information, or the appointment was wrong. DevOps is based on collaboration between professionals: developers, testers, operational services specialists, etc. Vertical and horizontal barriers between divisions inherent to multi-level support, as well as passive (without the participation of the receiving party) transfer of tasks do not correspond to the spirit of interdisciplinary interaction and cooperation.

Multi-level support does not solve the problem of congestion of expert experts (Subject Matter Experts). Although multi-level support solves the problem of passing questions that are too easy for them to experts, it does not protect them from being overwhelmed by complex tasks. “Heroes” is a real scourge of IT Service Management (ITSM). These are smart people who, as it seems at first glance, bring tangible benefits to the organization and regularly work wonders, solving complex problems. In fact, such heroes are a single point of failure overloaded with work. They, voluntarily or involuntarily, are the custodians of knowledge vital for the organization and tend to burn out. Multi-level support, being a linear and insulating structure, does nothing to prevent the cult of the Hero, and perhaps even supports it. In the process of shifting traditional business towards DevOps, we see the preservation of such a scheme of work, when key members of the DevOps team are placed at the end of the escalation chain. The damage in DevOps scenarios is perhaps even greater: key developers are being suspended from innovation and are forced to deal with the solution of escalated users who have already lost time.

Swarming as an alternative

“Collaborative communities can overcome professional and organizational barriers by encouraging cooperation, learning, and progress.”
(Don Tapscott and Anthony D. Williams, in Wikinomics )

The concept of Swarming was proposed at the end of the past decade as a new platform for organizing technical support. It explicitly rejects the conservative multi-level structure in favor of the network interaction model:

Source: Consortium for Service Innovation

The key company that first introduced this system was Cisco. In 2008, in a document called Digital Swarming, she presented the “Distributed Cooperation and Decision Making Model”. The concept was subsequently adopted by Consortium for Service Innovation, transforming into Intelligent Swarming . Some of its principles are:

Should not be divided into levels of support groups.
There should be no escalation from one group to another.
The task should be transferred directly to the employee who is likely to be able to solve it.
The person who accepted the appeal is responsible for it to the end.

Swarming in practice: sample structure for DevOps

Swarming does not have a single well-defined structure. This is partly due to the novelty and, accordingly, the low prevalence. However, the following example (based on the swarming-methods of user support used in the BMC) is typical. He significantly improved the support service ( as described at the UK's Servicedesk and IT Support Show in 2015 ).

Swarming begins when a problem appears that cannot be solved immediately at the time of receiving a call from the user. Quick primary sorting of a task ends with sending it to one of two groups (Swarms):

Primary sorting in the Swarm structure

Each group (Swarm) is a small team that processes incoming applications in near real-time mode:

“Severity 1” Swarm (First Gravity Incident Team)

Three employees working in the planned weekly rotation.
Main focus: provide an immediate answer, solve the problem as quickly as possible.

This group aims to solve the most serious problems. Its participants coordinate the response to difficult situations, connect the right people, try to organize the fastest possible solution of critical problems. This process is not much different from the major incident incident procedure used in the traditional multi-level model. However, another group is also developed in parallel, processing a much larger number of hits:

Dispatch Swarm ( Dispatch Group)

Holds meetings every 60–90 minutes .
Focused on regions and product lines .
Main focus: “grab a cherry” (first of all, take up what can be fixed very quickly).
Secondary task: checking requests before sending them to the product support teams.

This type of groups appeared as an answer to the main problem of multi-level support: a lot of calls that could be resolved very quickly when they reach the right specialist are lost in the lists of unfinished tasks. Thus, the solution of the five-minute question can stretch for days.

Members of this group are literally pushed to “grab cherries”, not paying attention to problems that cannot be fixed instantly. Thus, the time spent on solving a significant number of types of tasks can be greatly reduced.

There is an additional benefit. The inclusion of inexperienced employees leads to the fact that they gain knowledge, access to which in a multilevel model would appear only when transferred to a highly specialized team. At the same time, highly qualified specialists of the third level of support are closer to the client.

Using Dispatch Swarming leads to a quick solution of a significant number of tasks (in the BMC, their number is about 30%), and the remaining calls fall into the queues of more familiar support teams that are engaged in individual product lines. Here, many tasks will be familiar and understandable to ordinary members of the team, so their solution should not cause difficulties. At the same time, another part of the appeals (perhaps about 30%) may be worthy of the attention of the best support service specialists, regardless of their structural affiliation.

The third type of group is used here: Backlog Swarm .

Backlog Swarm (Group work with accumulated tasks)

Holds meetings regularly, usually daily.
Focus: solving complex tasks received from product line support teams.
Secondary task: replacement of single experts.

To solve the most difficult problems, Backlog Swarm unites groups of experienced and qualified engineers, regardless of geographic or structural boundaries. They receive tasks from field specialists who are now forbidden to directly contact experts individually. Instead, they must submit tasks to the appropriate Backlog Swarm.

Swarming example

When classical tech support works in conjunction with DevOps, the problems of the multi-level model are only exacerbated. In this case, unsolved tasks (Work in Progress backlogs) are actively accumulated, which, in turn, limits autonomy and flexibility. Such a system is essentially insulating. These problems are contrary to the philosophy of DevOps and are the main challenge to the implementation of DevOps practices in organizations with a traditional business model.

Already we can highlight the following negative points.

DevOps encourages software developers to take on its support (which is sometimes called “I wrote it myself — and figure it out myself”). However, in developed support services, typical of large organizations, the multi-level structure is the main channel through which user problems reach engineers. As we already know, barriers between the first level of support and the DevOps command can lead to a delay in solving problems, as well as to poor-quality primary processing of requests.
The “throw it over the fence” type of integration model used between ITSM-based call-in systems and software life cycle tools for DevOps teams leads to a lack of situational awareness of employees.
Attempts to forcibly introduce vertical and horizontal barriers interfere with the interaction of specialists from different departments - a key element in the successful application of DevOps.

On the contrary, the concept of Swarming is built largely on the same principles that are at the heart of DevOps success.

Dynamic cross-functional collaboration that allows you to put together a team of specialists with different skills and areas of expertise.
Flexible teams as opposed to inert hierarchical structures.
Self-reliance versus dogmatic processes (a key example here is the ability to “grab cherries” while working as part of a Dispatch Swarm).
Reducing the number of pending tasks.
Exchange of skills and experience between specialists in various fields.

Conclusion

“Big business is not changing slowly because there are stupid people or technology-hating people. They just have users. ”
(Luke Kaines, Founder and CEO, Puppet Labs. Configuration Management Camp, Belgium, 2015)

DevOps casts doubt on the very essence of established conservative working methods, combining the previously isolated roles of developers and operating services specialists, as well as trying to get rid of ingrained ineffective practices. This philosophy was largely (if not completely) formed in the organizations of the new generation, often without the need to maintain outdated, but working systems and their users.

It is important to note that this was done very successfully:

Source: Devops 2016 Status Report

Now DevOps got to traditional organizations, where in the process of implementation (often very painful), he will face new challenges. But for such companies it is no longer a question of improvement, but a necessary step in the struggle for survival. Changes in the form of “creative destruction” are a constant and real threat to the existence of large companies. Only 12% of the Fortune 500 list from 1955 remained in it in 2014.

IT companies should try to use fresh ideas wherever possible and constantly challenge conservative practices.

The swarming movement has launched an attack on the multi-tier support model, but progress in managing the IT services of traditional companies is slow, as it is limited to only a few far-sighted organizations. However, the proximity of the basic elements of Swarming and DevOps is difficult to deny, and therefore they have similar implementation problems, the solution of which makes it easier to use both systems.

Thus, there is a need to rethink the multi-level support model. The new methodology should take advantage of DevOps, while maintaining performance and efficiency across large companies. I think Swarming may well fit this role.

Source: https://habr.com/ru/post/318826/

All Articles