Don Jones "Creating a unified IT monitoring system in your environment" Chapter 3. Connecting everything into a single IT management cycle

We continue the translation. This chapter discusses the organization of the service desk and how to build interactions with users, as well as receiving feedback from them. We'll see why 'good' from the point of view of IT does not always mean 'good' from the point of view of the user and how these two ratings can be balanced.

Content

Chapter 1. Managing your IT environment: four things you do wrong
Chapter 2. Elimination of management practices for individual sites in IT management
Chapter 3. We combine everything into a single IT management cycle
Chapter 4. Monitoring: a look beyond the data center
Chapter 5: Turning Problems into Solutions
Chapter 6: Unified Case Management
')

Chapter 3. We combine everything into a single IT management cycle

For a long time, the practice of discrete, disconnected from real life processes was present in IT, often forcing key participants to be interested in what is happening inside IT. Consolidating everything - users, managers, IT professionals and other employees into a single cycle can provide significant benefits, as well as reduce the tendency to return to IT management for individual sites. It is here that true integration takes place between the service desk and the monitoring system, and these two concepts form the basis for the most central and important topics discussed in this book. All this concerns communications - ways to achieve better interaction, as well as creating opportunities for continuous improvement. Users sometimes perceive their IT department as inaccessible, highbrow and strange types with poor communication skills. Whether this is true depends on the IT specialists themselves, but such a prejudice , fair or not, often exists. And all because the IT service may too often be the last one to learn about events that users perceive as a problem. The server can work within the set values, but the order entry application is very slow. IT says mail works great, but I’m waiting for an incoming purchase order message for an hour, so there’s no chance that the mail system works correctly!

IT sometimes has to contend with its own unique difficulties that involve dropping out of the management process — for example, finding windows in a schedule where infrastructure changes need to be made, can be an extremely trivial task. It can be difficult to coordinate changes — proposed, approved, in development, ready for deployment, and so on. Many organizations have adopted a change management procedure, like the one described in ITIL, highlighting a specific process for auditing and approving changes. However, the physical coordination of work, from the outside looks more like an attempt to graze the cat's pack. This is even worse than the situation in which IT was managed by separate technology areas: the team dealing with the DBMS needs to make changes scheduled for midnight, but this change will conflict with the work on the power supply systems carried out by the data center infrastructure team. So all our statements remain valid - we need to keep everyone "on one page."

We start the cycle: we combine monitoring and service desk

Today, most organizations to coordinate work in IT have ticket-based systems. Also, these organizations usually have monitoring systems that allow you to monitor IT systems and respond to any problems. However, in very few companies, these two systems work together. Ideally, this would be what you need: a single integrated IT management system that is able to detect problems and then open tickets assigned to specific employees. If the mail server is down, the corresponding administrator should get the ticket to work. Of course, these tickets must have notification methods, such as text messages, mail, or any other method, so the addressee knows that an alarm has been declared. Automatic ticket assignment, sometimes referred to as automatic routing, should have a fair amount of artificial intelligence.

Different systems installed in different places at different times - all of them can influence the way of creating a ticket, as well as change the circle of people involved in the work on the problem.

Tickets must be as complete as possible, in the sense that the values for the “alert ticket” fields filled in should be collected automatically as far as possible. When filling out the details, do not rely heavily on the help desk or someone else.

Detailed information may contain information regarding the servers affected by the problem. In Figure 3.1. shows how, approximately, an automatically generated ticket can look, with the main fields pre-filled with the system.

Figure 3.1: Tickets automatically generated by the system in response to received alarms.

The idea is to have a service desk organization solution — software that helps coordinate and manage IT activities (often through tickets), working in conjunction with a monitoring system, which thus creates a truly integrated solution for dealing with problems arising in IT.

It is assumed that from all this will be extracted their own benefits. The first and most important is a faster resolution of incidents. Without waiting for your users to report a problem, you proceed to its operational solution, and if you have pre-filled tickets, IT specialists work faster because they have more information at their fingertips.

You can go deeper into this process even more if you have the right software installed for organizing the service desks. Framework, such as ITIL, strongly encourages the analysis of the original cause , meaning that your team should deal not only with immediate problems, but also make the overall environment more stable and resistant to difficulties. To achieve this, the solution for the service desk must define two types of situations: a global problem and a specific incident.

A specific incident can be a common embarrassment such as: “E-mail is slow,” “Order entry program slows down,” and so on. Both incidents may be related to the global problem “Incomprehensible network slowdowns” that need to be checked and corrected — perhaps in a congested router dropping packets more often than usual.
Sometimes, specific incidents cannot be fully resolved until the overlapping global problem is resolved. By tracking individual incidents along with a global problem, you can help your users and their managers be more informed. For example, after the overheated router was detected and replaced, each employee affected by this specific problem may be sent a message "Most likely we have found the reason for the slow work, so now there should be no problems." In Figure 3.2. shown as a single global problem can be tied to many incidents.

Figure 3.2: Relationship between multiple incidents and one problem.

I used a couple of keywords in the current discussion and would like to discuss them in the context of this book:

An incident is something that happens in an environment, such as a failed server or a slow-running application.
To deal with the incident, IT staff creates a record of the problem . Problems, in fact, may be associated with many incidents, such as, for example, in the case of an overheated router, which caused repeated, at first glance, unrelated failures, manifested in the whole environment.

From this point on, I'm going to use these two terms in this very sense. I hope that some of the advantages of combining monitoring and solving problems will become clear. For example, simpler solutions used for organizing the help desk allow you to open multiple tickets for an event that is actually the same problem. This can lead to a significant duplication of effort, when many experts are trying, each by itself, to solve the same problem. It can also lead to a lot of bureaucratic work, because trying to find the source of the reason requires a lot of time to process and close each ticket. If a more advanced system is used, all incoming events can be consolidated into a single, manageable process. In addition, it provides additional benefits, such as finding a suitable solution or workaround, which we will discuss in the following chapters.

However, problems and incidents are not the only reason why users interact with the IT service, moreover, I hope that this is not the main reason why users turn to the IT service! In addition to incident reports, users also request routine services: ask for advice, make requests for hardware upgrades, changes, access requests, etc. All these requests must be processed through a formalized set of procedures (workflow), at the entrance to which users place their request, and after approval it gets to the appropriate technical specialist, and within the process itself there is an opportunity to track the status of requests.
For example:

A user visits a website to select a single request from the “directory” that he can create — access to systems, replacement of equipment, etc.
The user selects an item from the catalog and adds additional clarifying information to complete the request.
In the service desk, a ticket is created containing the user's request. Depending on the request, the ticket can be sent to the head of the user for approval.
After coordination, the ticket can be automatically transferred to the work of the appropriate IT service technician.
As you work, the user can receive information about the status of the request, for example, via e-mail. The information includes a status message "completed" after the ticket is closed.
When using the same system for processing tickets, both for solving problems and for processing routine requests, IT professionals can use a single interface to manage their workload. Fig. 3.3 shows what a request for processing a ticket associated with a routine request might look like.

Figure 3.3: Routine requests can also be issued in the form of tickets.

It is much better when IT management can rely on fully documenting and tracking its own work in a single system — this allows management to be aware of the problems and have the entire required set of reporting, indicator panels and other mechanisms. On ris.3.4. An example is shown of how such a report might look:

Figure 3.4: Management reports become more effective when they include all of the IT service work.

The main idea is to keep everyone in a single cycle: users, IT, management - so that everyone is informed about the state of affairs. The main burden of notification falls on the software, which has the ability to send updated information about the state of affairs through e-mail or other means, so that everyone is aware of the current situation.

Alteration. How to find the right window

Large IT departments, with many specializations, have their own specific problems. In the previous chapter, I talked about management problems in individual technology sectors, where narrow experts spent a lot of time passing the problem to each other, and because everyone used his own tool exclusively and believed that the problem was not his, and the elimination of reasons was delayed. Without a doubt, we are not going to get rid of narrow specialists, and our solution is to use tools that will allow to bring information from different sources into a single console, which makes it possible to combine common efforts.

Another problem created by the practice of management for individual technological areas is related to change management. At the beginning of this chapter, I outlined one of these problems: database specialists are ready to make their changes to the system, but they conflict with changes that the other group is going to make. Managing windows in the schedule for making changes becomes extremely difficult. Applications and services not only require round-the-clock work, which leads to very small windows for making changes, but also leads to competition for these windows by different groups of specialists.

“Chief, we need to put this patch a long time ago, but this can only be done at night. It will take us 4 hours and we will fit in the window. But throughout the past week, other groups used the same window and their work did not allow us to do our work at the same time. ” The situation is not so rare; it becomes difficult for management to keep track of what changes need to be made to the configuration, and when to allocate an already limited time among the windows for maintenance under them.

The lack of visibility of windows, competition for them, often leads to the inability to make the right management decisions. For example, if management sees a certain number of changes pending and sees competition, then it may decide to expand the service window for a period sufficient to accomplish these tasks. Or maybe decide this and not do it. At least, it will be a consciously made decision , but not ignoring serious problems.

The way out is the use of software that makes it easy to coordinate the various departments. Think about it: if you use a solution for organizing a service desk to track tickets, then tickets can be created for the intended changes. These tickets can be assigned to a technician, sent for approval or revision, and so on, and that’s all through the workflow you have developed.

By the way, this is a very good way to implement ITIL processes. Tickets can then be entered into a unified calendar, built right into the service desk, and planners can create an acceptable work schedule. They can see agreed maintenance windows, manage competition between conflicting changes, and so on. By receiving this information in a familiar calendar view, they can also make decisions about whether or not to expand the service windows if this is so necessary and will benefit the organization. Figure 3.5 shows the change management calendar:

Figure 3.5: Schedule for managing changes in a calendar view.

This is just another way to help keep everyone in a single cycle of interaction. The management now has a clear visual display of changes and competition for the schedule. Such a calendar can even be made available to users, so that they can plan their work correctly and consciously.

Communication: how to engage users in a single cycle

The idea of supporting user information is certainly not new, but many companies that have tried to engage their users in the process have failed. Too often, “involving users in a single cycle” takes the form of self-service web portals, where registered users can view the status of their tickets or check the status of a particular service. This is all well and good, but portals of this type do not always coincide with the natural course of the user's thoughts. For example, most users, if they encounter any problems, do not necessarily think that they should go and check the information on the website - they just call the help desk.

But users spend a lot of time checking email in their inbox. Why not make it your communication channel? Organizations are trying not to use this communication method, in part because it can easily become a big time eater for your IT service. “I am fully engaged in solving the problem, and at the same time I have to send hourly updates on the state of affairs?” Just a cartoon about Dilbert of some kind.

In fact, a good solution for the service desk can do this for you. Sending updates via e-mails, if, for example, a user ticket has changed - this is a simple operation for software. Such messages can be informative and relieve users of most concerns about the status of their requests. In Figure 3.6. shows how it might look.

Figure 3.6. Informing users through detailed emails.

A service solution is even more in demand, which can actually receive requests via e-mail instead of waiting for users to access the website and open a ticket. Accept it as it is - your users are much more likely to pick up the phone than to visit a website and issue a ticket, well, of course, if you don’t set up significant artificial barriers along the way - such as complex voice menus in the telephone system. Users are more willing to send e-mail. If your service desk, instead of a technician, can receive these messages and use them to create tickets, you really have created a system that users will meet with open arms. Such tickets can have automatic assignment and routing, helping the technician to start solving the problem faster. Messages sent via e-mail can be valuable even for routine, non-problematic processes. When their request is approved, rejected, executed, completed, etc., an e-mail message helps users to be informed without additional human effort.

Comment
I would like to emphasize that self-service portals are a good thing . They can improve the user's personal experience, push the user in the right direction, if he tries to solve his problem with the help of self-service and much more, but they should not be the only ways to communicate with people.

Service Level Agreements: Arrangements and Compliance with Realistic Expectations

If you, of course, have not lived in the wilderness for the last ten years, then the phrase 'service level agreement (SLA)' should be familiar to you. In its simplest form, it is a commitment made by the IT service and providing a specific level of performance, performance and availability for a particular service. “The postal service will be available 99.999% of working time per year” - an example of the simplest SLA.

SLA very quickly become difficult and you can not just understand a single digit from them. What level of service can you reasonably provide? What is your historical level of service? And what is the business need? How do you track the level of SLAs that were once put into circulation, and how do you ensure that they fit the business? And do you, ideally, have a notice of the imminent danger of non-compliance with the terms of the agreement?

SLAs may not be the only type of agreement that needs to be accepted and tracked. Some also use agreements with third-party service providers ( underpinning contracts - UC) or operational level agreements (OLA) for various internal and external services, often supported in SLAs.

Competently built service desk and monitoring solutions will help you more accurately comply with these agreements. You can start by defining a top-level SLA, then, starting from them, determine UC and OLA as you require. ( It is necessary to make an amendment to the local legislation and the existing business practice of fulfilling obligations by third parties. If, for example, all your local communication providers have a recovery interval of 72 hours, then it is unreasonable to subscribe to your own SLA, which specifies responsibility for a shorter period, and there must be a strong reason and the approval of the guide for finding a provider with a shorter recovery time. Very often, any reduction of such seemingly long intervals leads to a disproportionate increase at the cost of the service, and then in practice it is not so and nuzhnym- pr.perev)

Once you have written the agreement parameters, the software solution should track the current performance and availability, signal this, perhaps as a simple indicator, shown in Figure 3.7 and showing your compliance with the current SLA. You may also have access to more complete and detailed reports on SLA metrics.

Figure 3.7: SLA Management. Indicators of the current status of the SLA.
However, more importantly, the software solution you use should be able to determine the rules of your SLA, according to which tickets can be created, if there is a danger of violation of conditions, and later, as we said before, they should be sent to specific specialists. The solution should support escalation rules: if the SLA values for which the execution rules do not comply, come to the threshold boundary and do not return back within a certain time, the application should try to automatically enter a reserve, call on additional technical specialists, notify the management and so on Further.

It should be categorically admitted that there are no perfect SLAs. Sometimes, for any reason, a business may decide that the declared service must be kept disabled. Sometimes it may be, for example, a software update or maintenance work. In these cases, you DO NOT violate the SLA; You agree - in conjunction with the part of the business that it affects - to temporarily suspend the SLA until the work is completed. The software solution for organizing the service desk should support these types of exceptions, including SLAs, valid only for specific periods of time; correctly handle exceptions in the form of weekends and holidays, coordinated maintenance windows, changes in the size of service windows, etc.

The idea is to automate the definitions of the SLA and their management, and also to automate the notifications that are associated with the agreement. If the SLA is violated, you can agree that the affected business users receive automatic notifications. This will allow them to be aware that IT knows about their problems and is working on them - without forcing users to visit the self-service portal and open tickets there. This kind of proactive response to a problem will help you go a long way towards improving relations between IT and users, and will enable IT to get the right assessment from a business point of view, as well as meet its requirements and support them.

Tell me what you really think

IT managers love when IT thinks of users as "customers." In some cases, your users may actually be “clients,” in the sense that they really “transfer money to you for the services you provide them.” In other cases, your users are internal, but still “customers” in the sense that they consume your services, which you, the IT department, provide, and for which you are, in fact, paid a salary.

A very big problem is that IT is always struggling with its perception among its customers. Do users really think you are doing a good job? And what is a good job?

Monitoring the user assessment metrics (EUE - End-User Experience), which we discussed in the first chapter, is becoming a hot trend in the IT industry. You can see that the performance of your server is within the normal range, but after everything passes through the old client computers, routers, cable system, and everything else where the service delivery to users is involved, they may perceive the server performance in a completely different way. Measuring a user rating is an opportunity to look into the overall picture of what your users or customers are with, if you like, you have to deal with it.

Different types of businesses use another important way of disclosing user opinions: an evaluation survey. Call your bank, and the phone answering robot can tell you that you are selected to participate in the short satisfaction survey that will start immediately after you end the conversation with your manager.If you come to an amusement park, a smiling employee with a tablet will surely ask you a few questions. Look at the check from your last purchase, you will surely find that you have the opportunity to win a gift card or other prize if you complete an online evaluation survey about your impressions when visiting the store.

Evaluation reviews are an effective way to collect information about what users really think, and a good solution for organizing a service desk should give you the opportunity to evaluate your work with your customers. You may want them to share their opinions on the completion of each request. Or maybe you want to be less annoying and ask users only on the third or fourth request. Whatever solution you come to, the software for the service desk should be able to automate this process. You may even want to involve users in a special survey regarding their views on daily tasks, service levels, and so on.

Of course, such surveys are useless without the ability to collect data and see what is being done and how it is done. The final part of the review should be presented in the form of statements, possibly with tables and charts that help you visualize the perception of your service.
Compare this report with the compliance report of your SLA - see the difference?

If your SLAs show that you are doing your job perfectly, and user evaluation reviews are far from being so brilliant, then your SLAs may not be at the right level — or your SLAs are not the only metrics worth considering.

I worked with a certain number of clients who had this situation: “Our SLAs are being executed, but our users still do not think that we are doing our job well. What can be wrong?". We found the answer with the help of several specialized surveys related to “minor” difficulties, such as the “position” of the IT team during user assistance. It turned out that the employees who worked with clients behaved unceremoniously, and sometimes rudely. We spent some time side by side with the team, and found that they were under incredible pressure because of the large number of tickets being serviced. As a result, the company managed to develop internal metrics that allowed to track the load of each of the specialists and worked to reduce the load to an acceptable and manageable level,while continuing to track the "small" problems of the relationship of professionals and users.
The moral of this story is that SLAs are not the only metrics to keep in mind, and integrated questionnaires can help uncover critical information that is important for understanding the overall effectiveness of services.

When everyone does not need to see everything: a multi-ownership approach

The multitenant approach is a growing trend among IT solutions offered by different software manufacturers and there are good reasons for this. If you are a service provider, or, more precisely, a managed service provider (MSP), then you should be aware of the availability of tools that you can configure and highlight for each of your users. Client A needs such indicator panels, while Client B needs others . Customer B doesn’t have to see client A’s tickets (and customer A doesn’t want Customer B to see them!). In the recent past, the presence of such multi-tenancy properties was typical only for solutions developed exclusively for MSP.

Today, however, things are changing. Large companies with many divisions want to deploy software that can serve the needs of all divisions without having to have unique solutions for each service. In this case, multi-ownership solutions can help: they allow you to take a single solution, tailor it to specific needs, divide it into parts and provide it to each department as if it were the only solution for one department, although in fact this solution serves all

Different units may have different views to view theirparts of the environment. For example, Division A may see indicators, while Division B sees something completely different. Nevertheless, the multi-ownership approach is far from what an arbitrary individual company needs. However, this is a fairly good property that can be held in the back pocket and used when it really becomes necessary. Therefore, this functionality should be taken into account if you have to make an assessment when choosing from several solutions - even if multi-owner features are not an immediate necessity. Of course, if you are MSP, then this property of the product must be present in your solutions.

This directly relates to the topic of this chapter on involving everyone in a single management cycle: the ability to provide specific, customized environments to different groups of users, both external and internal, helps the latter to have more accurate information about the state of affairs.

Call it a private cloud: cost sharing

There is one more thing that is worth paying attention to when all employees are involved in the work - these are the costs associated with them and the ability to provide their customers with detailed reports on the use of infrastructure, and, if necessary, issue real bills based on these figures.
In fig. 3.8.An example of such a report is given:

Figure 3.8: Report for billing on measuring user activity.

And again, although this type of report is an obvious and obligatory function for MSP, there is an increasing demand for reports of this kind for organizations that work only with internal users. One of the key elements of cloud computing is the concept of billing for the actual use of resources. The cloud provider builds and manages the infrastructure, which is specifically divided among clients. Each client pays for the parts and functions that he used. This is an obvious and well understood cloud computing model, which is also a model for privateof clouds. Instead of presenting IT in the form of a giant basket of costs, companies are increasingly looking for ways to distribute IT costs among consumers of IT services. “Marketing is going to deploy a dozen virtual web servers for a new website? Well, do they have a budget for this? ”

There is nothing new in the chargeback, as it is called. But monitoring solutions and for organizing service desks are increasingly able to provide a level of detail that will allow you to carry out accounting and indeedinvoice. Technological advances that have made possible the creation of public clouds can be fairly quickly integrated into private data centers and work there in the same capacity: billing (or distribution of costs) for actual use.

Linking IT costs directly to consumers of IT services is a great way to help IT make better business decisions. Instead of assigning IT the role of Cerberus, who decides who can and who cannot access certain IT services, the management of the organization should decide who, how much money and what services can be spent on. It should be so.On the other hand, IT has always had an outsourced, non-business activity, if you look at it from a business point of view. Although the work of the IT department is paid for by the organization, it is not directly involved in the creation of profit - it is a separate division. So, when a business sees IT as an “outsourced” team (although in fact it is internal), why shouldn't IT keep records and not bill users, just like any IT service provider does?

This is also another way to link everyone together in a single procedural cycle. Even if you do not use internal billing or invoices for their intended purpose, they can be useful for top management to understand the sources of costs and the value of IT investments. The IT manager can tell the CEO: “Well, you spent the stiller dollars on IT in the last quarter, but this is when and how this investment was used by the organization. If you want to save money, then start with users and find ways to consume less of our services. ”

Conclusion

This chapter talked about retaining employees in a single IT management cycle. Ranging from users who are informed about IT processes to better communication of IT specialists with current events and the provision of information for management to help make better and more informed decisions - this was all about communication . I have said very little in this chapter that any organization will not be able to do this right now, even if they put enough effort into it. The key point is to use specialized software in which there are opportunities that allow it to be implemented with a minimum of costs, which makes it possible to achieve the goals set.

In the next chapter ...

In the next chapter, we are going to look at the challenges encountered in IT life more and more often: key services and IT elements exist outside of their data centers. Yes, they can be called “cloud” or simply “outsourced services”. But whatever you call them, they are still critical for business and you need to treat them exactly the same as if they were your local services. They should not be taken as a separate technological section, because you will ultimately manage them separately from the overall system. Naturally, monitoring outsourced services is a game by different rules, unlike local services, so we have to look for some non-trivial solutions.

Go to Chapter 2

Source: https://habr.com/ru/post/174267/

All Articles