⬆️ ⬇️

OTRS for pumping in the style of REG.RU

image



Probably, there is no need to tell what OTRS is . Many companies use it as a means of managing services and to support their customers. The history of this project dates back to 2001. And this has its pluses and minuses. Very powerful tool with a huge amount of functionality for almost any needs of small and medium businesses. And for free. Paid support is only for those who do not have enough basic functionality or need help in setting up.



About this tool, which is actively used in our company since 2008, and will be discussed. Or rather, about what happened to him in the hands of the impatient Perl programmers of REG.RU.



What do we use OTRS for?



Yes, for the same as everyone else:



OTRS is the main tool of all our customer services. Therefore, its stable, fast and uninterrupted work is important for us.

')

Some statistics and interesting facts.



image

Those days have long passed when we recognized most of our customers by voice in the handset. Yes, there was such a time at the dawn of formation, when, having heard a familiar voice on the phone, you say: “Hello, Ivan Ivanovich! I am glad to hear you ”, but not the usual phrase, and in parallel you are already opening the card of the necessary client, ready to listen to a new request or description of the problem. Now, in many respects among registrars and hosting providers in Russia, we are already the first (well, I will not consider Hetzner a Russian company, forgive), and therefore the number of requests per day has long exceeded the volumes that an ordinary customer service employee can keep in his head.



For 2014–2015





Number of active tickets by year



Active tickets are considered to be created by real customers - not spam and non-service automatic notifications.



Presently





Charts for August 2015 by the maximum reaction time



Hosting Technical Support







Technical support by domain







Cloud technical support







Billing technical support







To some, these volumes will seem large, to others - scanty. But we do not support World of Warcraft ;-)



Now let's see what allowed us to achieve these results, how we improved and modernized OTRS.



Let's start with the fact that at the beginning of last year we abandoned backward compatibility with the main OTRS branch, supported by developers. There were many reasons for this. OTRS has mechanisms for expanding the functionality, but using only the tools provided, we were constrained in capabilities. In addition, we will not hide the fact that earlier (a few years ago) all the improvements were made live in the code, and often programmers who did not complete the OTRS project fully performed them, but simply asked them to fix something urgently in an adjacent project. They were given a little time to sort out the issue and quickly fix or implement some feature. Naturally, with this approach, compatibility and the overall quality of the OTRS code suffered.



Two years ago, an attempt was made to migrate then to the new OTRS 3.3, while retaining all of their own work. The operation was difficult, lasted for several months, and as a result, its result was not worth the effort we spent on it. Bitter experience, but with its advantages. Thanks to this experience, we dug in almost every OTRS file, understood how many mechanisms work, outlined several dozen places that should be refactored or rewritten completely. Not without its frankly deplorable constructions (not ours), which horrified not only newcomers to Perl and web programming, but also experienced leading programmers. Although all this is quite expected, because a powerful product, which has been living for 14 years, could not completely get rid of some parts in the code that have lasted from the very beginning of the product.



By this time, we finally understood a few things:





Large recycling



Search



The biggest problem for a long time was, of course, a ticket search. Do you remember the numbers from the previous chapter? Now imagine that an employee enters the word "reg.ru" (or any other domain) in the full-text search field to search the entire database without specifying a date range. Quite by the way, the usual task for the employee. And what does OTRS based on mysql do? She makes a request of the following form:



SELECT * FROM ticket, article WHERE from LIKE '%reg.ru%' OR to LIKE '%reg.ru%' OR title LIKE '%reg.ru%' OR body LIKE '%reg.ru%' —         LIKE. 




Needless to say, our database hung from such requests and, even if the request was executed successfully, how long it took. And if there were several such “seekers” at the same time? To put it mildly, it was hard. During working hours, mytop was almost permanently turned on and tracking of long requests that we manually killed.



Therefore, the first and important refinement was the search through Sphinx. We wrote it for a long time, but the result was excellent. The result was a module on CPAN OTRS :: SphinxSearch , which will help the same enthusiasts to deploy a search through Sphinx. The author of the module, I am sure, will surely answer your questions or respond to the found bugs.



The base index Sphinx, as you know, is stored in RAM. On our tickets with all messages, except the SPAM and Raw queues, it took about 4.5 gigabytes of RAM. Enough low price for instant search.



In addition, at first Sphinx worked with delta indices created by the crown: five-minute, fifteen-minute, hourly, six-hour, and finally daily. “Difficult and confusing,” you will say, and you will be absolutely right. In addition, it had several unpleasant consequences. First, from the moment the message appeared to the possibility of searching it, it took up to 5 minutes. Secondly, during the moments of indexing, the CPU was heavily loaded, which is why the whole OTRS stably slowed down, which affected the performance of all the support staff.

At some point we returned to what we were leaving: Sphinx reindexing requests were hung up by mysql. It needed to be urgently corrected.



The second reincarnation of the search was already on RealTime indexes. It worked much faster, did not load the CPU, only added one sql query to the Sphinx interface in the form of INSERT or REPLACE, depending on the situation.



This search works for us so far. There were a few minor changes in the code and only. We look forward to Sphinx 3.0 to take advantage of all its cool features.



Motivation "Solve everything"



The second problem that worried not only employees, but also customers - this is the time of response to the ticket. Especially if the question was complicated.



Previously, the quality assessment of a technical support employee was based on the number of tickets processed by him per month. This had its drawbacks, because if there are 10 tickets in the queue, 2 of them are difficult, and 8 are easy, then the easiest and often not the oldest tickets went to work. In statistics, the employee adored 8 completed tickets. In the meantime, he answered them, new light tickets appeared and he continued to solve only the most simple questions. While complex tickets could hang for hours and even days, until the stern manager came and personally poked at this, suggestive so much fear, a ticket.



In order for the employee to have the motivation to take the topmost ticket in the queue (the top one is the oldest), despite its complexity, a points system was developed. Honestly, I don’t really imagine how it is used in motivation, these are all tricks of the heads of client services. But from a programmer point of view, the following was done:

  1. Each ticket is assigned a certain number of points, depending on the position in the queue. All miscalculations of ticket points occur in real time for each employee individually, based on roles. The older the ticket, the higher it is in the queue, the more points it receives.
  2. For each ticket taken to the employee points are credited to his account. If the employee responded to the ticket, these points are assigned to him. If the employee unblocks the ticket and does not answer the client, then a penalty is charged, several times higher than the number of points. Thus, you can even go into the minus.
  3. At the end of the month, according to the statistics collected, managers determine who worked well and who did not. And, probably, they even give cash bonuses or write fines.




This task is, in fact, very extensive and involves many small innovations.



For example, a so-called ticket pool was added. Coming to the workplace, the employee enters this pool and immediately (or almost immediately) a ticket opens for him, to which he must respond. As soon as the employee sent a response to the ticket, a new ticket from his line immediately opens. Here is a pipeline that came up for tickets.



Funny story about how we came to the pool
Immediately after entering the points for tickets, there was some confusion. This is understandable: no one knew why they are needed and what they influence. I will tell you a secret: the leaders themselves didn’t really know why they had come up with these points and how we would use them. What to say about ordinary technical support staff who were not told everything until the last moment.



Having seen some incomprehensible “points” in the interface, the employees began to give in to a completely natural panic, and then they came to the aid of ingenuity: “If points are assigned for each ticket, then the more points the employee has, the better.” I also want to note that the points system was introduced in the summer period, when for natural reasons the burden on technical support is reduced (clients are on vacation and they are not in touch with sites and domains). In the light of innovation, all employees opened a second wind, and they with triple strength began to solve tasks on tickets and take new ones. An event unprecedented for us happened: the queue of hosting tickets, in which at least one hundred tickets were always spinning, was empty in two days! Were all in shock, including customers. The speed of response to appeals has increased significantly.



Now, the staff has a new problem: no tickets, but points are needed. What to do? At first, everyone frantically pressed F5 while waiting for a new ticket to pick it up as soon as possible. After all, you remember about the points, right? They are still credited for a successful response, and the benefit from the points is still unknown.



Special technical support staff members decided to simplify their lives and not engage in monkey labor: "Why should I press F5 myself when I can make the script automatically do it for me." Began to appear the first bots, written in different languages ​​and working on different principles. The team was healthy (and sometimes not very much :) competition. Almost no one shared his bots and each wrote his own.

Bots amazed with their variety and functionality. There were plug-ins for Chrome and Firefox, there were scripts running in the background on the employee’s computer, there were even demons running constantly on the employee’s own VPS. A variety of languages ​​also pleased: Javascript, PHP, Python.



Now there is a problem with bots - competition. After all, in one thread you don’t grab a lot, you need to parallelize the process. So, our newly-minted programmers in the person of technical support staff studied and applied multithreaded processes and asynchronous requests in practice. Not bad, by the way, practice.



All this lasted for some time, but something clearly needed to be done with the situation. 5-6 bots, each from 10 to 100 streams, hammer on our poor OTRS server. I still don’t understand how he didn’t fall, but groaned, pushed, but processed such a number of requests.



As a result, the leaders banned the use of such bots, and clarified the situation with the points. But this story will be stored for a long time in our memory. It was a fun time.


In other words, this is not a queue of tickets, but a queue of employees. Where everyone gets exactly one ticket and gets up at the end of the queue in anticipation of a new portion. At the same time, he does not see the queue with tickets, but receives only what the soulless machine has given him, without understanding how difficult or easy the ticket is.



Approximately in this way we achieved equality and all tickets are executed in strict sequence. I hope our customers noticed this, because for hosting questions the maximum response time is 8 hours.



OTRS own API



In the process of developing the tool, we really wanted to reduce the time for receiving applications. The standard way to get tickets in OTRS is mail. OTS can receive mail in different ways. Can walk on the crown to the mail server and pick up mail from there. Maybe, again by crown, read mail from the sendmail directory and generate tickets. In any case, the time between mail checks is at least one minute. And by and large two or more. The standard mail processing script is rather heavy, and on large letters it can spend tens of seconds on parsing and writing to the database.



And I wanted something light, modern, youth. Therefore, we wrote our API, which accepts JSON, and also adequately processes applications with file attachments.



And then an experienced OTRS programmer will catch me: why write your API, because they have rpc.pl? And it will be in something right. The API does exist, but we did not use it and did it consciously. One of the reasons is that rpc.pl works in the xml-rpc format, and I said that I want something modern and light. And in fact, its development took only a few days. In addition, our solution allowed us not only to receive information about tickets, but also to create new ones, conduct correspondence, open / close tickets and much more. Now we use this API to create tickets from the REG.RU website, and much less time is spent on creating applications.



And what if OTRS is unavailable for some reason? Network problems or prevention? Customer tickets will not be created and you will lose all the lines written with such love?



For this we have a solution in the form of a queue of tasks based on Redis. You can read more about the queue in the presentation by Ivan Sokolov about FastQueue . In the case of a non-200th response from the OTRS server, we create a task in the queue and add all the contents of the letter and even a bit more to it, as well as attachment files in binary form. Within a few hours there are attempts to send information about the new ticket. Even if there were short-term failures in the servers, in a few hours everything is getting better and new tasks are coming to the employees.

But some of the staff and this is not enough. In the heads wander ideas how to transfer task packages through the API.



Tags



Another fairly global addition is the tags. They help customer service managers collect metrics and analyze incoming message traffic.



Tags were originally based on dynamic fields (standard OTRS mechanism). But in the end we did not load the already not fast dynamic field tables and created our own mechanism with separate tables for storing tags. Also tags available search and filtering. There are several types of tags that are attached to a ticket.



First, it is the source tag, i.e., where the application came from. There can be several sources: from the authorized part of the REG.RU site (the user is logged in to REG.RU), from the unauthorized part, via mail and several more rare sources. It helps to determine the popularity of a particular method.



Secondly, tags of participants. Each time an employee answers a ticket, a member tag is attached to the ticket. It also helps to control how we work on the requests of our customers.



Third, there are notification tags. They allow you to quickly filter by emails that were sent automatically. Basically, such notifications include warnings about exceeding the limits on hosting or increased loads.



Email Confirmation



One of the first improvements in OTRS was the mechanism for confirming your email. It is always used if the client writes from an unauthorized part of the site. Obviously, this is created so that we can be sure that the application from the client is really created by him, and not by the attacker who recognized the client's mailbox.



This mechanism has repeatedly saved us from daring requests "Reinstall my VPS" from visitors who are trying to impersonate one of our clients.



Now this is a modified tool embedded in the client part.



Its statistics with metrics and filters



Serious technical support requires the use of serious tools. In order to be able to keep track of each answer and make sure that it was high-quality and complete, as well as in order to monitor the average rates of response, load on employees, we developed our own statistics on tickets in OTRS.



Again, experts will say that there are quite a few reports inside OTRS and you can create new ones. But we have abandoned this embedded tool for many reasons.



Built-in reports work for a long time, use all the same LIKE when sampling, there is no possibility to display them all on one page. When changing the filtering parameters, you need to wait for the system to rebuild everything anew.



And I wanted a light instrument with filters, tables, graphs, histagrams, pies and other buns. And so that they are filled in rialtime and do not load the main base. And to have access control to them. Yes, a lot of things like. I did not want to take all this beauty on the shoulders of OTRS. Recall, moreover, that OTRS only with version 4.0 (which came out quite recently) had an adequate Template :: Toolkit template. The previous one was samopisny and did not even support cycles (ok, supported, but they had to be written in the controller beforehand).



Thus, another project was born to collect and display various metrics. Among them are collected such data:



  1. employee performance:

    • unique tickets for employees;
    • assessment of the quality of responses;
    • the number of replies, postponements, notes, etc .;
  2. department performance:

    • queue load history;
    • time and number of ticket blocking;
    • the reaction time of the departments;
  3. reports:

    • the number of tickets in queues;
    • hourly load distribution.


And much more. Statistics are constantly being updated and all new sections and filters are added. I hope this helps the managers of our client services to analyze the quality of the work of the departments in general and individual employees in particular.



Jabber and SMS alerts



One of the recent innovations, which finally took a decent look and implementation. The idea to notify employees about urgent or important tickets was a long time ago, but only recently (during the last year) we implemented it in a form that suits everyone.



It all started with the introduction of the service "Callback". For a while, we were satisfied that the request for a callback immediately appeared at the top of the list, and the employee took her to work. But this did not last long. I wanted to reduce the time between the appearance of the application and the actual call.



To begin with, a notification was implemented in Jabber via the Net :: XMPP module for working accounts of responsible employees. But we are faced with intractable (at least for now) problems implementing XMPP for our jabber protocol provider. Such frequent “broadcasts” for a dozen addressees quickly took us to a temporary ban and no one received notifications. A lot of blood was shed before they decided to abandon this in favor of push notifications directly to the browser.



Now employees who are authorized to call the client and solve his problem, in case they are at work, receive a notification directly on the screen of their computer and can quickly respond (go to the page with the ticket, find out details, click the Call Client button).



But, in addition to callbacks, there are a number of applications that must be processed immediately, preferably right now. For example, such requests include requests and tasks for Colocation services. Not always the employee who can solve the client's problem is in his workplace. A trivial example: today is a day off. For such cases, they implemented notifications to a mobile phone in the form of an SMS, in which the ticket number, customer ID and message subject are indicated.



Such SMS-notifications allowed to instantly respond to incoming urgent tasks from the client during off-hours and holidays. By the way, working time is recorded according to the calendar specifically for the department, and holidays are recorded using Date :: Holidays :: RU . So we (and you too) can be sure that no urgent message will stay without attention for longer than 2-3 minutes.



Small innovations



We talked about the most ambitious changes, but in addition to them there is also a bunch of small improvements that make the work in OTRS more convenient and faster.



Quick search for answers from the corporate wiki

A small but very useful change. As you have already noticed, many tools that were initially available in OTRS were shallowed for one reason or another. The OTRS FAQ has not escaped this fate. We use our tools to search and insert quick answers in our internal Mediawiki-based FAQ.



The ability to change the queue when adding notes

Just a tiny change - they added the ability to immediately change the queue when they entered a note. It allows you not to open the ticket twice, but to do everything in one go.



Search in customer base REG.RU

In the client's card in OTRS, we display a link to his account in REG.RU with the ability to quickly open the list of his services. The binding of the client and the call goes according to several parameters, such as e-mail, login in REG.RU and some others.



Demarcation of access to queues

In general, it is hardly possible to point out the work of the programmer. We only competently work with roles and groups, the functionality of which goes in the basic configuration of OTRS.



Fast throwing in the previous queue

. , . .



,

, , - . , , , , .





OTRS , , . , .



«-» —

. , . , , . , , , . OTRS, .





. , , , . . , , , .



RS ( SIP-)

. OTRS , . , . « ». , , . Asterisk Restful Interface (ARI) .



, ? « ». , . , , , , - . , .



Redis

OTRS . ( -), inode . , - .



Redis. , -, . -, . , . OTRS Redis ( ), .





, . . , . Google - . , HTML, .



LDAP-

LDAP- . , OTRS, . OTRS.





. , select 80 , . , « » 55 . , .



SQL-

- , . , ( ), .



results



, . , , . - . 2 5 , , . , -, , . ! . Thank them for that.



. , , . .





OTRS. , . OTRS EventHandler, .



.



- , , - .



?



, : « RS ?». , , , .



, , , , . « ». «» OTRS. - . - , - , , . , .



OTRS, , , . REG.RU . , .



? . — .

, — , . , . . .



OTRS API, , .



Thanks



. , .

, .

, , .



Chips , banaking , shikin , imir . , , . , .

Source: https://habr.com/ru/post/267925/



All Articles