"Tumbler", or how to make the operation of the call-center exclude possible interruptions in its work

The work of a call center (call center) involves handling calls 24 hours a day, 7 days a week, 365 days a year, in other words, around the clock and continuously. This requirement is highly desirable for call-centers providing commercial services. But there are a number of call-centers for which this condition is obligatory. Such call-centers are services “09” or emergency services “01” - “04” and “112”. Despite the assurances of the suppliers of call-center platforms about the high reliability and reliability of the system, it still happens that such situations occur when the hardware and software complex fails. And the processing of incoming calls to the call-center becomes impossible. Whether this fall is associated with software problems or hardware component problems is no longer important, since a break in service calls is in itself critical.

Not all call-center platforms suggest the possibility of a full “hot” backup, and in many platforms that assume a full “hot” backup, the budget for it is comparable to buying a second such call-center.

So how to make it so that you can reserve the call-center capacity even if the call-center architecture does not allow making “hot” reservations or optimizing your costs for organizing a reservation scheme. I will make a reservation right away, such a solution is mainly focused on large call-centers, although in some cases it may be useful for a small number of operators too.
')

Recently, we implemented a “hot” backup scheme for a real call-center, where the existing call-center platform did not assume a “hot” backup scheme. The total number of operators - 75, connection to the public telephone network (PSTN) - E1 (edss1) in the amount of 4 pieces from one telecom operator. Maximum load of lines in CHNN - no more than 90 simultaneous calls in the system (conversation with the operator + waiting in the queue).

We had a strategic task to eliminate a possible interruption in the service of telephone calls. We started with the fact that we understood that when handling calls, even in the CNN, the loss of one E1 stream is unpleasant, but not terrible. The task could not be solved within the framework of the existing call-center platform, and we decided to expand the framework and solve the problem as part of building two independently installed call-center servers, and further increasing their number to four. Ideally, it was desirable to spread E1 streams into four different call centers established independently. That is, if any of the independent call-centers “falls”, the other platforms must continue to service the calls, and the operators' workplaces, having lost contact with the call-center server, switch automatically to another server and the entire call-center system continues to service the telephone calls, having suffered minimal losses in one call-center server and one E1 stream.

The “fall” of any of the independently established call-centers is accompanied by the disruption of the E1 connection of the flow brought into this call-center and by the communication operator. Thus, we consider that the event “call-center drop” and “E1 flow connection rupture” are equivalent.

Works on the side of the carrier. Equipment operators allows you to route calls to alternative destinations in case of unavailability of the main. That is, if, when routing a call to a specific E1 stream, the current direction is not available (“loss of communication on the E1 stream”), the communication operator routes the same call to an alternative direction — other E1 streams. We have solved this issue together with the service provider by setting up the “cross-routing” of calls for E1 flows on the operator’s side. That is, if in the course of operation the switchboard operator fixed the break in communication with any E1 flow, then calls are routed to other E1 flows, and in the case of communication restoration, the original routing scheme is resumed. In addition, we have determined with the service provider the priority order of E1 flows for receiving calls. After the problem with routing and call distribution was solved, it was necessary to organize the service of calls within the help desk.

Cross routing of calls on E1 streams made it possible to completely eliminate the possibility of refusal to process a call if any of the call-centers fell.

Work on the site help desk. After the questions of incoming calls to the help desk site were resolved, in any case (even if any of the E1 streams “fell”) it was necessary to solve a number of routing tasks between independent call centers. We assumed that it would be administratively difficult enough to control the number of operators connected to one or another call center. Consequently, the number of operators logged into any of the call-centers is not regulated. That is, at any time in any of the call-centers there can be any, even zero, number of operators. In addition, it is necessary to ensure uniform load on operators. That is, for the same time interval, each of the operators connected to any of the call-centers should receive approximately the same number of calls. Additionally, it is necessary to provide automatic switching of the operator’s workplace to another call-center, provided that the current one “falls”. Well, actually, it is necessary to combine the statistics of call processing from all call-centers, leaving the display of indicators of real-time statistics in each call-center separately.

First of all, we combined all the call-centers in such a way that each call-center was connected with the rest of the other VoIP channels, providing the processing of 30 voice connections. Such redundancy allowed, if necessary, to “give up” all 30 calls received by the E1 stream to any of the neighboring call-centers. The next task was to implement such a logic of directing calls between call-centers, which would balance the load between help desk operators and eliminate the appearance of a service queue at one of the call-centers if there are free operators on the other.

The next step was the development of call exchange logic between call-centers in such a way that the requirements of load balancing on help desk operators were met. It was made like this. Upon receipt of each call (from the E1 flow side) to each of the call-centers, the call-center calls the stored procedure in the external database, passing there the parameters:
- number A (AON)
- number B (dialed number)
- the number of free operators (Fi), that is, the number of operators in the system and in the status of “ready to serve the call”
- the total number of operators (Ni) servicing calls, that is, the number of operators in the system in any status (“ready”, “busy”, “post call processing”), except for the status “break”.
- number of subscribers in the queue (Qi)
- the estimated response time (Ti), in the event that the call will be distributed to this server.

As an output parameter, the stored procedure returns the name of the server to which the call should be redirected. The following constants are used in the stored procedure: time interval (P) during which we consider the information received from the servers as relevant. The error (E), if not exceeded, we assume that the load on the operators is the same.

The logic of deciding which of the servers to send the call takes into account instantaneous, relevant only at the moment, indicators of the workload of operators (Ri) on each server. The following values are taken as an instantaneous indicator of operator workload: a) if there is a queue, this is the expected waiting time of the client in the queue (Ti) or the ratio of the queue length (Qi) to the total number of operators in the system (Ni). b) in the absence of a queue, the value inverse to the number of free operators (1 / Fi) serves as an indicator of operator traffic. If, in the absence of a queue, the instantaneous workload indicator does not arouse suspicion, then the question of which indicator to choose in case of a queue required further study. It was experimentally found out that for load balancing between different platforms a call-center, the best results are obtained using the ratio of the queue length to the total number of operators in the system (Qi / Ni). The reason for greater confidence in this indicator lies further in the fact that different call-center platforms (and in our case there were two different platforms from two manufacturers) use their own algorithms for calculating the expected waiting time in the queue, different algorithms imply different accuracy of these calculations and different sampling of values. In the event that the balancing mechanism is used for load balancing between two identical call-center platforms, the use of the expected response time is more justified.

Thus, for each of the servers, the call-center was calculated instantaneous indicator of operator workload. When calculating, we process exceptions when the number of free operators is zero, when the total number of operators is zero. We cut the server utilization indicators (Ri) that arrived earlier than the specified time interval (P).

The final step in deciding which server to send the call to is the choice of the least loaded of the call-center servers. At the same time, if the difference in workload between servers does not exceed the specified error (in practice we took it equal to 10% or E = 0.10), then the mechanism of cyclic call distribution to the call center servers is activated (the first call to the first server, the second to the second and so on).

That is, in fact, all the distribution of calls. Unless it is necessary to add that in case of impossibility to give a call to the server specified in as the target server according to the results of the stored procedure, we give the call to the next server, with this in the DB.

Organization of jobs on the site help desk. All help desk jobs were logically divided into 4 zones. Workplaces of each zone were connected to the call-center, which was the main one for the workstations of this zone (see figure). At the same time, alternative (additional) call-center servers were specified in the workplace settings, to which the operator’s workplace should be connected if the connection was broken between the workplace and the call-center and the call center does not respond to the client worker’s requests places.

Another issue that needed to be addressed is the collection of statistics in a single place. But there everything is quite simple, the usual collection of statistics from different databases into one with bringing the data to one common form. In general, there is nothing to describe.

As a result, we received such an organization of work in the reference service, when the call-center results in the “fall” of any of the servers, only to 25% loss of power. The flow of incoming calls continues to be processed, and the operators automatically switch to another working server call center.

Sakhabutdinov Ayrat

PS: the problem of building such a distributed system for two different call-centers of various manufacturers, including load balancing and statistics collection, is now solved and working. In the work - building a complete distributed system on 4 e1 streams based on the call-center solution from one manufacturer

Source: https://habr.com/ru/post/108726/

All Articles

"Tumbler", or how to make the operation of the call-center exclude possible interruptions in its work

More articles: