📜 ⬆️ ⬇️

Hedgehogs on wheels: how do we maintain the quality of communication in Moscow

In the spring and summer of 2017, Roskomnadzor tested mobile operators and published the results on the quality of communications. As a result, MegaFon was the best in terms of successful voice connections and mobile Internet speed. At the same time, Internet testing was conducted by Cat.4 terminals (up to 150 Mbit / s), and in early August we were the first in Russia to launch Gigabit LTE (up to 1 Gbit / s). So when testing with Cat.6 devices and faster, the gap with competitors could be even greater. In this post we will describe how we achieve such results.



MegaFon's good performance is the result of the integrated work of all departments of the technical unit. The contribution is made by engineers responsible for the construction, planning and optimization of the network, operation systems, accident monitoring and quality control of the network.

The basis of a competent operation and quality control system is network monitoring. The first control and monitoring center (TsUM) of the MegaFon network was deployed in 2006 on Vyatskaya Street in Moscow.
')


In May 2015, following global trends, MegaFon brought monitoring to a single network management center (ECUS, also known as GNOC - Global Network Operational Center), located at two sites - in Samara and St. Petersburg. The advantage of two sites is that if any problems arise in one of the NOCs, the second one can quickly pick up its tasks.

The engineers of the center monitor the network elements of all subsystems around the clock and, if necessary, initiate prompt troubleshooting, using the necessary specialized units. For example, if a base station (BS) fails, ECU engineers inform a local team of engineers about this, who go to the BS and fix the problem. About ECUs and its features you can tell interesting and much - this is a subject for a separate post. Here we will focus on describing the principles and approaches of working with the quality of the MegaFon network.

"Robots are driving, not a person", or an automatic Self-Optimization Network (SON) system


To monitor the countless elements and parameters of a modern mobile network, we use the SON (Self-Optimization Network) system. In 24/7 mode, it automatically collects equipment parameters, uses network statistics and traces from each network element. Based on this data, SON performs three main functions:

  1. Quickly responds to changes in the network, for example, to sudden surges in load. For example, if the number of subscribers on a BS suddenly increases, SON can change the cell parameters or the antenna tilt angles to optimize the coverage area of ​​this BS, thereby redistributing the load on the neighboring BSs.
  2. Supports correct settings and interconnections between BSs. SON constantly checks the network settings and also optimizes the neighborhoods (interconnections) between the BSs. This is necessary so that subscribers can do a “handover” - to switch between the cells on one BS or neighboring BSs without interrupting the service. SON adds potentially useful (by its assessment) neighborhoods, checks how existing ones work, and removes unused ones.
  3. Automates routine network optimization tasks. Since mobile networks now have a very complex structure - three technologies (2G / 3G / 4G) on one BS with several bands in each technology (LTE-800/1800/2600, etc.) - the number of neighborhoods on only one BS is about a thousand . To cover this volume, not enough staff of engineers of a single company. SON removes this burden from engineers, who can only control the operation of the system. Thus, human resources are freed up for more complex and creative tasks, for example, the introduction of new technologies.

On average, the system performs from 200 to 400 thousand corrective operations per day. According to our estimates, this reduces the burden on engineers by 40-60%.


The SON system has a modular structure in which each module is responsible for performing a specific set of functions. Examples of basic modules:


In order for SON mechanisms to work efficiently, their logic must be regularly adapted to changing network conditions. In MegaFon, a special team of engineers is engaged in supporting this area. Their task is to continuously expand the functionality of SON by developing and commissioning new modules and automatic algorithms for managing software parameters of the radio subsystem.



Technical Control Department


The possibilities of SON are not limitless, and in difficult situations, human intervention is required to solve problems. For this purpose, there is a technical control department (QC) in MegaFon, which is connected when the ECUS cannot determine the cause of the quality degradation on the equipment. All statistics accumulated on the network operation are drained here. Using a wide range of proprietary networked KPIs and accumulated experience, we identify patterns in deterioration at various nodes, find their cause, and fix the problem.

A number of tools help us in this:

  1. Dashboards on various subsystems with customized KPIs.
  2. Subscriber traces. For them, you can analyze in detail the signal exchange of the phone with the network and identify the cause of the subscriber's problem.
  3. Measurement of staff and crowd-source systems.
  4. Benchmarking for comparison with other operators - SpeedTest services for data transfer speed and Vigo for video quality.
  5. The SQM system, which allows using the newest approach to E2E analysis, is not only based on non-network metrics, but at the level of a specific service as a whole. Suppose you are satisfied with the overall quality of the network, but the Facebook application does not work specifically. In this case, the system will allow to look in detail at what stage the problems arise.

We will not dwell on dashboards and subscriber traces, since in one form or another this has been the case for almost every cellular operator. Differences are manifested only in the nuances of analysis and testing of problems. But the rest will tell more.

Crowd-source data: My Network application


The application My Network, developed by Metricell and customized specifically for MegaFon, allows you to collect information about the quality of communication from a subscriber and transfer it to MegaFon technicians for analysis. The application is available to all in the Play Market. My network can operate in two modes - passive and active.



In the passive mode, basic information about network coverage (signal level, connection quality, etc.) and failures is automatically collected and sent to MegaFon server. There it accumulates and is further used in many activities: from network planning to marketing campaigns. In the active mode, the user independently checks the quality of communication and, in case of problems, reports them through a simple form.



“My Network” helps to identify network problems in complex urban buildings or rural areas, where engineers with portable measuring systems did not reach and did not reach the automotive systems for radio measurements.

Field radio measurements


In one popular song there are such words: “Because according to statistics, there are nine guys for ten girls”. It provides information on the numerical ratio, but it says nothing about what kind of girls and boys these are, how old they are and where they live. So in monitoring: the “big data” of network statistics can provide much food for thought, but for global trends, you can easily skip private problems. Most quality control tools rely on network statistics, which, in essence, is a vast array of information on the state of the network and the operation of services. Like any Big Data, network statistics shows trends well and does not always reveal particular cases of problems on the network. In addition, if in a certain zone there is no mobile network coverage at all, there will also be no information about this problem in the network statistics.

In such cases, in order to identify and eliminate problems, as well as for actual network measurements, we use field radio measurements: local drive tests with the participation of an engineer and automatic control tests using a machine with measuring equipment.

Local measurements are carried out for specific places, often on the basis of a subscriber’s complaint or when it is impossible to use automatic complexes (to drive by car). For example, in closed courtyards, shopping centers, subways, etc. And also for testing new services and technologies, running them live.



Wearable measuring complex is placed in an ordinary backpack. It consists of a scanning receiver, several measuring smartphones installed on a special chassis, and a control device. With the help of a scanning receiver, the engineer estimates the coverage of the cellular network, and also views the broadcast on all operating frequencies in a specific location. For example, you can detect a weak signal from a remote base station creating interference, or estimate the coverage of all four operators in this zone.



Special firmware measuring smartphones allows you to control their functions at the deepest level, up to connecting only to a specific range or base station. In the maximum version, up to 8 smartphones can be installed, but, as a rule, 4-5 devices are used - according to the number of basic tests. Using smartphones, the engineer assesses the quality of voice calls (including the so-called MOS (Mean Opinion Score) or, in a simple way, speech intelligibility), data transfer speed, video viewing quality, instant messengers and other parameters of standard services.



Full control of smartphones is carried out via Bluetooth, you do not need to climb into the backpack every time. As a controller, a smartphone, tablet or laptop can be used. Through him, the engineer starts the run of a test on one or all smartphones at the same time, once or in cyclic mode.



Logs of all tests are recorded in the memory of smartphones for subsequent analysis in the office. If necessary, tests can be conducted through the control device itself. In the process of radio measurements, the engineer can himself assess the identified problem and try to resolve it online with specialists located in the office.

For large-scale control measurements, we use not engineers with backpacks, but special automatic radio measuring systems installed on cars.



The driver of such a car does not participate in the management of the measuring complexes, but is responsible for moving along a given route. However, our current drivers are so experienced that they can, if necessary, turn up simple operations themselves, such as checking the status of a complex or rebooting a system.


The presence of "hedgehog" on the roof of the car due to the large number of bands involved in the cellular network of MegaFon.

As soon as the key is turned in the ignition lock, the complex starts up. It is a rack with 3-4 blocks, each of which consists of 4 devices that emulate smartphones. As with manual testing, the number of devices is determined by the number of tests performed. The units have connectors for connecting external antennas.



All management of the complex is carried out remotely. The engineer directly from the office can set the algorithm of his work or change the configuration. In terms of its capabilities, such a complex does not differ much from the version in the backpack, however, it has increased reliability, which allows performing testing around the clock, collecting a huge amount of data. Here we can also measure network coverage, make voice calls and test mobile data services. All measurement results are accumulated in the logs and are transmitted online to the office via cellular communication channels.



With the help of control radio measurements, we regularly evaluate the networks of other mobile operators, comparing with ours. We also collect and analyze specific settings used on their equipment. Such benchmarks allow you to evaluate your quality against other operators and to learn something useful from competitors.

System Quality Management (SQM) for network quality control


The latest SQM system deployed in MegaFon contains several modules:





Not a single mass event in Moscow and the region goes unnoticed by MegaFon.
In accordance with the territory of each event, a location is created in the system (a grouping of cells serving the event).



In accordance with the location, the system every five minutes gives the engineers the characteristics of the quality of the network at the event, affecting customer experience. In the case of a sharp change in indicators, we immediately localize the problematic elements of the network and use specialized units for prompt elimination. This approach was widely used during the Confederations Cup.



At mass events, MegaFon's OTC engineers apply non-standard solutions. For example, employees watched a concert of the Leningrad group through open public broadcasts in Periscope and Instagram. This allows online to determine the quality of the final service at specific points in the event area and to take prompt action in case of deterioration.




Moscow and the Moscow region are divided into 470 clusters, which are monitored by KQI (Key Quality Indicator). With a sharp deterioration in quality without visible accidents, automatic alarm generation is set up on the equipment. To not miss anything. In the event of a problem, the system issues a list of cells with the degradation of specific KPIs and the main ClearCode (cause of the problem) in 5 minutes.









And can appreciate the quality of their work.



MegaFon actively uses these metrics, because the client assesses the quality of the network, based not only on objective indicators, but also on the performance indicators of his terminal (smartphone). With the help of this data, we identify problem devices and do a lot of work with their manufacturers, assist them in the development of new firmware for phones and equipment. This information also helps to understand how subscribers use the capabilities of their network and equipment - taking it into account, we form new proposals. For example, a large proportion of subscribers who could use VoLTE technology, but do not do it, because their terminal is not updated to the desired firmware version.



Service Quality Management is able to collect statistics on roaming in foreign cellular networks and build their rating by quality indicators with the issuance of recommendations for manual connection. The engineer sees the data on the terminals of the roaming subscribers registered in the MegaFon network and tracks the basic parameters of their work.

As an example of the integrated approach of MegaFon to the quality of the network, one can cite the high-speed train “Sapsan”, which runs daily between two capitals. We provide LTE coverage along the route and at the same time we study the user experience in each car at any time. For this purpose, we installed Metricell Automobile measuring devices in each Sapsan car. Every second they collect statistics on important services for the subscriber and transmit data to the server for further analysis.



This allows you to find problems in the operation of Sapsan equipment that cannot be localized from any other statistics or accident data.

Tools for quality control of popular services


According to forecasts, up to 75% of all mobile traffic in the future 5 years will be video content. Therefore, it is important for us to assess how well the network allows you to view videos. The Vigo toolkit helps MegaFon in this.

This is the Russian SDK, which is embedded in many popular video viewing services - Vkontakte, Ivi.ru, Megogo, STS, Rain, etc. - and collects basic quality metrics in them, such as delays, viewing cliffs, buffering time, speed, video resolutions, etc. In addition, with the help of Vigo, we can compare ourselves with other operators in terms of the quality of this service.

The map below shows examples of data from Vigo. In green areas, MegaFon has the best performance, in yellow areas - parity with other operators.





This is how MegaFon assesses the quality of the network using numerous online and offline tools. If in our story you have any questions, leave comments, and we will answer you.

Source: https://habr.com/ru/post/342786/


All Articles