Video communication is the main way of communication between a teacher and a student on the Vimbox platform. We have long abandoned Skype, tried several third-party solutions and eventually settled on a bunch of WebRTC - Janus gateway. For a while we were satisfied with everything, but still some negative moments continued to come out. As a result, a separate video direction was created.
I asked Kirill Rogoviy, the head of the new direction, to tell about the evolution of video calling in Skyeng, the problems found, the solutions and the crutches that we eventually applied. We hope the article will be useful for companies that are also raising their own videos through the web application.
In the summer of 2017, Skyeng development head Sergey Safonov spoke at Backend Conf with a story about how we “abandoned Skype and implemented WebRTC”. Those interested can watch the recording of the speech on the link (~ 45 min), and here I will briefly outline its essence.
For the Skyeng school, video calling has always been a priority teacher-student communication. Initially, Skype was used, but it categorically did not suit for a number of reasons, primarily due to the lack of logs and the inability to integrate directly into the web application. Therefore, we conducted all sorts of experiments.
Actually, the requirements for the video link, we had about these:
- stability;
- low price per lesson;
- recording lessons;
- tracking who talks how much (it is important for us that the pupils speak in class more than a teacher);
- linear scaling;
- the ability to use both UDP and TCP.
The first in 2013 tried to implement Tokbox. Everything was good, but it turned out very expensive - 113 rubles per lesson - and ate up the profit.
Then in 2015 integrated Voximplant. Here was the tracking function we needed, who says how much, and the solution was much cheaper: subject to recording only the sound came out 20 rubles per lesson. However, it worked only through UDP, it was not skillful to switch to TCP. However, in the end, about 40% of students used it.
A year later, we began to appear corporate clients with their specific requirements. For example, everything should work through a browser, only http and https are open in the company; i.e. no Skype and UDP. Corporate customers = money, so they returned to Tokbox, but the price problem did not go away.
We decided to use the browser platform for WebRTC peer-to-peer video calling . She is responsible for setting up a connection, encoding and decoding streams, synchronizing tracks and quality control with processing network glitches. For our part, we need to ensure reading streams from the camera and microphone, video drawing, connection management, setting up a WebRTC connection and transferring streams to it, as well as sending signal messages between clients to establish a connection (WebRTC itself describes only the data format, but not the mechanism of transfer). In case clients are behind NAT, WebRTC connects STUN servers, if that doesn't help, TURN servers.
Usual p2p connections are not enough for us, because we want to record lessons for further analysis in case of complaints. Therefore, we send WebRTC streams through a Janus Gateway repeater from Meetecho . As a result, customers do not know each other’s addresses, seeing only the address of the Janus server; it also performs the functions of the signal server. Janus has many of the features we need: automatically goes to TCP if the client has UDP blocked; can record both UDP and TCP streams; scaled; There is even a built-in plugin for echo tests. If necessary, STUN and TURN servers from Twilio are automatically connected.
In the summer of 2017, we had two Janus servers plus an additional server for processing the recorded raw audio and video files, so as not to occupy the mainstream processors. When connecting, Janus servers were chosen on an even-odd (connection number) basis. At that time, this was enough, according to our feelings, gave about a fourfold safety margin, the percentage of implementation was about 80. At the same time, the price was reduced to ~ 2 rubles per lesson, plus development and support.
We constantly monitor feedback from students and teachers in order to identify and stop problems in time. By the summer of 2018, the quality of communication was firmly established in the first place among complaints. On the one hand, this meant that we successfully dealt with other shortcomings. On the other hand, it was necessary to do something urgently: if we break a lesson, we risk losing its value, sometimes together with the cost of buying the next package, and if we break down an introductory lesson, we lose a potential client altogether.
At that time, the video link was still in MVP mode. Simply put, they launched it, it worked, scaled once, understood how to do it - well, it's nice. If it works, don't fix it. No one purposefully addressed the issue of communication quality. By August, it became clear that it could not continue like this, and we launched a separate direction to find out what is wrong with WebRTC and Janus.
This direction was received at the entrance: the MVP solution, no metrics, no goals, no improvement processes, while 7% of teachers complain about the quality of communication (there was no data on students either).
The command looks like this:
To begin with, we set up a relatively reliable metric that tracked changes in the assessment of the quality of communications (average by days, weeks, and months). At that time, these were grades from teachers, and later grades from students were added to them. Then they began to build hypotheses that did not work, correct and look at the changes in dynamics. We went through low-hanging fruits: for example, we replaced the vp8 codec with vp9, the performance improved. We tried to play with the settings of Janus, to carry out other experiments - in most cases they did not lead to anything.
At the second stage, a hypothesis appeared: WebRTC is a peer-to-peer solution, and we use a server in the middle. Perhaps the problem lies here? They began to dig and found here while the most significant improvement.
At that moment, the server was selected from the pool by a rather stupid algorithm: each had its own “weight” depending on the channel and power, and we tried to send the user to the one where the “weight” is greater, not paying attention to where the user is geographically located . As a result, a teacher from St. Petersburg could communicate with a student from Siberia through Moscow, and not through our Janus server in St. Petersburg.
The algorithm was altered: now, when the user opens our platform, we use Ajax to collect pings from it to all servers. When establishing a connection, we choose a pair of pings (a teacher-server and a student-server) with the smallest amount. Less ping - less network distance to the server; less distance - lower probability of losing packets; packet loss is the biggest negative factor in video calling. The share of negative for three months fell twice (for the sake of justice, other experiments were conducted at this time, but this one was almost certainly the most affected).
Recently, we discovered another unobvious, but, apparently, an important thing: instead of one powerful Janus server on a thick channel, it’s better to have two easier ones with lower bandwidth. It turned out that after we bought powerful machines in the hope of cramming in as many rooms (communication sessions) as possible there simultaneously. Servers have a bandwidth limit, which we can accurately translate into the number of rooms - we know how much you can open, for example, at 300 Mbps. As soon as there are too many rooms on the server, we stop choosing it for new activities until the load decreases. The idea was that, having bought a powerful machine, we would load the channel to it to the maximum, in order to finally run into the processor and memory, and not throughput. But it turned out that after a certain number of open rooms (420), despite the fact that the processor, memory and disk utilization is still far from the limits, a negative begins to arrive in tech support. Apparently, something is getting worse inside Janus, perhaps there are some limitations too. We began to experiment, reduced the bandwidth limit from 300 to 200 Mb / s, the problems were gone. Now we bought three new servers with low limits and characteristics at once, we think that this will lead to a stable improvement in the quality of communication. Understand what was the case there, we, of course, did not, crutches - our all. In our defense, we say that at that moment it was necessary to solve the urgent problem as quickly as possible, and not to make it beautiful; besides, Janus for us is a black box written in C, to dig with it is very expensive.
Well, in the process we:
The experiments carried out and the changes that followed them reduced dissatisfaction with communication among teachers from 7.1% in January 2018 to 2.5% in January 2019.
Stabilization of our Vimbox platform is one of the company's main projects for 2019. We have high hopes that we will be able to maintain the momentum and no longer see the video link in the top of complaints. We understand that a significant portion of these complaints are related to the lags of computers and Internet users, but we must determine this part and resolve the rest. Everything else is a technical problem, it seems we should be able to deal with it.
The main difficulty is that we do not know to what level it is really possible to increase the quality. Clarifying this ceiling is the main task. Therefore, two experiments were planned:
These two experiments will allow us to define an achievable goal and concentrate on it.
In addition, there are a number of tasks to be solved in working order:
From April, the direction of the video call becomes a full-fledged separate project within Skyeng, which deals with its own product, not just a part of Vimbox. And this means that we are starting to look for people to work with video in full time mode . Well, as always we are looking for a lot of good people .
And, of course, we continue to actively communicate with people and companies working with video. If you want to exchange experience with us - we will be happy! Comment, contact - we will answer all.
Source: https://habr.com/ru/post/446444/
All Articles