WebRTC in Mail.Ru Agent and ICQ - pros, cons and prospects

Hi, Habr!

We recently released a new version of Mail.Ru Agent 5.10 for the Windows platform, and an inconspicuous line appeared in the list of new features for this version: “the quality of voice and video calls has been improved.”

At first glance, a minor change - but only at first glance. In fact, we completely abandoned the media library that we have used so far, and switched to using our own “engine” built on the open solution of WebRTC. We would like to tell about it in more detail today.

Story
')
So, until recently (that is, before version 5.9) we used in our desktop client a “engine” licensed several years ago by Global IP Solutions (GIPS). The idea of licensing someone else's product seemed quite successful at that time: we received a ready-made solution from one of the market leaders, and with it the opportunity to add call functionality to Mail.Ru Agent, the need for which became apparent.

True, in practice everything turned out to be not so cloudless: out of the box, the GIPS library did not work correctly, its closed code did not contribute to quick problem solving, and as a result we even had to take in the Mail.Ru Group’s office “troops” of Swedish engineers who right on the spot corrected errors in his "engine". However, in the end, we got what we wanted - calls to the Agent, realized from scratch in a short time (2-3 months).

Since the launch of this service, we have received a huge amount of feedback and suggestions, as well as many complaints and bug reports. Unfortunately, it was not in our power to satisfy all user requests - the closeness of the library and dependence on a third-party company deprived us of the necessary flexibility in solving the problems that arise. Therefore, starting from a certain point, we began to think about abandoning the GIPS solution. The final nail in the coffin of the licensed “engine” was driven by Google, which bought Global IP Solutions in the summer of 2009. After the purchase, it became clear that there was no point in waiting for improvements in the library (Google usually buys the team and technology, not the customer base). Thus, we lost not only the bug fixes in the future, but also hopes to conquer mobile platforms.

In addition, at about the same time, we ourselves acquired ICQ from AOL, and the question of a single media platform came up very sharply.

We considered many options, from proprietary development to licensing the “engine” from another supplier, but Google again made adjustments, which unexpectedly opened part of the GIPS source code as part of its new project WebRTC .

WebRTC

The essence of the WebRTC initiative is to provide an opportunity to make voice and video calls directly from web pages, without installing additional plug-ins. It is assumed that for this, developers must embed some set of media components into their web browsers, and provide access to them through the generally accepted (standard) high-level JavaScript API .

Well, the idea is good - however, its status is still rather vague. The only browser in which WebRTC is currently supported is the developer version of Chrome (the so-called “nightly builds”), this technology is still far from the W3C standard status, and Microsoft is not asleep, actively promoting its Silverlight.

In short, no revolution in calls from the web with the release of WebRTC has yet occurred. However, we were interested in something completely different in this library - directly the code of media components: acoustic filters, codecs, mechanisms for overcoming NAT, etc.

The first studies of the WebRTC repository gave encouraging results:

BSD license (the ability to use and modify the library without obligation to disclose the source code of the entire product);
availability of all necessary components for the implementation of calls between customers;
API semantics close to the GIPS engine that we used earlier;
frequent update of the code, indicating the active development of the project.

However, a more detailed acquaintance with WebRTC led to the first unpleasant discoveries:

the project was full of bugs, out of the box almost nothing worked;
video quality was rather low (especially in terms of adaptation to channel losses);
only the Windows platform was supported (the embryo code for Android was present in the repository, but at the time of its study it was not even going to);
the realization of one of the most important acoustic components, the Acoustic Echo Canceller (AEC), was worse in WebRTC than in GIPS;
The VP8 codec included in WebRTC turned out to be worse in some situations than the proprietary GIPS LSVX codec we used earlier;
a large amount of "garbage" - unused code.

The main conclusion that we made at the end of the study - the part responsible for the video in the WebRTC library is “broken” and requires careful work to correct errors.

Nevertheless, despite the many shortcomings found, we decided that WebRTC is a good basis for our own engine, and we began developing it on our own. We considered that full control over the code and independence from a third party in the long run are more important for us than the quick launch of Mail.Ru Agent and ICQ with a licensed engine from a new supplier.

So, our strategy was as follows: first, to bring the WebRTC code to production-quality, and then to solve more complex tasks: rewrite a number of especially problematic modules, add support for the h.264 video codec (out of the box, only VP8 goes), port the code to mobile platforms etc.

Actually, we successfully solved the first part of the task, and Mail.Ru Agent 5.10 is a kind of proof of concept - the first real product on which the WebRTC library was “run-in” with our modifications.

What's new in calls to Agent 5.10?

If we compare the new version with the previous ones (in terms of calls), then the changes, at first glance, are few. However, the sharply increased session establishment speed between clients immediately attracts attention. This was achieved through the use of libjingle , one of the components of WebRTC.

Libjingle is a pretty heaped framework for organizing P2P sessions, originally developed for implementing calls in the Google Talk messenger. Since Google Talk uses XMPP protocol, a fair amount of libjingle code solves the signaling problem (that is, the exchange of data on session parameters) over this IM protocol.

Since we do not use XMPP , we saved our signaling, however, we took from libjingle the code responsible for organizing the connection in all possible ways, including passing through various NATs, firewalls, etc.

The idea of overcoming NAT is simple in itself - it is enough for two clients to simply “agree” on external IP addresses, as well as on ports on which you need to open outgoing connections and which you need to “listen”, and establish a two-way transport connection with these parameters (in in our case - via RTP / RTCP over UDP). However, in practice, this is not so easy to do - because of the different types of NAT behind which clients may be located, because of firewalls (on which certain port ranges can be closed or even entire protocols — for example, UDP), because for errors in the implementation of network components in various home routers and other features of network configurations.

Previously, we solved the problem of passing NAT using a "self-made" (and rather primitive) mechanism. In libjingle, a much smarter connection setup logic is implemented. Among the most interesting features are the optimization of sorting “candidates” (IP: port pairs), the use of all network interfaces and IP addresses available on a computer (if more than 1 IP address is assigned to one interface), the encapsulation of UDP packets in TCP connection (in case UDP is blocked on the network), use of STUN servers, etc.

The use of libjingle significantly improved the Mail.Ru connectivity of the Agent: the number of connections established directly between clients (without using intermediate relays) increased 3-fold compared to our previous solution, which had a very positive effect on the quality of sound and video by reducing the number of delays and packet loss along the way. Connection speed has also been significantly increased. Both of these effects are quite noticeable visually.
We also did some work on improving adaptation to the channel, and also switched to the VP8 codec. This also contributed to the improvement of quality, although here we still have something to work on.

You can test the new “engine” by downloading the latest version of the client from the official site (5.10). Of course, your interlocutor must also have this version installed.

Plans

At the moment we have 3 main priorities:

video quality improvement (including better adaptation to channel bandwidth and packet loss);
expanding the range of possible video resolutions (now, as in the previous version, only CIF is supported, that is, 352x288 - we plan to add at least VGA and 720p);
Porting the “engine” to Android and iOS mobile platforms.

The changes in the code that we assume to solve these tasks are so large that in the future we do not plan to keep synchronization with the WebRTC repository - that is, in essence, the media gate becomes completely our development. It is abstracted into a separate library that is not tied to a specific platform or product, and over time we will be happy to publish its code. But, unfortunately, we are now making so many changes to it that it’s too early to talk about a stable public release.

To continually improve the subjective quality, we change the key components of the “engine”: jitter buffer, ARS (Automatic Rate Selection), FEC (Forward Error Correction), AEC (Acoustic Echo Canceller), etc. We also began a research comparison of h.264 versus VP8 — it is very likely that in future versions we will use different video codecs with different call parameters (bandwidth, type and performance of the client device).

As for mobile platforms, each of them has its own specific problems, which also have to be solved. The main ones are performance and compatibility with "iron", and in many ways they are intertwined. This is especially true for the Android platform, which now produces a huge number of devices built on completely different hardware solutions.

For example, most modern SoC solutions (System-on-a-Chip) for the ARM architecture contain a hardware accelerator for encoding and decoding video in the h.264 standard (which is why we began work on incorporating the h.264 codec into our engine). The difficulty lies in the fact that the accelerator control interfaces of different manufacturers have nothing in common. This is not a problem for the “native” software of the phone or tablet: this software is enough to be able to work with only one accelerator. However, a third-party application needs to support all currently popular chipsets - and this is not so easy, given that open documentation is often missing.

In general, most of the problems on Android are mainly connected with various levels of OS abstraction, that is, with software “layers” between the application and the hardware. At the same time, a part of such “interlayers” (for example, drivers) often differ from manufacturer to manufacturer, from model to model. The main efforts are connected with the support of this “zoo”. In addition, in some cases, it is necessary to optimize the resource-intensive code, originally designed for the x86 platform, for the ARM platform.

Everything is compounded by a wide variety of versions of the Android OS, which differ significantly from each other in the most unexpected places. For example, the behavior of the Wi-Fi stack when the screen is off (the normal state attached to the ear of the phone) changes from version 2.2 to version 4.0.

In iOS, their fun. Of course, there are significantly fewer problems with hardware there - the number of configurations is strictly limited, testing is not difficult. But here the limitations of the operating system come to the fore - many things cannot be done through public APIs (for example, access to the same encoding accelerator), and the use of private methods is fraught with problems with “censorship” before entering the App Store.

Despite all these difficulties, the work on porting is actively underway, and in the near future we will introduce a client for Android with support for calls.

Stay tuned!

Ilya Naumov,
Project Manager Mail.Ru Agent

Source: https://habr.com/ru/post/140062/

All Articles

WebRTC in Mail.Ru Agent and ICQ - pros, cons and prospects

More articles: