In the
previous article I covered a bit the question of the available ways of organizing voice communication in the browser. This time the task will be more difficult: we want to make video calls from the browser to the remote subscriber, who is sitting at the softphone or a device that supports SIP. It may be necessary, for example, here's why:
- we want to make a system of online consultations for online stores, which will allow site visitors to conduct a video conversation with a consultant sitting at the usual messenger.
- we want to complement the Polycom-based teleconferencing system with the ability to connect participants who have nothing but the browser.
Technology
I will not completely repeat all the calculations from the previous article, but go straight to the conclusions. If we want to:
- all desktop browsers were supported
- it was not necessary to put additional software
- the system was robust to network interference
- delays were minimal
we do
n’t have any other way
at the moment , except to make decisions based on Adobe Flash Player and the
RTMFP protocol
, however sad it may sound. The bright future is not far off: Google has
promised to soon include in Chrome the support of the very interesting technology
WebRTC , which I will definitely write in a separate article. In the meantime, we use what users already have.
Video support in Adobe Flash Player
Flash is currently able to play streams compressed with several codecs:
- H264
- Sorenson Spark H263
- On2 VP6
- Flash Screen Video
Of all this “wealth”, we are interested only in H264, because finding support for the other options in softphones and SIP devices is almost impossible.
With the capture and encoding of video from the camera, everything is much worse. H264 encoding support appeared only in the recently released version of FP11, and before that the only option was Sorenson Spark. Unfortunately, the 11th version is not yet installed by the vast majority of users, so you have to reckon with those who only have FP10.
')
We must also not forget with whom we are dealing. Adobe managed to “break down” the playback of certain types of H264 streams in Flash Player versions 11.0 - 11.2. The problem is in playing streams packaged in packetization-mode: 0, and this is the mode used by most softphones. Details about the bug can be read in the
bugtracker company.
Total the following picture turns out. To successfully connect to the H264 SIP client, we need:
- perform Sorenson -> H264 transcoding in one direction if the user of the FP version is lower than 11
- perform transcoding H264 -> H264 (to change the packetization) in one direction, if the user has FP 11 with the above mentioned bug
- let traffic as it is, in all other cases.
The
ffmpeg and
libx264 bundle is well suited for transcoding. For transcoding performance, it is extremely important that the server supports MMX, SSE and similar technologies as later as possible. Video codecs are able to use them, while accelerating at the same time.
Video is not a voice
At first glance, it may seem that the only difference between video and voice transmission is the width of the channel used. This is certainly true, but there are a number of significant differences.
The audio stream is usually divided into frames of 10-20 ms, each of which is encoded and decoded separately from the others. For video, this would be too wasteful, so for most frames, the image itself is not encoded, but its difference with the previous frame. For even better compression, the difference is taken with a slightly “shifted” previous frame to compensate for the movement of objects. In general, you can write separate cycles of articles about video compression and I will not dwell on this here.
Another thing is important. If we lose one frame of the audio stream, then we can simply
disguise it , for example, by losing the previous frame again, and few people will notice it. And in the video, this trick will not work, because the subsequent frames should be superimposed on the lost one. From here, artifacts appear that will not disappear themselves, unless you ask the remote side to send an independently compressed frame (key frame). In SIP, this can be done in two ways: at the signaling level via SIP INFO, and at the media level via RTCP.
Next, you need to consider the limit on the MTU of the channel between the participants in the conversation, which is usually about 1500 bytes (you can not rely on IP fragmentation in the case of NAT). Any audio frame will fit into this restriction, but the video frame most often does not. Hence the need for splitting frames into pieces, which is called packetization, which is precisely the cause of a bug in some versions of Flash Player.
Result
As a result, if you carefully walk through all the rakes, and everywhere you need to spread the necessary amount of hay, you can get a completely working solution. We managed to integrate support for video calls from the browser into our cloud platform
RTCKit , which in turn allows us to integrate this functionality into any web service in a matter of hours, saving a lot of time.
You can
test all this without registering on our
test page . Video resolution there is limited to 352x288. We tested the
Jitsi and
LinPhone softphones , it would be interesting to hear feedback about other customers with H264 support. We will try to withstand the load from the habr-effect! Important note: if you call via RTCKit from browser to browser, and you have a sufficiently friendly NAT, then RTMFP Peer-2-Peer technology is used instead of everything described.
In future articles we will cover the topic of voice and video conferencing, call routing, and interaction with mobile devices. Stay tuned.