📜 ⬆️ ⬇️

Record a video call from the browser: we hoped to file a week

At the beginning of our journey, our voximplant cloud platform allowed us to work only with voice calls. But progress does not stand still, and over time we added video transmission, text messages, presence, and many other features. And recently we finished developing the video recording function: now during a video call, it’s enough to call the record function from the javascript call manager to get a link to the recorded video file.

For our clients, everything looks and works very simply, but for us this task was not as simple as we thought. Several months it took our far from weak developers to solve a number of technical problems and create an adequately working solution. Under the cut - the story of our struggle with codecs, file formats and webRTC.

Video calls are different. You can call from the browser using the fashionable technology webRTC. If the browser is old and there is no technology in it, you can roll back to flash. You can do without a browser and use the phone program, the so-called “sip softphone”. You can connect to the Internet physical video phone with SIP support and make calls through it. You can make a mobile application calling via the Internet: especially for this we have sdk for ios, android and react native.

All these methods are united using the SDP protocol, thanks to which the parties involved in the video call agree on the resolution of the video, the codecs used and many other things needed to organize the call. And at this moment we get the first problem.
')

Difficulties with codecs


Codecs are different. SIP phones can try to agree on any. For flash, this is h.264. But for webRTC the story is quite interesting. The webRTC Standardization Committee has been puzzled for a long time about which codecs to make mandatory for the protocol: vp8 does not require patent fees, but h.264 is in every refrigerator. In the end, they spat on this case and made both codecs mandatory to implement . So despite the fact that de facto webRTC uses VP8, H.264 support is specified in the standard and, possibly, will receive some distribution.

What are the problems? Oddly enough, with containers. We do not know in advance about which codec the calling parties agree. Moreover, the video can be turned on and off during a call. Not all containers are equally useful, and for most containers libraries require you to immediately specify the codec for the video - which we do not yet know at the time of recording. And customers want to give the video in the most common containers, and that there is good compatibility with browsers. And browsers do not support all possible combinations when playing videos. For example, most browsers will not be able to play mp4 with VP8 video codec. And I’m not talking about the “bad” combinations of video and audio codecs.

The most durable solution we have found is the recording of video and sound into an intermediate file, after which the conversion without recoding to the desired format. Data can be written both in raw files and in a container. After some discussion, we decided to use the mkv container: it supports all the necessary codecs and simply convert it to other formats.

Video recording


At first we tried to go the easiest way and use the de facto standard video library, the great and powerful ffmpeg. But reality often makes adjustments to ambitious plans. It turned out that the mkv module for writing the container of the same name is complex. No, not so - it is DIFFICULT . Having spent several days on not very successful struggle with api and studying numerous unanswered questions in Google search, we changed tactics and began to look for a replacement. It turned out you don’t need to go far: the libmatroska from the mkv team is easy to use and works out of the box, which could not fail to please our developers.

The result is the following architecture:

  1. Javascript client running in our cloud calls the call.record () method with the settings “{video: true}” for video recording.
  2. Our server creates the mkv file and records the video for the corresponding participant in the call (if any) and two audio tracks, one per participant in the call.
  3. If the javascript is subscribed to the recording start event, then the corresponding callback is called from the url of the file being written (more on that later).
  4. After the end of the call, the python script is executed, which analyzes the tracks in the mkv file and converts it to either .mp4 or webM, depending on the codec used. Audio tracks are combined into one, stereo or mono for choice. In the case of stereo, the voice of the first participant of the call is placed in the left stereo channel, and the second participant in the right one.
  5. The resulting file is uploaded to Amazon, where it becomes available at the previously issued URL. The file in the URL does not have an extension (since we do not know it at the time of the start of the recording), so its type is transmitted to the MIME client by the field in the HTTP header.


Bonus for Habr


Open source we love not less than javascript and webrtc. Since we have spent so much time recording calls, we can and should share with the community. Especially for harabrachiteli we laid out on github the c ++ library created by us for recording mkv video in a few lines of code. So if you ever face a similar task, you will be able to take advantage of our work and save some time. The library is available under the MIT license in our official repository .

findings


Initially, we wanted to “burn” a video recording for a couple of weeks. It turned out for a couple of months. Why does this happen? The rapid development of the software development industry has led to a rapid change in technology and the emergence of a huge number of narrow specializations. Therefore, when planning, if you do not have a specialized specialist in the team, you can seriously “miss”, without disregarding any nuance. Agile approach recommends that before planning technically complex projects, a series of experiments be carried out in order to make sure that the technological stack is properly understood and there are no major surprises. Actually, this is exactly what we did the first week - we looked at different libraries for recording video and checked them on our data. But even a carefully laid straw can not always protect from surprises at a later development time. For us, such surprises were the opportunity to pause the video, packet loss and other small nuances. As it turned out, writing a video call is much more difficult than TCP stream from the camera.

The recording of video calls created by us made a favorable impression on the first customers who tried, and they have already started shipping their ideas on “improvement and expansion” to us. Let's see how the created architecture turned out to be adequate to the task: I hope that in half a year we will not write an article “how we remade the video recording” :). We also plan to do the conversion in the near future not only in the “native” container, but immediately in both. The load on computing power, of course, will increase, but clients will be able to play recorded video in any modern browser.

A sample of the recorded video call, a 10 megabyte file can be downloaded here :

Source: https://habr.com/ru/post/271921/


All Articles