We are promised a real-time video without friezes and twitching.
Every time I launch Skype, Zoom or Hangouts, I am looking forward to a fresh batch of jambs with video and sound. Technology rarely disappoints me: croaking, background noises, loss of voice, video splitting into “squares”, frozen frames and other joys of video conferencing haunt video calls, as far as I can remember. The interest is in many ways professional: in addition to programmable telephony for regular phones, web pages and mobile applications, we ship video to Voximplant. I want Full HD, in real time, without friezes, in any browser and a conference for 50 people. Interestingly, in the laboratory it works like this. But in some park on 3G, a video consultation with a doctor can turn into a step-by-step strategy: the packages are lost! A modern technology stack does not yet allow on an equal footing to fight the “blinking” Internet, but research is constantly being conducted. Under the cut - adapted for Habr translation about Salsify : a video codec fusion and a network protocol that minimizes problems when transmitting video in real time.
A team from Stanford conducted an experiment: replaced the entire patchwork of modern video conferencing technologies with a single compression and network transfer protocol.
Video conferencing: alliglag, fffffreeze and twitching
After a while, the problems go away by themselves. Sometimes - along with the image, leaving instead a black screen. The delivered troubles live in the range of “wait a couple of minutes, the grid blinks” to “the tele-operation can be completed, the patient has died”. Scientists from Stanford approached the problem fundamentally, having developed from scratch both the network stack, and the codec, and the data transfer method with the sole purpose: to do better than Skype, FaceTime, Hangouts and Chrome + WebRTC.
Stanford graduate student Sajjad Foladi, heading the study, presented the results at the NSDI'18 profile conference. The ideas underlying the solution "from scratch" are available to everyone and can be used in commercial solutions. Of course, if someone wants to replace the entire stack.
')
“Video transmission over the Internet has evolved for decades. Now the technology stack is more like a quilt, ”says computer science associate professor Keith Winstein . “Sajjad showed how you can assemble these pieces in a different way to get better quality videos with less delay.”
But about the timing of the introduction of the Winstein more cautious. “Now we are thinking of changes so that one day the transmission of live video will become more reliable. It will be very useful in telemedicine and robotic operations, ”he says. "But in the software that is used now, all these changes are difficult to make."
New approach, new name
The Stanford team called its framework “Salsify” (Kozlobornik, such a “flower”, remotely resembling a dandelion in his youth - the note of the translator). The framework solves the problem caused by the fact that “real-time video transmission” is now made of two different technologies. This is a “codec” that compresses video and a “network protocol”, which transmits small pieces of data over the network and tries to guess when it is necessary to send the next pieces so that it is not thrown out anywhere along the way because the network is overloaded and everything is bad. The problem is that these two components evolved separately from each other, often by different companies, and then were combined in products such as Skype or FaceTime.
Folady is sure: to solve the problem with friezes and lags, the codec and the network stack must work together. After all, it is important not just to send a packet over the network. You need the correct data in this package! And not a piece of video 3 seconds ago, which will still be thrown out on the receiving side as "too old." As the project manager says, “when the transport protocol and the codec lose synchronization, problems begin”. Therefore, the team has made a new codec, which is most integrated with the transport protocol. One algorithm controls the compression of video frames, the formation of network packets and their sending. Thus, the video stream "knows" about the state of the network in real time and tries to "fit" into it as far as possible.
Even a single frame sent non-temporarily can lead to jerks and friezes. Salsify will never send a frame if it can lead to network problems.
See and believe
The researchers conducted many tests comparing Salsify with Microsoft Skype, Google Hangouts, Apple FaceTime, and Google Chrome + WebRTC. On average, Salsify reduces the delay by four times (!!!), and the image quality becomes 60% better (by the method of changing structural similarity, SSIM). Ready side-by-side comparison with Chrome 65 WebRTC and made a separate website dedicated to the project. Open source project: you can download, learn, use the developments.
Everyone has problems with video conferencing. It's very cool to work on a project that aims to make a difference.