How Chrome and Firefox agree to transfer two video streams
Among the pitfalls of WebRTC, one is special. This is how browsers negotiate the transfer of media streams among themselves. Codecs, bitrates, video resolution - the whole story. Code media stream one - all is well. But when there are two of them (and video with sound is, for a moment, two media streams: one for video, the other for sound), the opinions of browsers on the format of describing the situation are sharply divided. Making a Chrome video call in Firefox is pretty easy. But the video call with sound is already gone. There is a small story under the cut, why it was so moved that they washed it in the new Safari and what a special way Microsoft Edge has.
Combine on the field of voice and video calls
WebRTC is a combine. A lot of protocols and different JavaScript API under one name that does different things:
Capture video from the camera and / or voice from the microphone.
Encoding and decoding with different codecs supported by the browser.
Establish Peer-to-Peer connections between browsers using the ICE approach and the specified servers. STUN servers to study the network topology and TURN servers, if it was not possible to break through NAT and need to connect through an external server.
Transfer video and audio over the network. In addition, the analysis of the width of the channel and adjusting the bit rate of the codec under it.
Reproduction received.
Data transfer in UDP or TCP style.
Screen Sharing.
The hardest part of this story is to establish a Peer-to-Peer connection. If this is not local communication between tabs, devices are not on the same network, or they do not have real IP addresses with open ports, then we need some intermediate servers to “agree”. Usually these servers are raised by a developer who wants to use WebRTC. With the exception of STUN, the echo servers that answer the question “what is my public IP” are public from Google.
Depending on what the developer intends to transmit: voice, video, or arbitrary data, a Peer-to-Peer connection is established. WebRTC creates “offer”, “answer” and “ice candidate” text packages that the developer must somehow transmit between connecting browsers to each other (usually via their own signaling server). In these packages, both browsers describe their capabilities and what will happen, and WebRTC tries to choose the best connection method. ')
SDP legacy telephony
Packages that WebRTC exchange with developer hands use the SDP format. It is very old, text, came from telephony (WebRTC tries to minimize the developer’s efforts when calling from the browser to the telephone networks and back) and is similar to HTTP. This is what the SDP package looks like: “this browser wants to establish a Peer-to-Peer connection to another browser, but does not yet know what it will transmit over the network.”
If the developer wants to start / finish transferring data, voice or video, then WebRTC immediately requires it to “renegotiation” - restart the Peer-to-Peer connection in order to check the optimality of the network route for the transmitted data and clarify the codecs. This is what the SDP packet looks like, in which WebRTC announces the desire to transmit video:
WebRTC has been with us for many years and is still in beta status. Recently, the JavaScript API has been completely rewritten from callbacks to promises, work has changed with voice and video streams, Microsoft has scrapped alternative API "oRTC". A lot of interesting things happened. And the format for describing media streams in the SDP-package has changed. For many years, the used “Plan B” with a hierarchical structure was deprecated and replaced with the “Unified Plan”, in which each stream was set in a separate section in the SDP package. Compare.
When it comes to beta versions of web technologies, their implementation in browsers sometimes varies greatly and may be behind the current version of the standard for years. It happened with WebRTC. Many years ago, Google Chrome made support for several media tracks in Plan B format and has not yet changed the implementation to Unified Plan. The corresponding ticket was opened a couple of years ago, the developers discuss how important this is and reassign the ticket to each other, but things are still there. In Firefox, which is typical, only Unified Plan is implemented, so without problems you can only communicate with one media track: voice or video without sound. Need more? Welcome to the world of adapters and polifilov!
Microsoft Edge, which initially supports only its own implementation of the “oRTC” API, has added support for the WebRTC API and Unified Plan in recent versions. In Safari, WebRTC support will only be in the next version, the beta of which is already available to developers . And, sadly, Plan B. Because it was made on the basis of Chromium.
How to make cross-browser calls?
As we can see, Chrome, the most popular browser, has remained with the outdated Plan B format. Safari is also there, the mobile version of which lives in the iPhone. Firefox and the new Microsoft Edge with the new "Unified Plan".
For voice or video without audio, this does not play any role, but in the case of several media tracks, you will have to manually modify the SDP or use the adapter . I really hope that sooner or later all browsers will switch to Unified Plan. But for now, the harsh reality is that most of the Desktop and the vast majority of Mobile browsers support Plan B, and for compatibility with Firefox and Edge you will have to add code. And a lot of debugging.