📜 ⬆️ ⬇️

How Chrome and Firefox agree to transfer two video streams


Among the pitfalls of WebRTC, one is special. This is how browsers negotiate the transfer of media streams among themselves. Codecs, bitrates, video resolution - the whole story. Code media stream one - all is well. But when there are two of them (and video with sound is, for a moment, two media streams: one for video, the other for sound), the opinions of browsers on the format of describing the situation are sharply divided. Making a Chrome video call in Firefox is pretty easy. But the video call with sound is already gone. There is a small story under the cut, why it was so moved that they washed it in the new Safari and what a special way Microsoft Edge has.

Combine on the field of voice and video calls


WebRTC is a combine. A lot of protocols and different JavaScript API under one name that does different things:


The hardest part of this story is to establish a Peer-to-Peer connection. If this is not local communication between tabs, devices are not on the same network, or they do not have real IP addresses with open ports, then we need some intermediate servers to “agree”. Usually these servers are raised by a developer who wants to use WebRTC. With the exception of STUN, the echo servers that answer the question “what is my public IP” are public from Google.

Depending on what the developer intends to transmit: voice, video, or arbitrary data, a Peer-to-Peer connection is established. WebRTC creates “offer”, “answer” and “ice candidate” text packages that the developer must somehow transmit between connecting browsers to each other (usually via their own signaling server). In these packages, both browsers describe their capabilities and what will happen, and WebRTC tries to choose the best connection method.
')

SDP legacy telephony


Packages that WebRTC exchange with developer hands use the SDP format. It is very old, text, came from telephony (WebRTC tries to minimize the developer’s efforts when calling from the browser to the telephone networks and back) and is similar to HTTP. This is what the SDP package looks like: “this browser wants to establish a Peer-to-Peer connection to another browser, but does not yet know what it will transmit over the network.”

If the developer wants to start / finish transferring data, voice or video, then WebRTC immediately requires it to “renegotiation” - restart the Peer-to-Peer connection in order to check the optimality of the network route for the transmitted data and clarify the codecs. This is what the SDP packet looks like, in which WebRTC announces the desire to transmit video:

Hidden text
type: offer, sdp: v=0
o=- 6268223368571881674 2 IN IP4 127.0.0.1
s=-
t=0 0
m=video 9 UDP/TLS/RTP/SAVPF 96 98 100 102 127 97 99 101 125
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:Q64h
a=ice-pwd:UPO8gbng2uE2JsOt2pB163Df
a=fingerprint:sha-256 F7:FB:E8:90:A3:DE:F8:2E:02:70:30:D8:2E:19:02:61:A9:E0:FD:8E:E9:D5:EB:D9:65:20:32:B0:CF:35:21:2C
a=setup:actpass
a=mid:video
a=extmap:1 urn:ietf:params:rtp-hdrext:toffset
a=extmap:2 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:3 urn:3gpp:video-orientation
a=extmap:4 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:5 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=sendrecv
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtpmap:98 VP9/90000
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtpmap:100 H264/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=fmtp:100 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:102 red/90000
a=rtpmap:127 ulpfec/90000
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
a=rtpmap:125 rtx/90000
a=fmtp:125 apt=102
a=ssrc-group:FID 1732143492 2116247900
a=ssrc:1732143492 cname:Vsw2XRlFtKOgvIT7
a=ssrc:1732143492 msid:986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0 f2e5b805-3a98-4ab1-9cc2-df2694fcc9a1
a=ssrc:1732143492 mslabel:986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0
a=ssrc:1732143492 label:f2e5b805-3a98-4ab1-9cc2-df2694fcc9a1
a=ssrc:2116247900 cname:Vsw2XRlFtKOgvIT7
a=ssrc:2116247900 msid:986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0 f2e5b805-3a98-4ab1-9cc2-df2694fcc9a1
a=ssrc:2116247900 mslabel:986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0
a=ssrc:2116247900 label:f2e5b805-3a98-4ab1-9cc2-df2694fcc9a1
view raw sdp1 hosted with ❤ by GitHub

Rapidly changing standard


WebRTC has been with us for many years and is still in beta status. Recently, the JavaScript API has been completely rewritten from callbacks to promises, work has changed with voice and video streams, Microsoft has scrapped alternative API "oRTC". A lot of interesting things happened. And the format for describing media streams in the SDP-package has changed. For many years, the used “Plan B” with a hierarchical structure was deprecated and replaced with the “Unified Plan”, in which each stream was set in a separate section in the SDP package. Compare.

It was:

Hidden text
type: offer, sdp: v=0
o=- 6268223368571881674 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video
a=msid-semantic: WMS 986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0
m=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:Q64h
a=ice-pwd:UPO8gbng2uE2JsOt2pB163Df
a=fingerprint:sha-256 F7:FB:E8:90:A3:DE:F8:2E:02:70:30:D8:2E:19:02:61:A9:E0:FD:8E:E9:D5:EB:D9:65:20:32:B0:CF:35:21:2C
a=setup:actpass
a=mid:audio
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=sendrecv
a=rtcp-mux
a=rtpmap:111 opus/48000/2
a=rtcp-fb:111 transport-cc
a=fmtp:111 minptime=10;useinbandfec=1
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:9 G722/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:110 telephone-event/48000
a=rtpmap:112 telephone-event/32000
a=rtpmap:113 telephone-event/16000
a=rtpmap:126 telephone-event/8000
a=ssrc:1333610373 cname:Vsw2XRlFtKOgvIT7
a=ssrc:1333610373 msid:986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0 3ff337f5-9edc-494d-a5c1-6da2ce9ed142
a=ssrc:1333610373 mslabel:986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0
a=ssrc:1333610373 label:3ff337f5-9edc-494d-a5c1-6da2ce9ed142
m=video 9 UDP/TLS/RTP/SAVPF 96 98 100 102 127 97 99 101 125
c=IN IP4 0.0.0.0
a=rtcp:9 IN IP4 0.0.0.0
a=ice-ufrag:Q64h
a=ice-pwd:UPO8gbng2uE2JsOt2pB163Df
a=fingerprint:sha-256 F7:FB:E8:90:A3:DE:F8:2E:02:70:30:D8:2E:19:02:61:A9:E0:FD:8E:E9:D5:EB:D9:65:20:32:B0:CF:35:21:2C
a=setup:actpass
a=mid:video
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:4 urn:3gpp:video-orientation
a=extmap:5 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=sendrecv
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:96 VP8/90000
a=rtcp-fb:96 ccm fir
a=rtcp-fb:96 nack
a=rtcp-fb:96 nack pli
a=rtcp-fb:96 goog-remb
a=rtcp-fb:96 transport-cc
a=rtpmap:98 VP9/90000
a=rtcp-fb:98 ccm fir
a=rtcp-fb:98 nack
a=rtcp-fb:98 nack pli
a=rtcp-fb:98 goog-remb
a=rtcp-fb:98 transport-cc
a=rtpmap:100 H264/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 nack pli
a=rtcp-fb:100 goog-remb
a=rtcp-fb:100 transport-cc
a=fmtp:100 level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f
a=rtpmap:102 red/90000
a=rtpmap:127 ulpfec/90000
a=rtpmap:97 rtx/90000
a=fmtp:97 apt=96
a=rtpmap:99 rtx/90000
a=fmtp:99 apt=98
a=rtpmap:101 rtx/90000
a=fmtp:101 apt=100
a=rtpmap:125 rtx/90000
a=fmtp:125 apt=102
a=ssrc-group:FID 1732143492 2116247900
a=ssrc:1732143492 cname:Vsw2XRlFtKOgvIT7
a=ssrc:1732143492 msid:986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0 f2e5b805-3a98-4ab1-9cc2-df2694fcc9a1
a=ssrc:1732143492 mslabel:986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0
a=ssrc:1732143492 label:f2e5b805-3a98-4ab1-9cc2-df2694fcc9a1
a=ssrc:2116247900 cname:Vsw2XRlFtKOgvIT7
a=ssrc:2116247900 msid:986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0 f2e5b805-3a98-4ab1-9cc2-df2694fcc9a1
a=ssrc:2116247900 mslabel:986f56f0-4d6e-49fa-8b01-1cb6e8bbd6d0
a=ssrc:2116247900 label:f2e5b805-3a98-4ab1-9cc2-df2694fcc9a1
view raw sdp2 hosted with ❤ by GitHub

It became:

Hidden text
v=0
o=mozilla...THIS_IS_SDPARTA-55.0 5576035894611904766 0 IN IP4 0.0.0.0
s=-
t=0 0
a=sendrecv
a=fingerprint:sha-256 98:7E:DA:2C:ED:DC:F9:CE:ED:CC:43:2F:58:36:AC:BE:17:B4:E2:84:69:60:91:38:11:9D:2B:0C:F8:12:FF:0B
a=group:BUNDLE sdparta_0 sdparta_1
a=ice-options:trickle
a=msid-semantic:WMS *
m=audio 47296 UDP/TLS/RTP/SAVPF 109 9 0 8 101
c=IN IP4 95.213.228.4
a=candidate:0 1 UDP 2122121471 192.168.15.145 1038 typ host
a=candidate:7 1 UDP 2122252543 172.29.0.1 1039 typ host
a=candidate:14 1 UDP 2122187007 192.168.80.1 1040 typ host
a=candidate:21 1 TCP 2105377023 192.168.15.145 58157 typ host tcptype passive
a=candidate:21 1 TCP 2105393407 192.168.15.145 9 typ host tcptype active
a=candidate:26 1 TCP 2105508095 172.29.0.1 57536 typ host tcptype passive
a=candidate:26 1 TCP 2105524479 172.29.0.1 9 typ host tcptype active
a=candidate:31 1 TCP 2105442559 192.168.80.1 57467 typ host tcptype passive
a=candidate:31 1 TCP 2105458943 192.168.80.1 9 typ host tcptype active
a=candidate:0 2 UDP 2122121470 192.168.15.145 1041 typ host
a=candidate:7 2 UDP 2122252542 172.29.0.1 1042 typ host
a=candidate:14 2 UDP 2122187006 192.168.80.1 1043 typ host
a=candidate:21 2 TCP 2105377022 192.168.15.145 53990 typ host tcptype passive
a=candidate:21 2 TCP 2105393406 192.168.15.145 9 typ host tcptype active
a=candidate:26 2 TCP 2105508094 172.29.0.1 59650 typ host tcptype passive
a=candidate:26 2 TCP 2105524478 172.29.0.1 9 typ host tcptype active
a=candidate:31 2 TCP 2105442558 192.168.80.1 54837 typ host tcptype passive
a=candidate:31 2 TCP 2105458942 192.168.80.1 9 typ host tcptype active
a=candidate:1 1 UDP 1685921791 195.91.179.50 1038 typ srflx raddr 192.168.15.145 rport 1038
a=candidate:4 1 UDP 92085759 95.213.228.4 47296 typ relay raddr 95.213.228.4 rport 47296
a=candidate:6 1 UDP 92085247 95.213.228.4 53983 typ relay raddr 95.213.228.4 rport 53983
a=candidate:22 1 TCP 1669160959 195.91.179.50 58157 typ srflx raddr 192.168.15.145 rport 58157 tcptype passive
a=candidate:23 1 UDP 8200191 95.213.228.4 48383 typ relay raddr 95.213.228.4 rport 48383
a=candidate:25 1 UDP 8200191 95.213.228.4 55022 typ relay raddr 95.213.228.4 rport 55022
a=candidate:1 2 UDP 1685921790 195.91.179.50 1041 typ srflx raddr 192.168.15.145 rport 1041
a=candidate:4 2 UDP 92085758 95.213.228.4 33164 typ relay raddr 95.213.228.4 rport 33164
a=candidate:6 2 UDP 92085246 95.213.228.4 55111 typ relay raddr 95.213.228.4 rport 55111
a=candidate:22 2 TCP 1669160958 195.91.179.50 53990 typ srflx raddr 192.168.15.145 rport 53990 tcptype passive
a=candidate:23 2 UDP 8200190 95.213.228.4 47176 typ relay raddr 95.213.228.4 rport 47176
a=candidate:25 2 UDP 8200190 95.213.228.4 45231 typ relay raddr 95.213.228.4 rport 45231
a=sendrecv
a=end-of-candidates
a=extmap:1/sendonly urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=fmtp:109 maxplaybackrate=48000;stereo=1;useinbandfec=1
a=fmtp:101 0-15
a=ice-pwd:e01708a35bd9fb67502a714e51a11644
a=ice-ufrag:ac97477f
a=mid:sdparta_0
a=msid:{89226294-ff67-4c59-aa43-6d55e4eeabeb} {09b10fc1-6364-4e6e-b96d-b6a33377a5c8}
a=rtcp:33164 IN IP4 95.213.228.4
a=rtcp-mux
a=rtpmap:109 opus/48000/2
a=rtpmap:9 G722/8000/1
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=setup:actpass
a=ssrc:2414570955 cname:{2d684c3d-4555-45a3-a880-030771197dc8}
m=video 47296 UDP/TLS/RTP/SAVPF 120 121 126 97
c=IN IP4 95.213.228.4
a=candidate:0 1 UDP 2122121471 192.168.15.145 1044 typ host
a=candidate:7 1 UDP 2122252543 172.29.0.1 1046 typ host
a=candidate:14 1 UDP 2122187007 192.168.80.1 1048 typ host
a=candidate:21 1 TCP 2105377023 192.168.15.145 53513 typ host tcptype passive
a=candidate:21 1 TCP 2105393407 192.168.15.145 9 typ host tcptype active
a=candidate:26 1 TCP 2105508095 172.29.0.1 63978 typ host tcptype passive
a=candidate:26 1 TCP 2105524479 172.29.0.1 9 typ host tcptype active
a=candidate:31 1 TCP 2105442559 192.168.80.1 55068 typ host tcptype passive
a=candidate:31 1 TCP 2105458943 192.168.80.1 9 typ host tcptype active
a=candidate:0 2 UDP 2122121470 192.168.15.145 1049 typ host
a=candidate:7 2 UDP 2122252542 172.29.0.1 1050 typ host
a=candidate:14 2 UDP 2122187006 192.168.80.1 1051 typ host
a=candidate:21 2 TCP 2105377022 192.168.15.145 50228 typ host tcptype passive
a=candidate:21 2 TCP 2105393406 192.168.15.145 9 typ host tcptype active
a=candidate:26 2 TCP 2105508094 172.29.0.1 60348 typ host tcptype passive
a=candidate:26 2 TCP 2105524478 172.29.0.1 9 typ host tcptype active
a=candidate:31 2 TCP 2105442558 192.168.80.1 58818 typ host tcptype passive
a=candidate:31 2 TCP 2105458942 192.168.80.1 9 typ host tcptype active
a=sendrecv
a=extmap:1 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=fmtp:126 profile-level-id=42e01f;level-asymmetry-allowed=1;packetization-mode=1
a=fmtp:97 profile-level-id=42e01f;level-asymmetry-allowed=1
a=fmtp:120 max-fs=12288;max-fr=60
a=fmtp:121 max-fs=12288;max-fr=60
a=ice-pwd:e01708a35bd9fb67502a714e51a11644
a=ice-ufrag:ac97477f
a=mid:sdparta_1
a=msid:{89226294-ff67-4c59-aa43-6d55e4eeabeb} {95959808-b921-4070-a3e2-4bbadd7bc9b2}
a=rtcp:1050 IN IP4 172.29.0.1
a=rtcp-fb:120 nack
a=rtcp-fb:120 nack pli
a=rtcp-fb:120 ccm fir
a=rtcp-fb:120 goog-remb
a=rtcp-fb:121 nack
a=rtcp-fb:121 nack pli
a=rtcp-fb:121 ccm fir
a=rtcp-fb:121 goog-remb
a=rtcp-fb:126 nack
a=rtcp-fb:126 nack pli
a=rtcp-fb:126 ccm fir
a=rtcp-fb:126 goog-remb
a=rtcp-fb:97 nack
a=rtcp-fb:97 nack pli
a=rtcp-fb:97 ccm fir
a=rtcp-fb:97 goog-remb
a=rtcp-mux
a=rtpmap:120 VP8/90000
a=rtpmap:121 VP9/90000
a=rtpmap:126 H264/90000
a=rtpmap:97 H264/90000
a=setup:actpass
a=ssrc:3311343959 cname:{2d684c3d-4555-45a3-a880-030771197dc8}
view raw sdp3 hosted with ❤ by GitHub


Chrome vs Firefox vs Edge vs Safari


When it comes to beta versions of web technologies, their implementation in browsers sometimes varies greatly and may be behind the current version of the standard for years. It happened with WebRTC. Many years ago, Google Chrome made support for several media tracks in Plan B format and has not yet changed the implementation to Unified Plan. The corresponding ticket was opened a couple of years ago, the developers discuss how important this is and reassign the ticket to each other, but things are still there. In Firefox, which is typical, only Unified Plan is implemented, so without problems you can only communicate with one media track: voice or video without sound. Need more? Welcome to the world of adapters and polifilov!

Microsoft Edge, which initially supports only its own implementation of the “oRTC” API, has added support for the WebRTC API and Unified Plan in recent versions. In Safari, WebRTC support will only be in the next version, the beta of which is already available to developers . And, sadly, Plan B. Because it was made on the basis of Chromium.

How to make cross-browser calls?


As we can see, Chrome, the most popular browser, has remained with the outdated Plan B format. Safari is also there, the mobile version of which lives in the iPhone. Firefox and the new Microsoft Edge with the new "Unified Plan".

For voice or video without audio, this does not play any role, but in the case of several media tracks, you will have to manually modify the SDP or use the adapter . I really hope that sooner or later all browsers will switch to Unified Plan. But for now, the harsh reality is that most of the Desktop and the vast majority of Mobile browsers support Plan B, and for compatibility with Firefox and Edge you will have to add code. And a lot of debugging.

Picture to Kata taken from here.

Source: https://habr.com/ru/post/334498/


All Articles