📜 ⬆️ ⬇️

A little investigation: how YouTube uses WebRTC for streaming


WebRTC is a JavaScript API in modern video call browsers. And also for voice calls, sharing the screen, breaking through NAT, disclosing the local address and other interesting pieces. Over the past couple of years, major players are starting to switch from proprietary APIs and browser extensions to WebRTC: Skype for Web works with it, partially Hangouts, and now YouTube capabilities via Broadcast directly from the browser. So far only from chrome and with a five-second delay - but the trouble is great beginning. Under the cut, we offer an adapted for Habr translation of a detective story, where WebRTC experts parse the YouTube client code and tell us what and how the Google developers did.

Last Thursday. Logging into your YouTube account, I found a new camera icon with a “Go Live” hint in the upper-right corner (translator's note: apparently, it’s not rolled out for all users. YouTube Red subscribers have comments in the comments). Naturally, I immediately clicked it, and it seems that now we can stream directly from the browser . It smacked of WebRTC, so I habitually opened chrome: // webrtc-internals / - and yes, it was WebRTC. As developers, we have always been interested in large-scale use of technology, so I immediately contacted Master Reverse Philip Fip Hankel and asked him to delve into the insides of YouTube. Then we can get acquainted with the results of his work.


The Chrome service page, webrtc-internals , served us well in the distant 2014, when we learned how the Hangouts works , and nothing prevented us from using it again. Since the new registration on YouTube is not available to broadcasters within 24 hours, we took advantage of a dump kindly provided by Tsakhi Levent-Levy (translator's note: yes, the same Tsakhi who spoke with us at Intercom and whom we regularly translate). You can use this tool to upload a dump to Chrome and see what is happening through the eyes of WebRTC.

Judging by what we saw, the new feature YouTube uses WebRTC only on the client side to capture the video camera stream. And from the server they have something of their own. What does it mean? So not realtime. Although our long-time and good friend Chris Cranky says that the delay is less than five seconds . We really expect him to pull out some interesting technical details.
')
In the meantime, delve into the technical details that we were able to pull out ...

GetUserMedia calls


After importing the dump, at the very beginning we see the JavaScript API calls getUserMedia that YouTube makes. Calls show that the service modestly wants the camera in 1080p resolution:


And they make a separate call getUserMedia to get a microphone.

In this screenshot, you can not see the very first call getUserMedia , which requests the camera and microphone at once, so that the user can see only one browser confirmation window instead of two.

Calls RTCPeerConnection


After examining the getUserMedia calls, you can proceed to the RTCPeerConnection calls. If you want to learn more about WebRTC, I recommend reading the results of the previous study " How Hangouts Works " or more general information about the webrtc-internals on our TestRTC blog .



ICE Servers


The log shows that the RTCPeerConnection object was created with an empty list of ICE servers (translator's note: it’s not surprising that this only works in Chrome so far. A hedgehog would not allow such an object to be created at all).

{ iceServers: [], iceTransportPolicy: all, bundlePolicy: balanced, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 } 

Further, it will be clear why TURN servers are not needed for this use case (translator's note: ICE is a “framework”; text instructions on how to do peer-to-peer with sad IP addresses 192.168 ..., TURN servers in the framework are not the most important The most important thing is STUN servers that answer the fundamental question “what is my external IP address?”. Without specifying at least one STUN server, most WebRTC implementations simply will not work).

The client then adds the MediaStream using the addSteam API. It's funny that this API is depricated. It is strange that the authors do not use the new addTrack API, which is available starting from the 64th version of Google Chrome, and in older versions - using the polyphile adapter.js

Alarm and setLocalDescription


After creating the RTCPeerConnection object , the client creates a WebRTC "offer" with a list of all audio and video codecs available to Chrome. Offer without modifications is set as a description of the local endpoint using setLocalDescription . By the way, the lack of modifications means that simulcast (simultaneous broadcasting of several streams with different video quality, allows not to recode everything on the server, reduces delays and workload) is not used.

In accordance with the logic of WebRTC, after calling setLocalDescription, Chrome offers several “candidates” - options for how a remote computer can try to connect to a local one. Most likely they are not used, since the client (Chrome) will connect to the server (YouTube backend).



Update : Finding the signaling server and the protocol used was not very difficult. The filter for the keyword "realtimemediaservice" of the Chrome web log shows us the HTTP request and the response to it. No tricky schemes, trickle-ice optimizations for connection setup and other magic, everything is as simple as possible.

setRemoteDescription


The next step is to call the setRemoteDescription based on the information received from the server. Where, as we remember, WebRTC is not used. And here everything becomes interesting! The SDP used in the setRemoteDescription looks like Chrome or a WebRTC library with a full list of codecs at the ready, made it on the other side. And we know for sure that YouTube does not use “ice-lite”, as Hangouts does.

The H.264 codec received from the SDP server side is indicated as preferred (the number 102 , see here if you are interested in how the SDP text packets are arranged):

 m=video 9 UDP/TLS/RTP/SAVPF 102 96 97 98 99 123 108 109 124 


The study of statistics (partially displayed after the dump is loaded) confirms that the H.264 codec is used, who are curious can search the dump for the keyword "send-googCodecName".

In addition to the SDP response, the server sends several candidates to Chrome to establish a connection:

 a=candidate:3757856892 1 udp 2113939711 2a00:1450:400c:c06::7f 19305 typ host generation 0 network-cost 50 a=candidate:1687053168 1 tcp 2113939711 2a00:1450:400c:c06::7f 19305 typ host tcptype passive generation 0 network-cost 50 a=candidate:1545990220 1 ssltcp 2113939711 2a00:1450:400c:c06::7f 443 typ host generation 0 network-cost 50 a=candidate:4158478555 1 udp 2113937151 66.102.1.127 19305 typ host generation 0 network-cost 50 a=candidate:1286562775 1 tcp 2113937151 66.102.1.127 19305 typ host tcptype passive generation 0 network-cost 50 a=candidate:3430656991 1 ssltcp 2113937151 66.102.1.127 443 typ host generation 0 network-cost 50 

We can observe IPv4 and IPv6 UDP candidates, “ICE-TCP” candidates (yes, in case of drought, WebRTC can walk on TCP, although it doesn’t like to do that) and are pre-installed for SSL-TCP Chrome, which we used to see in Hangouts . In this scenario, the TURN server will not improve the chances to establish a connection, since in both cases it will be Chrome's connection to the real IP address. Apparently, therefore TURN server is not used.

Codecs


There is no simulcast. What, in fact, is expected: there is no H264-simulcast codec in chrome. But there is a bug report with a sad lack of feedback . In general, H.264 is a reasonable choice: the encoding side can use a video card to facilitate the process, and most players can play this format without transcoding.

However, it will not be possible to do without transcoding, since the server will have to create streams with a lower bitrate and resolution for “weak” clients without simulcast. Most likely, YouTube already has the transcoding function, as part of the infrastructure that they have been using for streaming for a long time.

WebRTC statistics


Statistics alone do not reveal anything new. The most interesting chart is “picture loss indications”, PLI - data sent by the server (from the translator: WebRTC statistics is interesting because at each end of the connection both local statistics are collected and the remote one is received. We wrote about this last week):

image


pliCount increases every 10 seconds and, accordingly, every 10 seconds the client sends a keyframe (keyframe) to the server. Perhaps this is done to make it easier for YouTube servers to record or transcode video.

Total


YouTube uses WebRTC as a user-friendly way to get streaming video from a camera. Most likely, this will not affect professional streamers with expensive tuned rigs, but the entry barrier for beginners will be significantly reduced.

Unfortunately, the feature does not work in Firefox. This is one example of Google launching solutions that only work in Chrome. Nils Ochlmeyer from Mozilla tried to make it work by faking the user agent, but was faced with using the obsolete API registerElement in JavaScript. However, from the point of view of WebRTC, everything should work, so we will return to this issue after fixing the frontend bugs.

Update Unfortunately, an additional study showed that the JavaScript code of this feature also uses the webkitRTCPeerConnection API instead of the modern RTCPeerConection . We look forward to removing the prefix in Chrome.

Source: https://habr.com/ru/post/358222/


All Articles