WebRTC is a JavaScript API in modern video call browsers. And also for voice calls, sharing the screen, breaking through NAT, disclosing the local address and other interesting pieces. Over the past couple of years, major players are starting to switch from proprietary APIs and browser extensions to WebRTC: Skype for Web works with it, partially Hangouts, and now YouTube capabilities via Broadcast directly from the browser. So far only from chrome and with a five-second delay - but the trouble is great beginning. Under the cut, we offer an adapted for Habr translation of a detective story, where WebRTC experts parse the YouTube client code and tell us what and how the Google developers did.
Last Thursday. Logging into your YouTube account, I found a new camera icon with a “Go Live” hint in the upper-right corner (translator's note: apparently, it’s not rolled out for all users. YouTube Red subscribers have comments in the comments). Naturally, I immediately clicked it, and it seems that now we can
stream directly from the browser . It smacked of WebRTC, so I habitually opened
chrome: // webrtc-internals / - and yes, it was WebRTC. As developers, we have always been interested in large-scale use of technology, so I immediately contacted Master Reverse
Philip Fip Hankel and asked him to delve into the insides of YouTube. Then we can get acquainted with the results of his work.
The Chrome service page,
webrtc-internals , served us well in the distant 2014, when we learned
how the Hangouts works , and nothing prevented us from using it again. Since the new registration on YouTube is not available to broadcasters within 24 hours, we took advantage of a
dump kindly provided by Tsakhi Levent-Levy (translator's note: yes, the same Tsakhi who spoke with us at
Intercom and whom we regularly translate). You can use
this tool to upload a dump to Chrome and see what is happening through the eyes of WebRTC.
Judging by what we saw, the new feature YouTube uses WebRTC only on the client side to capture the video camera stream. And from the server they have something of their own. What does it mean? So not realtime. Although our long-time and good friend Chris Cranky says that the delay is
less than five seconds . We really expect him to pull out some interesting technical details.
')
In the meantime, delve into the technical details that we were able to pull out ...
GetUserMedia calls
After importing the dump, at the very beginning we see the JavaScript API calls
getUserMedia that YouTube makes. Calls show that the service modestly wants the camera in 1080p resolution:
And they make a separate call
getUserMedia to get a microphone.
In this screenshot, you can not see the very first call
getUserMedia , which requests the camera and microphone at once, so that the user can see only one browser confirmation window instead of two.
Calls RTCPeerConnection
After examining the
getUserMedia calls, you can proceed to the
RTCPeerConnection calls. If you want to learn more about WebRTC, I recommend reading the results of the previous study "
How Hangouts Works " or more general information about the webrtc-internals on our
TestRTC blog .
ICE Servers
The log shows that the
RTCPeerConnection object
was created with an empty list of ICE servers (translator's note: it’s not surprising that
this only works in Chrome so far. A
hedgehog would not allow such an object to be created at all).
{ iceServers: [], iceTransportPolicy: all, bundlePolicy: balanced, rtcpMuxPolicy: require, iceCandidatePoolSize: 0 }
Further, it will be clear why TURN servers are not needed for this use case (translator's note: ICE is a “framework”; text instructions on how to do peer-to-peer with sad IP addresses 192.168 ..., TURN servers in the framework are not the most important The most important thing is STUN servers that answer the fundamental question “what is my external IP address?”. Without specifying at least one STUN server, most WebRTC implementations simply will not work).
The client then adds the
MediaStream using the
addSteam API. It's funny that this API is depricated. It is strange that the authors do not use the new
addTrack API, which is available starting from the 64th version of Google Chrome, and in older versions - using the polyphile
adapter.jsAlarm and setLocalDescription
After creating the
RTCPeerConnection object
, the client creates a WebRTC "offer" with a list of all audio and video codecs available to Chrome. Offer without modifications is set as a description of the local endpoint using
setLocalDescription . By the way, the lack of modifications means that
simulcast (simultaneous broadcasting of several streams with different video quality, allows not to recode everything on the server, reduces delays and workload) is not used.
In accordance with the logic of WebRTC, after calling
setLocalDescription, Chrome offers several “candidates” - options for how a remote computer can try to connect to a local one. Most likely they are not used, since the client (Chrome) will connect to the server (YouTube backend).
Update : Finding the signaling server and the protocol used was not very difficult. The filter for the keyword "realtimemediaservice" of the Chrome web log shows us the HTTP request and the response to it. No tricky schemes, trickle-ice optimizations for connection setup and other magic, everything is as simple as possible.
setRemoteDescription
The next step is to call the
setRemoteDescription based on the information received from the server. Where, as we remember, WebRTC is not used. And here everything becomes interesting! The SDP used in the
setRemoteDescription looks like Chrome or a
WebRTC library with a full list of codecs at the ready, made it on the other side. And we know for sure that YouTube does not use “ice-lite”, as
Hangouts does.
The H.264 codec received from the SDP server side is indicated as preferred (the number
102 , see
here if you are interested in how the SDP text packets are arranged):
m=video 9 UDP/TLS/RTP/SAVPF 102 96 97 98 99 123 108 109 124
The study of statistics (partially displayed after the dump is loaded) confirms that the H.264 codec is used, who are curious can search the dump for the keyword "send-googCodecName".
In addition to the SDP response, the server sends several candidates to Chrome to establish a connection:
a=candidate:3757856892 1 udp 2113939711 2a00:1450:400c:c06::7f 19305 typ host generation 0 network-cost 50 a=candidate:1687053168 1 tcp 2113939711 2a00:1450:400c:c06::7f 19305 typ host tcptype passive generation 0 network-cost 50 a=candidate:1545990220 1 ssltcp 2113939711 2a00:1450:400c:c06::7f 443 typ host generation 0 network-cost 50 a=candidate:4158478555 1 udp 2113937151 66.102.1.127 19305 typ host generation 0 network-cost 50 a=candidate:1286562775 1 tcp 2113937151 66.102.1.127 19305 typ host tcptype passive generation 0 network-cost 50 a=candidate:3430656991 1 ssltcp 2113937151 66.102.1.127 443 typ host generation 0 network-cost 50
We can observe IPv4 and IPv6 UDP candidates, “ICE-TCP” candidates (yes, in case of drought, WebRTC can walk on TCP, although it doesn’t like to do that) and are pre-installed for SSL-TCP Chrome, which we used to see in Hangouts . In this scenario, the TURN server will not improve the chances to establish a connection, since in both cases it will be Chrome's connection to the real IP address. Apparently, therefore TURN server is not used.
Codecs
There is no simulcast. What, in fact, is expected: there is no H264-simulcast codec in chrome. But there is a bug report with a
sad lack of feedback . In general, H.264 is a reasonable choice: the encoding side can use a video card to facilitate the process, and most players can play this format without transcoding.
However, it will not be possible to do without transcoding, since the server will have to create streams with a lower bitrate and resolution for “weak” clients without simulcast. Most likely, YouTube already has the transcoding function, as part of the infrastructure that they have been using for streaming for a long time.
WebRTC statistics
Statistics alone do not reveal anything new. The most interesting chart is “picture loss indications”, PLI - data sent by the server (from the translator: WebRTC statistics is interesting because at each end of the connection both local statistics are collected and the remote one is received. We
wrote about this last week):
pliCount increases every 10 seconds and, accordingly, every 10 seconds the client sends a keyframe (keyframe) to the server. Perhaps this is done to make it easier for YouTube servers to record or transcode video.
Total
YouTube uses WebRTC as a user-friendly way to get streaming video from a camera. Most likely, this will not affect professional streamers with expensive tuned rigs, but the entry barrier for beginners will be significantly reduced.
Unfortunately, the feature does not work in Firefox. This is one example of Google launching solutions that only work in Chrome.
Nils Ochlmeyer from Mozilla tried to make it work by faking the user agent, but was faced with using the obsolete API
registerElement in JavaScript. However, from the point of view of WebRTC, everything should work, so we will return to this issue after fixing the frontend bugs.
Update Unfortunately, an additional study showed that the JavaScript code of this feature also uses the webkitRTCPeerConnection API instead of the modern
RTCPeerConection . We look forward to removing the
prefix in Chrome.