📜 ⬆️ ⬇️

How to gently get into the guts of WebRTC when transmitting voice and video

WebRTC is an interesting technology, but a bit confusing. First of all, the fact that this is not one technology, but a combine. Capture video from the camera and sound from the microphone. Establish a peer-to-peer connection between two browsers, with NAT piercing as much as possible. Transmission of audio and video on this connection, with the understanding that realtime data is being transmitted: codecs, bandwidth, frame loss, that's all. And finally, playback of the received in the window of another browser. Or not a browser, this is already going down. Oh yeah, still - realtime transmission of user data in the same way for games, sensors, and all that where tcp websocket lags are unacceptable. We at Voximplant are constantly digging into the guts of technology so that customers have high-quality sound and video in all cases, and not just for the local 100 megabyte. And we were very pleased to read last week an interesting article that tells how to dig in these guts properly. We also offer you to read the adapted translation, especially for Habr!

WebRTC 1.0 uses SDP to learn the capabilities of two connecting parties. Many people do not like using the protocol from the telephony of the 90s, but the cruel reality is that SDP will be with us for a long time . And if you want to get into the guts of WebRTC really deep: switch codecs, change the width of the data transfer channel, then you have to get your hands dirty in the SDP.

Recently, a conference on WebRTC was held in Boston. Nick Gauthier from MeetSpace told how he changed SDP and used other tricks to do a videoconference for 10 people. Without a single server, that is, each browser sent a stream of 9 other. Such tasks occur infrequently, but the ability to manually control the width of the WebRTC channel can be very useful. Video presentation can be seen here . And below I will tell you how he did all this.

Without our intervention, PeerConnection uses the entire available channel width to ensure the maximum video quality. Or sound. Which is very cool if video conferencing is the only thing your computer is doing right now. But what if you use GMail in parallel? Or do you have a mobile connection with a “floating” channel width? Or, like with us at MeetSpace, do you establish a 10-way connection and PeerConnection 's communicate with each other?
')
In this post, I want to show you how you can parse and modify SDP on the fly using JavaScript to set the maximum width of the channel used.

Where to modify SDP


First we need to get the SDP data. The very first SDP packet is created when the PeerConnection object creates an Offer , which you need to pass to the second connection negotiating side:

peerConnection.createOffer( function(offer) { console.debug("The offer SDP:", offer.sdp); peerConnection.setLocalDescription(offer); // your signaling code to communicate the offer goes here } ); 


What should be done? Modify SDP package before we give it to the second side. It is fortunate that WebRTC does not include “signaling” in the standard and the developer’s responsibility is to transfer Offer'ov between two connecting parties:

 peerConnection.createOffer( function(offer) { peerConnection.setLocalDescription(offer); // modify the SDP after calling setLocalDescription offer.sdp = setMediaBitrates(offer.sdp); // your signaling code to communicate the offer goes here } ); 


In the code above, we call the setMediaBitrates function, which will apply the modifications we need and return the modified SDP package (I will tell the details later). A curious nuance: you can not change the package between calls to createOffer / createAnswer and setLocalDescription . So we will change it before transferring to the second contracting party. When the packet reaches the second side, we will also have to change the second SDP packet, which WebRTC will create on the second side as “Answer”. This is necessary because “Offer” sounds “This is the width of the channel that I can use,” but “Answer” also sounds like “And this is the width of the channel that I can use.” It is necessary to limit from both ends of the pipe:

 peerConnection.setRemoteDescription(new RTCSessionDescription(offer)).then(function() { peerConnection.createAnswer().then(function(answer) { peerConnection.setLocalDescription(answer); // modify the SDP after calling setLocalDescription answer.sdp = setMediaBitrates(answer.sdp); // your signaling code to communicate the answer goes here }; }; 


Now that we have chosen the places to modify the SDP, we can begin the modification itself!

How to parse SDP


I highly recommend reading the post from Antón Román "Anatomy of WebRTC SDP" , it will help to understand what SDP is and how it works. It was from this post that my own adventure began. I also recommend the specification: RFC 4566 SDP: Session Description Protocol . The link will lead you to the 5th section of the 7th page, where the format is described. For those who do not like to read long specs, a short squeeze: SDP is a UTF-8 text, broken down into strings like "=".

Pay attention to the important thing hidden in the depth of the 5th section of the documentation: order of indication types. I will not repeat here a huge piece of text, and again I will give a squeeze. At the beginning of the SDP there is a section followed by repeated “media descriptions”. Their order will always be the same: “m”, “i”, “c”, “b”, “k”, “a”.

That's not all. Now you need to look at the FC 3556 Session Description Protocol (SDP) Bandwidth Modifiers for RTP Control Protocol (RTCP) . This specification describes how to set the channel width using a “type” with a value of “b”. The corresponding SDP string is “b = AS: XXX”, where XXX is the channel width that we want to set. The acronym "AS" stands for "Application Specific Maximum", that is, the maximum allowable channel width. Also from RFC we see that the value is set in kilo bits per second, kbps. So, our code will work according to this algorithm:

  ,    "m=audio"  "m=video"    type "i"  "c"    type "b",       ,    type "b" 


How to modify SDP


For most WebRTC video calls, the media description for the video and the media description for the sound will be used in the protocol. In our example, we limit the video stream to 500kb / s and the audio stream to 50kb / s:

View code
 function setMediaBitrates(sdp) { return setMediaBitrate(setMediaBitrate(sdp, "video", 500), "audio", 50); } function setMediaBitrate(sdp, media, bitrate) { var lines = sdp.split("\n"); var line = -1; for (var i = 0; i < lines.length; i++) { if (lines[i].indexOf("m="+media) === 0) { line = i; break; } } if (line === -1) { console.debug("Could not find the m line for", media); return sdp; } console.debug("Found the m line for", media, "at line", line); // Pass the m line line++; // Skip i and c lines while(lines[line].indexOf("i=") === 0 || lines[line].indexOf("c=") === 0) { line++; } // If we're on ab line, replace it if (lines[line].indexOf("b") === 0 { console.debug("Replaced b line at line", line); lines[line] = "b=AS:"+bitrate; return lines.join("\n"); } // Add a new b line console.debug("Adding new b line before line", line); var newLines = lines.slice(0, line) newLines.push("b=AS:"+bitrate) newLines = newLines.concat(lines.slice(line, lines.length)) return newLines.join("\n") } 


It's all! To be honest, I was very tense when I first encountered SDP. The overwhelming number of small details that need to be understood. But by and large it is just a set of strings, each of which defines something to connect. We do not need regexps, since the sections always have the same order. In our case, we simply replaced the string with type “b”, so even we didn't have to parse.

I hope this article will help you better understand how WebRTC works and how to modify it to fit your needs.

Source: https://habr.com/ru/post/316840/


All Articles