Video calls through the browser - how to make technology work for your company

Well, it was very interesting to figure out how to make video calls through the browser within the company and how useful it is. Moreover, skype is “listened” and the passwords sent are parsed by robots ...

It seems to have Google+ Hangouts and they are often used - but this is still not WebRTC and proprietary cloud technology. Who knows if colleagues from another ~~competitor~~ company with notebooks and genuine smiles on shining faces are not looking at our business planning meeting?

In general, you see, the topic of our private, secure video conversations within the company is as relevant as ever. Many need it, but how to organize something? We made it. This can be done quite simply, if you know how :-) (having studied a dozen RFCs, w3c standards and their implementations, and got to the bottom of the reasons).

Below, I will try to carry out the main technological risks of implementation, which we had to step on and we will most likely have to step on - and for a snack: a brief technological squeeze and a business TODO, without unnecessary details blasting the brain.

Technology

')
One of the breakthrough technologies of HTML5 is undoubtedly WebRTC , developed by w3c with the support of Google, Mozilla and Opera.

Its essence is that having a camera, a microphone and a browser on a computer, I can make a video call to another person with a browser, camera and microphone anywhere — both to a colleague at the next table, and to the boss who sunbathes on the beach with a laptop surrounded by beautiful female bodies. In this case, the connection will be encrypted, once, and only between you - peer-to-peer - two. Cool? But that is not all.

The action ~~takes place with the use of strong black magic~~ is almost transparent, "inside" the browser - you and your programmers do not bother with codecs, firewalls, approvals, sip, etc. - in the simplest case, the task of video calls at first glance is solved by a dozen lines of js-code and here it is, happiness - private video conversations in your company work.

Do not delve into this scheme, otherwise the brain may explode. For decades, media technologies and protocols have been stuck inside the browser: a television studio, a radio station, and a telephone exchange.

But as you know, the devil is in the details. Let's talk about them ...

Signaling

In the description of the WebRTC technology, there are two strikingly different topics - the romantic success stories "turned on and started" and wild low-level hardcore from sweeping away the terms stun , turn , ice , sdp ... Ie either skipipast and earn, but it is not clear how, or half a year to study the source code and RFC :-)

One of the hardcore terms that lead to despondency is “signaling” . In fact, it performs three simple tasks:

1) Dock the configuration of two browsers (audio / video streams, codecs, addresses and ports - in SDP format)

Sdp Between browsers, this nightmare is transmitted, the details of which can be ignored. But sediment, even after reading the RFC - remains

2) Password exchange for setting up an encrypted connection between browsers
3) Initiation of actions - call something (connect client A’s stream with client B’s stream on callbacks in js), hang up, etc. Those. on js in the browser you do very simple things with the RTCPeerConnection object - but inside the object is a real ADB.

Those. via signaling, which you write on anything and in any way - browsers dock with each other and you can ... make video calls. And, surprisingly, you can not understand what lies on the stack of protocols - calls will be surprisingly simple.

Signaling visually looks frightening - in fact, they simply “chase” SDP descriptions of media streams in the browser via ICE ... Eh, I just wanted to write - it doesn't work

Browsers Search Each Other

Calling is easy on the local network, but when employees are on different networks ... and even behind firewalls - browsers will not be able to connect without assistance.

Technologically, there is even more hardcore here than in the section above, I warn you right away, but if you tell something on your fingers, you get something like:
1) In order to “punch” firewalls of a company, employees must contact the STUN / TURN protocols to a certain central server. This server is either raised by your system administrators, or you use a free, but with limited capabilities (no support for TURN relay-mode), a STUN server from Google.
2) If the firewall fails to “punch”, the only possibility remains - to pour media streams through a third-party server, and not peer-to-peer between browsers. This mode of operation of the TURN server is called “relay”. You will have to understand this server yourself - there are open solutions , but you will have to configure them yourself under WebRTC. However, according to statistics - roughly only about 10% of video calls are in the “relay” mode.

Once again, briefly - to make video calls reliably go inside your company, raise or use an “alien” STUN / TURN server.

Something like this goes the media streams between users' browsers - either directly or through a relay-server. Yes, this is terrible and unaesthetic - but the best ways to “punch” firewalls have not yet been invented. Skype also works on this principle.

Without understanding that ICE is not ice, but the technology of choosing the optimal route for sharing video streams, enshrined in the RFC, you cannot go further. The level of sediment from the excessive, ugly and illogical complexity of the STUN / TURN / ICE protocol stack has increased to a critical level.

Video conferencing

WebRTC - does not support video conferencing directly. There are video / audio streams and browsers and combine. Skype is a paid service. In Google+, Hangouts (which by the way does not use WebRTC, but works in its plugin for Chrome and with specific codecs!) - a limit of 10-15 people.

Understand the complexity? All video streams need to be collected somewhere, turned into one, personal, video stream for a specific participant and returned to him. Those. if we have 10 people, I take the frames of the participants from each personal video stream, superimpose them on the lead frame somewhere below and give the formed frame to a specific participant, and so on for everyone. And you need to collect frames with support for codecs WebRTC compatible browsers. Represent the amount of computation. There are open implementations of this “MCU server”:
1) licode - IMHO is damp and there is no honest MCU, but simply multiplexing of streams.
2) MCU video milticonference server - but it didn’t start up right away after replicating with the Chrome source and the last Asterisk, which supports WebRTC via WebSockets. And somehow scary - java on java and jav-oh chases :-)

It looks awesome, I agree.

Of course, there are commercial products and a whole industry for organizing videoconferences - but in this article we focus only on WebRTC.

At WebRTC, you can do it easier - in a video conference, each browser keeps a video stream of each participant. This you have to program inside the browser on js yourself and connect with signaling. Yes, the traffic here increases, but for calls within the company to several people - this solution impresses with its simplicity and efficiency.

Browsers are different

Video calls work stably between Chrome - Chrome. The remaining combinations: Chrome, Firefox, Opera - or do not work, or formally work, but in fact not. But still promise with the new year reliable operation of Chrome-Firefox video calls.

If you look at it philosophically, not all companies are tied to a specific version of browsers - and you can put communicating colleagues in Chrome, perhaps only during the call and soon the New Year - well, you understand ...

WebRTC and telephony

But the truth is, why not call from a browser to a landline / mobile phone and vice versa? This is possible, but ... But here without immersion in the technology of transmission of sound and video can not do. The first “shock” is the SIP protocol . IP telephony and SIP are just inseparable technologies, but there is no mention of SIP in the WebRTC standard :-) Whom to beat and what?

And everything is simple - arguing on the fingers of the SIP is a kind of powerful and flexible signaling (see above). But if you write signaling yourself - why do you ~~need an intergalactic-level starburst with~~ SIP ~~photon rockets~~ ?

Nastykovochka began to generate monsters such as creating js-libraries ( ~~in assembly language~~ ), which understand SIP and can dock browsers on WebRTC. And asterisk began to support websockets for integration with browsers that want to communicate via WebRTC ... But the obvious incompatibility of technologies into one another and the complexity of the technology stack puts the use of SIP in your own WebRTC calls within the company ... (flew tomatoes, I understand, but agree js + sip is brute force in this case, although it may be).

SIP is what signaling does in WebRTC, but more seriously, difficult and distributed.

An alternative way to solve the above problem is to use the services of a company specialized in such calls . You are not immersed in details and just use a special js-library and ... you can call any city and mobile number through a special web service of the company.

Thus, the task to call from the browser to the phone number is also solved.

Summarize

Let us now summarize the facts and weigh the possibilities and risks. You can independently and in reasonable time to deploy secure video calls in the company. Technologies are open, described, in the jungle you can not climb - because I tried to describe all the risks objectively and to give a technological squeeze of hardcore above.

To launch video calls via WebRTC within a company, it turns out you need:

write signaling that takes into account the specifics of business processes in a company, has access to employee lists in AD, etc.
raise 1-2 STUN / TURN servers outside the company's local network to make video calls from different offices of the company and from mobile devices. In some cases, the video traffic will go through these servers, though it is encrypted.
try to integrate with very raw or rather complex products for organizing video conferencing for several people, or write your own WebRTC implementation of video conferencing
integrate with "gateways" for the possibility of making calls to regular phone numbers from / to the company

Technologically tasks are solvable. But if you do not have time to do all this, you can use, for example, our product or cloud service (free registration and 100 rubles on the account for any calls anywhere - you can test the technology and see if you need it or not) supporting all of the above video calling technologies out of the box and using our cluster of video servers with support for STUN / TURN protocols.

Technology operation

You chose an implementation option and started using video calls from a browser within the company. There is one thing that somehow either is silent, or they don’t know about it - there are rare cases when it’s impossible to reach the telephone. This is due to the fact that browsers could not find each other directly (install peer-to-peer by ICE) and try to use the last opportunity - to establish a connection through an external relay-server but ... system administrators blocked outgoing UDP traffic in the employee's subnet . But TURN supports relay-mode ONLY over UDP (details in RFC). It is enough to allow it - and video calls will work again.

Another common question is traffic and video size. In relay mode on a TURN server, one connection consumes hundreds of kilobytes per second - traffic directly depends on the size of the video in the browser. Those. if you expect tight video conferences in the relay-mode - think about it in advance.

That's probably all that I wanted to tell interesting and useful. I wish everyone good luck in the upcoming New Year, working projects and satisfied employees!

Source: https://habr.com/ru/post/206200/

All Articles