
Our platform
VoxImplant consists of several parts: the cloud, API, SDK for different platforms. The browser SDK connects to the cloud via WebSocket and allows you to make calls (and receive calls) both to other VoxImplant users and to regular phones. Previously, it worked with flash, but in modern browsers, WebRTC technology, specially created for working with voice and video, is used. The thing is good, but rather difficult to use: the ability to peer-to-peer communications, one of the key "chips" of the technology, is controlled entirely by hand. In order for two browsers to organize voice or video chat with each other, the developer needs to collect information on the IP addresses of computers, somehow transfer this information between browsers, run NAT Traversal and feed it all to WebRTC. And if bypassing NAT did not work, then also provide a Relay-server for data transfer.
We recently found an interesting article on the Internet that tells the technical details of the “information transfer” between browsers. Adapted for Habr translation - under the cut.
Where is the signal, and where is the transport level?
WebRTC as a protocol does not include "signaling" mechanisms. This means that you, as a developer, will need to take care of them yourself.
')
The first step is to choose a protocol. And to be more precise, the two protocols - transport and signaling. In most cases, we do not see the difference (or do not want to see it), but sometimes it is very important. I recently received a question for one of the posts, and this prompted me to write an explanation.
Web browsers and WebRTC transport protocols
We need the transport protocol to send messages from one device to another. In this case, it does not matter what is inside the message or how the message is structured - only that it can be sent. And then received.
HTTP / 1.1
Five years ago, browsers were simple if we talk about protocols. In fact, we had HTTP / 1.1 and all the hacks on top of it, known as XHR, SSE, BOSH, Comet. If you are interested in learning more about mechanics, leave a comment, and I will try to explain in the following articles - although you can easily find an explanation yourself if you google a little.
I call the solution group along with HTTP / 1.1 crutches. These solutions use HTTP / 1.1 because there was simply no alternative at the time, but they do it in a way that has no technical meaning.
Yes, you can use REST. But, again, this is a minor detail with respect to HTTP / 1.1.
After that, three technologies emerged: WebSocket, WebRTC and, most recently, HTTP / 2.
Websocket
WebSocket has been added to do what HTTP / 1.1 cannot do. Provide a bidirectional mechanism where both the client and the web server can send messages to each other. What kind of messages they are, what they mean, what type of format they support - the web page developer decides.
There is also a socket.io or less popular SockJS. Both offer a client side mechanic that emulates a WebSocket in cases where it cannot be used.
When your WebSocket works great, socket.io and SockJS are great too. But sometimes it doesn't work great (more on this below, under the HTTP / 2 part).
WebRTC Data Channel
To some extent, the Data Channel is used in WebRTC for signaling.
Yes. You will need to agree on the IP addresses used, and before that use ICE. And for this you will need an additional signal and transport level (the list is in this post). After establishing the connection, you can use the data channel as a signal level.
Data Channel can be used for signaling directly between two devices, or through intermediaries (depending on the tasks).
Why use the Data Channel as a transport protocol?
- Reduce the delay in your signaling. Data Channel is, in theory, the fastest thing you can do.
- Reduce server load. Now he will not receive all messages just to redirect them somewhere - you will send him what he intended.
- Increase the level of privacy / security of personal data - when you do not send messages through the server, it means that it will not peek at what is being sent - or will not even notice that some kind of message exchange is going on.
But, in truth, this option is rarely used. In the WebRTC world, the transport layer is important BEFORE a connection is established when the DataChannel is not yet available. And using DataChannel of one connection as a transport for signaling another is strange.
HTTP / 2
I already
wrote about HTTP / 2 before. But since then HTTP / 2 has spread even more and has become even more popular.
HTTP / 2 eliminates many limitations that are present in HTTP / 1.1. Therefore, it can be a good contender for the signal level protocols for the near future.
How HTTP / 2 can affect WebSocket needs is well
described by Alan Denis .
WebRTC Signaling Protocols
“Signaling” is where you express yourself. Or your service. You want one user to be able to connect with another. Or with a group of people who join a virtual room. You decide what types of messages you need, what they mean, what they look like, and so on.
This is your signaling protocol.
Unlike the transport protocol, you are not limited to what the browser allows, but to what you are trying to achieve.
Consider the three main signaling protocols that are often used with WebRTC.
Sip
SIP came from the world of telephony. Its main transport was UDP. Then TCP and TLS were added to it as transport protocols. Then SCTP pulled up. Understanding them does not make sense, since you cannot use them through a browser. Therefore, WebSocket was added as a SIP transport and simply called it “SIP via WebSocket”. SIP through WebSocket was standardized earlier than WebRTC (which was still not standardized), and, among other things, it already has its own
RFC . Why is all this important? Because it is possible to use SIP through WebSocket only together with WebRTC.
This is about SIP. And if you know SIP, love it or need it, you can use it as a signaling protocol for WebRTC.
XMPP
I hate XMPP.
But I do not quite understand why. Perhaps because when I say something bad about him, all the hardcore fans / followers / fanatics of the XMPP protocol rush to protect him in the comments. And it makes me laugh.
XMPP is all focused around user status and instant messaging. If these are the only requirements, then XMPP really wins - especially when the developer already knows what can be done with XMPP.
If you love XMPP enough, do not forget to answer in the comments - this is below.
Proprietary
I hate nih. Despite this, its own signaling protocol has many advantages.
Very often, all you want is just to put two users on one page. Not more. I know that I greatly simplify, but if not simplified, then you will carry with you all the redundancy of the general-purpose protocol, which you will never need.
In many other cases, you really do not want to add another web server just to work with signaling. You want one server to serve your entire web application. So you come to your own signal level protocol. Although you can not call him that. Or do not think of it as a signal level protocol.
How to make a choice?
Always start with a signal level protocol.
SIP should be used if there is some kind of infrastructure or external services to which you want to connect. If not, then skip it.
If you love XMPP, or need user status information and instant messaging features, then use it.
If the service to which you add WebRTC has its own logic, it may already have signaling. Therefore, you simply add the necessary messages to the proprietary signaling.
In all other cases, my advice is to use a proprietary signaling solution that precisely meets your requirements. You can even use a
SaaS solution for this.