All the pain of p2p development

Good day, habrasoobschestvo! Today I would like to talk about the magical and wonderful project of the company Tensor - a remote assistant. This is a remote access system that connects millions of customers and operators within a common VLSI client base. Remotely assistant is now closely integrated with online.sbis.ru. Every day we register more than ten thousand connections and dozens of hours of session time per day. In this article we will talk about how we establish p2p connections and what to do if this is not possible.

Experience is the son of difficult mistakes

There are quite a few remote access systems. This includes all sorts of variations of free VNC, and quite powerful and offering a wide range of functional paid solutions. Initially, our company used an adaptation of one of these solutions - UltraVNC. This is an excellent free system that allows you to connect to another PC, knowing its IP. The option of how to act if the PC has indirect access to the Internet network has already flashed on the Habr expanses, and we will not touch on this topic. This solution will be sufficient only to achieve a relatively small number of simultaneous connections. A step to the left, a step to the right, and the head begins with scaling, usability, integration into the system and complexity of improvements, which, of course, appear during the software life cycle, which we have encountered.

So, it was decided to invent your bike to create its own system for managing remote desktops, which could be integrated into the common VLSI ecosystem. Of course, the easiest way to associate 2 PCs that does not use only lazy is by a numeric identifier. In our implementation, we use random 6-digit numbers without reference to a specific client.
')
One very famous person once said:

Theory is when everything is known, but nothing works.
Practice is when everything works, but no one knows why.
We combine theory and practice: nothing works ...
and no one knows why!

At the very beginning of our journey, this quotation was very similar to the truth: there was an understanding of how one could “acquaint” the client and the operator with each other. But in practice, everything was not quite trivial.

Introduction to p2p

For communication of 2 devices, we use a signal server - an intermediary, access to which is on both sides. Its role is to register and exchange information between the participants in real time. Through it, without unnecessary trouble, we exchange endpoints (a bunch of IP address and port, access point) to establish a connection.

This signal server, referred to as our remote helper manager (RHM), is a pool of nodejs-written systems that provide fault-tolerant operation of the entire service. Well, more precisely, as a “fault-tolerant” ... we hope so :). Connection to one of the servers is based on the round-robin principle. Thus, the client and the operator can be connected to different servers, and all the mechanics for their synchronization and coordination are completely removed from the desktop application.

All work comes down to the exchange of service packages, with which the parties can unambiguously identify each other and perform some actions relatively synchronously, for example, to begin the procedure of collecting candidates for connection or the beginning of the connection attempt itself.

By the way, do not act like us - do not shoot yourself in the foot: if you are using the 443 TCP port - use TLS, and not pure traffic. More and more firewalls block it and break the connection, and, often on the side of the provider.

The most common communication protocols on the Internet are UDP and TCP. UDP is fast and easy, but it lacks the native ability to guarantee the delivery of packets and their order. TCP is devoid of these shortcomings, but a little more complicated during the installation of a p2p connection. And with the latest trends, it seems to me, a direct tcp connection may even sink into oblivion.

Not always the installation of p2p connections depends on the ability to work with network protocols. For the most part, this feature depends on the specific network settings, more often: like NAT (Network address translation) and / or firewall settings.

It is customary to divide NAT into 4 types, each of which differ in the rules of packet translation from the external network to the end user:

Symmetric NAT
Cone / Full Cone NAT
Address restricted cone NAT
Port restricted cone NAT

By the way, some advanced devices allow you to select the mode directly from the configuration panel, but this is not about that now.

In most cases, it is possible to break through NAT by initiating data transfer to a network node from which you expect to receive a response. For this, it is necessary for the remote side to recognize its external endpoint and communicate it to us. We, in turn, need to do the same.

To find out your IP address and port on an external device (for simplicity, we will call it a router), we use STUN (Session traversal utilities for NAT) and TURN (Traversal using relay NAT) server. STUN - to determine the external IP: port (endpoint) on the UDP protocol, TURN - for TCP.

Why so, because it would be much easier to get an external IP from our own signal server?

There are at least 4 arguments for:

The ability to transparently expand the list of servers (both our own and publicly available) to collect endpoints, thus increasing the fault tolerance of the system.
The complementarity and widespread use of the STUN and TURN protocols allows you to pay minimum attention to collecting endpoints and relaying traffic.
STUN and TURN protocols are very similar. Having dealt with the architecture of STUN packages, TURN is already on the "knurled." And the use of TURN gives us the ability to relay traffic when we fail to establish a direct connection.
We have already used the STUN / TURN server “coturn” in the video call project, which means it was possible to “zayuzat” their power with minimal injections into the “iron”.

Coturn is a opensource implementation of the TURN and STUN server. Its use, as practice has shown, is not limited to WebRTC at all. In my opinion, this is a fairly flexible tool, not very demanding. Yes, it does not have a built-in horizontal scaling capability, but everything can be solved, for example, using an alarm server.

How to communicate with the server using the STUN / TURN protocol

The steps for obtaining endpoints are documented in RFC # 3489, # 5389, # 5766 and # 6062.
All messages to the STUN or TURN protocol are as follows:

Respectively:

12 bytes per message type
22 bytes on its length (size of all subsequent attributes)
12 bytes for a random identifier for TURN and 16 bytes for STUN packets. Their size differs by 4 bytes - this data is reserved for the TURN packet under the constant MagicCookie.

In general, service information is enclosed in the first 20 bytes of the packet.
Attributes also consist of:

2 bytes per attribute type
2 bytes in length
attribute value itself

It is important that the total length of the attribute should be a multiple of 4 bytes. If, say, an attribute length value, for example, 7, then at the end it is necessary to complete: (2 + 2 + 7)% 4 bytes of empty data.

How the endpoint collection for the UDP protocol looks like:

Connect to server
Sending a packet containing binding request:
Receiving a packet containing a binding response:
Parsing the package and extracting the mapped-address:
0x00 0x01 - Attribute type corresponding to MAPPED-ADDRESS
0x00 0x08 - Attribute Cumulative Length
0x00 0x01 - Protocol version corresponding to IPv4
0x30 0x39 - Port, with a value of 12345

Then each byte corresponds to its octet ipv4 address: 123.123.123.123

The collection of endpoint for TCP is somewhat different, since we get it according to the TURN protocol. Why so? Everything is explained by minimizing the number of sockets connected to the TURN server, which means that potentially more people will be able to “hang” on a single traffic relay server.

To collect a candidate for the TURN protocol, you must:

Connect to server.
Send the packet containing the allocation request.
If it is necessary to authorize on the TURN server, in response we will receive allocate failure with 401 errors. In such a case, it will be necessary to repeat the allocation request with the user name and Message Integrity attribute generated based on the message itself, the user name, password and realm attribute taken from the response received from the server.
Further, the server, in case of successful registration, sends allocate success response with the attribute of the dedicated port on the TURN server, as well as XOR-MAPPED-ADDRESS - thus the public endpoint on the TCP protocol. For further work with IP, each octet must be “locked up” (XOR is a logical OR operation) with a similar byte from the MagicCookie constant attribute: 0x21 0x12 0xA4 0x42
In case of further work with this TURN connection, it is necessary to renew the registration each time, sending a refresh request. This is done to discard "dead" connections.

So, we have a server through which we exchanged collected endpoints with the remote side.

Of course, it now seems simple and understandable, but looking back, when you look at the RFC and understand that without wireshark prompts, things will not get off the ground - you are preparing to dive into ... In general, I recall one bearded anecdote:

Learn the boy, and then you will give the keys ...

How to establish a connection?

The simplest is the UDP hole punch organization.
To do this, you must artificially create routing rules on your NAT.

Simply organize a series of packet transfer to a remote endpoint and wait for an answer from it. Several packages are needed to create the appropriate rule on the NAT and get rid of the "race", who will be the first to deliver the appropriate package to whom. Well, no one canceled the loss on UDP.

Then exchanged control phrases and we can assume that the connection is established.

A bit more complicated is the organization of TCP hole punch, although the general ideology remains exactly the same.

The difficulty lies in the fact that only 1 socket by default can occupy its local endpoint, and an attempt to connect to another address will lead to an automatic disconnection from the first one. However, there are socket options that remove the restriction: REUSE_ADDRESS and EXCLUSIVEADDRUSE. After cocking the first and dropping the second option on the socket, the other sockets can occupy the same local endpoint.

Well, it remains a mere trifle - to be bogged down at a local endpoint, open with a socket when connected to TURN, and try to connect to the endpoint of the remote side.

Well, a little more difficult, but no less important for stable connection setup, is the retransmission of traffic.

Since we already have registration at TURN, all we need to do is add the registration of the remote side to the permissions on TURN. To do this, send the CreatePermission package with the indication of remote registration.
The initiator of the connection sends the ConnectRequest packet with the indication of the “blocked” endpoint of the remote registration and signs the MessageIntegrity packet.
If everything is good and the remote side sent CreatePermission with your registration, then the initiator will receive a connect success response, and the client will receive a connection attempt. In both cases, the connection-id attribute will be present in the incoming packet.
Then, for a short period of time, you need to connect to the same IP and TURN server port as the original socket (in the classic TURN server, you can listen to both 3478 and 443 tcp ports) and send a ConnectionBind packet from the new socket indicating connection- id received earlier.
Wait for the packet containing the connection bind success response, and voila - the connection is established. In this case, yes, 2 sockets are used - the manager, who is responsible for maintaining the connection, and the transport one, with which you can work as with a direct connection - everything that will be sent or received should be processed as is.

In terms of priority of use, we have such a hierarchy: direct tcp> direct udp> relay (relaying)

Why did we take the direct udp to second place?

Well, UDP, for all its ease and speed, has a significant drawback: the lack of delivery guarantee and priority. And if one could somehow reconcile with the video stream (the presence of graphic artifacts), then here it is somewhat more serious with transferring files.

To ensure the guarantee and priority, a mechanism similar to reliable UDP was implemented, which yes, consumes several more resources, but also gives the desired.
How did we get out of the situation? First you need to know the MTU (maximum transmission unit) - that is, the maximum size of the udp packet, which can be sent without fragmentation on passing nodes.

To do this, we take for the maximum packet size 512 bytes and set the socket the option IP_DONTFRAGMENT. We send the package and wait for its confirmation. If during the fixed time we received the answer, then we increase the maximum size and repeat the iteration. If, ultimately, we did not wait for confirmation, then we begin the procedure for specifying the MTU size: we start not to significantly decrease the maximum block size and expect a stable confirmation for 10 times. We didn’t receive confirmation - we lowered the MTU and launch a new cycle
The optimal MTU size is found.

Next, we carry out segmentation: we cut the whole large block into many small ones, indicating the initial segment number and the final segment number characterizing the packet. After splitting, we add segments to the send queue. Sending is performed until the remote party informs us that it has received it. The resubmission interval is used as 1.2 * maximum ping size, obtained by finding the MTU.
On the receiving side, we look at the received segment, add it to the incoming queue, and try to assemble the closest packet. If it worked out, we clean the queue and try to assemble the next one.

Here, of course, the most attentive of you who have “lived” up to this paragraph can safely notice: why not use the x264 or x265 codec? - and will be partially right. Honestly, we are also inclined to use it, then you can sacrifice this bike on udp. But what about, say, the transfer of binary files? In this case, we again return to the need for guarantees of delivery and order of packages.

In conclusion, I would like to note that with such an organization of connections, we have no more than 2-3% of failed connections per day, most of which are incorrect proxy or firewall settings, setting up which connection is made without problems.

If this topic turns out to be interesting to you (and it seems so to us), then in the following articles we will talk about launching the application with the highest rights in the system, problems that are formed because of this, and how we deal with them. About compression algorithms, virtual desktops and much more.

Author: Vladislav Yakovlev asmsa

Source: https://habr.com/ru/post/347534/

All Articles