We implement secure VPN protocol

Again the topic of VPN, confidential data transfer, security and cryptography. Now, in the post-Snowden era, it has become fashionable to rivet safe, respectful privacy, unbreakable, protected from special services and censorship programs. However, this article is not just another marketing campaign, but rather a demonstration of how cryptographic primitives are used and what you should pay attention to when developing such software.

The result of this work is to create a working client server suitable for review by developers (that is, some high-level language code) that is productive enough to be used in an industrial environment with a high security threshold: GoVPN .

What is not satisfied with the mass of known existing solutions in the form of SSH (it can work with TUN / TAP devices and be used for VPN), TLS , OpenVPN or IPsec ? The complexity of both the protocol and the code. Hence the complexity of the review, the security question. Dependence on US organizations that dictate which algorithms can be used. Only SSH out of the box offers such fast, secure, security-independent algorithms like ChaCha20 , Poly1305 and Curve25519 . TLS-libraries and the protocol themselves massively showed a bad side because of its complexity. OpenVPN, when it does not use PSK (pre-shared key - a key known in advance to both sides), also depends on TLS with all the consequences.
')

Transport protocol

To solve the problem of creating a simple VPN, modern operating systems offer such a convenient piece as TUN / TAP : virtual network interfaces. It is very simple to write a client server application that reads frames from the interface, wrapping them in UDP packets and sending them to the remote side. But we want to ensure the confidentiality of the transmitted data, authentication by both sides of the connection (to make sure that they pretend to be what we expect).

The entire protocol consists of two parts globally: the handshake process and the transport. The handshake will be considered as a black box at the exit from which we will receive: an IP address with the opposite port for sending UDP packets and a common symmetric encryption key. With a handshake, any communication begins, and it is enough to authenticate the parties only at this stage.

The transport is responsible for encrypted (ensuring the confidentiality of the payload) transmission of messages and their reception with decryption. As the algorithm / encryption function, choose Salsa20 (or a variant of this algorithm ChaCha20). At the moment, it has very good cryptographic security (no fatal flaws in cryptanalysis have been identified), the highest performance and ease of implementation in the code. This is a stream cipher: that is, we can assume that the result of its work is a pseudo-random sequence of bytes, which must be XOR with data that requires encryption.

With stream ciphers you should be very careful. Unlike block ones, they are much more difficult to analyze (that is why popular block ciphers are now all block ones), and if at least somewhere the key (and, accordingly, pseudo-random output of this function) was used twice, this instantly leads to the possibility of easy decryption of previously intercepted messages key. However, there are also nice simple sides: the size of the plaintext is equal in length to the size of the encrypted one, no need to think about additions (up to the multiplicity of the cipher block size), about various encryption modes (CBC, ECB, CTR, and others).

Salsa20 accepts, besides a key with data, also nonce: a number that must be used only once for a given key. Using twice is fatal. As nonce, we use a simple, constantly incremented counter. Every incoming packet increases it. This requires storing the state of the transport connection on both sides. Alternatively, it would be possible to use a pseudo-random number generator (PRNG) and access it every time, but it is much slower and finding a quality one (which, while running a single-key cipher, is guaranteed not to spit out the same number) PRNG is problematic. The receiving party for decryption should also know this nonce. If the packets were not lost and reached guaranteed in the specified order, then it is sufficient to simply store the nonce state of the opposite side counter and increment it with the arrival of the packet. But UDP packets are lost and therefore we prescribe a nonce value for each packet at the beginning.

This is already a workable protocol that ensures the confidentiality of data transmission, however, it is vulnerable to attacks inherent in all stream ciphers: ciphertext can be consciously changed and the required substitution of plaintext can be made after decryption. To prevent them, it is necessary to authenticate the transmitted data packets: confirm and make sure that this is the exact data set that was transmitted. As a rule, message authentication codes (MAC - message authentication code) are used. The input MAC accepts the key and the data to be authenticated, and the output receives the so-called tag attached to the message during transmission.

We choose Poly1305 as the MAC. Like Salsa20 from the same author, it has the highest security ceiling, performance and ease of implementation. Poly1305 has a different API in different libraries, but in the simplest case, we can assume that its keys are disposable and one key is used to authenticate only one message. As a key, you can have some kind of secret (known only to two sides) part with a counter attached to it. But it has become a normal practice to generate an authentication key for each message based on the Salsa20 PRNG output of this packet. That is, at the time of packet encryption, we take the first 256 bits of the Salsa20 output and use them as the authentication key for Poly1305. To encrypt them, of course, no longer use. Since the internal state of Salsa20 is 512 bits, then for every fireman we also ignore the next 256 bits of output. Yes, this is a loss of performance, but insignificant, and giving a simple (and therefore potentially more reliable in terms of security) code. This is already a practiced practice in SSH and TLS protocols.

In this way, we get an authenticated encryption mode that can actually be used in practice. The only thing that is slightly complemented is obfuscation / scrambling of the transmitted nonce. From the point of view of security, there are no complaints about it, but we are looking to ensure that, ideally, our protocol is indistinguishable from noise, from random data, in order to make life difficult for the receiving DPI traffic analysis systems.

Nonce is a 64 bit data block. Ideally, you can just encrypt it, turning it into a kind of noise. We already have the Salsa20 cipher, however, it is a PRF function (pseudorandom function is a pseudo-random function), and we want to have a PRP (pseudorandom permutation) pseudo-random permutations): just mix the bits. You can make PRP from PRF and Luby-Rackoff proved it. However, we still introduce another primitive: the XTEA block cipher function. The choice fell on her because of the ease of implementation, so as not to write her own network of Feistel from Salsa20. XTEA is not the fastest, not the safest (although it has a high threshold), but this is all not so critical due to the fact that it will be called exactly once when sending a packet. Since nonce is not repeated, then you don’t need to think about encryption modes, initialization vectors and the like. Since nonce is a multiple of the size of the cipher block, no addition is needed.

In our case, the nonce encryption key will be Salsa20 output with zero (which is not used by both parties) nonce. It is calculated only once (after a handshake) and remains so permanently.

There remains the last thing to remember: what to do with a replay attack. Its essence is that nobody forbids to intercept a packet and resend it. There are a lot of solution options: remembering packages and looking at whether duplications occur (this requires memory), looking at time stamps (this requires good clock synchronization), or, in our case, just remembering the last nonce of the opposite side. All packets with nonce below the last will be ignored. Unfortunately, due to the nature of UDP packets in the process of transmission, they may be reordered and therefore their losses will occur with this approach. In our implementation (this doesn’t concern the protocol as such) we optionally allow some window of allowable variations of nonce values: a compromise between absolute protection against repeated packets and a relatively small window when such an attack can be carried out in practice.

The resulting transport protocol looks like this:

Handshake

Now let's look at the handshake protocol, as a result of which the most important thing is to obtain a common encryption key and trust in the opposite side (authenticate it). As a result, this part of the protocol will work only once, for a total of four packets, but the entire protocol security completely depends on this stage of the handshake.

We do not use and do not want to use PSK for encryption: a long-lived key is, of course, simple, but if it is lost, compromised, we will be able to decrypt all previously intercepted traffic. I want to have such a property as perfect direct secrecy . For this, we each time, for each new session, at each connection, generate new keys. They are temporary, session and are only in the memory of programs and are forgotten when they are turned off. In addition, there are restrictions on the maximum size of data that should be encrypted with one key. For Salsa20, this border is quite large, but for peace of mind, it is worth remembering about it and, for example, exchanging keys (a handshake from scratch) every N gigabytes of encrypted traffic.

In order for both parties to agree on the use of the same key, it is necessary to use a key exchange protocol / algorithm, such as Diffie-Hellman . From the user's point of view, the essence of his work is simple: both parties generate a pair from a public and private key, exchange public parts over an unprotected communication channel, make calculations and they have a shared secret key. Having intercepted the public parts of the keys it is nontrivial to find out their private parts or to find out the resulting common.

In our protocol, we will use Curve25519 : again, one of the simplest from the point of view of implementation in the program, which has excellent (possibly at the moment and unsurpassed) cryptographic security, amazing performance. The algorithm is based on working with elliptic curves and therefore has a compact public key size: 256 bits. The private key is generated simply by taking 256 bits of data from the PRNG operating system. Curve25519 this function, roughly speaking, allows to multiply points on a curve. The public key is obtained by multiplying the point, which is a private key, with a constant string inserted into the code. The shared secret is obtained by multiplying the point of the private key on one side with the point of the public key opposite.

By transmitting one packet with a public key from each side, we are able to establish a common key, a shared secret. However, we do not authenticate the parties: we do not know with whom we communicate. Very often, modern protocols make key exchange, then they are used to encrypt the communication channel, and already use a separate authentication protocol: for example, transferring passwords using either CHAP or using asymmetric cryptography such as RSA. First, it means a large number of round-trip-packages over the network, which is not very fast. Secondly, if asymmetric cryptography is used, then it is resource-intensive and complex.

In our protocol, we use the parties' shared authentication key. In fact, the password, but only high entropy. If the task would be to authenticate only the client, then on top of the channel encrypted with the session key it would be possible to simply send this password and compare it with what is in the database. However, if there is an attacker's server, he will know the password. You can use the CHAP protocol: it is simple, fast, but the attacker's server learns the hash from the password / key. On the one hand, the loss is not great, but this makes it possible to attack already on the hash function.

The best choice for us is the EKE : Encrypted Key Exchange - encrypted key exchange. This is a subtype of the PAKE protocol family: password authenticated key exchange - a key exchange that is authenticated by passwords. The essence of the protocols is that it is at the time of the key exchange that one or two parties are authenticated. In the simplest case, each of the two DH packets is symmetrically encrypted; a common password known to each other is used as the key. If the password does not match, then the parties receive incorrectly decrypted public keys, which, when multiplied, will not give a matching common key.

At the same time, there is such a property as zero-knowledge proof (zero-knowledge proof), in which the attacker will not know a bit of information about the PSK key: he receives encrypted data (noise) and the only thing he can do is try to decrypt it, but he cannot find out if he found the public key correctly (whether he decrypted it), since it is also a noise. It is this property that is decisive in our choice of the DH-EKE protocol and is not in possession of the popular SSH, TLS, and others - they all have attacks on either asymmetric cryptography, or on hash functions, or on symmetric ciphers.

All the same Salsa20 is used for encryption. Since he has a nonce at the entrance, which should never be repeated, each time we generate a random nonce R for it and send it along with the encrypted public key DH from the client.

R, enc(PSK, R, ClientDHPubKey) --> Server

The server, knowing the common authentication key, is able to correctly decrypt the message and immediately calculate the shared secret.

 R, enc(PSK, R, ClientDHPubKey) --> Server enc(PSK, R+1, ServerDHPubKey) --> Client

To authenticate the parties, we will transmit random numbers ( RS from the server side) and expect to receive them again in response after re-encryption:

 R, enc(PSK, R, ClientDHPubKey) --> Server enc(PSK, R+1, ServerDHPubKey), enc(K, R, RS) --> Client enc(K, R+1, RS) --> Server

The client, after receiving the public key of the server, is able to calculate the common key K , respectively, correctly decrypt the RS, encrypt it, and send it to the server. The server compares the RS with what it sent (the state should be stored in memory) and thus make sure that the client really knows the PSK. For server authentication, a similar action is taken with a random number from the RC client:

 R, enc(PSK, R, ClientDHPubKey) --> Server enc(PSK, R+1, ServerDHPubKey), enc(K, R, RS) --> Client enc(K, R+1, RS + RC) --> Server enc(K, 0, RC) --> Client

In the latter case, as nonce, we use zero, but we can use R + 2 or something else like that. Not fundamentally: just not repeated when using a single key, and the key K is already unique for each session. The client, having received the correctly decrypted RC from the server, makes sure that he and the server actually have a common key and that the server knows the PSK.

Shared key K may not have the best entropy. Curve25519 still has not all 256 bits involved, respectively, its cryptographic strength is slightly less than the symmetric analogue of the 128-bit threshold. In addition, the dishonest party can use specially created public keys, which, when multiplied, can give weak K keys: in theory, this is not excluded, although in practice it was not noticed for Curve25519. But nevertheless, paranoid moods can be satisfied by separately generating with each party a qualitative high-entropy key and transmitting it in packets encrypted with a K-key. The resulting shared key, which will already be used in the transport protocol, is obtained by the XOR of the parts created by the client and server. If any side is dishonorable, then all the same one high-entropy part of the key is enough for a good key. In the final diagram below, part of the client key is designated as SC , and the server, SS .

And the final touch is to send the client ID. This part is not related to protocol security: it is solely for ease of administration. When we have several clients, you need to somehow determine which PSK keys you need to use. It is possible to distinguish between clients and servers by ports, but we decided to send the identifier explicitly. The identifier is not secret information, but by sending it in an explicit form, we do not have anonymity and give the opportunity to use the DPI and clearly see which user is making the connection.

Currently, the identifier is sent as an encrypted R (which is already present in the original handshake packet from the user), where the client identifier is used as the key. XTEA is used as a cipher. Since R is different each time, then again, there is no need to think about initialization vectors, encryption modes. The server cannot determine deterministically at once what key was used to encrypt it and it simply goes through all the client identifiers known to it and finds someone whose decryption result matches R. A symmetric cipher is a quick operation and therefore the costs are irrelevant here, in fact, by orders of magnitude smaller than DH work. In this case, we, of course, provide the attacker with both plain text and ciphertext, but XTEA cryptanalysis shows that there is nothing to worry about. In the worst case, the attacker finds out the client ID, but not the PSK key.

The final version of the handshake looks like this:

What else to do or fix?

At the moment, the handshake packets have an explicit label (ends in zero bytes): the server can immediately understand that this is not a transport packet. If we want to confront DPI, then we need to remove this label and make handshake packets indistinguishable from noise. Fixed in version 2.3

The protocol during the handshake and PSK key generation depends on the quality of the PRNG. If PRNG is predictable, weak, then there is no need to talk about security. Running software under a closed proprietary operating system and using the word "security" is meaningless. Popular Microsoft Windows and Apple OS X are known to have sources of entropy and PRNG that are unsuitable for cryptography. We could build our own PRNG, for example, on the basis of Fortuna , but this is only a delay in the inevitable key leakage, since the OS still has full access to the program memory and no one guarantees that the information from them is not used for loopholes. Fixed in version 3.4 : third-party EGD-compatible PRNG sources can be used.

The protocol provides confidentiality, authenticity of the payload, two-way zero-knowledge authentication of the parties, but data on the size and time of occurrence of packets flow into the network. This can be solved by their fragmentation and the addition of "noise": outsiders, not related to the real packet traffic. Fixed in version 3.0

Read also the continuation of the article Let's realize an even more secure VPN protocol !

All the best, do not switch!

© Sergey Matveev, Python and Go developer ivi.ru

Our previous publications:
" Unnecessary items or how we balance between servers
» Blowfish on guard ivi
» Non-personalized recommendations: the association method
" By cities and villages or as we balance between CDN nodes
" I am Groot. We do our analytics on events
» All on one or as we built CDN

Source: https://habr.com/ru/post/256365/

All Articles

We implement secure VPN protocol

Transport protocol

Handshake

What else to do or fix?

More articles: