Implement an even more secure VPN protocol

This publication is a continuation of the previously written in our blog: " Implement a secure VPN-protocol ." In this article, we do not alter or rewrite the protocol, but only slightly modify it further. The implementation of everything described below is already present in version GoVPN 3.1 .

To create noise, the transport protocol is slightly modified. The handshake protocol has been changed to allow handshake augmentation and password gain. More details about all this under the cut.

Hiding the size and time of sending the payload

At the end of the previous article, I noticed that we ensure the confidentiality of the contents of the transmitted data, but do not hide the size of the packets and the fact of their sending. Sometimes even the very fact (the period of the packet's appearance) can indirectly with high probability say that now, for example, DHCP works via an encrypted channel: it seems to be encrypted, but we still know which processes are inside. Or, you can track the correlation between incoming traffic from one client to outgoing in another place, and thereby de-anonymize it.
')
We solve this problem quite simply, although it is somewhat costly in terms of resources: we add noise to the traffic.

In the transport protocol, after nonce , two bytes are added (which will be encrypted) containing the size of the payload. It can be equal to zero, which is convenient for heartbeat packages to show that the client / server is still “alive” on the network. As a side effect: we reduce the MTU of the virtual TAP interface by these two bytes.

Each packet is padded with zeros before encryption in order to increase its size to the maximum possible amount sent by GoVPN. After encryption, it becomes noise, in which it is impossible to understand where the payload is and where useless data is.

So we hid the size of the message, but not the fact of messages in the network. This problem is solved simply by creating a constant traffic rate (constant packet rate). Technically done simply: the tick generator is turned on. For each tick, it is checked whether there is a package to send. If not, an empty packet is sent. All packages are supplemented to the maximum size with noise.

The formation scheme of the transport layer package looks like this:

Strong password authentication protocol

As the cebka user correctly noted in the comments to a previous publication, the 256-bit public key Curve25519 is not a random set of bytes, but a point on an elliptic curve. Therefore, when trying to decrypt it, we will see that we received not random data, but, actually, a point, and, thus, we will understand that we have successfully picked up (found) the common authentication key.

The common authentication key in the previous implementation of GoVPN even in the examples assumes that it was generated not from a password, but from a PRNG. So in practice, of course, it would not have been possible to simply sort out the key. However, if we want to use passwords, then this will become a problem, since passwords have far less entropy and are amenable to dictionary-searching attacks .

Why do we want to use passwords? Because, in any case, the common authentication key must somehow be protected. Either it is stored on a disk to which full disk encryption is applied, or PGP is encrypted, for example, and when used, the decrypted version of it is placed in random access memory (temporary disk). Both disk and PGP, in turn, are protected by password phrases. Why not use these pass phrases directly in the GoVPN protocol to have less software dependencies and attack vectors?

A small digression: passwords should be used, not passwords. Technically, there may be no difference between them for the computer, but for a person it is significant: the password is usually a short line of high-entropy (random) characters, and the passphrase is a long line of low-entropy. Low entropy means easy human memorization. It is believed that ordinary English text contains 1-2 bits of entropy per character. However, if you take a hundred characters, then in total we will get a hundred bits, as a rule, easily remembered. The only "but" from a technical point of view: if the password can still be stored in the database (do not do that, of course), then the passphrase is not convenient for this and the hash is saved from it.

For the authentication protocol to be called “strong,” it must be safe to use even with weak passwords. In our case, the password “foobar” will be quickly picked up from the dictionary and decrypting the public key at the time of the handshake will indicate that the password has been successfully selected. That is, this is not a zero-knowledge-protocol.

This can be corrected by applying a special coding of the points of the Elligator curves. It allows them to be encoded so that they become indistinguishable from noise. This will be enough for the protocol to become zero-knowledge and to be able to use even weak passwords, while being called “strong authentication protocol”. Elligator is applied to a public key on one side before encryption and is inverted on the opposite side after decryption.

Elligator can be applied not to all key pairs Curve25519: on average, about half of the points can not be encoded into a random string. When generating a Curve25519 key pair, we try to encode a public one, checking whether it will work. If not, repeat the procedure. We get an unpleasant side effect: when generating keys Curve25519 on each side, we will need, on average, two times more entropy and computational resources.

Password enhancement

The protocol after the application of Elligator becomes zero-knowledge and is suitable for authentication with weak passwords. But the authentication data is stored on the server and client. There may be no client on the client’s hard disk, as the passphrase is entered manually, but on the server it will be a separate file. Compromising the contents of the server's hard disk, leaking the authentication key database will allow you to sort through the password, and attack with a dictionary. This is a very powerful attack, which is able to recover a huge number of passwords and passwords used by people.

If we keep the password hash on the server (since it is convenient to store it), then the attacker will simply calculate the hashes from the passwords being sorted and compare with what is on the hard disk. Hashes are considered fast. Therefore, always and everywhere stored passwords or passwords need to be strengthened.

Common password enhancement methods: PBKDF2 , bcrypt , scrypt . We will not particularly go into the descriptions of these algorithms, since there are a lot of articles on this topic (because so far people manage to use none of this without absolutely appreciating the secrets of users).

Personally, I do not consider bcrypt as an option, since the nominal length of password phrases per input is limited to 72 characters (the Blowfish feature), which is small (personally, all my password phrases have a length of 90-110 characters). And the main argument in favor of bcrypt is that its function is slower. True, but what prevents to increase the number of iterations of PBKDF2? The difference between them as a whole is very blurry: the essence is one, just a little other tools are used.

Scrypt is interesting, but there are also many arguments against it, albeit controversial. One could think about it more closely if it were not for the final of the Password Hashing Competition , which is designed to make a good quality password enhancement function that takes into account the load on memory and temporary side channel attack. There really are very interesting ideas, implementations and well-versed in the subject of "judges". But until the finalist is selected, PBKDF2-SHA512 is used in GoVPN.

As a rule, any enhancement is to increase the entropy of passwords and some expensive operation. An increase in entropy is necessary so that, at a minimum, the same strong passwords do not coincide and for this purpose add the so-called "salt". The expensive operation in the case of PBKDF2 is a lot (thousands) of hash function iterations. In addition, additional entropy protects against the creation of pre-calculated hash values.

In GoVPN, the already existing 128-bit client ID is used as a salt that is not secret (no need to hide it).

The server is already pre-enhanced version. It is also used in the handshake protocol. Before starting the connection, the user enters a password phrase, it is amplified using the user ID as a salt, and this result is already used as an authentication key when shaking hands with the server. The amplification operation is expensive, but is performed only on the client when the daemon starts.

Authentication augmentation

By compromising the database of client authentication keys on the server, we are unlikely to be able to easily find out user passwords. But in our hands we have the result of their gain, used for authenticating the parties. If this data leaks to the attacker, then he will be able to present himself as a client, will be able to connect to the VPN server.

If we can store on the server something that can only authenticate the authentication data, but cannot be used in their capacity, then this problem will be solved. The process is commonly called augmentation (“augmentation”) and is described in the article for EKE. Instead of passwords on the server side are the so-called "verifiers" (verifiers).

Options for solving this problem are many. We will apply it based on asymmetric signature algorithms. Specifically Ed25519 from the author already used by us Curve25519, Salsa20 and Poly1305. It is an easy-to-implement, fast, reliable (good cryptanalysis) algorithm for generating and verifying signatures. In addition, it does not require additional entropy when creating signatures.

The essence of augmentation in this case comes down to the fact that the public key Ed25519 of the pair generated from the strong password is used as the verifier. Instead of a strong password, this checker is used to encrypt Diffie-Hellman public keys. The client additionally, at the end of the handshake, signs the used shared key K obtained after Diffie-Hellman and sends this signature to the server. Since the checker is just a public key, the server will be able to verify the signature for them and make sure that the client really has a private part of the key, which can be obtained by knowing only the password in the clear form. The attacker will not be able to create a signature and introduce himself as a client.

The checker is created on the client side in advance, using the utility included in GoVPN. After entering his ID (which can be created on anyone's side) and the passphrase, on the basis of which the enhanced version is created and Ed25519 key pair, he sends the checker to the server administrator. As a side effect, we get an increase in the handshake traffic by the length of the signature, and the waste of the client’s processor resources to create the signature, and the server to check it.

The final protocol of the handshake began to look like this:

rand (xbit)	reading X bits from PRNG
CDHPriv	Private Diffie-Hellman client key
SDHPriv	Private Diffie-Hellman server key
CDHPub	Diffie-Hellman's public client key
Sdhpub	public diffie-hellman server key
enc (K, N, D)	Salsa20 encryption with key K, nonce N, data D
H ()	hash function HSalsa20. Not fundamentally what here. Could be SHA2
El ()	function of coding point of the Elligator curve, as well as inverting this action
DSAPub	Ed25519 client's public key generated based on its password
DSAPriv	Ed25519 client's private key generated based on its password
Sign (K, D)	Ed25519-signature generation by private key K of data D
Verify (K, D)	verification of the Ed25519 signature with the public key K data D

What else to do or fix?

Dependence on quality PRNG has not disappeared anywhere and the safe use of GoVPN under closed proprietary operating systems is technically impossible. This can be fixed only by changing the OS / platform for good. Fixed in version 3.4 : third-party EGD-compatible PRNG sources can be used.

The only thing that can indirectly be understood is that the traffic is GoVPN-specific is that at the beginning (when there is a handshake) there is an exchange of packets of always clearly defined sizes and only then the “noise” is turned on. Handshake messages are indistinguishable from the noise, do not give out the client ID, but the size is not hidden. Fixed in version 4.0 : handshake messages can be noisy.

A small statistic is not current:

Overhead transport protocol	26 bytes per Ethernet packet TAP interface
Overhead protocol handshake	264 bytes, 2 packets from the client, 2 from the server
Pass IPv4 TCP traffic	786 Mbps on amd64 FreeBSD 10.1, Intel i5-2450M CPU 2.5 GHz, Go 1.5.1, one kernel loaded by the daemon
Code size of the transport protocol encryption f-ii (de)	1 screen, 1 screen
Code size of the server, client part of the handshake protocol	2 screens, 1.5 screens
Supported Platforms	i386 / amd64 GNU / Linux and FreeBSD
Available as packages in	Arch Linux , FreeBSD

All the best, do not switch!

Sergey Matveyev, Python and Go-developer ivi.ru

Our previous publications:
» Implementing a secure VPN protocol
" Unnecessary items or how we balance between servers
» Blowfish on guard ivi
» Non-personalized recommendations: the association method
" By cities and villages or as we balance between CDN nodes
" I am Groot. We do our analytics on events
» All on one or as we built CDN

Source: https://habr.com/ru/post/257431/

All Articles