📜 ⬆️ ⬇️

How HTTPS Secures Your Connection: What Every Web Developer Should Know



How does HTTPS work? This is a question that I have been struggling with for a few days in my working draft.

As a Web developer, I realized that using HTTPS to protect user data is a very, very good idea, but I have never had a crystal clear understanding of how HTTPS actually works.
')
How are the data protected? How can a client and server establish a secure connection if someone already listens on their channel? What is a security certificate and why do I have to pay someone to get it?

Pipeline


Before we dive into how it works, let's briefly talk about why it is so important to protect Internet connections and what HTTPS protects against.

When a browser makes a request to your favorite website, this request must go through many different networks, any of which can potentially be used to listen or interfere with the established connection.



From your own computer to other computers on your local network, through routers and switches, through your provider and through many other intermediate providers - a huge number of organizations relay your data. If the attacker turns out to be at least in one of them - he has the opportunity to see what data is being transmitted.

As a rule, requests are transmitted via plain HTTP, in which both the client's request and the server's response are transmitted in clear text. And there are many weighty arguments why HTTP does not use default encryption:

• It requires more computing power.
• More data is being transmitted.
• You can not use caching

But in some cases, when extremely important information is transmitted via a communication channel (such as passwords or credit card information), it is necessary to provide additional measures to prevent such connections from being tapped.

Transport Layer Security (TLS)


Now we are going to plunge into the world of cryptography, but we will not need any special experience for this - we will consider only the most general questions. So, cryptography allows you to protect the connection from potential intruders who want to affect the connection or just listen on it.

TLS - SSL successor is the most commonly used protocol for providing a secure HTTP connection (the so-called HTTPS). TLS is located below the HTTP protocol in the OSI model . Explaining on the fingers, this means that in the process of executing the request, all the “things” associated with the TLS connection first occur and only later, everything connected with the HTTP connection.

TLS is a hybrid cryptographic system. This means that it uses several cryptographic approaches, which we will consider further:

1) Asymmetric encryption (public key cryptosystem) for generating a shared secret key and authentication (i.e., certifying that you are the one you claim to be).
2) Symmetric encryption using a secret key to further encrypt requests and responses.

Public key cryptosystem


A public key cryptosystem is a type of cryptographic system where each side has both a public and a private key mathematically related to each other. The public key is used to encrypt the message text in “gibberish”, while the private key is used to decrypt and retrieve the original text.

Since the message has been encrypted using the public key, it can only be decrypted with the corresponding private key. None of the keys can perform both functions. The public key is published in the public domain without the risk of exposing the system to threats, but the private key should not reach anyone who does not have the right to decrypt the data. So, we have the keys - open and closed. One of the most impressive advantages of asymmetric encryption is that the two parties, who previously did not know each other at all, can establish a secure connection, initially exchanging data over an open, unprotected connection.
The client and server use their own private keys (each with their own) and a published public key to create a shared secret key for the session.

This means that if someone is between the client and the server and oversees the connection, he still will not be able to find out either the client’s private key, the server’s private key, or the session secret key.

How is this possible? Maths!

Diffie-Hellmann algorithm


One of the most common approaches is the Diffie – Hellman key exchange algorithm (DH). This algorithm allows the client and the server to agree on a common secret key, without the need to transfer the secret key over the connection. Thus, the attackers listening to the channel will not be able to determine the secret key, even if they intercept all data packets without exception.

Once the key exchange with the DH algorithm has occurred, the resulting secret key can be used to encrypt a further connection within this session, using much simpler symmetric encryption.

A bit of math ...


The mathematical functions underlying this algorithm have an important distinguishing feature - they are relatively simple to calculate in the forward direction, but practically not calculated in the reverse direction. This is the area where very large primes come into play.

Let Alice and Bob be two parties exchanging keys using the DH algorithm. First, they agree on some basis of root (usually a small number, such as 2,3 or 5) and a very large prime prime number (more than 300 digits). Both values ​​are sent in open form over the communication channel, without compromising the threat of the connection.

Recall that both Alice and Bob have their own private keys (of more than 100 digits), which are never transmitted via communication channels.

The communication channel also transmits a mixture of mixture obtained from private keys, as well as prime and root values.

In this way:
Alice's mixture = (root ^ Alice's Secret)% prime
Bob's mixture = (root ^ Bob's Secret)% prime
where% is the remainder of division

Thus, Alice creates her mixture of mixture based on the approved values ​​of the constants ( root and prime ), Bob does the same. Once they have received each other's mixture values, they perform additional mathematical operations to obtain the session private key. Namely:

Alice's calculations
(Bob's mixture ^ Alice's Secret)% prime

Bob's calculations
(Alice's mixture ^ Bob's Secret)% prime

The result of these operations is the same number for both Alice and Bob, and this number becomes the private key for this session. Please note that neither of the parties was to send its private key via the communication channel, and the received secret key was also not transmitted via the open connection. Sumptuously!

For those who are less mathematical savvy, Wikipedia gives an excellent picture explaining this process using the color mixing example:

image

Notice how the initial color (yellow) eventually turns into the same “mixed” color in both Bob and Alice. The only thing that is transmitted over an open communication channel is half the mixed colors, in fact meaningless for anyone listening to the communication channel.

Symmetric encryption


Key exchange takes place only once per session, during connection establishment. When the parties have already agreed on a secret key, client-server interaction occurs using symmetric encryption, which is much more efficient for transmitting information, since no additional overhead for confirmation is required.

Using the secret key obtained earlier, as well as agreeing on the encryption mode, the client and the server can safely exchange data by encrypting and decrypting messages received from each other using the secret key. An attacker connecting to the channel will see only “garbage” walking back and forth on the network.

Authentication


The Diffie-Hellman algorithm allows two parties to obtain a private secret key. But how can both parties feel confident that they are really talking to each other? We haven't talked about authentication yet.

What if I call my friend, we do DH key exchange, but suddenly it turns out that my call was intercepted and in fact I was communicating with someone else ?! I can still safely communicate with this person - no one else can listen to us - but this will not be the one with whom I think I communicate. It is not too safe!

To solve the authentication problem, we need a public key infrastructure that allows us to be sure that the subjects are who they claim to be. This infrastructure is designed to create, manage, distribute, and revoke digital certificates. Certificates are those annoying things you have to pay for in order for the site to work on HTTPS.

But, in fact, what is this certificate, and how does it provide us with security?

Certificates


In a very rough approximation, a digital certificate is a file that uses a digital signature (more on that in a minute) and associates the public (public) key of the computer with its identity. A digital signature on a certificate means that someone certifies the fact that a given public key belongs to a specific person or organization.

In essence, certificates associate domain names with a specific public key. This prevents the possibility that the attacker will provide his public key, posing as the server that the client is accessing.

In the example with the phone above, a hacker may try to show me his public key, impersonating my friend - but the signature on his certificate will not belong to the one I trust.

For a certificate to be trusted by any web browser, it must be signed by an accredited certification authority (certificate authority, Certificate Authority, CA). CAs are companies that perform manual verification that a person who is trying to obtain a certificate satisfies the following two conditions:

1. is real;
2. has access to the domain, the certificate for which it is trying to get.

As soon as CA verifies that the applicant is real and he actually controls the domain, the CA signs the certificate for this site, in effect, setting a confirmation stamp on the fact that the public key of the site really belongs to him and can be trusted.

The list of accredited CAs is already pre-loaded into your browser. If the server returns a certificate that is not signed by an accredited CA, then a large red warning will appear. Otherwise, everyone could sign bogus certificates.

image

So even if the hacker took the public key of his server and generated a digital certificate confirming that this public key is associated with the site facebook.com, the browser will not believe in it, because the certificate is not signed by an accredited CA.

Other things you need to know about certificates


Extended Validation

In addition to the usual X.509 certificates, there are Extended validation certificates that provide a higher level of trust. By issuing such a certificate, the CA performs even more checks on the person who receives the certificate (usually using passport details or accounts).

When obtaining such a certificate, the browser displays a green plate in the address bar, in addition to the usual lock icon.

Serving multiple websites on one server

Since TLS data exchange takes place even before the start of the HTTP connection, problems may arise if several websites are located on the same web server and at the same IP address. Routing of virtual hosts is performed by the web server, but the TLS connection occurs even earlier. A single certificate for the entire server will be used when requesting any site located on the server, which can cause problems on servers with multiple hosts .

If you use web hosting services, then most likely you will need to purchase a dedicated IP address in order for you to use HTTPS. Otherwise, you will have to constantly receive new certificates (and verify them) every time you update the site.

There is a lot of data on this topic in Wikipedia, there is a course on Coursera . Special thanks to the guys from the chat on security.stackexchange.com , who answered my questions this morning.

Translator's Notes:

1) Thanks to wowkin habrauser for the excellent link on the topic (video translated and voiced by freetonik habrauser ):



2) According to the results of the discussion that unfolded in the commentaries (thanks for the participation of the a5b , Foggy4 and Allen habraouers ) I supplement the main article with the following information:

According to netcraft based on fresh SSL survey (2.4 million SSL sites, June 2013), most SSL connections do not use Perfect forward secrecy algorithms: news.netcraft.com/archives/2013/06/25/ssl-intercepted-today-decrypted- tomorrow.html

Especially the situation is bad in the case of IE (even version 10), which supports Diffie-Hellman only on elliptic curves (RSA and ECDSA certificates), or the classic Diffie-Hellman with more rare DSS certificates (DSA).
According to netcraft estimates, 99.7% of connections with IE and 66% each with Chrome, Opera and Firefox will not use Diffie-Hellman.

Hacker News also noticed this in the discussion .

It is clear that it’s important that you talk about it. it is not a regular DH, not EC-based DH) It is clear that it is encrypted using the public key. (RFC 5246: 7.4.7.1, 8.1.1)
This is important and interesting, but not everyone understands that it is used less frequently in reality. Most SSL and TLS sessions actually exchange keys by encrypting them with RSA.

Source: https://habr.com/ru/post/188042/


All Articles