Strength training: rolling out HTTPS under high loads

In September, Mail.Ru Mail enabled HTTPS encryption for all users.

The advantages of a secure connection are obvious to all developers of large Internet projects. Most modern web servers (nginx, Apache, etc) and browsers support HTTPS. At the same time, there are not many sites on which the secure protocol is always enabled and by default. Why is this so? What difficulties did we face with HTTPS support? Read under the cut.

Features of the work of SSL on highly loaded systems
')
In essence, HTTPS is a standard HTTP protocol over the SSL network layer, which allows an insecure network channel to be made secure by encryption.

What needs to be done to support SSL?

It would seem that it is enough to order a certificate, decompose it into servers, enable the option in nginx, and welcome to the world of secure traffic. But the reality is not so simple. Or rather, not so: everything would be simple if for some reason we needed to secure a static page with the text “Hello world”. But for a high-load system with a large variety of content, such as Mail.Ru Mail, the most interesting things start right after the certificates are already purchased and laid out.

What is the difference between Mail.Ru Mail and a hypothetical page with “Hello world”? In essence, there are two main differences in terms of HTTPS support. First, Mail contains many pages - dozens: a list of letters, the letters themselves, settings, address book, etc., and there are hundreds of elements on each page (pictures, JS-nicks, CSS, etc.) . Secondly, Mail works under a huge load, processing hundreds of thousands of HTTP requests per second (including, of course, requests for statics). It is because of these two aspects that the implementation of HTTPS support was so difficult.

How do these aspects affect support? The first aspect affects the following way: when the page is single, the server has a certificate, the browser understands it, everything works. Problems begin when there are many pages and, moreover, when there are external elements on these pages that run in the context of this page. The browser believes that the connection is safe only if all elements to the last page were sent using a secure protocol: this applies to all images, JS, CSS, etc. If at least one element was given over an insecure protocol, the browser will consider the entire page as unsafe and inform the user about it.

The second problem that creates a large load of the Mail, is that encryption and decryption is quite an expensive operation - both in terms of the processor and in terms of memory. CPU time is spent on encryption and decryption. Memory is spent on SSL cache, increased nginx buffers, a large number of web server workers who hang longer in memory due to a longer request processing. SSL is such a beast that will eat a lot of “stone” and memory and will not choke at the same time.

How to deal with it?

Banner system

Mail is a heterogeneous design. We have our own banner system. This is not only the mail system, but the general banner system of the Mail.Ru portal, which has not heard anything about any HTTPS. Considering that we had to give all the data from all pages via HTTPS, the banner system had to be significantly improved.

Besides the fact that we taught web servers to send all banners, scripts and statics via HTTPS, we had to deal with partner pixels. A pixel is a 1x1 picture that a partner provides and that jerks when a banner is shown. It should be noted that not all of our partners work on HTTPS, but, as I have already noted, there should not be any insecure elements on the page, even if they are 1x1 pixels in size. It was possible, of course, to skip all the pixels through a proxy, but we went first in a simpler way. We agreed with the partners, explained to them how wonderful and necessary all this is, and now they supply us with HTTPS pixels. The point was, of course, difficult, but it worked. True, it is now necessary and new partners to teach that the link to the picture should be safe.

There is another problem with banners: they can be introduced not only by technical specialists, but by other employees with different levels of computer literacy. Accordingly, there is a possibility that, along with the banner, non-SSL pictures, JS, and insecure counters may well appear on the page.

To solve this problem, we once went through all the banners and replaced all unsafe with safe ones. However, this does not guarantee that nothing will change in the future. Therefore, we made a revision in the banner system: we created the SSL-ready bit, which is assigned to each project. So far it is installed only at Mail. The presence of this bit indicates that the project will not show banners that contain unsafe content. All banners that were shown in the Post also had an SSL-ready bit. And our system for such banners prohibits, at the input level in the admin panel, from changing secure content to unsafe or adding unsafe content. This completely eliminated the human factor when editing a banner: if someone creates a new banner with unsafe content, he will not be able to show it in Mail, because he will not be able to mark it as safe. And if a person changes the old banner, which is already shown in the Post Office, the admin panel will not make it unsafe.

Ubiquitous HTTPS

Next we had to support HTTPS wherever possible. There is a lot of content that should be given via HTTPS, and it is heterogeneous: these are all pictures, statics, avatars, attachments. One thing is to support SSL on the servers that give these pictures, another thing is to ensure the correctness and protocol independence of the links (HTTPS mode is optional, therefore the link must be correct regardless of whether the current protocol is secure or not). Accordingly, we had to change a significant number of templates that were not sharpened for this. We worked in a semi-automatic way: first through the scripts, then everything was cleaned with our hands. Naturally, the test plan has undergone significant changes: now our testers are checking all this, correcting, driving autotests and so on.

In addition to all this, we have developed a proxy for images. The fact is that most of the pictures that come to the user in letters are external, and we are not yet able to make the entire Internet work via HTTPS. Therefore, a fast, lightweight, single-threaded proxy was created, which passes all external links through itself and gives the user always HTTPS content to the browser.

How it works. The proxy receives an HTTP URL and receives either redirect or content via it. In the second case, the proxy goes over the redirects to some value specified in the config. If she eventually receives an HTTPS redirect, she gives it to the client, that is, to the browser. The browser shows the image directly from there. If at the end of the content is given out, then all the content of the picture, we tighten to ourselves, wrap it in SSL and give it to the browser. Thus, all the pictures are shown absolutely safe, even if they are from an unsafe place. I note that we did antibruteforce and protection against malicious use of a proxy.

In addition, we had to transfer to SSL the part of My World project, from which Mail receives avatars, as well as servers on which the logging system is deployed. Also the Web Agent and servers, from which attachments and their previews are loaded, switched to SSL.

Even from the fact that it is interesting to note, we have made clever SSL support on mobile devices. Not all mobile devices work correctly with SSL, and we have developed a system that determines the type of device from the server-side code, and operates depending on the result.

Currently, support is enabled on the iPhone and iPad, other mobile devices have some problems that we work on. The hour is not far off when Mail.Ru will work by default on all modern mobile devices via HTTPS.

Optimization

Well, the last thing I wanted to talk about was optimization. It would be possible, of course, to buy iron and cover the increased costs of processor time and memory, but this is not our method.

Optimized, as usual, starting from the narrowest points. What is the bottleneck for SSL? This is a connection, because when the connection is a handshake. After the handshake is established, the client and server have some data that can be used to encrypt and decrypt within this particular connection. Actually, since the handshake is very expensive, each connection is expensive for the system. To reduce the number of connections, we set keepalive to 2 minutes (instead of one second, as it was before).

Next, we created an SSL cache to reduce the time of the handshake itself by reusing key data for a single IP address for a not too long time.

And finally, talk about a small hack, which also significantly increased performance. We made a link from / dev / random to / dev / urandom. It works faster because / dev / random is blocking, / dev / urandom is not blocking. Therefore, the I / O wait is shortened. And since the disks on our web servers are sufficiently loaded (due to logging, arriving statics, saving attachments, etc.), the additional I / O wait has a significant impact on the server as a whole.

Conclusion

SSL on a large, high-load project with a lot of communication components is really rocket science, and it’s not as easy as it seems at first. It is not easy to decompose certificates. These are many, many such seemingly small tasks. Something like a puzzle collection; the difference is that once a puzzle is assembled, it will not break, and in our case even a change of one component can lead to unexpected consequences. Therefore, it is important to allocate time and resources for additional monitoring, to conduct a large number of tests, including automatic ones, so that the system continues to please the user with a green lock of a secure connection.

If you have any questions about the implementation of HTTPS under high loads, ask them in the comments, I will answer with pleasure.

Denis Anikin,
Technical Director Mail Mail.ru

Source: https://habr.com/ru/post/158603/

All Articles

Strength training: rolling out HTTPS under high loads

More articles: