⬆️ ⬇️

Using additional CPU instructions in one of the PHP tasks to speed up performance

When building large PHP projects, many faced a lack of performance, even on powerful servers. Even a small piece of code can significantly affect the entire resource as a whole: in terms of profit, and in terms of the costs of support and maintenance of this resource. I will tell you my experience about a non-standard approach to solving one problem.



Over the course of the year, we constantly added new functionality: we wrote more code, created more modules, modules from modules, more tables with millions of records that participated in cross-sampling. The project grew with great speed. The composition of the developers has changed more than once, and this, though not essential, but, nevertheless, adversely affected the project, which also added unnecessary problems. In general, a fairly large project, as is the case with large companies.



Already when everything is written, it works, and continues to be further developed, and neither time nor budget redoing anything — in order to improve performance — no, and you only need to move forward, and as soon as possible, I get another task. At first I looked at it as a regular ticket: all personal information of the user: last name, address, telephone number, identification code - should be stored in the database in an encrypted form, and can only be accessed by querying with the decryption keys. Since this is my first serious experience related to data encryption, I began to search Google for possible solutions to the problem using PHP, and naturally I came across the well-known library mcrypt. It does not take much time to figure out how to work with it. The library worked - on the forums you can find many examples, comments, discussions. It seemed to me an ideal option for solving my problem, especially considering that there was very little time.



As a result, I used code that is located right on the page describing the mcrypt_encrypt function:

http://us2.php.net/manual/en/function.mcrypt-encrypt.php

<?php

$iv_size = mcrypt_get_iv_size(MCRYPT_RIJNDAEL_256, MCRYPT_MODE_ECB);

$iv = mcrypt_create_iv($iv_size, MCRYPT_RAND);

$key = "This is a very secret key";

$text = "Meet me at 11 o'clock behind the monument.";

echo strlen($text) . "\n";

$crypttext = mcrypt_encrypt(MCRYPT_RIJNDAEL_256, $key, $text, MCRYPT_MODE_ECB, $iv);

echo strlen($crypttext) . "\n";

?>



')

Everything works well, except for one small BUT: the 5th parameter is $ iv (it is also IV - initialization vector) in the mcrypt_encrypt function out of place - since it is not used at all in the ECB ( Electronic codebook ) encryption mode. And in general, I wonder why this example is present in the documentation - it is confusing.



Our Engineer Lead conducted a code review, and made two well-reasoned comments:

  1. IV is not used in ECB mode (about which I wrote above) - this is with regard to safety, it does not deal with productivity.
  2. mcrypt is too heavy and slow to allow to call it on each page load, it is better to find pieces of code where you really need this data and only decrypt it in those cases.


The first is not a problem, google further, you immediately come across the CBC mode ( Cipher-block chaining ). But what to do with the second one is that you need to search through all the modules, because the users' names are used on almost every page of the site. This is too much, I thought, considering the timing, the risks - after all, QA will still have to pass.



One evening, discussing daily problems related to work, drinking beer with a friend who is far from PHP and “these” problems, but very experienced in low-level programming and C ++ - this turned out to be not only a pleasant pastime, but also very useful for work.

He revealed one secret to me (in fact, only for me it was a secret, but for the world of C ++ programmers, of course, this is obvious): if you use certain processor instructions, you can raise the performance of computing tasks 10 times, including tasks related to data encryption. The new intel processors already support instructions for accelerating data encryption and decryption - Advanced Encryption Standard (AES) Instruction Set . And fortunately, as it turned out, our project runs on servers with Intel Xeon E5645 processors that already have these instructions available ( AES New Instructions ).



But how to use all this in PHP?


We will write our PHP module, which will take the value of PHP and encrypt / decrypt using the capabilities of the processor. After several sleepless nights, comparing the results of performance and in general - the concept of what the module should do, where and how to store the vector with the data (after all, it is necessary for decoding) - the following has happened.

PHP module consisting of two parts:

  1. Botan ( http://botan.randombit.net/ ) is an open library written in C ++, which implements many encryption algorithms, including AES256, which we need, and at the same time has the ability to use AES-NI.

  2. libaecrypt — already our part — serves as an adapter for the C ++ interface of the Botan library into the C interface (functions, not classes), which can be called from the main C file of the module.


In the module, we implemented three functions:

  1. Random Key Generator - returns random data of length N bytes, which can be used as a key or a vector.
  2. Encryption
  3. Decryption


Encryption / decryption - uses as parameters:





The algorithm looks like this: random IV is generated, then the data is encrypted using the data key and IV; encrypted vector using key vector. The encrypted vector is added to the encrypted data with the separator # and stored in the database, the decoding is in reverse order.



The main feature of Botan:


I do not think that it is reasonable to lay out the code listings in the article, because there are a lot of them, so I’ll tell you about the module initialization, which is the juice.

Included with the library Botan, is an auxiliary tool that allows you to determine the processor and its instructions (botan / cpuid.h). To speed up encryption / decryption, it is checked whether the processor has an AES-NI; if not, then is there SSSE3.

int Init()

{

//

pInitObj = new Botan::LibraryInitializer();



//

CPUID::initialize();



//

if(CPUID::has_aes_ni())

global_state().algorithm_factory().set_preferred_provider("AES-256", "aes_isa");

else if(CPUID::has_ssse3())

global_state().algorithm_factory().set_preferred_provider("AES-256", "simd");

else

global_state().algorithm_factory().set_preferred_provider("AES-256", "core");



return 1;

}





As a result, the Apache - ab (Apache Benchmark) load testing tool showed the difference between our module and the implementation of the same algorithm using mcrypt: approximately 600 requests / second versus 1400 requests / second - in favor of our module.



Conclusion:




OpenSSL, which also comes with PHP, starting with version 1.0.1, released on March 14, 2012 (after all our torments), already already knows how to use AES-NI instructions (and SSSE3), and in performance is a similar algorithm written in PHP c OpenSSL, gives our module only 200 requests per second ( Software supporting AES instruction set , OpenSSL from version 1.0.1 is in the list).

Personally, in the future, I will use OpenSSL, instead of MCrypt. Besides the fact that mcrypt is slower, it requires a key of 32 bytes as an initialization vector ! - which is not quite standard, since OpenSSL, Botan, and as I understand it, many other libraries implementing encryption in AES256-CBC mode accept a key for an IV of 16 bytes . If you use mcrypt, then only they can decrypt the data.



- UPD1: As for the code samples and the link to my module: the problem is that I signed a contract that does not allow me to publish the source code of the project publicly, because This may affect security (we are talking about 100 thousand US users). But I will try today to post a modified version of the module for viewing in order not to violate the terms of the contract.



UPD2: I was surprised by the sharp negativity and minus in karma, so I want to say: I wanted to share my experience, tell you that if you are working in PHP and are dealing with encryption, then mcrypt is not the best choice, since this library has performance problems. The php package also comes with OpenSSL, which since version 1.0.1 (as I wrote above), uses processor instructions, runs much faster and performs data encryption perfectly. After the release of the new version of OpenSSL, our self-written module no longer matters, but this, unfortunately, was before its release, and again I note that we had too little time.



UPD3: Once again, please note that 25k Page Views and performance problems of our project are not the main point, please focus on the main conclusion from my experience: using AES-NI (processor instructions for speeding up performance) and OpenSSL vs. MCrypt. Thanks to everyone who commented and expressed his opinion, I will try to rewrite the article as soon as possible in order to pay more attention to AES-NI, OpenSSL vs. MCrypt and how to write a module for PHP.

Source: https://habr.com/ru/post/142823/



All Articles