📜 ⬆️ ⬇️

Protecting the Scrypt Encryption Utility with the Intel® Tamper Protection Toolkit

In our article, we will show how the Intel Tamper Protection Toolkit helps protect critical sections of code and valuable data in the Scrypt encryption utility against static / dynamic reverse engineering and changes. Scrypt is the newest and safest key generation function with a password, widely used in practice. However, there is a threat of falsifying the parameters of the scrypt function, which will lead to the appearance of user password vulnerabilities. The toolkit allows you to reduce these threats. We define a threat model for the application in question, explain how to refactor it for further protection, taking into account the features of applying the Tamper Protection tool to the application.
The main purpose of this article is to demonstrate the capabilities of Intel Tamper Protection Toolkit to protect against attacks on critical sections of code and valuable data found in real-world applications. The toolkit allows you to counteract static and dynamic reverse engineering by obfuscation and to prevent changes to the protected application by monitoring integrity during runtime.

Here we consider only one tooling component, called iprot , which is used for obfuscation, and apply it to the Scrypt version 1.1.6 encryption utility. The utility is a simple scrypt key generation function based on the entered password. The choice in favor of it was made for several reasons. First, the code of its functions contains features frequently used by applications: read and write operations to files, memory allocation, cryptographic functions, and system calls. Secondly, it includes a specific mathematical apparatus. Thirdly, the utility is quite compact, but it allows you to show a wide range of problems that developers may face in practice in the process of protecting their own applications. Finally, scrypt is a modern and secure key generation function that is actively used in practice, for example, new disk encryption based on scrypt is built into Android 4.4.

Code obfuscation


Consider the example of the source code for the sensitive function, which is presented in the Listing below, and compile a dynamic library for it.

#define MODIFIER (0xF00D) int __declspec(dllexport) sensitive(const int value) { int result = 0; int i; for (i = 0; i < value; i++) { result += MODIFIER; } return result; } 

Sensitive source code
')
We will do reverse engineering of the resulting library using IDA Pro. The figure shows the order of execution of the code with logic and data for calculation. Thus, a hacker can easily see the value of MODIFIER in the code and change it.


Execution order in disassembled code

Different code obfuscation techniques can help to hide implementation details, complicate reverse engineering and prevent code changes. Code obfuscation is the process of transforming code into one for which it is difficult to do reverse engineering and understand the logic and data of a program, while the code will have the same functional purpose. Obfuscation is used to avoid the theft of secret data in the code, their changes and to protect the intellectual property of the developer.

Intel Tamper Protection Toolkit


Intel Tamper Protection Toolkit is a product that is used to obfuscate code and check the integrity of the application at run time for executable files under Microsoft Windows * and Android *. Using the Tamper Protection Toolkit , you can protect valuable code and data in the application from static and dynamic reverse engineering and changes. Executable files protected by the tool do not require special boot loaders or additional software and can be run on any Intel processor.
The Intel Tamper Protection Toolkit Beta Toolkit can be downloaded here .
In this article, we will use the following Intel Tamper Protection Toolkit components in order to obfuscate critical sections of code and protect the encryption utility from possible attacks:

Obfuscator receives as input a dynamic library (.dll) and a list of export functions. The output is a dynamic library with obfuscated export functions. Starting from their addresses, the code supplied to the input of the dynamic library is parsed and converted into a special internal representation. Branches, transitions and challenges, if attainable, are also understood and transformed. In order for the code to be obfuscated, several limitations should be considered. In the code there can be no low-level work with memory, external unreachable function calls, indirect transitions and global variables.
In this article, we describe what pitfalls were encountered when making changes to the code in order to obfuscate it, and how to avoid the difficulties encountered.
Obfuscate the dynamic library discussed in the previous section using iprot :
 iprot sensitive.dll sensitive -o sensitive_obf.dll 

Let's try to reverse engineer the obfuscated code using IDA Pro.

 sensitive PROC NEAR jmp ?_001 ?_001: push ebp push eax call ?_002 ?_002 LABEL NEAR pop eax lea eax, [eax+0FECH] mov dword ptr [eax], 608469404 mov dword ptr [eax+4H], 2308 mov dword ptr [eax+8H], -443981824 mov dword ptr [eax+0CH], 1633409 mov dword ptr [eax+10H], -477560832 mov dword ptr [eax+14H], 15484359 mov dword ptr [eax+18H], -1929379840 mov dword ptr [eax+1CH], -1048448 <….> 

Disassembled Obfuscated Code

We can easily notice how the obfuscated code differs from the original one. IDA Pro was unable to display a scheme for the execution order of the obfuscated code, and the MODIFIER value disappeared. Also, the obfuscated code is protected from static and dynamic changes.

Key generation function by password


Password generation key functions ( PBKDF ) are used to convert a password entered by a user into a key (binary data set), which can be used in cryptographic algorithms. PBKDF is a very important component of application protection, because a password entered by a user is not safe to use in a cryptographic algorithm due to its insufficient entropy. These functions are widely used to protect applications, for example, cryptographic keys are obtained from passwords in PGP systems for encrypting / decrypting data on a disk. Also, operating systems use these functions to verify a user password (authentication).

In general, the mathematical expression for PBKDF is as follows:
y = F ( P , S , d , t 1 , ..., t n )
where y is the key generated by the function, P is the password, S is the salt, d is the length of the key being generated and t 1 , ..., t n are the parameters determined by the amount of hardware resources, such as the processor clock frequency, the amount of RAM required to calculate the function . Salt S is used to create different keys for a given password. The t 1 , ..., t n parameters play the role of determining the hardware resources consumed to compute the function, and can be configured to complicate its computation and add additional protection against brute-force attack using parallelization at the hardware level ordinary GPU.


PBKDF Usage Scheme

There are two ways to recover a user password:
  1. The attacker recovers the password using a key that was generated as a result of the leak;
  2. The attacker recovers the password using encrypted or signed key data.

For the first case, the Intel Tamper Protection Toolkit will help prevent key leakage by hiding the code that is executed to generate it and then use the generated key.
The second case of Intel Tamper Protection Toolkit cannot be prevented, but it will help to verify that the attacker did not change the parameters used to generate the key to unsafe.
Here are examples of key generation functions with a password used in practice:

The modern and most secure feature developed by Colin Percival is scrypt . It has the following mathematical formula:

y = F ( P , S , d , N , r , p ),

where y is the key generated by the function, d is the length of the key generated, P is the user password, S is the salt, p , r and N are the parameters for setting the processor time and the amount of RAM required to generate the key. The values ​​of the parameters N , r , p , d can be open and usually they are stored with the key or with encrypted data.

Depending on the values ​​of the N , r , p parameters, the generation of the same key may require different amounts of processor time and memory size. For example, if parameters request ~ 100 ms and ~ 20 MB, then a brute force attack on a regular GPU against the scrypt function will not be as effective as against PBKDF2, which requires a small amount of RAM and allows parallel calculations for different passwords on the GPU.

Scrypt encryption utility


The encryption utility Scrypt uses the AES algorithm in CTR mode and the key generated by the scrypt function using a user password for working with input files. It contains the required and optional parameters to run.
Required are:

Extra options:

For example, running the utility with the command

scrypt enc infile -t 0.1 -M 20971520

will require 100ms of processor time and 20MB of RAM to generate a key. Such parameter values ​​complicate the parallelization of brute force when attacking with brute force.
The figure below represents the work of the Scrypt utility in the case when the user entered the name of the input file for encryption, the password and the parameters defining the required hardware resources.
We will describe the steps performed by the utility when encrypting:
  1. Scrypt Collect and convert parameters. The program selects the parameters of processor time and the amount of RAM required to generate the key and converts them into parameters that are perceived by the scrypt function.
  2. Scrypt Key generation. The scrypt function generates a 64-byte key using the user password and the N , r , p parameters calculated in the previous step. The lower 32 bytes of the dk 1 key are used to calculate the authentication code for the N , r , p , salt, and encrypted data parameters. Thus, during the decryption process, you can check the correctness of the entered password and the integrity of the encrypted data. The upper 32 bytes of the dk 2 key are used to encrypt the input file with the AES algorithm in CTR mode.
  3. Calculate authentication code for scrypt parameters. In this step, an authentication (authentication) code is calculated for the N , r , p, and salt parameters used to generate the key.
  4. OpenSSL encryption with 32-byte AES blocks in CTR mode. Encrypt the input message with dk 2 using the 32-byte AES cipher in CTR mode.
  5. Calculate the authentication code for encrypted data. Finally, the authentication code is computed to encrypt data, using dk 1 to ensure integrity. The output file contains encrypted data, N , r , p parameters, the salt used in encryption, and authentication codes that ensure the integrity of the encrypted data and parameters.



Scrypt encryption scheme

Possible threats


Analyzing the utility operation in encryption mode, we will determine the threat model. The values ​​of the parameters N , r , p , salts and key generated, obtained in the intermediate steps, are critical data and require protection against changes in real time. For example, in debug mode, an attacker can set other values ​​of the N , r , p parameters in order to weaken the resistance of a key to attack by brute force.

The figure below illustrates the decryption process when a user enters an input file name with encrypted text, N , r , p , salt , authentication codes, and a password .
We give a description of the steps performed by the utility when decrypting:
  1. Scrypt Setting Parameters. The input file for decryption contains encrypted data, authentication codes hmac 1 , hmac 2 and the parameters N , r , p , salt , used for encryption. At this step, these parameters are read from the input file and transferred to the key generation function.
  2. Scrypt Key generation. The scrypt function generates a key for the password and the N , r , p , and salt parameters obtained in the previous step. The lower 32 bytes and the upper 32 bytes of this key are indicated in figure dk 1 and dk 2, respectively.
  3. Scrypt Checking the integrity of parameters and password. The integrity of the N , r , p , salt, and password correctness is verified using an authentication code. To verify the password is correct, the utility calculates the authentication code for the parameters N , r , p , salt , using dk 1 , and compares the obtained value with the value hmac 1 . If they match, then the password is correct.
  4. Check the integrity of encrypted data. To verify that the encrypted data has not been changed, an authentication code for the data is calculated using dk 1 and compared with the value of hmac 2 . If they match, then the data has not been corrupted and can be decrypted in the next step.
  5. OpenSSL 32-byte block decryption algorithm AES in CTR mode. Finally, the data is decoded using the 32-byte AES block algorithm in CTR mode using dk 2 . The output file contains the decrypted data.



Scrypt decryption scheme

Porting utility under Windows


The aim of the work is to protect the encryption utility Scrypt under Windows OS using the Tamper Protection toolkit . The original version of the utility is written under Linux OS, so the first task is to port it under Windows OS.
The platform dependent code will be placed between the following conditional directives:

 #if defined(WIN_TP) //    Windows #else //    Linux #endif // defined(WIN_TP) 


The preprocessor directive WIN_TP separates code intended for Windows. WIN_TP must be defined for building under Windows, otherwise code for Linux will be selected for building.
We use the Microsoft * Visual Studio 2013 development environment for building and debugging utilities. There are differences between some objects of Windows OS and Linux OS, such as process, stream, memory and file management, service infrastructures, user interfaces, and so on. We had to take into account all these differences when porting the utility. We describe them below.


Utility Protection with Intel Tamper Protection Toolkit


Now we will refactor the utility code to protect all important data defined in our threat model. The protection of such data is achieved by obfuscating the code with the help of the iprot tool, an obfuscating compiler from the set. We will also adhere to the principle of reasonableness and obfuscate only those functions that create, process and use important data.
We already know that obfuscator accepts a dynamic library as input and generates a binary file containing only the protected functions specified on the command line. Therefore, we will put all the functions that work with important data inside the dynamic library for its further obfuscation. The remaining functions, such as parsing command line arguments, reading the password, we will leave unprotected in the main executable file.

The new structure of the protected utility is shown in the figure below. The utility is divided into two parts: the main executable file and the dynamic library that will be obfuscated. The main executable file is responsible for parsing command line arguments, reading the password and loading the input file into memory.The dynamic library contains export functions such as scryptenc_file , scryptdec_file , which work with important data ( N , r , p , salt ).

The key data structure used in the dynamic library is called the Scrypt context and contains the HMAC verification information for the parameters of the scrypt function : N , r , p, and salt . HMAC contextual information is used to verify the integrity of monitored parameters by trusted functions, such as scrypt_ctx_enc_init ,scrypt_ctx_dec_init , scryptenc_file and scryptdec_file , which were added as a result of code refactoring. These trusted functions will be resistant to change, as we intentionally obfuscate them with a tool. Two new functions, scrypt_ctx_enc_init and scrypt_ctx_dec_init, were needed to initialize the context of the scrypt function for each of the modes: encryption, decryption.


Architecture of the protected utility Scrypt

Let us give a detailed description of the figure, how the utility works in each of the modes: encryption and decryption.
Encryption:
  1. The utility uses the getopt () function to parse command line arguments. The list of arguments is given above.
  2. / .
  3. scrypt_ctx_enc_init scrypt ( N , r , p ), maxmem , maxmemfrac maxtime , . HMAC ( -) , , scrypt . , , scrypt .
  4. , .
  5. scrypt_ctx_enc_init . scrypt , HMAC. , scrypt HMAC. , . , -, , .
  6. The export encryption function scryptenc_file using the entered password is invoked . The function checks the integrity of the scrypt function context with the parameters ( N , r , p, and salt ) used to generate the key. If the verification is passed, the scrypt algorithm is called to generate the key. The generated key is then used for encryption. The export form of the function has the same output as the original function of the scrypt utility . This means that the output has the same hash value used to verify the integrity of the encrypted data and the correct password entered during the decryption process.

Decryption:
  1. getopt() .
  2. / .
  3. scrypt_ctx_dec_init , . , , scrypt .
  4. , .
  5. scrypt_ctx_dec_init . , .
  6. scryptdec_file , . scrypt ( N , r , p ), . , scrypt . , .

In the protected utility, we replace the OpenSSL implementation of the AES algorithm in CTR mode and the function of calculating the authentication code with similar functions from the Intel Tamper Protection Toolkit crypto library . Unlike OpenSSL , the crypto library satisfies all the restrictions on the source code and can be obfuscated using the iprot tool and used with obfuscable code without modification. The AES algorithm is called inside the scryptenc_file and scryptdec_file functions to encrypt / decrypt the input file and uses the key generated by the password. The function of calculating the authentication code is called in export functions ( scrypt_ctx_enc_init, scrypt_ctx_dec_init , scryptenc_file, and scryptdec_file ) to check the integrity of the scrypt context data before using them. In the protected utility, all export functions of the dynamic library are obfuscated using iprot .

Tamper Protection helps us achieve the goal of reducing threats. Our solution is a reworked utility with an obfuscated iprot dynamic library. The solution is resistant to the attacks defined earlier and this can be proved: context scryptcan only be updated via export functions, because they contain their own HMAC key to recalculate the HMAC value in context. Also, these functions and HMAC verification data are protected from modification and reverse engineering with the obfuscator. In addition, other important data, such as the key generated by the scrypt function , is protected because the generation takes place inside the obfuscated export functions scryptenc_file and scryptdec_file . obfustsiruyuschy compiler iprot generates code that is self-modifying at run time and protected from making changes and debugging.

Consider how the scrypt_ctx_enc_init function protects the scrypt context.. The main executable file using the buf_p pointer indicates which function scrypt_ctx_enc_init is called . If the pointer is empty (the value is null ), then the function is called the first time, otherwise the second time. During the first call, the scrypt parameters are initialized , the HMAC is calculated, and the amount of memory required for calculating the scrypt function is returned . All of this is illustrated in the following code.

  //  :       scrypt if (buf_p == NULL) { //   scrypt    // <...> //  HMAC itp_res = itpHMACSHA256Message((unsigned char *)ctx_p, sizeof(scrypt_ctx)-sizeof(ctx_p->hmac), hmac_key, sizeof(hmac_key), ctx_p->hmac, sizeof(ctx_p->hmac)); *buf_size_p = (r << 7) * (p + (uint32_t)N) + (r << 8) + 253; } 


During the second call, buf_p points to the allocated memory passed to the scrypt_ctx_enc_init function . Using the HMAC value, the function checks the integrity of the context to make sure that no data has been changed between the first and second function calls. After that, it initializes the address inside the context, using the buf_p pointer , and recalculates the HMAC value for the changed context. The code that is executed when the call is repeated is shown below.

 //  :    scrypt  if (buf_p != NULL) { //  HMAC itp_res = itpHMACSHA256Message( (unsigned char *)ctx_p, sizeof(scrypt_ctx)-sizeof(ctx_p->hmac), hmac_key, sizeof(hmac_key), hmac_value, sizeof(hmac_value)); if (memcmp(hmac_value, ctx_p->hmac, sizeof(hmac_value)) != 0) { return -1; } //      scrypt: // ctx_p->addrs.B0 = … //  HMAC itp_res = itpHMACSHA256Message( (unsigned char *)ctx_p, sizeof(scrypt_ctx)-sizeof(ctx_p->hmac), hmac_key, sizeof(hmac_key), ctx_p->hmac, sizeof(ctx_p->hmac)); } 


We already know that the obfuscator imposes some restrictions on the source code, so it can be obfuscating: there should be no relocations and indirect transitions (Eng. Indirect jump ) in the code. C language constructs containing global variables, system calls, and standard C functions can generate relocations and indirect transitions. The code above contains one standard C-function memcmp , which makes the code non-infustible using iprot . For this reason, we implement several of our own standard C-functions, such as memcmp , memset , memmoveused in the utility. We will also replace all global variables in the dynamic library with local ones and take care that the data is initialized on the stack.

In addition, we are faced with the problem of obfuscation of a code containing double values , which is not described in the documentation for the tool. For example, the code below shows that the pickparams function to limit the number of operations salsa20 / 8 uses the double variable type with a value of 32768. This value is not initialized on the stack and the compiler places it in the data segment of the executable file, which generates a relocation in the code.

  double opslimit; #if defined(WIN_TP) // unsigned char d_32768[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xE0, 0x40}; unsigned char d_32768[sizeof(double)]; d_32768[0] = 0x00; d_32768[1] = 0x00; d_32768[2] = 0x00; d_32768[3] = 0x00; d_32768[4] = 0x00; d_32768[5] = 0x00; d_32768[6] = 0xE0; d_32768[7] = 0x40; double *var_32768_p = (double *) d_32768; #endif /*     salsa20/8. */ #if defined(WIN_TP) if (opslimit < *var_32768_p) opslimit = *var_32768_p; #else if (opslimit < 32768) opslimit = 32768; #endif 


We eliminated this problem simply by initializing on the stack the desired sequence of bytes in hexadecimal form, representing the required double value , and created a double pointer to the address of this sequence. Perhaps some small utilities like double2hex can help developers get a hexadecimal representation for double values ​​and can be used as an auxiliary tool.
To obfuscate a dynamic library using iprot , we use the following command:
 iprot scrypt-dll.dll scryptenc_file scryptdec_file scrypt_ctx_enc_init scrypt_ctx_dec_init -c 512 -d 2600 -o scrypt_obf.dll 

The interface of the protected utility has not changed. Compare non-obfuscated and obfuscated code. The disassembly code below shows a significant difference between the two.

 #   scrypt_ctx_enc_init PROC NEAR push ebp mov ebp, esp sub esp, 100 mov dword ptr [ebp-4H], 0 mov eax, 1 imul ecx, eax, 0 mov byte ptr [ebp+ecx-1CH], 1 mov edx, 1 shl edx, 0 mov byte ptr [ebp+edx-1CH], 2 mov eax, 1 shl eax, 1 mov byte ptr [ebp+eax-1CH], 3 mov ecx, 1 <…> 

 #       scrypt_ctx_enc_init PROC NEAR mov ebp, esp sub esp, 100 mov dword ptr [ebp-4H], 0 mov eax, 1 imul ecx, eax, 0 mov byte ptr [ebp+ecx-1CH], 1 push eax pop eax lea eax, [eax+3FFFD3H] mov dword ptr [eax], 608469404 mov dword ptr [eax+4H], -124000508 mov dword ptr [eax+8H], -443981569 mov dword ptr [eax+0CH], 1633409 mov dword ptr [eax+10H], -477560832 <…> 

As a result of obfuscation, the utility performance dropped and the library size increased. Obfuscator allows developers to choose between greater security and greater performance with the help of options: cell size and distance between mutation points. In our case, the obfuscator uses 512-byte cells and 2600-byte mutation distances. A cell is a subsequence of instructions from the original executable file. The cells in the obfuscated code are encrypted until you need to execute the code stored in them. After the decryption of the cell and the complete execution of the code contained in it, it is encrypted back.
The source code of the utility protected by the Intel Tamper Protection Toolkit will soon appear on Github.

Thanks


We thank Raghudip Kannavar for the idea of ​​protecting the encryption utility Scrypt and Andrei Somsikov for numerous useful discussions.

Links


  1. K. Grasman. getopt_port on github
  2. C. Percival. The scrypt encryption utility
  3. C. Percival. “Stronger key derivation via sequential memory-hard functions”.
  4. C. Percival, S. Josefsson (2012-09-17). “The scrypt Password-Based Key Derivation Function”. IETF.
  5. N. Provos, D. Mazieres, J. Talan Sutton 2012 (1999). “A Future-Adaptable Password Scheme”. Proceedings of 1999 USENIX Annual Technical Conference: 81–92.
  6. W. Shawn. Freebsd sources on github

Authors: Roman Kazantsev, Denis Katerinsky, Thaddeus Letnes
{Roman.Kazanstev, Denis.Katerinskiy, Thaddeus.C.Letnes}@intel.com

Source: https://habr.com/ru/post/274045/


All Articles