
In our article, we will show how the
Intel Tamper Protection Toolkit helps protect critical sections of code and valuable data in the
Scrypt encryption
utility against static / dynamic reverse engineering and changes.
Scrypt is the newest and safest key generation function with a password, widely used in practice. However, there is a threat of falsifying the parameters of the
scrypt function, which will lead to the appearance of user password vulnerabilities. The toolkit allows you to reduce these threats. We define a threat model for the application in question, explain how to refactor it for further protection, taking into account the features of applying the
Tamper Protection tool to the application.
The main purpose of this article is to demonstrate the capabilities of
Intel Tamper Protection Toolkit to protect against attacks on critical sections of code and valuable data found in real-world applications. The toolkit allows you to counteract static and dynamic reverse engineering by obfuscation and to prevent changes to the protected application by monitoring integrity during runtime.
Here we consider only one tooling component, called
iprot , which is used for obfuscation, and apply it to the
Scrypt version
1.1.6 encryption utility. The utility is a simple
scrypt key generation function based on the entered password. The choice in favor of it was made for several reasons. First, the code of its functions contains features frequently used by applications: read and write operations to files, memory allocation, cryptographic functions, and system calls. Secondly, it includes a specific mathematical apparatus. Thirdly, the utility is quite compact, but it allows you to show a wide range of problems that developers may face in practice in the process of protecting their own applications. Finally,
scrypt is a modern and secure key generation function that is actively used in practice, for example, new disk encryption based on
scrypt is built into Android 4.4.
Code obfuscation
Consider the example of the source code for the
sensitive function, which is presented in the Listing below, and compile a dynamic library for it.
#define MODIFIER (0xF00D) int __declspec(dllexport) sensitive(const int value) { int result = 0; int i; for (i = 0; i < value; i++) { result += MODIFIER; } return result; }
Sensitive source code')
We will do reverse engineering of the resulting library using IDA Pro. The figure shows the order of execution of the code with logic and data for calculation. Thus, a hacker can easily see the value of
MODIFIER in the code and change it.
Execution order in disassembled codeDifferent code obfuscation techniques can help to hide implementation details, complicate reverse engineering and prevent code changes. Code obfuscation is the process of transforming code into one for which it is difficult to do reverse engineering and understand the logic and data of a program, while the code will have the same functional purpose. Obfuscation is used to avoid the theft of secret data in the code, their changes and to protect the intellectual property of the developer.
Intel Tamper Protection Toolkit
Intel Tamper Protection Toolkit is a product that is used to obfuscate code and check the integrity of the application at run time for executable files under Microsoft Windows * and Android *. Using the
Tamper Protection Toolkit , you can protect valuable code and data in the application from static and dynamic reverse engineering and changes. Executable files protected by the tool do not require special boot loaders or additional software and can be run on any Intel processor.
The
Intel Tamper Protection Toolkit Beta Toolkit can be downloaded
here .
In this article, we will use the following
Intel Tamper Protection Toolkit components in order to obfuscate critical sections of code and protect the encryption utility from possible attacks:
- iprot - obfuscator, creating a self-modifying and self-encrypting code;
- The crypto library is a library with a set of basic cryptographic operations: secure hashing algorithms, message authentication (authentication) codes, and symmetric ciphers.
Obfuscator receives as input a dynamic library (.dll) and a list of export functions. The output is a dynamic library with obfuscated export functions. Starting from their addresses, the code supplied to the input of the dynamic library is parsed and converted into a special internal representation. Branches, transitions and challenges, if attainable, are also understood and transformed. In order for the code to be obfuscated, several limitations should be considered. In the code there can be no low-level work with memory, external unreachable function calls, indirect transitions and global variables.
In this article, we describe what pitfalls were encountered when making changes to the code in order to obfuscate it, and how to avoid the difficulties encountered.
Obfuscate the dynamic library discussed in the previous section using
iprot :
iprot sensitive.dll sensitive -o sensitive_obf.dll
Let's try to reverse engineer the obfuscated code using IDA Pro.
sensitive PROC NEAR jmp ?_001 ?_001: push ebp push eax call ?_002 ?_002 LABEL NEAR pop eax lea eax, [eax+0FECH] mov dword ptr [eax], 608469404 mov dword ptr [eax+4H], 2308 mov dword ptr [eax+8H], -443981824 mov dword ptr [eax+0CH], 1633409 mov dword ptr [eax+10H], -477560832 mov dword ptr [eax+14H], 15484359 mov dword ptr [eax+18H], -1929379840 mov dword ptr [eax+1CH], -1048448 <….>
Disassembled Obfuscated CodeWe can easily notice how the obfuscated code differs from the original one. IDA Pro was unable to display a scheme for the execution order of the obfuscated code, and the
MODIFIER value disappeared. Also, the obfuscated code is protected from static and dynamic changes.
Key generation function by password
Password generation key functions (
PBKDF ) are used to convert a password entered by a user into a key (binary data set), which can be used in cryptographic algorithms. PBKDF is a very important component of application protection, because a password entered by a user is not safe to use in a cryptographic algorithm due to its insufficient entropy. These functions are widely used to protect applications, for example, cryptographic keys are obtained from passwords in PGP systems for encrypting / decrypting data on a disk. Also, operating systems use these functions to verify a user password (authentication).
In general, the mathematical expression for PBKDF is as follows:
y =
F (
P ,
S ,
d ,
t 1 , ...,
t n )
where
y is the key generated by the function,
P is the password,
S is the salt,
d is the length of the key being generated and
t 1 , ...,
t n are the parameters determined by the amount of hardware resources, such as the processor clock frequency, the amount of RAM required to calculate the function . Salt
S is used to create different keys for a given password. The
t 1 , ...,
t n parameters play the role of determining the hardware resources consumed to compute the function, and can be configured to complicate its computation and add additional protection against
brute-force attack using parallelization at the hardware level ordinary GPU.
PBKDF Usage SchemeThere are two ways to recover a user password:
- The attacker recovers the password using a key that was generated as a result of the leak;
- The attacker recovers the password using encrypted or signed key data.
For the first case, the
Intel Tamper Protection Toolkit will help prevent key leakage by hiding the code that is executed to generate it and then use the generated key.
The second case of
Intel Tamper Protection Toolkit cannot be prevented, but it will help to verify that the attacker did not change the parameters used to generate the key to unsafe.
Here are examples of key generation functions with a password used in practice:
- Password-based Key Derivation Function (PBKDF2). This is a function of the form y = F ( P , S , c ), where c is the number of iterations for regulating the processor time required to calculate the function F for any P , S. PBKDF2 can be implemented for systems with a very small amount of RAM, which makes a brute force attack using the GPU very effective. Despite this, many products continue to use PBKDF2.
- bcrypt . This feature is more resistant to this type of attack using the GPU, as it uses a larger fixed amount of RAM.
The modern and most secure feature developed by Colin Percival is
scrypt . It has the following mathematical formula:
y =
F (
P ,
S ,
d ,
N ,
r ,
p ),
where
y is the key generated by the function,
d is the length of the key generated,
P is the user password,
S is the salt,
p ,
r and
N are the parameters for setting the processor time and the amount of RAM required to generate the key. The values of the parameters
N ,
r ,
p ,
d can be open and usually they are stored with the key or with encrypted data.
Depending on the values of the
N ,
r ,
p parameters, the generation of the same key may require different amounts of processor time and memory size. For example, if parameters request ~ 100 ms and ~ 20 MB, then a brute force attack on a regular GPU against the
scrypt function will not be as effective as against PBKDF2, which requires a small amount of RAM and allows parallel calculations for different passwords on the GPU.
Scrypt encryption utility
The encryption utility
Scrypt uses the AES algorithm in CTR mode and the key generated by the
scrypt function using a user password for working with input files. It contains the required and optional parameters to run.
Required are:
- the password that the scrypt function uses to generate the key;
- mode : encryption or decryption;
- input file name .
Extra options:
- -t time in seconds required to generate a key;
- -m proportion of RAM used to generate the key;
- -M number of bytes of RAM used to generate the key;
- name of the output file .
For example, running the utility with the command
scrypt enc infile -t 0.1 -M 20971520will require 100ms of processor time and 20MB of RAM to generate a key. Such parameter values complicate the parallelization of brute force when attacking with brute force.
The figure below represents the work of the
Scrypt utility in the case when the user entered the name of the input file for encryption, the password and the parameters defining the required hardware resources.
We will describe the steps performed by the utility when encrypting:
- Scrypt Collect and convert parameters. The program selects the parameters of processor time and the amount of RAM required to generate the key and converts them into parameters that are perceived by the scrypt function.
- Scrypt Key generation. The scrypt function generates a 64-byte key using the user password and the N , r , p parameters calculated in the previous step. The lower 32 bytes of the dk 1 key are used to calculate the authentication code for the N , r , p , salt, and encrypted data parameters. Thus, during the decryption process, you can check the correctness of the entered password and the integrity of the encrypted data. The upper 32 bytes of the dk 2 key are used to encrypt the input file with the AES algorithm in CTR mode.
- Calculate authentication code for scrypt parameters. In this step, an authentication (authentication) code is calculated for the N , r , p, and salt parameters used to generate the key.
- OpenSSL encryption with 32-byte AES blocks in CTR mode. Encrypt the input message with dk 2 using the 32-byte AES cipher in CTR mode.
- Calculate the authentication code for encrypted data. Finally, the authentication code is computed to encrypt data, using dk 1 to ensure integrity. The output file contains encrypted data, N , r , p parameters, the salt used in encryption, and authentication codes that ensure the integrity of the encrypted data and parameters.
Scrypt encryption schemePossible threats
Analyzing the utility operation in encryption mode, we will determine the threat model. The values of the parameters
N ,
r ,
p ,
salts and key generated, obtained in the intermediate steps, are critical data and require protection against changes in real time. For example, in debug mode, an attacker can set other values of the
N ,
r ,
p parameters in order to weaken the resistance of a key to attack by brute force.
The figure below illustrates the decryption process when a user enters an input file name with encrypted text,
N ,
r ,
p ,
salt ,
authentication codes, and a
password .
We give a description of the steps performed by the utility when decrypting:
- Scrypt Setting Parameters. The input file for decryption contains encrypted data, authentication codes hmac 1 , hmac 2 and the parameters N , r , p , salt , used for encryption. At this step, these parameters are read from the input file and transferred to the key generation function.
- Scrypt Key generation. The scrypt function generates a key for the password and the N , r , p , and salt parameters obtained in the previous step. The lower 32 bytes and the upper 32 bytes of this key are indicated in figure dk 1 and dk 2, respectively.
- Scrypt Checking the integrity of parameters and password. The integrity of the N , r , p , salt, and password correctness is verified using an authentication code. To verify the password is correct, the utility calculates the authentication code for the parameters N , r , p , salt , using dk 1 , and compares the obtained value with the value hmac 1 . If they match, then the password is correct.
- Check the integrity of encrypted data. To verify that the encrypted data has not been changed, an authentication code for the data is calculated using dk 1 and compared with the value of hmac 2 . If they match, then the data has not been corrupted and can be decrypted in the next step.
- OpenSSL 32-byte block decryption algorithm AES in CTR mode. Finally, the data is decoded using the 32-byte AES block algorithm in CTR mode using dk 2 . The output file contains the decrypted data.
Scrypt decryption schemePorting utility under Windows
The aim of the work is to protect the encryption utility
Scrypt under Windows OS using the
Tamper Protection toolkit . The original version of the utility is written under Linux OS, so the first task is to port it under Windows OS.
The platform dependent code will be placed between the following conditional directives:
#if defined(WIN_TP)
The preprocessor directive
WIN_TP separates code intended for Windows.
WIN_TP must be defined for building under Windows, otherwise code for Linux will be selected for building.
We use the Microsoft * Visual Studio 2013 development environment for building and debugging utilities. There are differences between some objects of Windows OS and Linux OS, such as process, stream, memory and file management, service infrastructures, user interfaces, and so on. We had to take into account all these differences when porting the utility. We describe them below.
- The utility uses the getopt () function to parse command line arguments. The list of available program arguments is given above. The getopt () function is in the header file unitstd.h according to the POSIX standard set. We use the get_opt () implementation from the getopt_port open project. To do this, add the getopt.h and getopt.c files from the getopt_port project to our project.
- The remaining gettimeofday () function declared in the POSIX API is used by the utility to measure salsa opps and count the number of operations per second salsa20 / 8 performed on the user platform. The salsa opps metric is used by the utility to select safer values for the N , r , and p parameters, so the scrypt algorithm performs the salsa20 / 8 operations the minimum number of times it can avoid brute force attacks. We have added the implementation of the gettimeofday () function to the scryptenc_cpuperf.c file.
- Before launching the configuration algorithm, the utility requests the operating system the amount of available RAM that will be captured by a call to the getrlimit function (RLIMIT_DATA, ...) from the POSIX set to generate the key. In Windows, the hard and non-hard limits for the maximum size of the process data segment (initialized and uninitialized data and heap) are set to 4GB. All this is shown in the code below.
#if defined(WIN_TP) rl.rlim_cur = 0xFFFFFFFF; rl.rlim_max = 0xFFFFFFFF; if((uint64_t)rl.rlim_cur < memrlimit) { memrlimit = rl.rlim_cur; } #else if (getrlimit(RLIMIT_DATA, &rl)) return (1); if ((rl.rlim_cur != RLIM_INFINITY) && ((uint64_t)rl.rlim_cur < memrlimit)) memrlimit = rl.rlim_cur; #endif
- Additionally, a directive has been added to the MSVS compiler to define inline functions in the sysendian.h file.
#if defined(WIN_TP) static __inline uint32_t #else static inline uint32_t #endif
- We ported the tarsnap_readpass (...) function to perform hidden password entry in the terminal. The function disables the display of characters in the terminal window and masks the password with whitespace. The password is stored in a dedicated buffer and sent to the following Scrypt configuration and key generation functions.
#if defined(WIN_TP) if ((usingtty = _isatty(_fileno(readfrom))) != 0) { GetConsoleMode(hStdin, &mode); if (usingtty) mode &= ~ENABLE_ECHO_INPUT; else mode |= ENABLE_ECHO_INPUT; SetConsoleMode(hStdin, mode); } #else if ((usingtty = isatty(fileno(readfrom))) != 0) { if (tcgetattr(fileno(readfrom), &term_old)) { warn("Cannot read terminal settings"); goto err1; } memcpy(&term, &term_old, sizeof(struct termios)); term.c_lflag = (term.c_lflag & ~ECHO) | ECHONL; if (tcsetattr(fileno(readfrom), TCSANOW, &term)) { warn("Cannot set terminal settings"); goto err1; } } #endif
- The original getsalt () function to obtain a pseudo-random sequence reads the special file / dev / urandom , which is part of the Unix operating system. On Windows, we use the rdrand () instruction from a hardware random number generator, available on Intel Xeon and Core chips, starting with Ivy Bridge . The standard C function for generating a pseudo-random sequence is not intentionally used, since in this case the getsalt () function cannot be obfuscated using the Tamper Protection obfuscation tool. The getsalt () function must be protected by the obfuscator from static and dynamic modification and reverse engineering, since the salt produced by this function is categorized by us in section 3 as a protected object. Below are the changes made to the code to get the salt .
#if defined(WIN_TP) uint8_t i = 0; for (i = 0; i < buflen; i++, buf++) { _rdrand32_step(buf); } #else /* /dev/urandom. */ if ((fd = open("/dev/urandom", O_RDONLY)) == -1) goto err0; /* , buffer. */ while (buflen > 0) { if ((lenread = read(fd, buf, buflen)) == -1) goto err1; /* , buffer. */ if (lenread == 0) goto err1; /* */ buf += lenread; buflen -= lenread; } /* */ while (close(fd) == -1) { if (errno != EINTR) goto err0; } #endif // defined(WIN_TP)
Utility Protection with Intel Tamper Protection Toolkit
Now we will refactor the utility code to protect all important data defined in our threat model. The protection of such data is achieved by obfuscating the code with the help of the
iprot tool, an obfuscating compiler from the set. We will also adhere to the principle of reasonableness and obfuscate only those functions that create, process and use important data.
We already know that obfuscator accepts a dynamic library as input and generates a binary file containing only the protected functions specified on the command line. Therefore, we will put all the functions that work with important data inside the dynamic library for its further obfuscation. The remaining functions, such as parsing command line arguments, reading the password, we will leave unprotected in the main executable file.
The new structure of the protected utility is shown in the figure below. The utility is divided into two parts: the main executable file and the dynamic library that will be obfuscated. The main executable file is responsible for parsing command line arguments, reading the password and loading the input file into memory.
The dynamic library contains export functions such as scryptenc_file , scryptdec_file , which work with important data ( N , r , p , salt ).The key data structure used in the dynamic library is called the Scrypt context and contains the HMAC verification information for the parameters of the scrypt function : N , r , p, and salt . HMAC contextual information is used to verify the integrity of monitored parameters by trusted functions, such as scrypt_ctx_enc_init ,scrypt_ctx_dec_init , scryptenc_file and scryptdec_file , which were added as a result of code refactoring. These trusted functions will be resistant to change, as we intentionally obfuscate them with a tool. Two new functions, scrypt_ctx_enc_init and scrypt_ctx_dec_init, were needed to initialize the context of the scrypt function for each of the modes: encryption, decryption.
Architecture of the protected utility ScryptLet us give a detailed description of the figure, how the utility works in each of the modes: encryption and decryption.Encryption:- The utility uses the getopt () function to parse command line arguments. The list of arguments is given above.
- / .
- scrypt_ctx_enc_init scrypt ( N , r , p ), maxmem , maxmemfrac maxtime , . HMAC ( -) , , scrypt . , , scrypt .
- , .
- scrypt_ctx_enc_init . scrypt , HMAC. , scrypt HMAC. , . , -, , .
- The export encryption function scryptenc_file using the entered password is invoked . The function checks the integrity of the scrypt function context with the parameters ( N , r , p, and salt ) used to generate the key. If the verification is passed, the scrypt algorithm is called to generate the key. The generated key is then used for encryption. The export form of the function has the same output as the original function of the scrypt utility . This means that the output has the same hash value used to verify the integrity of the encrypted data and the correct password entered during the decryption process.
Decryption:- getopt() .
- / .
- scrypt_ctx_dec_init , . , , scrypt .
- , .
- scrypt_ctx_dec_init . , .
- scryptdec_file , . scrypt ( N , r , p ), . , scrypt . , .
In the protected utility, we replace the OpenSSL implementation of the AES algorithm in CTR mode and the function of calculating the authentication code with similar functions from the Intel Tamper Protection Toolkit crypto library . Unlike OpenSSL , the crypto library satisfies all the restrictions on the source code and can be obfuscated using the iprot tool and used with obfuscable code without modification. The AES algorithm is called inside the scryptenc_file and scryptdec_file functions to encrypt / decrypt the input file and uses the key generated by the password. The function of calculating the authentication code is called in export functions ( scrypt_ctx_enc_init, scrypt_ctx_dec_init , scryptenc_file, and scryptdec_file ) to check the integrity of the scrypt context data before using them. In the protected utility, all export functions of the dynamic library are obfuscated using iprot .Tamper Protection helps us achieve the goal of reducing threats. Our solution is a reworked utility with an obfuscated iprot dynamic library. The solution is resistant to the attacks defined earlier and this can be proved: context scryptcan only be updated via export functions, because they contain their own HMAC key to recalculate the HMAC value in context. Also, these functions and HMAC verification data are protected from modification and reverse engineering with the obfuscator. In addition, other important data, such as the key generated by the scrypt function , is protected because the generation takes place inside the obfuscated export functions scryptenc_file and scryptdec_file . obfustsiruyuschy compiler iprot generates code that is self-modifying at run time and protected from making changes and debugging.Consider how the scrypt_ctx_enc_init function protects the scrypt context.. The main executable file using the buf_p pointer indicates which function scrypt_ctx_enc_init is called . If the pointer is empty (the value is null ), then the function is called the first time, otherwise the second time. During the first call, the scrypt parameters are initialized , the HMAC is calculated, and the amount of memory required for calculating the scrypt function is returned . All of this is illustrated in the following code.
During the second call, buf_p points to the allocated memory passed to the scrypt_ctx_enc_init function . Using the HMAC value, the function checks the integrity of the context to make sure that no data has been changed between the first and second function calls. After that, it initializes the address inside the context, using the buf_p pointer , and recalculates the HMAC value for the changed context. The code that is executed when the call is repeated is shown below.
We already know that the obfuscator imposes some restrictions on the source code, so it can be obfuscating: there should be no relocations and indirect transitions (Eng. Indirect jump ) in the code. C language constructs containing global variables, system calls, and standard C functions can generate relocations and indirect transitions. The code above contains one standard C-function memcmp , which makes the code non-infustible using iprot . For this reason, we implement several of our own standard C-functions, such as memcmp , memset , memmoveused in the utility. We will also replace all global variables in the dynamic library with local ones and take care that the data is initialized on the stack.In addition, we are faced with the problem of obfuscation of a code containing double values , which is not described in the documentation for the tool. For example, the code below shows that the pickparams function to limit the number of operations salsa20 / 8 uses the double variable type with a value of 32768. This value is not initialized on the stack and the compiler places it in the data segment of the executable file, which generates a relocation in the code. double opslimit; #if defined(WIN_TP)
We eliminated this problem simply by initializing on the stack the desired sequence of bytes in hexadecimal form, representing the required double value , and created a double pointer to the address of this sequence. Perhaps some small utilities like double2hex can help developers get a hexadecimal representation for double values and can be used as an auxiliary tool.To obfuscate a dynamic library using iprot , we use the following command: iprot scrypt-dll.dll scryptenc_file scryptdec_file scrypt_ctx_enc_init scrypt_ctx_dec_init -c 512 -d 2600 -o scrypt_obf.dll
The interface of the protected utility has not changed. Compare non-obfuscated and obfuscated code. The disassembly code below shows a significant difference between the two.As a result of obfuscation, the utility performance dropped and the library size increased. Obfuscator allows developers to choose between greater security and greater performance with the help of options: cell size and distance between mutation points. In our case, the obfuscator uses 512-byte cells and 2600-byte mutation distances. A cell is a subsequence of instructions from the original executable file. The cells in the obfuscated code are encrypted until you need to execute the code stored in them. After the decryption of the cell and the complete execution of the code contained in it, it is encrypted back.The source code of the utility protected by the Intel Tamper Protection Toolkit will soon appear on Github.Thanks
We thank Raghudip Kannavar for the idea of protecting the encryption utility Scrypt and Andrei Somsikov for numerous useful discussions.Links
- K. Grasman. getopt_port on github
- C. Percival. The scrypt encryption utility
- C. Percival. “Stronger key derivation via sequential memory-hard functions”.
- C. Percival, S. Josefsson (2012-09-17). “The scrypt Password-Based Key Derivation Function”. IETF.
- N. Provos, D. Mazieres, J. Talan Sutton 2012 (1999). “A Future-Adaptable Password Scheme”. Proceedings of 1999 USENIX Annual Technical Conference: 81–92.
- W. Shawn. Freebsd sources on github
Authors: Roman Kazantsev, Denis Katerinsky, Thaddeus Letnes
{Roman.Kazanstev, Denis.Katerinskiy, Thaddeus.C.Letnes}@intel.com