📜 ⬆️ ⬇️

Hardware support for the AES algorithm with modern processors


In 2008, Intel proposed new commands for the x86 architecture, which added support for the hardware level of the AES (Advanced Encryption Standard) symmetric encryption algorithm. At the moment, AES is one of the most popular block encryption algorithms. Therefore, the hardware implementation should lead to an increase in the performance of programs using this encryption algorithm (OpenSSL, The Bat, TrueCrypt ... ). The new expansion of the teams received the name AES-NI. It contains the following instructions:

Since much has already been said about the AES encryption algorithm itself, in this post we will look at how you can use these instructions.


First, let's remember how AES works. This is required in order to understand what mechanisms are implemented in these instructions.
The AES algorithm uses 4 functions:
  1. AddRound - XOR (exclusive or) key messages,
  2. SubBytes - substitution function,
  3. ShiftRows - cyclic shift of fields in a block according to a given rule,
  4. MixColumns - the mixing procedure.

The encryption algorithm itself looks like this:

Getting started


For a start, you need to make sure that the AES-NI extension is present in our processor. To do this, there is a special CPUID command, which, with the value eax = 0x00000001, must set bits in registers relative to the extensions present. For the AES extension, this is the 25 bits of the ECX register:
AES-NI verification code:
mov eax,0x00000001; CPUID; test ecx,0x2000000; je L_no_AES; 

If the bit is set to 1, then we can proceed to encryption.

Key Expansion / ExpandKey


The key expansion algorithm in pseudocode looks like this:
 KeyExpansion(byte key[4*Nk], word w[Nb*(Nr+1)], Nk) begin word temp i = 0; while ( i < Nk) w[i] = word(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3]) i = i+1 end while i = Nk while ( i < Nb * (Nr+1)) temp = w[i-1] if (i mod Nk = 0) temp = SubWord(RotWord(temp)) xor Rcon[i/Nk] else if (Nk > 6 and i mod Nk = 4) temp = SubWord(temp) end if w[i] = w[i-Nk] xor temp i = i + 1 end while end 

For hardware support, you must use the AESKEYGENASSIST statement, which will execute:
 AESKEYGENASSIST xmm1, xmm2/m128, imm8 Tmp := xmm2/LOAD(m128) X3[31-0] = Tmp[127-96]; X2[31-0] = Tmp[95-64]; X1[31-0] = Tmp[63-32]; X0[31-0] = Tmp[31-0]; RCON[7-0]:= imm8; RCON [31-8]:= 0; xmm1 :=[RotWord (SubWord (X3)) XOR RCON, SubWord (X3), RotWord (SubWord (X1)) XOR RCON, SubWord (X1)] 

As it is easy to notice, the instruction does not execute:
 w[i] = w[i-Nk] xor temp 

These operations will have to be performed using MMX instructions.
Example of 128b key expansion
 aeskeygenassist xmm2, xmm1, 0x1 ; 1  pshufd xmm2, xmm2, 0xff; movups xmm3, xmm4; pxor xmm2,xmm3; pshufd xmm2, xmm2, 0x00; pshufd xmm3, xmm3, 0x39; pslldq xmm3,0x4; pxor xmm2,xmm3; pshufd xmm2, xmm2, 0x14; pshufd xmm3, xmm3, 0x38; pslldq xmm3,0x4; pxor xmm2,xmm3; pshufd xmm2, xmm2, 0xA4; pshufd xmm3, xmm3, 0x34; pslldq xmm3,0x4; pxor xmm2,xmm3; 



Encryption / Encryption


To implement one round of encryption, use the AESENC instruction, which performs the following actions:

 AESENC xmm1, xmm2/m128 Tmp = xmm1 Round Key := xmm2/m128 Tmp = ShiftRows (Tmp) Tmp = SubBytes (Tmp) Tmp = MixColumns (Tmp) xmm1 = Tmp xor Round Key 

')
The last round of encryption is implemented using the AESENCLAST instruction:
 AESENC xmm1, xmm2/m128 Tmp = xmm1 Round Key := xmm2/m128 Tmp = ShiftRows (Tmp) Tmp = SubBytes (Tmp) xmm1 = Tmp xor Round Key 

The difference between this instruction and AESENC is that the MixColums operation is not performed at the last step:
Sample encryption procedure
 aesenc xmm1, xmm2 ; aesenclast xmm1, xmm3; 


Decryption / decryption


To implement the decryption procedure, use the AESDEC instruction:

 AESDEC xmm1, xmm2/m128 Tmp = xmm1 Round Key = xmm2/m128 Tmp = InvShift Rows (Tmp) Tmp = InvSubBytes (Tmp) Tmp = InvMixColumns (Tmp) xmm1 = Tmp xor Round Key 

To get InvKey, you must perform the InvMixClomuns operation for the key. The instruction that does this is AESIMC xmm1.xmm2
And for the last decryption round, the AESDECLAST Statement is used:
 AESDECLAST xmm1, xmm2/m128 State = xmm1 Round Key = xmm2/m128 Tmp = InvShift Rows (State) Tmp = InvSubBytes (Tmp) xmm1= Tmp xor RoundKey 

An example of the decryption procedure
 aesmic xmm2,xmm2; aesdec xmm1, xmm2 ; aesdeclast xmm1, xmm3; 



So, hardware support should give us a decent boost to encryption speed. As the end of the post I will give a class in C ++ that implements encryption and decryption operations in ECB mode. After the test run, the encryption speed on a single i5-3740 (3.2GHz) core was reached, equal to 320MB / sec

References:


  1. Intel Advanced Encryption Standard (AES) New Instructions Set
  2. List of processors supporting the expansion of AES-NI commands
  3. C ++ class with AES-NI assembler inserts
  4. Animation how AES works
  5. Wikipedia article with a list of libraries and programs using AES-Ni instructions

Source: https://habr.com/ru/post/201114/


All Articles