
In 2008, Intel proposed new commands for the x86 architecture, which added support for the hardware level of the AES (Advanced Encryption Standard) symmetric encryption algorithm. At the moment, AES is one of the most popular block encryption algorithms. Therefore, the hardware implementation should lead to an increase in the performance of programs using this encryption algorithm (OpenSSL, The Bat, TrueCrypt
... ). The new expansion of the teams received the name AES-NI. It contains the following instructions:
- AESENC - Perform one round of AES encryption,
- AESENCLAST- Perform the latest AES encryption round,
- AESDEC - Perform one round of AES decryption,
- AESDECLAST - Perform the latest AES decryption round,
- AESKEYGENASSIST - Contribute to the generation of the AES round key,
- AESIMC - Reverse Mix Columns.
Since much has already been said about the AES encryption algorithm itself, in this post we will look at how you can use these instructions.
First, let's remember how AES works. This is required in order to understand what mechanisms are implemented in these instructions.
The AES algorithm uses 4 functions:
- AddRound - XOR (exclusive or) key messages,
- SubBytes - substitution function,
- ShiftRows - cyclic shift of fields in a block according to a given rule,
- MixColumns - the mixing procedure.
The encryption algorithm itself looks like this:

Getting started
For a start, you need to make sure that the AES-NI extension is present in our processor. To do this, there is a special CPUID command, which, with the value eax = 0x00000001, must set bits in registers relative to the extensions present. For the AES extension, this is the 25 bits of the ECX register:
AES-NI verification code:
mov eax,0x00000001; CPUID; test ecx,0x2000000; je L_no_AES;
If the bit is set to 1, then we can proceed to encryption.
Key Expansion / ExpandKey
The key expansion algorithm in pseudocode looks like this:
KeyExpansion(byte key[4*Nk], word w[Nb*(Nr+1)], Nk) begin word temp i = 0; while ( i < Nk) w[i] = word(key[4*i], key[4*i+1], key[4*i+2], key[4*i+3]) i = i+1 end while i = Nk while ( i < Nb * (Nr+1)) temp = w[i-1] if (i mod Nk = 0) temp = SubWord(RotWord(temp)) xor Rcon[i/Nk] else if (Nk > 6 and i mod Nk = 4) temp = SubWord(temp) end if w[i] = w[i-Nk] xor temp i = i + 1 end while end
For hardware support, you must use the AESKEYGENASSIST statement, which will execute:
AESKEYGENASSIST xmm1, xmm2/m128, imm8 Tmp := xmm2/LOAD(m128) X3[31-0] = Tmp[127-96]; X2[31-0] = Tmp[95-64]; X1[31-0] = Tmp[63-32]; X0[31-0] = Tmp[31-0]; RCON[7-0]:= imm8; RCON [31-8]:= 0; xmm1 :=[RotWord (SubWord (X3)) XOR RCON, SubWord (X3), RotWord (SubWord (X1)) XOR RCON, SubWord (X1)]
As it is easy to notice, the instruction does not execute:
w[i] = w[i-Nk] xor temp
These operations will have to be performed using MMX instructions.
Example of 128b key expansion aeskeygenassist xmm2, xmm1, 0x1 ; 1 pshufd xmm2, xmm2, 0xff; movups xmm3, xmm4; pxor xmm2,xmm3; pshufd xmm2, xmm2, 0x00; pshufd xmm3, xmm3, 0x39; pslldq xmm3,0x4; pxor xmm2,xmm3; pshufd xmm2, xmm2, 0x14; pshufd xmm3, xmm3, 0x38; pslldq xmm3,0x4; pxor xmm2,xmm3; pshufd xmm2, xmm2, 0xA4; pshufd xmm3, xmm3, 0x34; pslldq xmm3,0x4; pxor xmm2,xmm3;
Encryption / Encryption
To implement one round of encryption, use the AESENC instruction, which performs the following actions:

AESENC xmm1, xmm2/m128 Tmp = xmm1 Round Key := xmm2/m128 Tmp = ShiftRows (Tmp) Tmp = SubBytes (Tmp) Tmp = MixColumns (Tmp) xmm1 = Tmp xor Round Key
')
The last round of encryption is implemented using the AESENCLAST instruction:
AESENC xmm1, xmm2/m128 Tmp = xmm1 Round Key := xmm2/m128 Tmp = ShiftRows (Tmp) Tmp = SubBytes (Tmp) xmm1 = Tmp xor Round Key
The difference between this instruction and AESENC is that the MixColums operation is not performed at the last step:
Sample encryption procedure aesenc xmm1, xmm2 ; aesenclast xmm1, xmm3;
Decryption / decryption
To implement the decryption procedure, use the AESDEC instruction:

AESDEC xmm1, xmm2/m128 Tmp = xmm1 Round Key = xmm2/m128 Tmp = InvShift Rows (Tmp) Tmp = InvSubBytes (Tmp) Tmp = InvMixColumns (Tmp) xmm1 = Tmp xor Round Key
To get InvKey, you must perform the InvMixClomuns operation for the key. The instruction that does this is AESIMC xmm1.xmm2
And for the last decryption round, the AESDECLAST Statement is used:
AESDECLAST xmm1, xmm2/m128 State = xmm1 Round Key = xmm2/m128 Tmp = InvShift Rows (State) Tmp = InvSubBytes (Tmp) xmm1= Tmp xor RoundKey
An example of the decryption procedure aesmic xmm2,xmm2; aesdec xmm1, xmm2 ; aesdeclast xmm1, xmm3;
So, hardware support should give us a decent boost to encryption speed. As the end of the post I will give a
class in C ++ that implements encryption and decryption operations in ECB mode. After the test run, the encryption speed on a single i5-3740 (3.2GHz) core was reached, equal to
320MB / secReferences:
- Intel Advanced Encryption Standard (AES) New Instructions Set
- List of processors supporting the expansion of AES-NI commands
- C ++ class with AES-NI assembler inserts
- Animation how AES works
- Wikipedia article with a list of libraries and programs using AES-Ni instructions