📜 ⬆️ ⬇️

Secure crypto programming. Part 1

In this post, we would like to acquaint Habr's users with the basic rules of programming cryptographic algorithms. This set of rules, called the “Cryptography coding standard”, was created in 2013 at the initiative of one of the modern cryptography gurus Jean-Philippe Omasson . Despite the fact that the approaches described in it are well known to those who are professionally engaged in the development of protection systems, beginners and students, we think, it will be interesting to get acquainted with the proposed text, which is a translation of the rule set from cryptocoding.net .

1. Compare the lines of secret data for a constant time.


Problem


A byte string comparison can be used to implement attacks that use information about the execution time of a program (so-called timing attacks), for example, to fake MAC message authentication codes (see Google Keyczar vulnerability in the crypto library ).

Built-in implementations of comparison functionality, such as the memcmp function, the Arrays.equals method (Java), or the == (Python) operator, are usually not executed in constant time. Consider, for example, the implementation of [memcmp] from Microsoft CRT :

EXTERN_C int __cdecl memcmp(const void *Ptr1, const void *Ptr2, size_t Count) { INT v = 0; BYTE *p1 = (BYTE *)Ptr1; BYTE *p2 = (BYTE *)Ptr2; while(Count-- > 0 && v == 0) { v = *(p1++) - *(p2++); } return v; } 

Decision


')
Use comparison functions performed in a fixed time, such as the one offered in the NaCL library:

 int crypto_verify(const unsigned char *x,const unsigned char *y) { unsigned int differentbits = 0; #define F(i) differentbits |= x[i] ^ y[i]; F(0) F(1) F(2) F(3) F(4) F(5) F(6) F(7) F(8) F(9) F(10) F(11) F(12) F(13) F(14) F(15) return (1 & ((differentbits - 1) >> 8)) - 1; /* returns 0 if equal, 0xFF..FF otherwise */ } 

A more general version of this approach is as follows:

 int util_cmp_const(const void * a, const void *b, const size_t size) { const unsigned char *_a = (const unsigned char *) a; const unsigned char *_b = (const unsigned char *) b; unsigned char result = 0; size_t i; for (i = 0; i < size; i++) { result |= _a[i] ^ _b[i]; } return result; /* returns 0 if equal, nonzero otherwise */ } 

Consider examples of the implementation of the comparison functions and tests for 32-bit values ​​performed in a fixed time:

 #include <stdint.h> /*    */ /*  1   , 0   */ int ct_isnonzero_u32(uint32_t x) { return (x|-x)>>31; } int ct_iszero_u32(uint32_t x) { return 1 ^ ct_isnonzero_u32(x); } int ct_neq_u32(uint32_t x, uint32_t y) { return ((xy)|(yx))>>31; } int ct_eq_u32(uint32_t x, uint32_t y) { return 1 ^ ct_neq_u32(x, y); } int ct_lt_u32(uint32_t x, uint32_t y) { return (x^((x^y)|((xy)^y)))>>31; } int ct_gt_u32(uint32_t x, uint32_t y) { return ct_lt_u32(y, x); } int ct_le_u32(uint32_t x, uint32_t y) { return 1 ^ ct_gt_u32(x, y); } int ct_ge_u32(uint32_t x, uint32_t y) { return 1 ^ ct_lt_u32(x, y); } /*    */ /*  1   , 0   */ int ct_isnonzero_s32(uint32_t x) { return (x|-x)>>31; } int ct_iszero_s32(uint32_t x) { return 1 ^ ct_isnonzero_s32(x); } int ct_neq_s32(uint32_t x, uint32_t y) { return ((xy)|(yx))>>31; } int ct_eq_s32(uint32_t x, uint32_t y) { return 1 ^ ct_neq_s32(x, y); } int ct_lt_s32(uint32_t x, uint32_t y) { return (x^((x^(xy))&(y^(xy))))>>31; } int ct_gt_s32(uint32_t x, uint32_t y) { return ct_lt_s32(y, x); } int ct_le_s32(uint32_t x, uint32_t y) { return 1 ^ ct_gt_s32(x, y); } int ct_ge_s32(uint32_t x, uint32_t y) { return 1 ^ ct_lt_s32(x, y); } /* Generate a mask: 0xFFFFFFFF if bit != 0, 0 otherwise */ uint32_t ct_mask_u32(uint32_t bit) { return -(uint32_t)ct_isnonzero_u32(bit); } /*  x  y      bit*/ /* : return bit ? x : y */ uint32_t ct_select_u32(uint32_t x, uint32_t y, uint32_t bit) { uint32_t m = ct_mask_u32(bit); return (x&m) | (y&~m); /* return ((x^y)&m)^y; */ } 

These approaches offer no guarantees: C and C ++ use the “as if” rule, which allows the compiler to implement operations in an arbitrary manner if the observed behavior of the program (execution time is not considered to be observed in both languages) remains unchanged. For example, consider the following ct_select_u32 variant:

 uint32_t ct_select_u32(uint32_t x, uint32_t y, _Bool bit) { uint32_t m = ct_mask_u32(bit); return (x&m) | (y&~m); } 

If we now compile this code with the parameters clang-3.5 -O2 -m32 -march = i386, we get the result:

 ct_select_u32: mov al, byte ptr [esp + 12] test al, al jne .LBB0_1 lea eax, dword ptr [esp + 8] mov eax, dword ptr [eax] ret .LBB0_1: lea eax, dword ptr [esp + 4] mov eax, dword ptr [eax] ret 

Due to the uneven speed of execution of conditional jump instructions, this code can potentially reveal the selected secret value. Since the compiler has almost unlimited freedom in generating code that can be executed in different times, it is necessary to check the final assembly for execution in a fixed time.

Avoid branching the program depending on secret data.


Problem


If the conditional branch of the program (if, switch, while, for) depends on the secret data, then the executable code, as well as the time it is executed, depends on the secret data.

A classic example of the use of this vulnerability is timing attacks on exponentiation algorithms based on squaring and multiplication (or doubling and addition for algorithms that use elliptic curves).

A particular case of this problem is the cycles in which the limit value of the counter depends on the secret value.

Decision


Leaks in program execution time can be counteracted by inserting dummy operations into program branches in order to guarantee a fixed time for its execution. However, it is much more reliable to completely avoid branching, for example, by implementing conditional statements in the form of a straight-line program. For example, the choice of two inputs | a | and | b | depending on the control bit | bit | can be implemented as follows:

 unsigned select (unsigned a, unsigned b, unsigned bit) { /* -0 = 0, -1 = 0xff....ff */ unsigned mask = - bit; unsigned ret = mask & (a^b); ret = ret ^ a; return ret; } 

For Intel processors, a faster solution is possible, which is based on using CMOV conditional branch instructions .

Avoid referring to a table with addressing depending on secret data.


Problem


The time to access a table element may vary depending on the value of its index (for example, if the element is not in the cache). This effect has been used in a number of cache attacks on AES.

Decision


Replace table references with a sequence of logical operations with constant execution time, for example, using the bit-slice technique.

Avoid cycles in which the limit value of the counter depends on the secret value.


Problem


A loop with a counter value of a counter that depends on a secret value directly makes the program vulnerable to execution time attacks. For example, the implementation of the Montgomery ladder (exponentiation algorithm) in OpenSSL 0.9.8o allowed for the leakage of logarithm (secret random number) information in the ECDSA algorithm, which could be used to obtain the server's TLS secret key. The corresponding program code is presented below. Here | scalar | - secret random number, and | scalar-> d | - pointer to array of its bits:

 /* find top most bit and go one past it */ i = scalar -> top - 1; j = BN_BITS2 - 1; mask = BN_TBIT ; while (!( scalar -> d[i] & mask )) { mask >>= 1; j --; } mask >>= 1; j - -; /* if top most bit was at word break , go to next word */ if (! mask ) { i - -; j = BN_BITS2 - 1; mask = BN_TBIT ; } for (; i >= 0; i - -) { for (; j >= 0; j - -) { if ( scalar ->d[ i] & mask ) { if (! gf2m_Madd ( group , & point ->X , x1 , z1 , x2 , z2 , ctx )) goto err ; if (! gf2m_Mdouble ( group , x2 , z2 , ctx )) goto err ; } else { if (! gf2m_Madd ( group , & point ->X , x2 , z2 , x1 , z1 , ctx )) goto err ; if (! gf2m_Mdouble ( group , x1 , z1 , ctx )) goto err ; } mask >>= 1; } j = BN_BITS2 - 1; mask = BN_TBIT; } 

Decision


Make sure that the number of iterations in all cycles is limited to a constant (or at least some non-secret variable).

In particular, make sure, as far as possible, that the boundaries of cycles and situations in which the counter values ​​exceed these boundaries or do not reach them, do not depend on user-controlled input data (does anyone really need their own Heartbleed ?).

To be continued…

Source: https://habr.com/ru/post/268113/


All Articles