⬆️ ⬇️

The implementation of the block cipher "Grasshopper" with CFB mode in C ++

Today we will talk about the new Grasshopper block encryption algorithm from the GOST R 34.12 2015 standard. Recently there have been many publications devoted to this standard. From a theoretical point of view, the described algorithm is described in them, the features of hotel transformations are studied, and optimization methods are proposed by including insertions of the code in assembly language.



In this article, I propose to the reader to become familiar with the implementation of this block cipher in C ++. It should be noted that when writing this program, the goal was not to achieve the greatest efficiency, and the main task was to show how the algorithm works. Read the description of the algorithm in the official documentation .



Program structure



The program consists of three parts.



Used notation


#define BLOCK_LENGTH 16 typedef unsigned char BYTE; typedef unsigned short WORD; 


Auxiliary means



Class byteblock


Interface byteblock
 class ByteBlock { BYTE * pBlocks; size_t amount_of_bytes; public: // Construct block of bytes which contsists of // size_ blocks each of them with init_value in it ByteBlock(size_t size_ = 0, BYTE init_value = 0); // Construct block with size_ first bytes of pBlocks_ // The value will be copied, source stays untouchable ByteBlock(BYTE * pBlocks_, size_t size_); // Move constructor // Copy constructor thus implicitly deleted // Object to move turn to null ByteBlock(ByteBlock && rhs); // Destructor, yeah ~ByteBlock(); // Move assigment operator // Object to move turn to null void operator = (ByteBlock && rhs); // This cast may be convenient to use the ByteBlock // in functions which takes raw pointers as argument BYTE * byte_ptr(); const BYTE * byte_ptr() const; // Indexing operator with evident functionality BYTE & operator [] (size_t index); BYTE operator [] (size_t index) const; bool operator == (const ByteBlock & lhs) const; bool operator != (const ByteBlock & lhs) const; // Replace body of the current block with pBlocks_ // Old value will be zerod, and then, deleted // New value copied into the block, // source stays untouchable void reset(const BYTE * pBlocks_, size_t size_); // Return amount of bytes in block size_t size() const; // It'll return deep copy of the block, which // points to different place in memory ByteBlock deep_copy() const; // It'll return slice of current ByteBlock ByteBlock operator () (size_t begin, size_t length) const; // Changes values between two ByteBlock-s friend void swap(ByteBlock & lhs, ByteBlock & rhs); }; 


Objects of this class (hereinafter referred to as “messages”) are commonly used in the program for storing information in binary form. The interface was designed in such a way that the following tasks were effectively solved:

')



Uniqueness is ensured by the fact that the copy constructor and the copy assignment operator are prohibited (implicitly), since similar moving functions are described.



Objects of the ByteBlock class always own the memory pointed to. Even when an object is initialized with a certain section of memory, the designer copies it to a new location, and the object works with a copy of the original information. In a sense, this class is similar to the smart pointer from STL - std :: unique_ptr.

Memory reset is provided by the memset function. It is worth noting that when building this program, it is not necessary to specify optimization options, since some compilers tend to ignore the memset function call, knowing that the memory will not be used later, and will soon be deleted.



Translation of hex strings to ByteBlock and back


 std::string hex_representation(const ByteBlock & bb); ByteBlock hex_to_bytes(const std::string & s); 


These functions convert the input data from hexadecimal to their byte representation, which is necessary for the further work of the program.



Algorithm "Grasshopper"



Kuznyechik interface
 class Kuznyechik { std::vector<ByteBlock> keys; static bool is_init; public: static const int block_lenght { BLOCK_LENGTH }; Kuznyechik(const ByteBlock & key); Kuznyechik(const Kuznyechik & rhs); ~Kuznyechik(); void encrypt(const ByteBlock & src, ByteBlock & dst) const; void decrypt(const ByteBlock & src, ByteBlock & dst) const; }; 


The keys field is an iterative key that is calculated once during the initialization of the object using the specified key. The is_init field is a flag that indicates whether Kuznyechik type objects have ever been created. This flag is necessary due to the fact that at the time of launch of the program many coefficients and parameters of the algorithm are missing. At the first initialization, they are calculated and stored in memory until the program is completed.



Taking into account the above, it is necessary to comment on the presence of the copy constructor, whereas inside the class there are ByteBlock objects that do not have this constructor. The fact is that when copying, elementwise deep copying of iteration keys from the keys array takes place.



Global variables and their initialization


Used global variables
 const vector<BYTE> nonlinear_transform_perm = { 252, 238, 221, 17, 207, 110, 49, 22, 251, 196, 250, 218, 35, 197, 4, 77, 233, 119, 240, 219, 147, 46, 153, 186, 23, 54, 241, 187, 20, 205, 95, 193, 249, 24, 101, 90, 226, 92, 239, 33, 129, 28, 60, 66, 139, 1, 142, 79, 5, 132, 2, 174, 227, 106, 143, 160, 6, 11, 237, 152, 127, 212, 211, 31, 235, 52, 44, 81, 234, 200, 72, 171, 242, 42, 104, 162, 253, 58, 206, 204, 181, 112, 14, 86, 8, 12, 118, 18, 191, 114, 19, 71, 156, 183, 93, 135, 21, 161, 150, 41, 16, 123, 154, 199, 243, 145, 120, 111, 157, 158, 178, 177, 50, 117, 25, 61, 255, 53, 138, 126, 109, 84, 198, 128, 195, 189, 13, 87, 223, 245, 36, 169, 62, 168, 67, 201, 215, 121, 214, 246, 124, 34, 185, 3, 224, 15, 236, 222, 122, 148, 176, 188, 220, 232, 40, 80, 78, 51, 10, 74, 167, 151, 96, 115, 30, 0, 98, 68, 26, 184, 56, 130, 100, 159, 38, 65, 173, 69, 70, 146, 39, 94, 85, 47, 140, 163, 165, 125, 105, 213, 149, 59, 7, 88, 179, 64, 134, 172, 29, 247, 48, 55, 107, 228, 136, 217, 231, 137, 225, 27, 131, 73, 76, 63, 248, 254, 141, 83, 170, 144, 202, 216, 133, 97, 32, 113, 103, 164, 45, 43, 9, 91, 203, 155, 37, 208, 190, 229, 108, 82, 89, 166, 116, 210, 230, 244, 180, 192, 209, 102, 175, 194, 57, 75, 99, 182 }; const map<BYTE, BYTE> direct_permutation, inverse_permutation; const vector<WORD> linear_transform_coeff = { 148, 32, 133, 16, 194, 192, 1, 251, 1, 192, 194, 16, 133, 32, 148, 1 }; const WORD linear_transform_modulus = 0x1C3; const vector<ByteBlock> iteration_constants; 


It is here that we see those variables that, when launched, do not store the values ​​necessary for the operation of the algorithm. These are direct_permutation, inverse_permutation, which are used for non-linear transformation, and iteration_constants, which are used to expand the key. They are filled in as follows:



Global variable initialization
 void init_perms() { map<BYTE, BYTE> *p_direct, *p_inverse; p_direct = const_cast< map<BYTE, BYTE> * >(&direct_permutation); p_inverse = const_cast< map<BYTE, BYTE> * >(&inverse_permutation); for(int i = 0; i < nonlinear_transform_perm.size(); i++) { (*p_direct)[i] = nonlinear_transform_perm[i]; (*p_inverse)[nonlinear_transform_perm[i]] = i; } } void init_consts() { vector<ByteBlock> * p = const_cast< vector<ByteBlock> * >(&iteration_constants); ByteBlock v128; for(BYTE i = 1; i <= 32; i++) { v128 = ByteBlock(BLOCK_LENGTH, 0); v128[BLOCK_LENGTH - 1] = i; iteration_linear_transform_direct128(v128.byte_ptr()); p->push_back(std::move(v128)); } } 


Implementation of the transformations used in the algorithm


All conversions work with ordinary pointers: the conversion from ByteBlock to BYTE * does not require additional costs, and the check that the size of the allocated memory corresponds to the cipher parameters has been performed at a higher level.



Nonlinear direct conversion
 void nonlinear_transform_direct128(BYTE * target) { BYTE * p_end = target + BLOCK_LENGTH; while(target != p_end) { *target = direct_permutation.at(*target); target++; } } 


A nonlinear transformation is nothing more than an ordinary permutation.



Linear direct transform
 void iteration_linear_transform_direct128(BYTE * target) { for(int i = 0; i < 16; i++) linear_transform_direct128(target); } void linear_transform_direct128(BYTE * target) { BYTE buffer = linear_transform_core128(target); for(int i = BLOCK_LENGTH - 1; i > 0; i--) target[i] = target[i-1]; *target = buffer; } BYTE linear_transform_core128(const BYTE * target) { WORD result = 0; for(int i = 0; i < BLOCK_LENGTH; i++) result ^= multiply(target[i], linear_transform_coeff[i]); return result; } WORD multiply(WORD lhs, WORD rhs) { WORD result = 0, modulus = linear_transform_modulus << 7; for(WORD detecter = 0x1; detecter != 0x100; detecter <<= 1, lhs <<= 1) if(rhs & detecter) result ^= lhs; for(WORD detecter = 0x8000; detecter != 0x80; detecter >>= 1, modulus >>= 1) if(result & detecter) result ^= modulus; return result; } 


It will be interesting to dwell on the function multiply. Its peculiarity lies in the fact that when performing a linear transformation, all calculations are carried out in the factor ring GL (2) [x] / p (x), where p (x) = x ^ 8 + x ^ 7 + x ^ 6 + x + 1. In the first cycle, the polynomials multiplied by their coefficients from GL (2) are multiplied. In the second cycle, the value modulo p (x) is calculated step by step.



Deploy iteration keys


Functions for generating iterative keys
 void keys_transform128(BYTE * k1, BYTE * k2, int iconst) { BYTE buffer[BLOCK_LENGTH]; memcpy(buffer, k1, BLOCK_LENGTH); xor128(k1, k1, iteration_constants[iconst].byte_ptr()); nonlinear_transform_direct128(k1); iteration_linear_transform_direct128(k1); xor128(k1, k2, k1); memcpy(k2, buffer, BLOCK_LENGTH); } void key_derivation128(BYTE * k1, BYTE * k2, BYTE * k3, BYTE * k4, int ipair) { if(k1 != k3) memcpy(k3, k1, BLOCK_LENGTH); if(k2 != k4) memcpy(k4, k2, BLOCK_LENGTH); for(int i = 0; i < 8; i++) keys_transform128(k3, k4, ipair * 8 + i); } 


And finally, the encryption algorithm in our notation will be as follows:



 void encrypt128(BYTE * target, const vector<ByteBlock> & keys) { xor128(target, target, keys[0].byte_ptr()); for(int i = 1; i < 10; i++) { nonlinear_transform_direct128(target); iteration_linear_transform_direct128(target); xor128(target, target, keys[i].byte_ptr()); } } 


Here are only options for functions acting in the "direct" direction. In other words, performing encryption. Decryption functions are implemented in exactly the same way.



CFB Encryption Mode



Interface CFB_Mode
 template <typename CipherType> class CFB_Mode { const CipherType algorithm; const ByteBlock iv; void decrypt_with_iv(const ByteBlock & src, ByteBlock & dst, const ByteBlock & iv_) const; public: CFB_Mode(const CipherType & alg, const ByteBlock & init_vec); void encrypt(const ByteBlock & src, ByteBlock & dst) const; void decrypt(const ByteBlock & src, ByteBlock & dst) const; void parallel_decrypt(const ByteBlock & src, ByteBlock & dst) const; }; 


The choice of block encryption mode was random. By analogy, other modes are easily written. Those familiar with the CryptoPP library may have a sense of deja vu, and this will be justified. After all, this approach to the implementation of the interaction of the block cipher and encryption mode used in this library.



In order to be able to use the block cipher in tandem with this template class, the class that implements it must meet the following requirements:





Obviously, our class Kuznyechik meets these requirements.



Encryption and decryption function


In these algorithms, auxiliary functions are used, which break all messages into blocks, multiples of the length at which a particular block cipher works, merge them back, as well as element-wise xor.



 std::vector<ByteBlock> split_blocks(const ByteBlock & src, size_t length); ByteBlock join_blocks(const std::vector<ByteBlock> & blocks); void xor_blocks(ByteBlock & to_assign, const ByteBlock & lhs, const ByteBlock & rhs); 


Encryption function
 template <typename CipherType> void CFB_Mode<CipherType>::encrypt(const ByteBlock & src, ByteBlock & dst) const { auto blocks = split_blocks(src, CipherType::block_lenght); ByteBlock tmp; algorithm.encrypt(iv, tmp); xor_blocks(tmp, tmp, blocks[0]); blocks[0] = std::move(tmp); for(int i = 1; i < blocks.size(); i++) { algorithm.encrypt(blocks[i-1], tmp); xor_blocks(tmp, tmp, blocks[i]); blocks[i] = std::move(tmp); } dst = join_blocks(blocks); } 


Decryption function
 template <typename CipherType> void CFB_Mode<CipherType>::decrypt(const ByteBlock & src, ByteBlock & dst) const { decrypt_with_iv(src, dst, iv); } template <typename CipherType> void CFB_Mode<CipherType>::decrypt_with_iv(const ByteBlock & src, ByteBlock & dst, const ByteBlock & iv_) const { auto blocks = split_blocks(src, CipherType::block_lenght); ByteBlock tmp; algorithm.encrypt(iv_, tmp); xor_blocks(tmp, blocks[0], tmp); swap(tmp, blocks[0]); for(int i = 1; i < blocks.size(); i++) { algorithm.encrypt(tmp, tmp); xor_blocks(tmp, blocks[i], tmp); swap(tmp, blocks[i]); } dst = join_blocks(blocks); } 


The decision to split the decryption function into multiple components seems redundant. This would be the case if the ciphertext feedback encryption mode did not support parallelism for this procedure. Next will be considered a variant of the decryption algorithm using the std :: threads thread from the C ++ 11 standard.



Concurrency decryption function
 template <typename CipherType> void CFB_Mode<CipherType>::parallel_decrypt(const ByteBlock & src, ByteBlock & dst) const { // length in blocks of CipherType::block_lenght unsigned long const length = src.size() / CipherType::block_lenght + (src.size() % CipherType::block_lenght ? 1 : 0); // amount of threads which can perform really simultaniously unsigned long const hardware_threads = std::thread::hardware_concurrency(); // blocks of size CipherType::block_lenght to perform on by one thread unsigned long const min_per_thread = 1; // amount of threads to satisfy current condition unsigned long const max_threads = (length + min_per_thread - 1) / min_per_thread; // amount of threads to create unsigned long const num_threads = std::min( hardware_threads != 0 ? hardware_threads : 2, max_threads ); // if we aren't able to use multiple threads call common decryptor if(num_threads <= 1) { decrypt(src, dst); return; } std::cerr << "Running " << num_threads << " threads." << endl; unsigned long const block_size = (length / num_threads) * CipherType::block_lenght; std::vector<ByteBlock> init_vectors(num_threads); std::vector<ByteBlock> results(num_threads); std::vector<std::thread> threads(num_threads - 1); init_vectors[0] = iv.deep_copy(); for(int i = 1; i < num_threads; i++) init_vectors[i] = src(i * block_size - CipherType::block_lenght, CipherType::block_lenght); unsigned long start_pos = 0; for(unsigned long i = 0; i < num_threads - 1; i++) { threads[i] = std::thread( &CFB_Mode<CipherType>::decrypt_with_iv, this, src(start_pos, block_size), std::ref( results[i] ), std::ref( init_vectors[i] ) ); start_pos += block_size; } decrypt_with_iv( src(start_pos, src.size() - start_pos), results[num_threads - 1], init_vectors[num_threads - 1] ); for(auto & t : threads) t.join(); dst = join_blocks(results); } 




An example of a program that encrypts a message



 #include "mycrypto.hpp" #include "Kuznyechik.hpp" int main() { ByteBlock key = hex_representation( "8899aabbccddeeff0011223344556677fedcba98765432100123456789abcdef" ); ByteBlock iv = hex_representation("abcdef12345600dacdef94756eeabefa"); ByteBlock msg = hex_representation("1122334455667700ffeeddccbbaa9988"); ByteBlock result; CFB_Mode<Kuznyechik> encryptor(Kuznyechik(key), iv); encryptor.encrypt(msg, result); return 0; } 


The repository of the project is located here .

Source: https://habr.com/ru/post/313932/



All Articles