I believe many people know that since 2007, the National Institute of Standards and Technology (NIST) has been holding a competition to develop a hash algorithm to replace SHA-1 and the family of algorithms SHA-2. However, this topic is somehow deprived of attention on the site. Actually this led me to you. I bring to your attention a series of articles on hash algorithms. In this cycle, we will study together the basics of hash functions, consider the most famous hash algorithms, plunge into the atmosphere of the SHA-3 competition and consider algorithms that claim to win it, we will definitely test them. If possible, Russian hashing standards will be considered.
About myself
Student of the department of information security.
About hashing
Currently, virtually no cryptographic applications can do without the use of hashing.
Hash functions are functions designed to “compress” an arbitrary message or data set, usually written in a binary alphabet, into some bit combination of fixed length, called convolution. Hash functions have a variety of uses when conducting statistical experiments, testing logical devices, and building fast search algorithms and checking the integrity of records in databases. The main requirement for hash functions is the uniform distribution of their values ​​with a random choice of argument values.
A cryptographic hash function is any hash function that is crypto-resistant, that is, satisfies a number of requirements specific to cryptographic applications. In cryptography, hash functions are used to solve the following tasks:
- building data integrity control systems during their transfer or storage,
- data source authentication.
A hash function is any function
h: X -> Y , easily computable and such that for any message
M the value
h (M) = H (convolution) has a fixed bit length.
X is the set of all messages,
Y is the set of binary vectors of fixed length.
')
As a rule, hash functions are based on the so-called one-step contraction functions
y = f (x 1 , x 2 ) of two variables, where
x 1 ,
x 2 and
y are binary vectors of length
m ,
n and
n, respectively, and
n is the convolution length and
m is the length of the message block.
To get the value of
h (M), the message is first divided into blocks of length
m (if the message is not a multiple of
m, then the last block is complemented to the full in some special way), and then to the received
M 1 , M 2 , .., M blocks
N apply the following sequential procedure to calculate the convolution:
H o = v,
H i = f (M i , H i-1 ), i = 1, .., N,
h (M) = H NHere
v is a constant, often referred to as an initializing vector. She is chosen
for various reasons, and may be a secret constant or a set of random data (sample date and time, for example).
With this approach, the properties of the hash function are completely determined by the properties of the one-step compression function.
There are two important types of cryptographic hash functions - key and keyless. Key hash functions are called message authentication codes. They make it possible, without additional funds, to guarantee both the correctness of the data source and the integrity of the data in systems with users trusting each other.
Keyless hash functions are called error detection codes. They make it possible with the help of additional means (encryption, for example) to guarantee the integrity of the data. These hash functions can be used in systems with both trusting and non-trusting users.
About statistical properties and requirements
As I said before, the main requirement for hash functions is the uniform distribution of their values ​​with a random choice of argument values. For cryptographic hash functions, it is also important that with the slightest change in the argument the value of the function changes dramatically. This is called an avalanche effect.
The key requirements for hashing are as follows:
- the impossibility of fabrication,
- impossibility of modification.
The first requirement means the high complexity of the selection of the message with the correct value of convolution. The second is the high complexity of the selection for a given message with a known convolution value of another message with the correct convolution value.
To keyless functions make demands:
- one-pointedness
- resistance to collisions,
- resistance to finding the second preimage.
Unidirectionality is understood to mean the high complexity of finding a message by a given value of convolution. It should be noted that at the moment there is no used hash functions with proven unidirectionality.
By collision resistance, we understand the difficulty of finding a pair of messages with the same convolution values. Usually, finding a way to construct collisions with cryptoanalytics serves as the first signal of the algorithm’s obsolescence and the need to replace it soon.
Under the resistance to finding the second preimage, we understand the difficulty of finding the second message with the same convolution value for a given message with a known convolution value.
This was the theoretical part, which will be useful to us in the future ...
About popular hash algorithms
Algorithms
CRC16 / 32 - checksum (non-cryptographic transformation).
Algorithms
MD2 / 4/5/6 . They are the work of Ron Rayvest, one of the authors of the RSA algorithm.
The MD5 algorithm was once very popular, but the first prerequisites for hacking appeared in the late nineties, and now its popularity is rapidly falling.
The MD6 algorithm is a very interesting algorithm from a constructive point of view. He was nominated for the SHA-3 competition, but, unfortunately, the authors did not have time to bring him up to standard, and this algorithm is missing from the list of candidates who passed in the second round.
SHA Rule Algorithms Algorithms that are now widespread. There is an active transition from SHA-1 to the standards version of SHA-2. SHA-2 is the collective name of the SHA224, SHA256, SHA384 and SHA512 algorithms. SHA224 and SHA384 are essentially analogous to SHA256 and SHA512, respectively, only after calculating the convolution of the information in it is discarded. It is worth using them only for ensuring compatibility with the equipment of old models.
Russian standard -
GOST 34.11-94 .
In the next article
Overview of MD algorithms (MD4, MD5, MD6).
Literature
A. P. Alferov, Fundamentals of Cryptography.
Bruce Schneier, Applied Cryptography.