Two of my acquaintances, who asked questions about the same in the course of the week (approximately in spirit:
“I heard that MD5 / SHA-1 has already been hacked, why do we still use them?” ), Pushed me to write this note, although the main events described below occurred more than 3 years ago.
General information (for specialists - to pass without a doubt)
As you know, cryptographic hash sums differ from
regular hash sums in that, besides the basic properties required from any hash function:
- the ability to convert an input value (usually text) of arbitrary length into an output value of fixed length,
- statistical uniformity of outliers,
- good “scatteredness” (about half the bit difference) of output values even for insignificantly (possibly only 1 bit) differing input texts;
Additional requirements are imposed on
cryptographic hash algorithms :
- The digit capacity of the output values should be far beyond the brute force on modern technology, both in processing speed and in storage (in practice, this is 128, 160, 256 or more bits);
- there should not be a way (significantly more efficient than a complete enumeration of input values) to calculate any pair of input texts that give the output the same hash value (no matter what) - a successful attack on this requirement is called a "collision" of the hash function;
- There should not be a way (significantly more efficient than a complete enumeration of input values) by the value of the hash function to pick up any input text giving the output of the algorithm this hash value — a successful attack on this requirement is called “inversion” of the hash function.
The three requirements for crypto-resistant hash functions listed above transform them in fact into fixed-length identifiers for texts, files, and generally arbitrary blocks of information. In addition, these identifiers:
- unique throughout the world of the modern information society;
- irreversible, that is, do not reveal absolutely nothing about the content of the original document.
This allows using hash functions in tasks:
- checking the integrity of files, archives, assemblies, etc. (it is enough to transfer the hash amount of the file to the recipient with confidence and then any unauthorized changes in the file will immediately change its control hash amount);
- storing passwords in an irreversible form (not the passwords themselves are stored in the repository, but the hash sums from the string, for example, “constant (salt)” + “password” type, which allows to keep the vocabulary password phrase secret when disclosing the hash);
- digital signatures of documents (the EDS itself is always installed not on the file itself, which, firstly, would be very slow, and secondly, it would carry some security problems with the signature key itself, but its hash sum, providing exactly same level of protection against modifications).
To date, the overwhelming share of applications of hash functions is “assumed” by the algorithms MD5, SHA-1, SHA-256, and in our country also GOST R 34.11-94, which is de facto almost a monopolist in FAPSI / FSB-certified crypto products. Of course, there are many other lesser known, or common only in narrow communities of algorithms (for example, RIPEMD, TIGER, PANAMA, etc.)
What happened?
For quite a long time (from the mid-1990s), the outgoing generation of cryptographic functions was considered very, very persistent. And although both MD5 and SHA-1 were based on ideas embodied in the MD4 algorithm, which was cracked in the early 1990s, improvements made at the design stage (namely, an increase in the number of passes (rounds) when calculating the hash value and more thorough elaboration of the applied mathematical operations) led to the fact that until 2004 the family was considered invulnerable to any kind of attacks.
Thunder out of the blue sky sounded on August 16, 2004, when a group of Chinese researchers (under the direction of X.Wang) published on an open server of scientific publications
eprint.iacr.org without any explanation of a pair of real
collisions for MD4, MD5, RIPEMD and HAVAL with a single remark about that the selection of these input texts took them
1 hour 5 minutes . It was a blast. The strongest cryptographers again turned their attention to the MD5 / SHA-1 family and, of course, to the examples given in the breakthrough article. Certain regularities were clarified, and soon a methodical device for building collisions to similar ciphers was built, which was improved throughout 2004–2006 and reached a speed of
less than a second on an ordinary personal computer today to generate a pair of texts that create a collision. Today, indisputably hacked
(but only in a collision class attack!) MD4, MD5, RIPEMD, HAVAL, SHA-0.
')
With SHA-1, the situation is somewhat different. The structure of mathematical transformations, which is somewhat more thoughtful than that of MD5 and RIPEMD-160, limits the best known attack implementations in open circles (proposed, by the way, by the same Chinese) to the computational complexity of 2 ^ 63 operations. This is objectively a lot. Even taking into account the exponential growth of productivity of computing. It is unlikely that the selection of at least one collision will be possible with one computer in the next twenty years. However, do not forget that there are distributed projects. Healthy ambitions (getting the first artificial collision for SHA-1, by the way officially being the standard for crypto-hashing in the USA) pushed researchers from the
University of Graz (Austria) to launch a distributed search on the server
boinc.iaik.tugraz.at , optimistic forecasts expect the first SHA-1 collision in next
5 years .
And finally, the question that is always touched upon when discussing this technique is
whether GOST R 34.11-94 is
vulnerable to this attack ? The answer is
no . Domestic cryptostandard uses a different scheme of “mixing” blocks of the source text during hashing, therefore, to date, the described methods for the GOST have not been applied. The best known attack (which triggered some agiotage headlines in the media in 2008) was proposed by researchers from the same Graz and improves the collision selection process 2 ^ 23 times. It is based on a different principle and has the computational complexity of
2 ^ 105 operations , which is incredibly far from reality, even for distributed projects and supercomputers.
Effects
What do we have in the bottom line?
The MD4, MD5, SHA-1, RIPEMD, HAVAL cryptographic algorithms are clearly compromised with respect to
collision generation attacks. However (!) Fortunately,
not a single physically feasible attack on the hash function inversion (even for MD4) has been published to date .
What does it threaten with?
The attacker has the ability to create two different documents that will have the same hash value (while he does not have the ability to "control" what value it will turn out). This means that it cannot
create a “twin” to an existing foreign document / text / file. He controls the situation only when he can design the first and second documents himself.
Can an intruder ...?
- ... create another file / archive / etc., Which will have the same hash sum as someone else's original one, in order to later replace it with the original one, for example, on a WEB server or in a source / binary repository? Not. However, if he is the author of the original file, he can create two versions of the file with the same hash sum: harmless and malicious , give the public study harmless, and then at the distribution stage, imperceptibly change the version to malicious - check based only on the hash function of this will not notice.
- ... recover the client's password by finding out its hash in one way or another? Definitely not.
- ... modify the previously created file (yours or someone else's), protected by EDS? Not. However, as in the first case, he can initially create two versions of the source file , sign one of them with his EDS or give someone’s signature for signature (for example, when a document is deposited by a third party or notarized when the document was created), and then after receiving the signed document the unit responsible for the signature on the second version of the document and the EDS will be correct , which of course is unacceptable. To date, cryptanalysts have already demonstrated examples of collisions of payment orders, differing only in the amount of payment, as well as collisions of X.509 certificates with different public keys, however, the same hash sums, and therefore the same certifying signatures of certification authorities. All this gives unfortunately a large enough space for fantasy.
What to use today?
- Of the hash algorithms widely known for quite a long time, there are currently no complaints about GOST R 34.11-94 (developed by FAPSI, 1995) and TIGER (developers are world-famous cryptographers E. Biham and R. Anderson, 1995).
- The new generation of SHA algorithms from the American Institute of Standardization NIST (2001-2002) has a new numbering SHA-224, SHA-256, SHA-384, SHA-512 (from the bitness of output values) and is sometimes combined under the general code name SHA-2. The algorithms have a different structure of cryptotransformations and so far no information about attacks on them has been received by the first or second kind.
- Not satisfied with this, NIST launched in 2007 an open competition for the third-generation hash standard SHA-3 (exactly repeating, in its principles, how the AES block encryption standard was chosen in 2000). Currently, only the first phase of the competition is underway, 51 algorithms from all over the world are allowed to participate in it.
- And finally, in principle, no one forbids you to use to check two or more vulnerable functions at the same time . Creating a pair of documents that would simultaneously give the same MD5 sums and the same SHA-1 sums are impossible for today.