📜 ⬆️ ⬇️

Non-cryptographic hash functions and DoS attack on them

Non-cryptographic hash functions are used where speed is important and the possibility of attacking the characteristics of a function is not so important. Recently, an attack on the algorithmic complexity of hash tables by creating multiple collisions of a hash function that can lead to DoS has been actively discussed. We will consider modern non-cryptographic hash functions, conditions for their use, possible methods of protection against attacks on hash tables, and why it turned out that this is not so easy to fix.

Non-cryptographic hash functions


If everyone has a lot of cryptographic hash functions, then little is known about non-cryptographic (general-purpose hash functions). Non-cryptographic functions are used where data is not affected by third parties (the attacker). For example, such functions can be used to build hash tables.

Criteria that are important for non-cryptographic hash functions:


Since usually the input value space is larger than the output value space, collisions are inevitable. In an ideal function, the output values ​​have a uniform distribution, so the probability of collisions is minimal. This can be checked chi-square test (χ²).
')
FNV-1a

2003 Speed: 4.5 cycles per byte.
Very simple function [1] (RFC [5]).
1. Uniform distribution: poor, up to 14 bits can be used for a hash table (up to 2 14 hash table slots). Slightly better in the FNV-1a variation, which is recommended by default.
2. Avalanche test: bad. Also bad in FNV-1a.
Both criteria are significantly improved in modified FNV-1a [2], where at the end mixing is added to FNV-1a.

Bob Jenkins' Hash (lookup3)

2006, length 32 and 64 bits, 32-bit instructions, speed: 2.62 cycles per byte (for messages 16 bytes long).
1. Uniform distribution: excellent for all bits.
2. Avalanche test: good. There is a small problem with the upper bits.
The first version of the function was called One-at-a-Time.

CRC32

1975, length: 32, 64 and 128 bits, speed: 1.2–0.13 cycles per byte (for messages shorter than 64 bits, depending on the CPU).
1. Uniform distribution: poor distribution of lower bits. Use the upper bits for the hash table, then all tested bits can be used - up to 16 upper bits (up to 2 16 hash table slots).
2. Avalanche test: very bad. Avalanche effect is absent. The purpose of the hash is affected: its output bits are designed to detect errors in the data transmission channel. Also, the CRC has an important property: you can split the message into parts, count the hash for each part, then merge the message, and count the hash of the entire message from the hashes of its parts.

MurmurHash 2

2008, length 32 and 64 bits, 32 and 64-bit instructions.
There is a problem with the uniform distribution of some keys [3].

MurmurHash 3

2011, length 32 and 128 bits, 32 and 64-bit instructions, speed: 1.87 cycles per byte (for messages 16 bytes long).
1. Uniform distribution: good (in one case out of three “almost suspicious”).
2. Avalanche test: excellent.



CityHash

2011, length 32, 64, 128 and 256 bits, 64-bit instructions. Rather difficult function.
For the fastest option, you need a CRC32 processor instruction (SSE 4.2 - Intel Nehalem).
1. Uniform distribution: good (in one case out of three “almost suspicious”).
2. Avalanche test: good. There is a small problem with the lower bits.

Spookyhash

2011, length 32, 64 and 128 bits, speed: 0.33 cycles per byte. By Bob Jenkins
1. Uniform distribution: average (in one case out of three “suspicious”).
2. Avalanche test: excellent.

Statistical analysis of hash functions can be done using the tools [17], using the results of the analysis [2], [4], [18].

CityHash is more complex and has more bandwidth (speed on long messages) than MurmurHash. MurmurHash is simpler, has a low latency for short messages (<20 bytes).

CityHash uses CRC instructions for new processors, providing up to 0.17 cycles per byte, which is faster than all other hashes; for comparison, SHA-1 - 4.6 cycles per byte (Ivy Bridge). Without the support of these instructions, CityHash is two times slower than SpookyHash (according to the author of the latter).

For the 32-bit platform, MurmurHash 3 can be faster than SpookyHash and CityHash.

Perfect hash function

If the array of messages is known in advance, then an ideal hash function can be used: no collisions and a search in one comparison operation. Example: gperf.

If the hash function has a normal distribution, then the probability of a collision is as follows:




How real are the collisions? For 32-bit hashes, they are met regularly. For example, CRC32 coincides with “codding” and “gnu”, “exhibitors” and “schlager”, “curl” and “natural”, etc. This is not the fault of the statistical features of the hash function, but a consequence of the limited size of the hash - only 4 bytes.

All listed functions, except for CRC32 and FNV-1, have good statistics. For a 32-bit system, MurmurHash may be the fastest. If the system cannot do unaligned memory read, for example, ARM or SPARC, then MurmurHash may also work faster than anyone. CityHash speed is highly dependent on the compiler, since a lot of complex code is used, it can be either more or less than the rest.

Recommendations

You can use MurmurHash 3, CityHash, SpookyHash V2, gperf as a general-purpose hash function.

DoS attack hash table


You can imagine a DoS attack, when an attacker feeds data into a hash table so that they all fall into one slot, and you get the worst speed for finding one element - O (n), inserting each new collision - O (n²), the memory size increases ( hash-flooding). To launch an attack, you need to be able to place arbitrary data in a hash table, and find hash function collisions.
The attack is also called hash DoS or, more generally, an attack on algorithmic complexity attack.

Collisions can be created in several ways:


We list the properties of cryptographic hash functions (not applicable to general-purpose hash functions):


Sometimes the last two criteria are called resistance to collisions of the first and second kind, respectively. We will call the collision only the case when neither the message nor the hash was originally specified.

Usually the “easiest” attack is the search for collisions. Due to the “birthdays” paradox, the complexity of this attack is at least two times less than the length of the hash. This attack exists for SHA-1: with a theoretical collision complexity O (2 80 ), the attack reduces the difficulty to O (2 61 ). There is no recovery attack of the first or second preimage for SHA-1.

To increase collision resistance, you can:


A MAC based on a cryptographic function has the following properties:


If we decide to use a cryptographic hash function (for example, SHA-3) or a MAC based on it, we will encounter the following disadvantages:


A universal hash function besides a message also receives a key, using which selects a hash function from a finite family of functions. With a key known to the attacker, the function must be resistant to the restoration of the second prototype, but there is no requirement for collision resistance. If the key is unknown to the attacker, then the function will be resistant to collisions. Considering our speed requirements, it is advisable to do without cryptographic functions.

To complicate the search for collisions, they sometimes try to build a MAC based on general-purpose hash functions, using seed as the key. But the resulting “MAC” will not have the required properties, since it is not built on the basis of a cryptographic function. The following attacks are possible:


Therefore, although this approach may make life more difficult for the casual attacker, it often does not withstand the simplest cryptanalysis (oCERT [8]). It has been shown that most general-purpose hash functions are not resistant to cryptographic attacks:


Until 2008-2010, most languages ​​and frameworks used fairly simple hash functions:


Exception: Perl 4, used One-at-a-Time with a random seed since 2003 (this year they discussed this attack at the conference), later (5.17.6) used MurmurHash 3.

To reduce the number of collisions and complicate the DoS attack, many languages ​​over the last year switched to modern hash functions with a random seed. And when it turned out that this does not solve the problem with the attack, many have changed the hash and the second time.

An example of a collision-resistant MAC :


Siphash

2012, length 64 bits, key 128 bits.
Speed: 15.50 cycles per byte for 8 bytes of a message, 3.66 cycles per byte for 64 bytes of a message, up to 1.96 cycles per byte for long messages. For comparison, the fastest finalist SHA-3 BLAKE-512: 134 cycles per byte for 8 bytes, 20 cycles per byte for 64 bytes.
You can use any number of compression and finalization rounds. For example, SipHash-2-4 - 2 and 4 rounds, respectively.

It was developed as a collision-resistant hash function based on a cryptographic pseudo-random function (PRF), fast for short messages. Since it is intended to be used with a random key, there is no requirement for collision resistance. It differs from non-cryptographic hash functions in that it works correctly and safely with the key, providing both the MAC properties and the properties of the universal hash function.



The situation by language currently looks like this:


Open tasks:


Other methods of protection

It is believed that the fight against this attack is not the task that the hash function should perform. You can instead:


Recommendations

For data structures that use untrusted data, apply algorithms with complexity O (log n ), for example, a balanced tree. If it is not possible, then use a hash function randomly selected from the family of functions (MAC), for example, VMAC or SipHash.

Source: https://habr.com/ru/post/178955/


All Articles