📜 ⬆️ ⬇️

XXH3: a new record for hashing speed


Benchmarks are made in the program SMHasher on Core 2 Duo 3.0 GHz

On Habré, they repeatedly told about non-cryptographic hash functions that are an order of magnitude faster than cryptographic ones. They are used where speed is important and it makes no sense to use slow MD5 or SHA1. For example, to build hash tables with storage of key-value pairs or to quickly check the checksum when transferring large files.

One of the most popular is the xxHash family of hash functions, which appeared about five years ago. Although initially these hashes were thought to check the checksum when compressing LZ4, but they began to be used for a variety of tasks. This is understandable: just look at the table above with a comparison of the performance of xxHash and some other hash functions. In this test, xxHash outperforms its closest competitor in performance by half. The new version XXH3 raises the bar even higher.


Here and further diagrams are clickable.
')
The author of the program, Jan Kollet, writes that the idea of ​​improving the algorithm appeared in the process of implementing the Bloom filter, which required fast generation of 64 pseudo-random bits based on small variable-length input data. In principle, the standard XXH64 could handle this, but processing small input data has never been a priority when it is developed. In other words, optimization is possible. As a result of this optimization, a new hash function XXH3 has appeared, in which ideas from several other hash algorithms are implemented. What is most interesting, XXH3 is significantly faster than all existing xxHash options, not only in small input data, but in almost all use cases.

XXH3 eliminates the main disadvantage of XXH64 - a 64-bit hash limit. The author says that for this reason he was often asked to release at least a 128-bit version. So, the XXH3 hash function is theoretically capable of generating hashes up to 256 bits.

In XXH3, an internal loop that is optimally processed by vectorization. Due to this, the function uses hardware support on SSE2, AVX2 and NEON instruction sets. Performance depends on the compiler. Unexpectedly, the version compiled by clang far surpasses the rest. Jan Kolle even thought that the performance of the hash function here would exceed the memory bandwidth. This version of the graph corresponds to the dotted line.



As a result, it turned out that the hash function with support for AVX2 has a much higher throughput than RAM, so the cache size becomes an important factor. However, in many tasks it is necessary to process data that is already in the cache, so working with a speed faster than memory makes sense.

Support for SSE2 instructions allows the new hash function to significantly bypass XXH32 on 32-bit applications that are still common in the real world (for example, WASM bytecode).



On small input data (20-30 bytes), the XXH3 hash function is not much, but still surpasses Google CityHash , as well as FarmHash, which used to be clear leaders.



The following graph shows the results of the most realistic test with variable-length input data (random variable from 1 to N).



In another test, the delay is taken into account: here the hashing begins on a signal. The idea is to simulate a real workload, where the hash function does not work continuously, but is called in other processes at a certain moment.



The author writes that this test with the multiplication of 64 × 64 => 128 bits prefers the mumv2 algorithms from Vladimir Makarov and t1ha2 from Leo Yuriev.



Finally, here is the final and most important graph that shows the hash rate on variable-length input data, taking into account the delay. It reflects the actual use of the hash function, for example, in databases.



XXH3 is part of the xxHash v0.7.0 package . It is marked "experimental", and for the unlock you need to apply the macro XXH_STATIC_LINKING_ONLY . The author explains that the hash function can still be used only on test ephemeral data, but not for the actual storage of hashes. According to the results of the experimental period and the collection of user feedback, the XXH3 function will receive a stable status in the next version of xxHash.

Certificates of the signature of documents Microsoft Office, Adobe PDF, LibreOffice, etc.
GlobalSign is a great opportunity to implement trusted digital signatures . From desktop, server to cloud implementation options. Read more

Source: https://habr.com/ru/post/444144/


All Articles