Yesterday I finally released the first public version of
Lamer News , this is also a real example of using Redis in the form of a site like Hacker News, and a project of a completely independent site about news from the world of programming.
The project was well received by the community, and has been in the HN top for some time. Thanks for the feedback.
After the release, I received several requests to change the hash function, which I used to hash passwords in the database:
')
This code uses SHA1 with salt. As readers have noted, this is not the safest choice, as there are ways to calculate SHA1 very quickly. After some time, people started tweeting and writing the same sentence in the comments: “use BCrypt”. I suggested using nested SHA1 in a loop to avoid adding new dependencies in the code (if you check the README, one of the goals is to make the code simple and with as few dependencies as possible). And here it happened: the encryption dogma. No reasoning about crypto-primitives and their possible applications and combinations, just stupidly “use BCrypt”. In the eyes of these comrades, programmers are simply stupid drones who execute the guidelines, which by no means can argue about cryptography. But more about that later ...
Let's take a step back and consider the original problem with all of this, and how insecure this code is.
Problem
The problem is fairly easy to understand, but better explained in detail. In order not to store passwords in the database in the clear, they are usually hashed.
HP = HASH (password) # HP – hashed password,
When the server part needs to authenticate the user, it receives the password in the clear, hashes it again and compares it with what is in the database. If it matches, then authentication is successful.
However, what happens if an attacker (let's call her Eve) gets a dump of the database? Eva will have a set of hashed passwords, let's call them HP1, HP2, HP3, ... Her goal is to find an attack that will turn HP back into P.
The hashing algorithm is open, so the first thing Eve has to do is try to apply it to a dictionary of common words. If the hash from such a word matches what is in the database, the password has been found. Note that in English there are not so many words, so this attack can be carried out very easily and quickly.
But perhaps our user, Bob, has chosen a password that cannot be found in the dictionary, and is not very long.
Eva can generate all the combinations for passwords, say up to 6 characters in length, and calculate their hash. This attack is more complex computationally. If the password is a binary string, say, of 6 characters, then there are a total of 256 ^ 6 combinations, namely 281474976710656.
If our attacker can calculate a billion hashes per second (modern GPUs allow this, and you don’t have to go broke), then breaking this password will require:
281474976710656 / 1000000000 = 281474
It's only three days, or, on average, even a half. Not good! It's too easy. There is another problem, the user is unlikely to use all 256 characters with equal probability. Consider the worst case, it's only 26 lowercase letters, no numbers and other characters. Take this time a password of 8 characters.
There are 26 ^ 8 possible passwords, namely 208 827 064 576. This time our password can be cracked in 208 seconds (or half the average time).
This is definitely not good. How long should our alphabetic password be to be inaccessible to an attacker who can calculate a billion hashes per second?
14 characters will be selected on average 1024 years. For 16 characters, the attacker must spend 1382824 years. 12 characters will resist for a year and a half (again, on average), this is definitely not enough for most applications.
So is SHA1 safe for password hashing? Yes, if the user has chosen a strong password of 14 characters or more. Otherwise not very safe. It all depends on the length of the password, and it is no secret that users have a bad habit of picking up short and unreliable passwords.
But even worse
Unfortunately, in fact, everything is worse. For example, attacks against our 12 characters of the password can be carried out instantly: you can spend about three years and calculate hashes for all combinations and write them into a table like P -> HP.
However, such a table takes up a lot of space (86,792 terabytes, to be exact), and this is also assuming that we store data so compactly that we spend only one byte per pair of P, HP (poorly accessible target). However, this is a real threat when the size of the table is within reasonable limits.
The bottom line is that often in cryptography attacks can be carried out at the expense of more space, instead of more time.
The good news is that there is a way to prevent the calculation of a
single table suitable for all sites using the same algorithm . It is about the use of "salt". Salt is a (public) string that we add to our password before hashing. For example, if the salt is “lame” and the password is “foo”, then we will calculate
HP = HASH ("foolame")
Thus, for a tabular attack, Eva will need to calculate the hashes of all combinations of passwords with the same salt. Such a table would be useless to attack another site that uses a different salt.
Random salt
We can improve a little more and store not just HP, but also random salt. When we create an account for a user, we also generate a random salt for it and save it in the account. With this approach, our passwords are even safer, because now the attacker will have to calculate the table
for each user separately . And one more interesting observation: if we have one global salt, then the probability that it will fall into the hands of an attacker is greater, even if the passwords remain secure. If we have individual salts, then this is unlikely.
We slow down the hash function
However, even if we prevent all tabular attacks, there is still a fundamental problem: if the password is short and Eve is able to calculate HASH 1 billion times per second, then we have problems.
We can do something in this case: use a hash function, which is slow.
There are algorithms that are very slow in calculating both software and hardware. Or we can take the existing algorithm and make it very slow, using it in a loop.
For example, Blowfish is an encryption algorithm with a slow key-splitting algorithm (the algorithm itself is quite fast after performing key scheduling, so Blowfish is not good only if you want to encrypt many short messages with different keys, but it can be fast, if you need to encrypt a large message with one key).
The fact that Blowfish has a slow key-splitting algorithm makes it a good candidate for HASH.
So Niels Provos and David Mazières developed an algorithm called Bcrypt, which can be used to hash passwords. The algorithm was introduced in 1999 and uses a modified version of the Blowfish key-splitting algorithm. I'm not sure that Blowfish analysis can be applied to Bcrypt. Also, I do not know to what extent the analysis of Bcrypt itself was performed, so I cannot make comments on the security of this algorithm.
However, this is a popular choice, Provos and Mazières are two well-known cryptographers, so the algorithm probably has no obvious drawbacks.
As soon as we start using a slow hash function, the attacker starts having problems. For example, Bcrypt can be customized, making it faster or slower. If you make it so slow that even on a good hardware it will not be possible to calculate more than a thousand hashes per second, then this will probably still be acceptable for your users, but already impractical for Eva. For example, a password of 8 characters using only lowercase letters:
26 ^ 8 / 1000 / 3600 / 24 / 365 = 6,6218627782
3.3 years (on average) for hacking a password of 8 characters. Probably still not enough security, but it's better than a few seconds.
However, please note that we are still not protected from a dictionary attack . If the user chose a common word, then everything is hopeless. 30k hashes can still be computed in a reasonable amount of time.
About dogmas
At the moment we have considered some interesting thoughts. For example, there is no such hashing scheme that protects users with bad passwords. It is very important to force users to add capital letters and / or numbers and other characters to the password (in case security is important for your application).
It is important to understand how things work. This smoothly leads us to the next thought. After the “use bcrypt” choir, I replied that I could use another solution based on repeating SHA1. But apparently, many believe that the programmer should not understand cryptography. He should take it as a dogma. If you have dogmas, you will be an unimportant programmer, most likely. What if there is no bcrypt support on your system, but you need to work on it. I suggested a trivial scheme:
SHA1 (SHA1 (SHA1 (...( SHA1 (password | salt))))
What a heresy! I was considered foolish, not realizing that it is not safe to chain hash functions in this way into a chain, and so on. But what if you think a little?
SHA1 is a hash function. It consists of small steps, called
rounds , which are performed again and again. There is no key sharing, because it is not a block cipher, this function simply compresses the input bitstream into a fixed-length output.
It is important to understand that many cryptographic algorithms are based on the idea that you can take a simpler function and perform it several times to enhance its effect. This concept is so important that sometimes an attack on an algorithm becomes impractical if you add more rounds. Sometimes cryptographers use a variant of the algorithm with a reduced number of rounds simply to analyze the algorithm in a more vulnerable form, in order to better understand how strong the option is with the full number of rounds.
So why not just add a lot of rounds? It is slow. Even an amateur cryptographer can develop an algorithm that is safe and slow. A good cryptographer can find a compromise between security and speed.
But ... because now we know about the concept of rounds, we know that in SHA1 there is no key separation algorithm, that the output value of the function depends only on its input, and that SHA1 was not designed with the possibility of reversing (getting the initial value from the hash).
So it’s quite natural that the proposed SHA1 scheme (SHA1 (SHA1 (..))) will do just that, add rounds to SHA1. It follows from the fundamental properties of SHA1 that it is computationally impossible to write such a function SHA1000, which will be equivalent to a thousand-fold nested call to SHA1 and at the same time will be easily computable.
Please note that the
result of SHA1 (SHA1 (..)) is not the same as adding more rounds to the inside of SHA1, since there is also pre- and post-processing.
And you know what? This morning I discovered that the PBKDF1 algorithm described in RFC2898 does exactly what I suggested.
There are people who are very happy to show you the way, but if you look closely at them, you will find that they have no idea what they are talking about. Please use proven standards, try writing secure code, but also use your mind, learn more about cryptography and how you can combine primitives. Dogma - this sucks.
Perhaps it would not be a good idea for a programmer to invent his block cipher and then try to use it somewhere for useful purposes. There are specially trained people for this. It is much more important to understand individual “bricks”, what can be done with the help of cryptography, how to properly use protocols and so on.
In conclusion, I want to say that I am calm when people are rude to me, if at the same time they are right. Arrogance - not very big trouble, if it goes along with the mind. If instead arrogance is met with ignorance, then this is indeed a sad story.