Analyzing the capabilities of mass auditing based on leaked hashes from LinkedIn

A week ago, the LinkedIn hash base leaked , for others, this event can be remarkable in itself, but for me, first of all, it means being able to analyze the current possibilities of password cracking. And I'm not going to talk about how many times the word “password” has been encountered among passwords and how long it takes to go through six-character combinations. I’ll rather frighten users by how difficult passwords can be hacked in a few hours. And I will tell programmers how it can be effectively implemented, and as a small gift I will attach a program that I wrote for a mass audit. There is also some educational program on the use of rainbow tables with simple conclusions.

And so, in an hour we managed to “recover” about 2.5 million passwords on an average working configuration, without special dictionaries and rainbow tables. Among the passwords found there are 16-character alphanumeric combinations, and not in a single copy.

But let's start in order. Imagine that you come across a database of 6.5 million fresh hashes (only 5,787,239 unique), what are the ways to recover the maximum number of passwords in a reasonable (say 7 days) time?

Rainbow tables ;
Brute force;
Dictionary attack (with rules);
Frequency analysis.

A small educational program . Many believe in the wonderful properties of rainbow tables , supposedly they are “something capable of breaking everything at once.” This is a huge mistake, moreover, they are completely unsuitable for mass audit (thousands or millions of hashes). Therefore, forget the phrase “a hacker can generate a rainbow table!”.

Why? Take a set of tiny rainbow tables of the size of a hundred gigabytes , which are able, with a probability of 98%, to recover any password up to 8 alphanumeric characters of one register. By the way, such tables can be generated somewhere for a couple of months on a fairly powerful machine, but may we already have them, as a kind of divine gift.
')
The time required to find a password for a single hash value in such tables is about a minute. During this time, it is necessary to perform chain_length of hashing, reduction and search operations, thus 100Gb.

It’s not possible to optimally search multiple passwords for different hash values at once; for each hash it is required to build a separate rainbow chain and look for it in the table. Thus, we will need approximately 5.7 million minutes to search.

It's about 10 years. Therefore, our divine gift in this case can not be considered a modest gift.

However, a good set of rainbow tables will help restore the password to a single hash value in minutes (if you take into account the same modest restrictions of 8-9 characters of length).

Direct search is somewhat different from rainbow tables for mass hacking - it is easily optimized for searching for enumerated values in large sets of hash values.

We need to take each line from the set {a..z0..9} ^ 8, count its hash, and search the database of hash values, which in this case, LinkedIn carefully provided us .

And the search is an operation that in this case is quite easy to optimize. Looking ahead, in my program I achieved O (1) search even on such large bases.

The search is based on simple filtering - do not try to search for those hashes that we will not find. Create an array of bit values (checkup) with the size of SIZE, about a hundred megabytes and build a function that maps the hash value to the index of this array. Oddly enough, such an object is also called a hash function, but not a cryptographic one, and is often called a convolution. For each hash from LinkedIn we compute the convolution, and write “1” to the corresponding bits of the checkup array.

When iterating, we consider the convolution == j from the obtained value, we look at checkup [j], if there is 0, then there is no point in looking for such a value in the set (this is checked for O (1)). Otherwise, we use a binary search, which already copes with O (log (N)).

If you go back to numbers, a direct search with similar optimization will take about a month on the same hardware, or several days on video cards.

That is, for mass hacking even direct bust is more profitable than tables!

But we would like to deal with passwords longer than 8 characters and dictionaries come to the rescue. Dictionaries are very cool! They contain those passwords that were already someone's, and the probability of being among ours - they are much higher than those of random lines. And if you add some set of replacement rules, then you can work wonders. In this case, the speed of such a search will be comparable to the speed of a direct search.

But there is one drawback - dictionaries have to come from somewhere. That is, the selection of words should not be random, but should be the result of some analysis. And by downloading the dictionary from the Global Network, you can get a complete “junk”, the effectiveness of the search in which will be lower than the direct search (I, for example, will specifically write there the lines that hardly anyone will put as a password).

Do it yourself

Here we come to the optimal, in my opinion, mass hacking strategy related to frequency analysis . And we do not need anything at all except the merged list of hashes.

Step one . We run a brute force on the set of all characters that can be entered from the keyboard, up to 5-6 characters long. In essence, we have no task to get all the passwords of such length, we just want to get a certain number, hundreds of thousands - for further analysis. If 6 characters is too short, then you can take 7-8, again, we do not need to exhaust the entire range for enumeration.

The main thing is to work for 5-10 minutes.

So, we found some passwords. Now you can conduct a frequency analysis, highlighting the most likely combinations of characters coming in a row. For example, “pass” is one of such combinations, and on LinkedIn and “link” too.

Step two . We launch search on concatenation of the dictionary of frequent combinations with itself. Now I just remind, but if the essence is not clear, I recommend reading my previous note .

Let it also work for 5-10 minutes, and notice, it will be much faster to find passwords than last time.

And the third step. Having typed the "critical mass" of the found passwords, say a tenth of all, we repeat the frequency analysis and run the search again, using the obtained dictionaries. Now you can not stop - he will do his job.

At the moment, the search has been working for me for two hours without stopping, and finds “by sight” about 50-100 passwords per second.

Here is an example of the dictionary: dl.dropbox.com/u/243445/md5h/relevant.txt
You can check the security of your password by trying to “collect it from pieces” of the dictionary.
If four or fewer pieces are enough, it is bad, change it.

What managed to "break"

linkedinmel1234, andrea71103245, hockey101155239, magmag624222, carlito5657224, linda@790212, supercow779212, jesus143mary143, linkedin#239133, linkedinpassword123, thepassword1776000, 13051987159000, meatballstew123, latenightbreeze, whatthedillyo, friendofkellyg, hannah11emily9, linkedin7barry5, linkedin.passwd, linkedinrocksforeva, philip23marcus, 54fordpickup, nabe1959@, ge0rgin@, #1dust67, logic123tree456, ramgopal@123456, Jk971423, tiger!376400, ...

UPD here is a screen with counted statistics for found passwords (I can calculate some more parameters if needed):

We see that neither the length of the password, nor the use of special characters, nor the combination of registers and numbers, nor chance is saved.

Program

As promised - here is the program that managed to do it. And by the way, you are free to try to repeat the result (or achieve the best).

Sources: dl.dropbox.com/u/243445/md5h/src.7z
Binary: dl.dropbox.com/u/243445/md5h/MD5BLAST.exe

Do not be surprised that it is called MD5-, but not SHA- because, as a piece, I took my program, which I already mentioned .

The CUDA Toolkit is still needed: developer.nvidia.com/cuda-toolkit-32-downloads#Windows

The dictionary search speed for SHA1 on the GTX460 with 5.7 million unique hashes on the list is more than 60 mpwd / s. And do not scold the "low" speed - it's the same:
* SHA1;
* 5.7 million hashes for search;
* concatenation of a dictionary of strings of arbitrary length.

There are still no analogs with a higher speed for this task.

To run a search, you must save the hashes in the file hash_list.txt, the dictionary in the file words.txt and call:
MD5Blast words.txt 3 0.0
where 3 is the maximum degree (the number of concatenations of the dictionary), and 0.0 is the initial progress in percent.
For step 1 of the above scheme, words.txt should contain all characters entered from the keyboard, one for each line:

a
b
c
...

To get a list of relevant combinations:
MD5Blast _found.txt relevant.txt 1.0 4.0 16.0
where the first file is the source one for analysis, the second one is for recording the results (yes, a bit is not the Unix-way),
and the remaining three parameters are adaptive frequency thresholds for two, three, and four-character combinations, respectively (the number is higher — fewer combinations as a result, you can experiment with them).

Small conclusions

Rainbow tables for mass hacking are useless;
The length of the password, the presence of special characters, its meaninglessness, different registers - separately do not constitute security;
Hashing passwords without using salt is like storing half of them in clear text ( we learn from our mistakes );
Frequency analysis allows you to generate good dictionaries "from nothing."

Source: https://habr.com/ru/post/145843/

All Articles

Analyzing the capabilities of mass auditing based on leaked hashes from LinkedIn

Do it yourself

What managed to "break"

Program

Small conclusions

More articles: