📜 ⬆️ ⬇️

Quick password recovery using MD5-hash brute-force method

Probably every one of us at least once forgot the password from some important site, and then tried to decrypt it by the cookie that was saved in the browser. Perhaps these were not even your cookies, but this is not important - if you are interested in the topic of high-speed brute force, then welcome under the cat!

I’ll say right away that the acceleration techniques described in this article are suitable for any hashing algorithm, but because of the widespread prevalence, I chose md5.

Perhaps brute force is not always advisable to use to solve a specific problem, but, nevertheless, it is the most universal method, which is guaranteed to produce results, even the longest.

There are two ways to improve performance: hardware and software. The first is to buy more powerful hardware and launching scripts for brute force on them, but this is not our method, so we will choose the second one - thorough optimization and maximum use of what we have. And we have a fairly weak by current standards system unit with a Dual Core 1.8 Ghz processor, 2 GB Ram and a Nvidia 9800 GT video card. So, let's try to squeeze all the juice out of it!
')

0. Select programming language


In this case, the assembler is probably the obvious choice, but I would like to preserve the portability of the code, so I chose C as the fastest option after the assembler.

1. Acceleration md5


The md5 algorithm is designed in such a way that it does not immediately operate with all the data entirely, but breaks the data into blocks of 64 bytes and processes each block separately. Considering that enumeration of up to 64 characters (and to be precise, up to 59, because the rest of the block is occupied by service information) is physically impossible even on supercomputers, since takes dozens of years, then this partition can be eliminated altogether and a pre-allocated array of 64 bytes in length, where and write down possible password options. Also, in order not to compare with each other all 16 bytes of the received hash, it is advisable to immediately translate the original hash into 32-bit numbers A, B, C and D and already compare them with those obtained during the brute force, since comparison of four 32-bit numbers on a modern processor will be made much faster than sixteen 8-bit ones.

2. Optimal CPU utilization


First, let's try playing with the compiler. I used GCC version 4.4.5. We compile the brute-forser with no additional flags and run the test:

~$ make PARAMS=""
~$ time ./brute 827ccb0eea8a706c4c34a16891f84e7b
...
real 0m22.164s
user 0m22.080s
sys 0m0.020s

12345 is used as a test password. The alphabet used in the search consists of numbers, uppercase and lowercase English letters - this is enough for a test.

Next, try adding the -O3 flags (maximum optimization level) and -static (static linking, all libraries are included directly in the executable file) and run the test again:

~$ make PARAMS="-O3 -static"
~$ time ./brute 827ccb0eea8a706c4c34a16891f84e7b
...
real 0m6.422s
user 0m6.370s
sys 0m0.020s

As a result, we get the acceleration as much as ~ 3.5 times!

Since the processor has two cores, we will try to run the same test in two threads:

~$ make PARAMS="-O3 -static -DTHREADS=2"
~$ time ./brute 827ccb0eea8a706c4c34a16891f84e7b
...
real 0m3.763s
user 0m6.530s
sys 0m0.020s

We get acceleration by ~ 2 times, which, in general, is quite logical.

Also, you can try to compile a brute forcer using a trial version of the Intel C ++ Compiler , which optimizes the code best of all (assuming that the code is compiled for an Intel processor, of course), but it didn’t give me a noticeable performance boost, so I won’t give measurements here .

3. Enable the GPU


Indeed, if we have another processor (or rather, as many as 112 small processors), which is bored without work, so why not use it? For the organization of calculations on the GPU, there are two solutions: CUDA and BrookGPU , for NVIDIA and AMD video cards, respectively. Since we only have a video card from NVIDIA, we will use CUDA to calculate md5-hashes on the GPU.

The main problem here is that a completely different principle of programming is used under the GPU and it is not easy to organize a cycle that enumerates all possible passwords, because each thread has a timeout after which the thread automatically ends.
You can get around this if you put the main loop on the CPU and calculate the possible password options for each of the GPU threads in advance, and on the GPU itself already read one hash per stream and compare it with the initial value.
Let's try it in action:

~$ make PARAMS="-O3 -static -DTHREADS=2 -DUSEGPU"
~$ time ./brute 827ccb0eea8a706c4c34a16891f84e7b
...
real 0m1.967s
user 0m3.933s
sys 0m0.120s

The results of the joint efforts of CPU + GPU managed to achieve acceleration by ~ 2 times! These are already quite good numbers, and, if you have patience, it is quite possible to find passwords of 9 or more characters in length, even if the alphabet includes punctuation marks and other languages.

PS The sources used in the tests can be obtained using git: git clone git://github.com/VladX/md5-bruteforcer.git

Source: https://habr.com/ru/post/110809/


All Articles