Hack Hill's cipher? Easy

Purpose: Hack the Hill cipher

Good day, dear readers! Today I wanted to share a way that helped me uncover the text that was encrypted using Hill’s method. I will not describe what Hill method is: experienced craftsmen have already tried to convey to me the features of this method. Link to post .

What do we have?

I will say right away that there was neither a plaintext nor a key on my hands. It was known that the text of 6286 characters was encoded with a 7 x 7 matrix. Therefore, for our convenience, we divide the text into 898 lines of 7 characters each. The text does not contain the letters '' and '' . In order to prudence, I will not give all the encrypted text, but only part of it:

...

Seemingly pointless nonsense, for now ...
')

How are we going to break?

Consider a brute force attack. It was stated above that two letters are excluded from the alphabet, therefore all linear combinations when encrypting (as well as when decrypting) are taken mod 31 (taking into account that this is a simple number, the text becomes a bit more secure).
If we consider brute force inverse key matrices, then we will have to go through

$ inline $ 31 ^ {49} $ inline $ combinations (this number is approximately 75 characters). Therefore, this method is eliminated instantly, though! If from this set it would be possible to sort through a subset of non-degenerate matrices in some more or less fast way, then perhaps the task would be easier. Unfortunately, I don’t know such a way and am not sure that such a one exists at all!

TO How to be?

I propose to slightly "soften" our attack and use with sharpness. Imagine that we know the original matrix and take one of its lines. Let it be the first line . If we apply the multiplication of the first key line to the first block of size 7, then we will get the first letter of the plaintext. If we multiply by the second block, we have the seventh letter of plaintext, etc. In this way, we can get 898 characters of plaintext and collect some statistics from it! Great, then we have to pick up those seven lines of keys that satisfy some criteria. With this criterion, I took the index of Russian coincidences . To do this, you need to collect all 898 letters, calculate the frequency for each letter and calculate their coincidence index (IC). the recalculation of the information system directly caused doubts for the following reason: the use of the double type prevents the program from running quickly. I was lucky to find the finest article about FI-test for mono-alphabeticity. Very briefly:

Calculates the "critical" point $ inline $ PHI (p) = 0.0529 * N * (N-1) $ inline $ where N is the length of the text;
Random text is taken, calculated $ inline $ PHI (o) = \ sum F (F- 1) $ inline $ where F is the number of times a letter occurs in the text;
We learn how much better $ inline $ PHI (O) $ inline $ approximates $ inline $ PHI (P) $ inline $ .

Rather, you notice that there is nothing new there. All the same can be done by breaking the Vizhener cipher. However, this will allow us to use only integer types in the program, due to which the conversion performance will only accelerate.

Start breaking ...

Immediately make a reservation: yes, I am a ~~cattle~~ coder. I have no doubt that the professionals of the C language will find something to complain about here :). The program was parallelized on 8 threads.
I will say that for our text

$ inline $ PHI (p) = 0.0529 * 898 * 897 = 42611 $ inline $ . I will give the code of one stream:

 int chast[898]; /*   */ int mass[31]; /*   */ int c1; int c2; int c3; register __int32 PHI_O; /*PHI(O)*/ for (int k1 = 0; k1 < 4; k1++) { /*   for ...*/ for (int k7 = 0; k7 < 31; k7++) { /*       PHI(o) */ memset(mass, 0, sizeof(mass)); PHI_O = 0; /*     */ for (int i = 0; i < 898; i++) { c1 = k1 * sym[7 * i] + k2 * sym[7 * i + 1]; c2 = k3 * sym[7 * i + 2] + k4 * sym[7 * i + 3]; c3 = k5 * sym[7 * i + 4] + k6 * sym[7 * i + 5] + k7* sym[7 * i + 6]; chast[i] = (c1 + c2 + c3) % 31; /* - */ mass[chast[i]]++; /*    */ } for (int i = 0; i < 31; i++) PHI_O += mass[i] * (mass[i] - 1); if(PHI_O > 39000) { cout << PHI_O << '\n'; printf("%d %d %d %d %d %d %d\n", k1, k2, k3, k4, k5, k6, k7); } } /*      */ }

So one thread goes through

$ inline $ 4 * 31 ^ {6} $ inline $ lines of the matrix of the key and made fi-test. As a result, I received a lot of candidates for the "good" keys. Some were so elegant that their IP of 898 characters was 0.0529 ! All is not gold that glitters . I was not too lazy to recalculate the frequency of letters obtained using the key-string and noticed an amazing thing:

This series of letters fully satisfied the criterion of the fi-test, but the frequency of the letters was not plausible. In other words, the letter “b” could meet as often as the letter “o” occurs in a natural text, and “o” could not be met at all, but the IP property was preserved. I have not been able to give this a mathematical explanation. I rule out that this is a coincidence. So, we had to ~~separate flies from cutlets~~ to choose the best candidates, which are well combined with the frequency of the letters of our beloved and native Russian language. Surprisingly, such a tough selection was satisfied by the candidacies of several fighters:

3 16 5 24 17 20 25
6 30 5 1 13 12 3
7 11 18 12 17 27 15
12 17 0 15 5 16 1
15 19 27 14 21 10 2
25 14 27 20 22 21 22
29 3 1 10 30 28 26

There is nothing left ... Arrange them in the right sequence. To do this, I had to create a key matrix, change the line indexes between each other using the permutation function (i.e., the permutation went only between indices) and apply the resulting matrix to the ciphertext, I output the whole thing to a text file (wonderfully) and tried to catch at least for some meaningful text. Looking for ... looking for ... stop! It seems something turned out! Yes!

By the way, find among 7! lines of meaningful text is very easy, take no more than 2 minutes. It seems that is all!

findings

Calculations on the CPU on 8 threads took me 10 hours. The computer puffed like a steam locomotive. Faced with difficulties more than once. First, there were heaps of errors inside the code, which after 10 hours gave an incorrect result. Secondly, it was difficult to find a suitable criterion. Thirdly, the program was not given a good optimization (in this article I brought a piece of non-optimized code). With regard to the theoretical side of the question: Divide and conquer .

This article is a must-read for anyone interested in classical cryptanalysis. The technique is great for small texts when you do not apply your frequency analysis skills broadly. The “hard” method is recursive in the sense that it strongly depends on all previous results. For this reason, it cannot be parallelized (by the way, it cannot be optimized even if it is desired), but the method is dynamic and should lead to the correct result (in theory).

What in practice ? CUDA ~~without him~~ ! If I had my own titan farm (at least stuff 10), Hill’s cipher would have been ripped apart in less than a minute. The concept of parallel computing on CUDA, like a pack of hungry piranhas, does not give a chance to remain crypto-resistant to this encryption method, compressing time several thousand times. I am not a huge expert in this matter, but I almost do not doubt that this idea is easily implemented and I hope someone will implement it (if they really want to).

It's all! I hope my article will be just a little useful for you! Thanks for attention.

Source: https://habr.com/ru/post/345876/

All Articles