Meet: Hash steganography. Very slow but top secret

Yes, dear reader, you read it right: top secret . Moreover, I ask you to note that it is completely secret in the strictest mathematical sense: it is completely secret according to Kaschen , because the Kullback – Leibler distance in my mathematical construction will be zero; and not “almost zero”, but a genuinely zero, without any “infinitely small” and other vulgar approximations!

How? It's very simple - I won't sprinkle anything in the stegcontainer at all. Indeed, if we do not inject anything, then the empty container is indistinguishable from the stegocontainer , right?

“Wait, but if we don’t sprinkle anything at all, then we do n’t give anything at all !!!” - the reader will reasonably argue with me.
')
Absolutely right! We will not intersperse! There is a way, without distorting the container, nevertheless to transfer information. How?

Schematic Hash steganography ɔ⃝
can be represented as:

Text explanation for the picture under the cut.

Hash steganography

The idea is very simple. We take a big bunch of pictures. In the modern world, an oversupply of gadgets, self-helpers and cheap memory for storing the photographs, the generation of a large number of pictures, I think, will not make any special problems ...
If you are too lazy to generate, there are many sites that provide a huge number of photos (and other pictures).
We write a simple script in Python and upload data.

Now we take some good hash function . Of all the properties of the "goodness" of the hash function, we are only interested in equiprobability ; that is, you can even take (from a cryptographic point of view) not the best hash transformation, for example, MD5, which rightly lost public trust .

We run all the pictures through the hash function, the data is entered into the table:
picture -> hash

For example:

1.jpg -> 0xd131dd02c5e6eec4693d9a0698aff95c
2.jpg -> 0x55ad340609f4b30283e488832571415a
...

Fix some integer n .
For example, n = 2 .

We compose a new tablet, choosing the first (or the last ones, who like it) n nibbles (nibbles) from the hash.
For our example:

1.jpg -> 0xd1
2.jpg -> 0x55
...

Since no definitions can not do, we introduce the concept of a hash code .
The hash code is the first n nibbles of the hash image.
We denote the function of the computation by H (...) .
For example: H (cat1.jpg) = 0xd1; H (cat2.jpg) = 0x55

We make an index on a hash code in our table. This is necessary so that we quickly enough (namely, O (log (M) , where M is the number of pictures in the database) could find the necessary string in the hash code database.

Suppose we want to send a message (in hexadecimal form): 55d134237598 . Divide it into n nibbles. As an example, we chose n = 2 , then the message 55d134237598 should be split into 2 nibbls. If we “break” we will have spaces, then it will look like this: 55 d1 34 23 75 98 . The resulting "pieces", each of which contains n nibbles, will be called words . That is, we all got 6 words .

Next we do the following.

Take the word . We are looking for a picture in which the hash code matches our word . If there are several such pictures, we take any of them.
We transfer the picture to the channel
take the second word , look for a hash code that matches the second word
We transfer the picture to the channel
go to the third word ... and so on ...

Let's look at the GIF one more time for understanding, which shows the principle of transferring 3 words: 55 d1 34

Please note that if we take an arbitrary image in the database, then the probability that
its hash code will coincide with an arbitrarily given word equal to (1/2 ^ (4 * n)) ( ~~oh, why there is still no LaTeX on a habr~~ ).

For n = 1, this value is 1/16 , for n = 2, this value is already 1/256 .
The lower this probability, the more photos need to be placed in the database to meet our steganographic needs.

On the other hand, the greater the n , the greater the speed of the code .
Unfortunately, with increasing n , the speed increases linearly , but the need for the number of photos is exponential .

For this reason, hash steganography is very slow. However, for the transfer of short messages, it is quite acceptable.

Formal description

Now let's give a formal description of the stegosystem.

We define the stegosystem as a set of objects I.
Define a set of words S
We define the function H (i) , which for each i of I puts the word s from S into correspondence.
Let's call this function a hash code .

To send a message s1, s2, s3, ..., sm you should find such i1, i2, i3, ..., im ,
for which it is true: H (i1) = s1, H (i2) = s2, ..., H (im) = sm .

That's the whole mathematical model. As you can see - nothing complicated.

In order for each word s not to sort through the pictures, until H ( i ) becomes equal to s
It is reasonable to select a sufficiently large number of images, pre-calculate their hash codes, write to the database and build an index by hash codes.

How many images should there be so that with a probability of 99.999% you can send an arbitrary message s1, s2, s3, ..., sm of length m ? We denote this number by M.

For me, this question is open ... There are no other ways except for the Monte Carlo method for calculating M from m and n .

Upgrades

Note that our hash steganography is a typical pure steganography, so it would be nice to encrypt the data itself. For example, you can use a stream cipher; in this case, before submitting the desired stego-message to the input of the stegosystem, it should be summed up with a certain gamma. For example, the desired message 55 d1 34 23 75 98 can be encrypted with the Vernama cipher and an already encrypted sequence sent to the input of the stegosystem.

Another, in my opinion, an important upgrade is to remove the image from the database after using it; so that after using the picture it is not reused.

You can also try to play with probabilities and with the reliability of the system. For example, what happens if we want to transfer the word 0x55 , but we already have images with the hash code 0x55 in our database ?
For now, there are two options: either pass with an error or in some way generate a lot of new photos in the hope that at least one of them will have a hash code equal to 0x55 ...

However, you can think of another option: use robust coding . Then we have the opportunity to make a few mistakes and the ECC will fix them.

And, of course, you can not break into nibbles, but bytes, bits, arbitrary numbers of an arbitrary number system.

It should also give an explanation about the channel. By channel, I meant abstraction in the most general sense of the word. Most importantly, the order of one word after another must be given. In principle, you can simply put the files en masse in a folder, enumerating them. In this case, the folder with the files will be the channel.

Examples

Offhand a few examples of a specific implementation of hash steganography.

Publication of images in a social network in a certain sequence.
Data transfer via BitTorrent protocol in the right order (so that the necessary hash codes for the transfer of the right words were needed)
If the protocol X has the ability to encode the same data in different ways, then the choice of this encoding method to transmit the right words . For example, suitable protocol ZIP
Completely arbitrary, meaningful substitution in human language until we get the desired hash code . For example, the sentence “I went to the forest and saw a gray wolf” can be replaced with a similar meaningful sentence “I went to the forest and saw a gray wolf”. The meaning is the same, but the hash code may be different. It should be replaced until we get the desired hash code.

Let's sum up

The points.

Hash steganography does not distort container data. Therefore, it is top secret (according to Kashen)
Hash steganography is slow enough. As n grows, speed increases linearly, and complexity exponentially.
Hash steganography is not sensitive to container data. There is no LSB for MP3, steganography cannot be applied in prosody for JPEG ... It is only important for hash steganography that the data is digitally representable.
Due to claim 1, item 2, item 3, hash steganography can be a good solution for sending short messages
Since hash steganography is pure steganography (steganography without a key), encryption is necessary to protect the data.
Hash steganography can be a “subalgorithm” of another steganography algorithm. True, in this case, we will most likely have to sacrifice perfection . For example, LSB can be divided into U parts of the image. We take the part and change 1 bit of data in it, we calculate the hash code for this part. If it coincided with the word , then go to the next part and take the next word . If not, then try to change the other data bit. Thus, we changed all U data bits, but transferred 4 * U * n data bits. Increasing n and u can achieve very good results.

Source: https://habr.com/ru/post/272935/

All Articles