📜 ⬆️ ⬇️

Coding with the withdrawal of information. Part 2, Mathematical

Introduction


In the previous part, the principal possibility of coding was considered, in which, in case you can isolate the common part of the key and the message, you can transmit less information than there is in the original message.


image

Let me tell you a little about where this topic came from. A long time ago, from one good person, I took ivlad to read and now I can’t give (please forgive) an interesting book [1], where it is written: “In turn, the cryptography itself can be divided into two areas, known as permutation and replacement” .

Accordingly, almost immediately the following questions appeared:



What do the crow and the desk have in common?


image
The picture is taken from here .
The riddle of Carroll in the title is given as an illustration of the problem that needs to be solved. Namely: define a couple of functions E(k,m)- coding and D(k,s)- decoding satisfying the following conditions:



Here I(m)- the amount of information on Shannon [2] (assuming that all characters are equally likely, which is not true in general, but is used for simplicity of presentation) in the message m, k- key, c=E(k,m)- coded message.


In order to simplify further reasoning, let us restrict ourselves to further messages that can be numerically constrained, which on the one hand does not particularly limit us (since you can assign an appropriate numeric code to any character), and on the other, allow you to use mathematics in the right amount.


So, consider first for example two numbers k=746,130and m=50133666and using the main theorem of arithmetic [4] (any natural can be decomposed into a product of simple) we find the following parameters:



Let's write all this in tabular form, where the general part of the information between the key and the message is marked in blue, the information is unique for the key in yellow, and the message is unique in orange:


image

Immediately it becomes clear that Gcd(k,m)is a common part between the key kand message mthat both the sender and the recipient have. Thus, ideally, of course, and not in reality, it is enough to transmit information about:
')


Mark now the positions of simple dividers Gcd(k,m)common with simple dividers k, units, and the rest, except maybe the leading position - zeros.


image

The resulting number s1=1010011=$8sufficiently characterizes GCD, thereby allowing the original message to be restored. Those. getting two numbers to the input [39979,83]you need to do a fairly simple inverse transformation:

image

If we now estimate the length of the original message in binary form len2(m)=len2([1111010000011100100000100])=$2and messages len2([c0,c1])=len2([1001110000101011,1010011])=$2you can see that you can save the message length in 3bit, the same with the amount of information I(m)=13bit, I(c)=I([c0,c1])=11.5bit Here is the function len2( cdot)- the length of the number in binary form.

image

Thus, to put it in words from the well-known anecdote, "in Scotland there is at least one sheep, black at least on one side" [3].

So, at the moment we have - this coding method is suitable, but only for pairs of numbers that have the same divisor in the decomposition. Otherwise, if, for example, we take simple numbers as a key and as a message, we obviously cannot reduce anything, but only increase the length of the message, as well as the amount of information transmitted, which is not what we wanted. The same situation is repeated in the case when, for example, GCD(k,m)<2or len2(c1)≥len2(gcd(k,m))because savings will be either minimal or absent in principle.

What to do…


in order to increase the likelihood of success with this coding method?
In general, if you look closely at the key, it is clear that it is not bad divided into prime numbers 2,3,5,7,11,17,19what is done specifically for the success of the operation of finding Gcd(k,m). Since such an operation can negatively affect the number of keys used, then determining the order of simple factors and making it part of the key information, we obtain, although not radical, an increase in the number of encoding options that positively affect the number of keys. Using a number instead of a key [746,130]and a couple [746130,0125763]where the second number represents the order of multiplication and determines the value c1.


The next problem to be effectively solved is the ratio of the frequencies of the appearance of primes and powers of two, it is easy to notice that the coding efficiency of powers of two rather quickly decreases with increasing length c1(see the graph of the corresponding sequences if there is one common simple divisor) and depends mainly on the number of key dividers and their values.


image

In case, there are two common dividers and one of them, for example, is 3, then opportunities for savings increase very noticeably.

image

Which means that for efficient coding you need to come up with a more efficient coding method c1.

That is, it makes sense to continue the subject further.


Example


As usual, they are laid out on github [5] - Excel is used because many have it and there is no need to think about compatibility. There are two bookmarks in the file:



Links


  1. Singh Simon. Book cipher. The secret history of ciphers and their decryption. M. Astrel 2007.
  2. https://ru.wikipedia.org/wiki/%D0%98%D0%BD%D1%84%D0%BE%D1%80%D0%BC%D0%B0%D1%86%D0%B8%D0 % BE% D0% BD% D0% BD% D0% B0% D1% 8F_% D1% 8D% D0% BD% D1% 82% D1% 80% D0% BE% D0% BF% D0% B8% D1% 8F
  3. http://www.gaelic.ru/Biblioteka/Folklore/anekdoty.htm
  4. https://ru.wikipedia.org/wiki/%D0%9E%D1%81%D0%BD%D0%BE%D0%B2%D0%BD%D0%B0%D1%8F_%D1%82%D0 % B5% D0% BE% D1% 80% D0% B5% D0% BC% D0% B0_% D0% B0% D1% 80% D0% B8% D1% 84% D0% BC% D0% B5% D1% 82 % D0% B8% D0% BA% D0% B8
  5. https://github.com/rumiantcev/nod_code

Source: https://habr.com/ru/post/345962/


All Articles