📜 ⬆️ ⬇️

Paradoxes about data compression

The task of data compression in its simplest form can refer to numbers and their designations. Numbers can be denoted by numerals ( "eleven" for the number 11), mathematical expressions ( "two in the twentieth" for 1048576), string expressions ( "five nines" for 99999), proper names ( "the number of the beast" for 666, "the year of Turing's death " For 1954), or arbitrary combinations thereof. Any designation according to which the interlocutor will be able to unambiguously determine what number is meant. It is obvious that to inform the interlocutor of "factorial eight" more effectively than the equivalent designation "forty thousand three hundred twenty . " Here a logical question arises: what is the shortest designation for a given number?

The philosopher Bertrand Russell published the “Berry Paradox” in 1908, which addresses the question of the designation of numbers from the opposite side: what is the smallest number that eighty letters are not enough to indicate?
Such a number must exist: out of eighty Russian letters and spaces, you can make up only 34 80 symbols, which means that with the use of eighty letters you can indicate no more than 34 80 numbers. It means that a certain number, not greater than 34 80 , cannot be designated in this way.

This means that this number will correspond to the designation “the smallest number, to designate which is not enough eighty letters” , in which only 78 letters! On the one hand, this number must exist; on the other hand, if this number exists, then its designation does not correspond to it. Paradox!
')
The easiest way to dismiss this paradox is to refer to the informality of verbal designations. Like, if only a concretely defined set of expressions were allowed in the notation, then “the smallest number to designate which is not enough eighty letters” would not be a valid designation, whereas practically useful notation of the “factorial eight” type would remain valid.

Are there any formal ways to describe the sequence (algorithm) of operations on numbers? There are, and in abundance - they are called programming languages. Instead of verbal notation, we will use programs (for example, in Python) that output the necessary numbers. For example, the program print("9"*5) suitable for five nines. We will continue to be interested in the shortest program for a given number. The length of such a program is called the Kolmogorov complexity of a number; This is the theoretical limit to which a given number can be compressed.

Instead of the Berry paradox, we can now consider a similar one: what is the smallest number for which a kilobyte program is not sufficient?

We will argue in the same way as before: there are 256 1024 kilobyte texts, which means that no more than 256 1024 numbers can be output with kilobyte programs. It means that a certain number, not greater than 256 1024 , cannot be derived in this way.

But we will write in Python a program that generates all possible kilobyte texts, launches them for execution, and if they print a certain number, then it adds this number to the reachable dictionary. After checking all 256 1024 possibilities, no matter how long it takes - the program searches for the smallest number in the dictionary and displays this number. It seems obvious that such a program will fit into a kilobyte of code - and will output the same number that cannot be output with a kilobyte program!

What is the catch now? It can no longer be written off for the informality of designations

If you are confused by the fact that our program will require an astronomical amount of memory to work - a dictionary (or bitmap) of 256 1024 elements - then you can do the same without it: for each of 256 1024 numbers, iterate through all 256 1024 possible programs until you find the right one. It does not matter that such a search will last a very long time: after checking less than (256 1024 ) 2 pairs from the number and the program, it will end, and it will find that cherished number.

Or not completed? After all, among all the programs that will be tried, there will be a while True: pass (and its functional counterparts) - and further verification of such a program will no longer work!

In contrast to the Berry paradox, where the trick was in the informality of notation, in the second case we have a well-disguised reformulation of the “problem of stopping” . The fact is that according to the program it is impossible to determine its output in a finite time. In particular, the Kolmogorov complexity is not computable : there is no algorithm that would allow for a given number to find the length of the shortest program that outputs this number; therefore, there is no solution for the Berry problem — to find the length of the shortest verbal notation for a given number.

Source: https://habr.com/ru/post/446976/


All Articles