SGVsbG8gd29ybGQh or base64 history

Brief background

In general, it all started a long time ago. So long ago, that hardly remained witnesses of the holy wars of those days when it was decided how many bits should be in a byte.

It now seems to us self-evident that 1 byte = 8 bits, that 256 different values can be encoded in a byte. But once it was completely wrong. History remembers both seven-bit encodings, and six-bit, and even more exotic systems (for example, the Setun computer , which used threefold logic, that is, one ternary bit — trit could have three, not two values, for it the ratio 1 tray = 6 tritas). But if we leave aside any exotic, then the mainstream still had encodings, in which 6, 7 or 8 bits per byte.

Six-bit encoding (for example, BCD) allowed 64 different values to be encoded in one byte, which seemed to be quite enough for encoding alphanumeric characters, and the “extra” seventh bit extended the encoding to 128 characters.
')
However, soon the eight-bit byte became generally accepted.

Problem of the eighth bit

The approval of eight-bit encodings as a de facto standard brought many problems. By this time, a certain infrastructure already existed, using exactly seven-bit encodings, and holy wars flared up with a new force.

They reached us in the form of problems with “cutting the eighth bit” in the e-mail system. The approval of an eight-bit byte gave 256 different values for one byte, which, in turn, made it possible to fit in the same code table common symbols (digits, punctuation marks, Latin) and symbols, say Cyrillic. It would seem that it is a complete convenience, the text can be typed in Russian letters, even in English, and if necessary, there will be a place for German umlauts too!

But, as always, the devil was in the details. Already accumulated and working hard-n-software was often adapted for seven-bit encodings, which led to various problems.

For example, when sending a letter, the mail server could absolutely zero the high bits in each byte of the message, which could not but lead to problems, often the information was simply disastrously lost.

For a temporary solution to this problem, several options were proposed. One of them was the KOI-8 encoding. The solution, it must be admitted, is quite elegant - in this encoding, Russian letters were arranged in order of Latin and differed from them exactly by that most significant bit. Thus, when this bit was cut off, the Russian “A” turned into Latin “A”, “B” - into “B”, and so on, the message was simply transliterated and could still be read. True, even here it was not without a skeleton in the closet - sorting in Russian alphabetical order in “KOI” became a nightmare ...

And what was to be done to other languages, peoples and codings? And binary data? Anyway, transliteration encodings did not solve the fundamental problem - the loss of the eighth bit, the loss of some information. This is how the encoding (or rather, the algorithm) Base64 was born.

Base64 algorithm

The idea of base64 is simple - reversible coding, with the possibility of recovery, which translates all the characters of the eight-bit code table into characters that are guaranteed to persist when transferring data in any networks and between any devices.

The algorithm is based on the reduction of three eights of bits (24) to four sixes (also 24) and the representation of these sixes as ASCII characters. Thus, reversible encryption is obtained, the only drawback of which is the size that increases during encoding - in the ratio 4: 3.

Example:
Take the text of the Russian text "ABCD". In binary form in Windows-1251 encoding, we get 5 bytes:
11000000
11000001
11000010

11000011
11000100
(00000000) - extra zero byte is needed so that the total number of bits is divided by 6

Divide these bits into groups of 6:
110000
001100
000111
000010

110000
111100
010000
000000

We take an array of characters "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 + /" and translate the resulting numbers into these characters, using them as array indices, we get "wMHCw8Q". It only remains to add one “=” character at the end, as an indication of one extra zero byte, which we added in the first step and get the final result:

“ABCGD”: base64 = “wMHCw8Q =”

The inverse transformation is just as easy, try, for example, to decipher what is in the title of this article.

Application

The base64 algorithm to this day is used where there is no way to guarantee the careful handling of your information - for example, when encoding email attachments. In PGP, the base64 algorithm is used to encode binary data.

You can imagine other base64 applications - for example, when saving to a database, if you do not know the environment beforehand (oh, these are magic_qoutes in PHP!) And there is no need for indexing and searching by text, you can use base64.

base64 can be used to get hashes, for example, using the md5 algorithm, as a means against tabular selection of a hash, if the data, for example, the user's password in the system, is previously converted to base64.

Finally, the Data URI

Links

ru.wikipedia.org/wiki/Base64
base64.ru

Source: https://habr.com/ru/post/88077/

All Articles