ZBase32, Base32 and Base64 coding algorithms

Hello!

Many use Base64 encoding, less often Base32 and even less often ZBase32 (do you know about this?), But not everyone understands their algorithms. In the article I describe the advantages and disadvantages of these encodings, as well as talk about their implementation.

Not so long ago, I needed to use encrypted data in an http-link address. As you know, the http standard implies case-insensitive url-addresses and any proxy server or browser could spoil the data in case of using case-sensitive coding.

Given these requirements, ZBase32 coding was chosen as an algorithm.
As it turned out, there is no standard implementation in .net (unlike base64), so I had to write myself. To my surprise, I had difficulty finding a clear explanation for Base32 and ZBase32. Some ready-made solutions were found, but I could not understand the algorithm to apply them, and reading the magic of large formulas, bit shifts, was hard without a verbal description. Now that I’ve got everything behind me, I’d like to share with you a little knowledge of elementary coding. The article is academic in nature.
')

Advantages and disadvantages

Base64

Allows you to encode information represented by a set of bytes, using a total of 64 characters: AZ, az, 0-9, /, +. There may be several special characters at the end of the coded sequence (usually “=”).

Benefits:

Allows you to represent a sequence of any bytes in printable characters.
In comparison with other Base-codings, it gives a result that is only 133. (3)% of the length of the original data.

Disadvantages:

Case sensitive encoding.

Base32

Uses only 32 characters: AZ (or az), 2-7. It may contain several special characters at the end of the coded sequence (by analogy with base64).

Benefits:

A sequence of any bytes translates into printable characters.
Case-insensitive encoding.
Numbers that are too similar to letters are not used (for example, 0 is similar to O, 1 to l).

Disadvantages:

Coded data is 160% of the original.

ZBase32

The encoding is similar to Base32, but has the following differences.

Human-oriented alphabet of 32 characters. The most elaborate table of characters to facilitate writing, pronunciation and memorization of coded information. The authors rearranged the most convenient characters for a person to the positions that are used most often. How they did it I do not know. The alphabet is given below.
There are no special characters at the end of the encoding result.

You can read more about each of the encodings in Wikipedia here and here , and now I would like to dwell directly on the implementation of ZBase32.

ZBase32 coding algorithm description

Let me in the description of the algorithm show the calculations on C # for more understanding.

So, we have a 32-character alphabet of the following content:

static string EncodingTable = "ybndrfg8ejkmcpqxot1uwisza345h769";

The input is an array of bytes (of course, 8 bits each), which I would like to translate into characters from the alphabet.

 public static string Encode(byte[] data) {

The alphabet is a string of 32 elements, which means that each of its characters is encoded with a number from 0 to 31 (character indexes in the string). As you know, any number from 0 to 31 in a binary number system can be written using 5 bits of a byte. From this it follows that if we present the initial set of bytes as a single array of bits and break it into pieces of 5 bits (see figure below), then we get a set of coordinates of characters from the alphabet. That's all.

The Base32 and Base64 algorithms are similar to ZBase32, only different alphabets (in composition in the case of Base32, in composition and size in the case of Base64) and the size of “pinch off” bits (6 bits for Base64).

So, I suggest that before you start splitting the source data into pieces of 5 bits, prepare a place where the result will be recorded. In order not to think about indexes in static arrays, let's use StringBuilder.

 var encodedResult = new StringBuilder((int)Math.Ceiling(data.Length * 8.0 / 5.0));

During initialization, we immediately specify the size of the resulting string (so as not to waste time on expanding during the operation of the algorithm).

Now it remains to run through the original byte array and divide it into 5-bit bits. For convenience, I suggest working with a group of 5 bytes, since this is 40 bits - a number that is a multiple of the length of the “pieces”. But do not forget that no one has driven us to the initial data, therefore we consider the possibility of shortage.

 for (var i = 0; i < data.Length; i += 5) { var byteCount = Math.Min(5, data.Length - i);

Since we are working with a group of 5 bytes, we need a buffer where a continuous set of bits will be formed (40 bits in total). Let's get an ulong type variable (64 bits at our disposal) and put the current batch of bytes there.

 ulong buffer = 0; for (var j = 0; j < byteCount; ++j) { buffer = (buffer << 8) | data[i + j]; }

And the final stage is the “pinching off” of what happened, pieces of 5 bits each and the formation of the result.

 var bitCount = byteCount * 8; while (bitCount > 0) { var index = bitCount >= 5 ? (int)(buffer >> (bitCount - 5)) & 0x1f : (int)(buffer & (ulong)(0x1f >> (5 - bitCount))) << (5 - bitCount); encodedResult.Append(EncodingTable[index]); bitCount -= 5; }

Perhaps in the last code example at first glance, not everything is clear, but if you concentrate a little, everything will fall into place.

The decoding process is similar to the encoding process, only in the opposite direction.

You can see the full implementation of ZBase32Encoder .

Conclusion

And, of course, in conclusion, I want to say the following.

 4nq7bcgosuemmwcq4gy7ddbcrdeadwcn4napdysttuea6egosmembwfhrdemdwcm4n77bcby4n97bxsozzea9wcn4n67bcby4nhnbwf94n9pbq6oszemxwf74nanhegow8em9wfo4gy7bqgos8emhegos9emyegosmem5wfa4n6pbcgozzemtwfirr

Source: https://habr.com/ru/post/190054/

All Articles