ASN.1 in simple terms (REAL type encoding)

Introduction for Habr

The text below is actually the first two chapters of my article "ASN.1 simple words." Since the article itself is quite large by the standards of Habr, I decided to first check whether knowledge of the coding of simple types is in demand on this resource. In case of a positive reaction from the audience, I will continue to publish all the other chapters.

Introduction

Already for a fairly long period I have to deal with ASN.1. I was lucky to work in the field of creating cryptographic programs, and in the field of telecommunications. Both in the one and in the other sphere, the ASN.1 standard was initially extremely active and widely used.

However, in the process of creating cryptographic programs and in the process of creating programs for the telecommunications industry, I constantly met with the same opinion - ASN.1 is a complex and incomprehensible format, and therefore third-party compilers are better for encoding / decoding even other coding standards for transmitted information).

One of the reasons why the situation has arisen when the overwhelming majority of software developers consider the ASN.1 standard to be difficult is the lack of books on the subject. Yes, in spite of the venerable age of this standard, a lot of freely distributed compilers and various articles, there are still very few books (or even articles on the Internet) where the coding of simple ASN.1 types would be clarified in simple and understandable language, with many examples. .

')
Correcting the current situation, this article partly serves as a kind of manual that helps even a person who has not previously encountered this format understand the intricacies of ASN.1 coding. The article covers only the coding of simple (non-composite) types - REAL, INTEGER, OBJECT IDENTIFIER, all kinds of strings, BOOLEAN, NULL, SEQUENCE, SET. The article provides a detailed explanation of all the intricacies of coding for each of the types, also provides detailed examples explaining the intricacies of coding for this type. In a separate file attached to this article, you can find the code in C ++, which forms all the examples from the article. In addition, this sample file contains additional materials not covered in this article. All materials of the article are based on the latest ASN.1 standard from 2008, all of whose constituent sub-standards can be downloaded in one file at http://www.itu.int/rec/T-REC-X.680-X.693- 200811-i / en . If this is not specifically mentioned, then the examples given in the article encode the types in the ASN.1 BER standard (Basic Encoding Rules).

In most manuals and books on ASN.1, the study of coding begins with the simplest, not complex, types and ends with the most complex. In this article, the order will be strictly opposite - the reader will first be asked to study the coding of complex types, and only then we will gradually move on to the study of the simplest. This will once assimilate the coding methods for a complex type to easily and quickly understand the coding technique more simple.

Chapter 1. ASN.1 General Encoding Rules

Initially, it is still necessary to clarify some basic coding in the ASN.1 format.

To begin with, we will explain why this standard was created. There are many different computers in the world. And besides, there are many standards for presenting data in these computers. ASN.1 was created as a kind of general standard, allowing to describe arbitrary information, which would be understood by any computer that has an idea about this standard. The ASN.1 standard therefore imposes strict coding rules even at the level of individual bits of information, as well as their mutual arrangement. Additionally, it must be said that the ASN.1 standard encodes information not in the form of text, but in the form of binary sequences. Variations of coding formats have already appeared that allow data to be presented as text (XML), but a review of these formats is beyond the scope of this article. Here we consider only the most difficult - binary encoding (ASN.1 BER format - Basic Encoding Rules).

The data encoded in the ASN.1 format is a sequence of bytes (or "octets") that go one after the other, without any gaps. The sequence encoded in ASN.1 can be transmitted over communication lines, saved to a file — a block of encoded information in ASN.1 already contains the necessary description of its total length and content.

To enable such a description of the information contained in an encoded block, a certain general structure of each block is applied. Each block contains at least 3 mandatory parts (in some cases only the first two blocks remain, but these cases are described separately):

Part of the block identifier (up to several octets);
Part of the total block length (up to several octets);
The part containing the actual value that carries this block (up to several octets);

In addition, there may be another 4th, not mandatory part - part of the octets of the end of the block value (several octets). About this part will be discussed later.

Let us proceed to the description of each part of the ASN.1-coded block.

The block identifier part consists of at least one octet. The format of this first octet is strictly fixed.

bits 8 and 7 (high bits, usually written to the far left) encode the so-called "class" of the current block;
bit 6 must be set to 0 if the current block contains information about only one value and must be set to 1 if there are additional ASN.1-coded blocks within the block value;
Bits 5 through 1 encode the actual type identifier for this block;

If the type identifier for a block is in the range of 0-30, the identification block consists of only one octet. If the type identifier for the block is 31 or higher, then all 1 is set in bits 5-1, and the following number is encoded in subsequent octets. The type identifier number is encoded as an unsigned integer laid out on the base 128. In each octet encoding the type identifier for the block, the high-order bit must be equal to 1, except for the most extreme, final octet (the encoding method is exactly the same as the SID for the OBJECT are encoded IDENTIFIER, see below).

The part of the total block length contains at least 1 octet encoding the length of the value that contains the block (it is only the length of the block containing the encoded value, and not the total length of the entire encoded block together with the block identifier and part of the total length!). In the simplest case, the block length is encoded as a non-signed integer spread out on the base 128. Bit 8 (high bit) in this case is an additional flag. If the total length of the encoded block exceeds 128, then the most significant bit of the first octet of the part of the total length of the block should be set to 1, and the next 7 bits should encode without a significant integer the number of subsequent octets that will encode the actual total length of the block.

For example, if the total block length is L = 201, then it will be encoded using two octets:

1000 0001 (81)
1100 1001 (C9)

In addition to explicitly specifying the total block length, it is possible to determine the end of this block directly during the block decoding process. This is important when it is not clear at the initial coding of the block exactly how many octets it will contain (stream coding). In this case, the first octet of a part of the total block length must be equal to 80 (most significant bit 8 is 1 and all other bits are 0). The end of the whole block is determined by the presence in the block of the value of two successive octets 00 00.

Chapter 2. Coding Type REAL

General description of the type:

Tag class - UNIVERSAL (00);
Tag Number - 9;
The value coding form is primitive (not constructive);

First, a little theory on the actual floating-point numbers. Floating-point numbers are usually composed of three parts: mantissas, bases, and exponents. This can be more easily explained using the formula: REAL = (mantissa) * (base) ^(exponent) . If according to this formula to represent the usual decimal numbers, you get REAL = (mantissa) * 10 ^(exponent) . Since in ASN.1 both the mantissa and the exponent can be both positive and negative, it is possible as a representation of arbitrarily large and arbitrarily small values, with an arbitrary sign.

Unlike the usual, machine-based, representation of floating-point numbers (IEEE 754) in ASN.1, the type REAL is practically unlimited in size as the mantissa (the mantissa can consist of a practically unlimited number of octets and represent an arbitrarily large number), and the size of the exponent (the exponent value can also consist of an arbitrary number of octets). Restrictions on coding are imposed only on the value of the “base”: only the numbers 10, 2, 8 or 16 can be selected as the “base”.

The following three basic blocks are used for encoding type REAL:

Service Information Octet;
The value of the exponent of a number;
The value of the mantissa number;

The service information octet contains the following information:

Possible combinations of bits 8 and 7 (leftmost):
- Bit 8 = 1 - binary coding is applied (on one of the bases 2, 8 or 16);
- Bit 8 = 0 and bit 7 = 0 - decimal encoding is applied (in fact, the encoding of the string standard representation of a number, see below);
- Bit 8 = 0 and bit 7 = 1 - the encoded value is "special value" (NaN, INFINITE etc.) or the encoded value encodes "-0";
Bit 7 is set to 0 when the number to be encoded is positive, and is set to 1 when the number to be encoded is negative;
The combination of bits 6 and 5 defines the base of binary coding:
- 00 - the coded number is spread out on the basis 2;
- 01 - the coded number is spread out on the base 8;
- 10 - the coded number is spread out on the base 16;
- 11 - reserved for future possible changes;
Bits 4 and 3 encode the value of "scaling factor" (F, see below) in binary code;
Bits 2 and 1 encode the exponent representation in a coded number:
- 00 - the next octet is the only octet encoding the value of the exponent;
- 01 - the next two octets encode the value of the exponent;
- 10 - the next three octets encode the value of the exponent;
- 11 - the next octet contains the number of subsequent octets encoding the value of the exponent (the number of octets is encoded as a normal unsigned number (only positive values are allowed, of course), and the subsequent octets encode the value of the exponent;

The value of the exponent of a number is encoded by an integer consisting of an arbitrary number of octets. Here it is necessary to make a small digression and tell exactly how positive integers and negative integers are encoded in ASN.1.

Positive integers in ASN.1 are a sequence of “indices” with the corresponding degrees of decomposition on the base 256. That is, an integer represented in the usual decimal format is first decomposed on the base 256, and then the indices with the corresponding degrees 256 are written as encoding octets . For a visual example, take the number 32639. This number decomposes along the base 256 as: 32639 ₁₀ = 127 * 256 ¹ + 127 * 256 ⁰ . Therefore, the coefficients at the corresponding powers of 256 will be equal (127, 127). By representing the decimal value 127 as a sequence of bits, we get: 127 = 0111 1111, or by representing each group of four bits as a number from 0 to F, we get: 127 = 0111 1111 = 7F. Thus, the initial number 32639 will be encoded by a sequence of two octets 7F 7F.

The above method can encode an arbitrarily large positive integer. However, what about coding negative integer values? It is for encoding negative integers that the special procedure for encoding values is applied.

For example, again take the number 32639, but now let it be negative (-32639). The encoding of negative integers is constructed in such a way that not one, but two integer values are actually encoded - one basic value and another integer value that must be subtracted from the basic value. That is, when decoding to obtain a coded negative number, simply calculate the result (x - y). As can be seen from this simplest formula, if the value of "x" is less than the value of "y", then the result will be less than zero (that is, a negative number).

The above two numbers (the main number and the number that must be subtracted from the main one) are formed according to the following rules:

Let an ASN.1-encoded number consist of a sequence of N bits;
Then the number to be subtracted from the base number is formed as a number also consisting of N bits, but where all the bits except the highest one (the leftmost bit) are set to 0;
The main number also consists of N bits, but the most significant bit in it is set to 0. The values of all the other bits fully correspond to the corresponding bits from the originally coded number (remain unchanged);

Let us turn to the coding of a specific number from the example (-32639). Since the number to be subtracted from the main one must be greater than the main number, the encoding of negative integers begins with the choice of this subtracted. Since, according to the rules, this deductible must be decomposed in base 256 so that all bits representing indices with corresponding degrees 256 are 0 except the first bit, the number of possible subtracted is the leading octet 80 (1000 0000) and some number of octets 00, following him. That is, 80 (128 ₁₀ ), 80 00 (32768 ₁₀ ), 80 00 00 (8388608 ₁₀ ), etc. can be used as deductible. To encode our number "-32639", choose the first suitable subtractive, larger than the number to be coded modulo (i.e., greater than 32639). The nearest such number is 32768 (80 00).

Now you need to get the value of the main number. To do this, you must again solve the simplest formula: x - 32768 = -32629. Solving the equation we get the value x = 129 = 129 * 256 ⁰ , therefore the number 129 is encoded with one byte 81 ₂₅₆ . Since if you look at the rules more closely, you can understand that the number of bits in the main and subtracted numbers should be equal. The number of bits in the subtracted is 16. At the same time, the number of bits in the base number is only 8. To increase the number of bits in the base number, simply add non-significant zeros for the higher bits. Then we get 129 = 0 * 256 ¹ + 129 * 256 ⁰ , and therefore the main number will be encoded with two octets as (00 81). Now by setting the first bit to 1 for the received two octet base number, we get the final number, which encodes "-32639". This number will be encoded with two 80 81 81 octets. Once again - the main number is formed from all the bits of the encoded number, except for the most significant bit (we get that the main number is encoded in us 00 81), and the subtracted number is formed only from one of the first bits set to 1 , and all the other bits set to 0 (we find that the subtracted number is encoded as 80 00).

And now, pleasant information - in modern computer systems, integers (both positive and negative) are automatically encoded and stored in the format that was described above. That is, for encoding integers in ASN.1, you do not need to perform any actions at all - you just need to save them byte by byte and that's it.

The value of the mantissa of a number is always without a signed integer. That is, the mantissa of the number encoded in ASN.1 is always a positive number. In order to encode negative floating point numbers, a separate bit (bit 7) is provided in the service octet in ASN.1 (see above).

The mantissa is encoded as a sequence of bytes representing the coefficients of the decomposition of the initial number on the base 256. That is, if the mantissa of the number in decimal form is 32639, then the coded number will consist of two 7F 7F octets (32639 ₁₀ = 127 * 256 ¹ + 127 * 256 ⁰ = 7F * FF ¹ + 7F * FF ⁰ ).

Examples of coding REAL numbers in ASN.1 in binary representation:

For example, take the number 0.15625. To begin with, we encode it in binary representation on the base 2. The coefficients of the expansion of this number on the base 2 will be as follows: 0.15625 ₁₀ = 1 * 2 ^-3 + 1 * 2 ^-5 . That is, the mantissa for our test number will be M = 101 ₂ , and the exponent value will be -5. The service octet for this number will be 1000 0000 ₂ = 80 ₁₆ . The exponent value will be encoded by one octet: -5 = 123 - 128 and therefore the main number will be 123 ₁₀ = 7B ₁₆ , and the subtracted number is 128 ₁₀ = 80 ₁₆ . Then the final octet encoding the number -5 will be equal to FB ₂₅₆ . The value of the mantissa is also encoded by one octet: 101 ₂ = 05 ₁₆ . Now we know all the parts of the block coding the value of 0.15625 in binary code on the base 2 and the entire coding block will consist of three octets (80 FB 05) ₂₅₆ .
Now we will encode the same number 0.15625, but already on the base 8. The coefficients of the expansion of this number on the base 8 will be as follows: 0.15625 ₁₀ = 1 * 8 ^-1 + 2 * 8 ^-2 . That is, the mantissa for our test number will have the value M = 12 ₈ = (001 010) ₂ (when encoding a number in an 8-fold system, three separate bits are required for each value). The exponent value will be -2. The service octet for this number will be 1001 0000 ₂ = 90 ₂₅₆ . The exponent value will be encoded by one octet, where the main and subtracted numbers are found from the formula: -2 = 126 - 128. Therefore, the octet encoding the exponent -2 value will be FE ₂₅₆ . The value of the mantissa of the number will also be encoded with one octet 0A ₂₅₆ .
In this example, decompose the number 0.15625 to the base 16. The coefficients of this decomposition will be as follows: 0.15626 ₁₀ = 2 * 16 ^-1 + 8 * 16 ^-2 . Therefore, we obtain the expression for the mantissa M = 28 ₁₆ = (0010 1000) ₂ and the exponent value is E = -2. Now we put an additional condition: the value of the mantissa should be "normalized", that is, it should not contain zeros in the lower digits of the number (also this requirement often sounds like "the mantissa must be odd", since if the last low bit is 1, then the whole number is odd due to the fact that 1 * 2 ^{0 is} added to powers of two. How can a similar "normalization" condition be fulfilled? Obviously, the main way is to change the value of the exponent of a number, shifting the floating point. In the case of base 2 decomposition, everything seems simple - changing the value of the exponent by 1 shifts the floating point (or adds / removes zeros in the lower digits of the mantissa) by exactly one position. However, in the case of using the decomposition on bases 8 and 16, we find that changing the value of the exponent by 1 shifts the floating point in the mantissa by 3 and 4 bits at once, respectively (since in the case of decomposition on the base 8, 3 numbers are required to represent the number, and in the case of decomposition base 16 requires 4 bits to represent the number). Consequently, the mantissa value obtained for the decomposition on bases 8 and 16 is far from always “normalized” simply by changing the value of the exponent. For a more "fine tuning" of the floating point shift in the mantissa, an additional factor was introduced: the multiplying factor, F. The multiplying factor shifts the floating point in the mantissa to the right (or adds the required number of zero bits to the right of the number). To do this, before decoding, the value of the mantissa is obtained as the result of the multiplication M = N * 2 ^F. It is well known that multiplying an integer by 2 is equivalent to a bit shift to the left by 1 bit. Accordingly, multiplying by 2 ^{F is} equivalent to a bit left shift of F bits. Thus, we obtain the following process of encoding / decoding the mantissa upon presentation of the requirement for its normalization:
1. Let the mantissa be given 0010 1000;
2. When encoding, we "normalize" it (or shift it to the right by 3 bits), receiving 0000 0101, simultaneously setting the value of the multiplying factor F = 3;
3. When decoding, we multiply the coded value of the mantissa by 2 ^F , rather than shift the coded mantissa back to F = 3 bits to the left;
Consequently, the entire floating-point number from our example (provided that the mantissa is “normalized”) will be encoded with the following sequence of octets:

AC FE 05

In addition to coding all parts of a floating-point number as a binary representation in the decomposition into various powers of two, there is additionally an excellent opportunity to represent such numbers in ASN.1 in the usual string form, in which we usually see such numbers. In this case, it is considered that the number is encoded with a base of 10.

When coding on base 10, the concept of "number representation forms" is additionally introduced. There are 3 such forms in total (forms NR1, NR2 and NR3) and they are described in a separate standard ISO 6093. Since this standard is paid, you can recommend the “ancestor” of ISO 6093 - ECMA-63, which is easy can be found on the Internet.

When encoding a floating-point number in the base 10 decomposition representation, the number representation form code is specified in the service information octet (01, 02 or 03 for the corresponding forms), and immediately after the service information octet, character codes representing the coded number are indicated. The following character codes are allowed:

The characters denoting the numbers 0-9 (codes 30-39, respectively);
Space (code 20);
The separator character "." (code 2E);
The separator "," (code 2C);
The symbol representing the exponent "E" (code 45), or another character representing the exponent "e" (code 65);
The "-" symbol (2D code);
The "+" symbol (code 2B);

All other characters are not allowed to be encoded (when decoding characters other than the above, the ASN.1 decoder is required to give an error).

Examples of encoding a floating-point number in decimal form:

For example, we encode the usual number 1. In the case of representation in the form NR1, the number will be encoded by the string "1" (or "+1").
In the case of representing a number in the form of NR2, the number should already be encoded with the separator character, therefore all the lines below are equivalent:
1. "one,"
2. "+1.0"
3. "1.000000"
4. "1.0" (there may be an unlimited number of spaces at the beginning of the line)
Now imagine 1 in the form of NR3. Here it is already necessary to use both a separator symbol and an exponent symbol. In the form of NR3, according to standard 1, it can be represented as "+ 1.0E + 0" ("1.0E + 0" in the case of the separator symbol "."), That is, the value of the exponent must always be zero.

In addition to the usual numbers, ASN.1 allows you to also encode a number of “special” numbers:

PLUS-INFINITY (plus infinity);
MINUS-INFINITY (minus infinity);
NOT-A-NUMBER (the so-called "non-number");
minus zero (for the possibility of coding "-0");

All special numbers are encoded with only one service information octet, without specifying the octets for the exponent and the mantissa:

PLUS-INFINITY - _40,256 ;
MINUS-INFINITY - _41,256 ;
NOT-A-NUMBER - _42,256 ;
minus zero - _43,256 ;

UPDATE: a list of subsequent chapters of my article

UPDATE # 2: Link to encoding example file for all data types

UPDATE # 3: Maybe someone missed, but here is the implementation of C ++ ASN.1 coder / decoder with support for type REAL. And here is the implementation in JavaScript, but so far without the REAL type.

Source: https://habr.com/ru/post/150757/

All Articles

ASN.1 in simple terms (REAL type encoding)

Introduction for Habr

Introduction

Chapter 1. ASN.1 General Encoding Rules

Chapter 2. Coding Type REAL

More articles: