The text below is actually the first two chapters of my article "ASN.1 simple words." Since the article itself is quite large by the standards of Habr, I decided to first check whether knowledge of the coding of simple types is in demand on this resource. In case of a positive reaction from the audience, I will continue to publish all the other chapters.
Already for a fairly long period I have to deal with ASN.1. I was lucky to work in the field of creating cryptographic programs, and in the field of telecommunications. Both in the one and in the other sphere, the ASN.1 standard was initially extremely active and widely used.
However, in the process of creating cryptographic programs and in the process of creating programs for the telecommunications industry, I constantly met with the same opinion - ASN.1 is a complex and incomprehensible format, and therefore third-party compilers are better for encoding / decoding even other coding standards for transmitted information).
One of the reasons why the situation has arisen when the overwhelming majority of software developers consider the ASN.1 standard to be difficult is the lack of books on the subject. Yes, in spite of the venerable age of this standard, a lot of freely distributed compilers and various articles, there are still very few books (or even articles on the Internet) where the coding of simple ASN.1 types would be clarified in simple and understandable language, with many examples. .
In most manuals and books on ASN.1, the study of coding begins with the simplest, not complex, types and ends with the most complex. In this article, the order will be strictly opposite - the reader will first be asked to study the coding of complex types, and only then we will gradually move on to the study of the simplest. This will once assimilate the coding methods for a complex type to easily and quickly understand the coding technique more simple.
Initially, it is still necessary to clarify some basic coding in the ASN.1 format.
To begin with, we will explain why this standard was created. There are many different computers in the world. And besides, there are many standards for presenting data in these computers. ASN.1 was created as a kind of general standard, allowing to describe arbitrary information, which would be understood by any computer that has an idea about this standard. The ASN.1 standard therefore imposes strict coding rules even at the level of individual bits of information, as well as their mutual arrangement. Additionally, it must be said that the ASN.1 standard encodes information not in the form of text, but in the form of binary sequences. Variations of coding formats have already appeared that allow data to be presented as text (XML), but a review of these formats is beyond the scope of this article. Here we consider only the most difficult - binary encoding (ASN.1 BER format - Basic Encoding Rules).
The data encoded in the ASN.1 format is a sequence of bytes (or "octets") that go one after the other, without any gaps. The sequence encoded in ASN.1 can be transmitted over communication lines, saved to a file — a block of encoded information in ASN.1 already contains the necessary description of its total length and content.
To enable such a description of the information contained in an encoded block, a certain general structure of each block is applied. Each block contains at least 3 mandatory parts (in some cases only the first two blocks remain, but these cases are described separately):
In addition, there may be another 4th, not mandatory part - part of the octets of the end of the block value (several octets). About this part will be discussed later.
Let us proceed to the description of each part of the ASN.1-coded block.
The block identifier part consists of at least one octet. The format of this first octet is strictly fixed.
If the type identifier for a block is in the range of 0-30, the identification block consists of only one octet. If the type identifier for the block is 31 or higher, then all 1 is set in bits 5-1, and the following number is encoded in subsequent octets. The type identifier number is encoded as an unsigned integer laid out on the base 128. In each octet encoding the type identifier for the block, the high-order bit must be equal to 1, except for the most extreme, final octet (the encoding method is exactly the same as the SID for the OBJECT are encoded IDENTIFIER, see below).
The part of the total block length contains at least 1 octet encoding the length of the value that contains the block (it is only the length of the block containing the encoded value, and not the total length of the entire encoded block together with the block identifier and part of the total length!). In the simplest case, the block length is encoded as a non-signed integer spread out on the base 128. Bit 8 (high bit) in this case is an additional flag. If the total length of the encoded block exceeds 128, then the most significant bit of the first octet of the part of the total length of the block should be set to 1, and the next 7 bits should encode without a significant integer the number of subsequent octets that will encode the actual total length of the block.
For example, if the total block length is L = 201, then it will be encoded using two octets:
In addition to explicitly specifying the total block length, it is possible to determine the end of this block directly during the block decoding process. This is important when it is not clear at the initial coding of the block exactly how many octets it will contain (stream coding). In this case, the first octet of a part of the total block length must be equal to 80 (most significant bit 8 is 1 and all other bits are 0). The end of the whole block is determined by the presence in the block of the value of two successive octets 00 00.
General description of the type:
First, a little theory on the actual floating-point numbers. Floating-point numbers are usually composed of three parts: mantissas, bases, and exponents. This can be more easily explained using the formula: REAL = (mantissa) * (base) (exponent) . If according to this formula to represent the usual decimal numbers, you get REAL = (mantissa) * 10 (exponent) . Since in ASN.1 both the mantissa and the exponent can be both positive and negative, it is possible as a representation of arbitrarily large and arbitrarily small values, with an arbitrary sign.
Unlike the usual, machine-based, representation of floating-point numbers (IEEE 754) in ASN.1, the type REAL is practically unlimited in size as the mantissa (the mantissa can consist of a practically unlimited number of octets and represent an arbitrarily large number), and the size of the exponent (the exponent value can also consist of an arbitrary number of octets). Restrictions on coding are imposed only on the value of the “base”: only the numbers 10, 2, 8 or 16 can be selected as the “base”.
The following three basic blocks are used for encoding type REAL:
The service information octet contains the following information:
The value of the exponent of a number is encoded by an integer consisting of an arbitrary number of octets. Here it is necessary to make a small digression and tell exactly how positive integers and negative integers are encoded in ASN.1.
Positive integers in ASN.1 are a sequence of “indices” with the corresponding degrees of decomposition on the base 256. That is, an integer represented in the usual decimal format is first decomposed on the base 256, and then the indices with the corresponding degrees 256 are written as encoding octets . For a visual example, take the number 32639. This number decomposes along the base 256 as: 32639 10 = 127 * 256 1 + 127 * 256 0 . Therefore, the coefficients at the corresponding powers of 256 will be equal (127, 127). By representing the decimal value 127 as a sequence of bits, we get: 127 = 0111 1111, or by representing each group of four bits as a number from 0 to F, we get: 127 = 0111 1111 = 7F. Thus, the initial number 32639 will be encoded by a sequence of two octets 7F 7F.
The above method can encode an arbitrarily large positive integer. However, what about coding negative integer values? It is for encoding negative integers that the special procedure for encoding values is applied.
For example, again take the number 32639, but now let it be negative (-32639). The encoding of negative integers is constructed in such a way that not one, but two integer values are actually encoded - one basic value and another integer value that must be subtracted from the basic value. That is, when decoding to obtain a coded negative number, simply calculate the result (x - y). As can be seen from this simplest formula, if the value of "x" is less than the value of "y", then the result will be less than zero (that is, a negative number).
The above two numbers (the main number and the number that must be subtracted from the main one) are formed according to the following rules:
Let us turn to the coding of a specific number from the example (-32639). Since the number to be subtracted from the main one must be greater than the main number, the encoding of negative integers begins with the choice of this subtracted. Since, according to the rules, this deductible must be decomposed in base 256 so that all bits representing indices with corresponding degrees 256 are 0 except the first bit, the number of possible subtracted is the leading octet 80 (1000 0000) and some number of octets 00, following him. That is, 80 (128 10 ), 80 00 (32768 10 ), 80 00 00 (8388608 10 ), etc. can be used as deductible. To encode our number "-32639", choose the first suitable subtractive, larger than the number to be coded modulo (i.e., greater than 32639). The nearest such number is 32768 (80 00).
Now you need to get the value of the main number. To do this, you must again solve the simplest formula: x - 32768 = -32629. Solving the equation we get the value x = 129 = 129 * 256 0 , therefore the number 129 is encoded with one byte 81 256 . Since if you look at the rules more closely, you can understand that the number of bits in the main and subtracted numbers should be equal. The number of bits in the subtracted is 16. At the same time, the number of bits in the base number is only 8. To increase the number of bits in the base number, simply add non-significant zeros for the higher bits. Then we get 129 = 0 * 256 1 + 129 * 256 0 , and therefore the main number will be encoded with two octets as (00 81). Now by setting the first bit to 1 for the received two octet base number, we get the final number, which encodes "-32639". This number will be encoded with two 80 81 81 octets. Once again - the main number is formed from all the bits of the encoded number, except for the most significant bit (we get that the main number is encoded in us 00 81), and the subtracted number is formed only from one of the first bits set to 1 , and all the other bits set to 0 (we find that the subtracted number is encoded as 80 00).
And now, pleasant information - in modern computer systems, integers (both positive and negative) are automatically encoded and stored in the format that was described above. That is, for encoding integers in ASN.1, you do not need to perform any actions at all - you just need to save them byte by byte and that's it.
The value of the mantissa of a number is always without a signed integer. That is, the mantissa of the number encoded in ASN.1 is always a positive number. In order to encode negative floating point numbers, a separate bit (bit 7) is provided in the service octet in ASN.1 (see above).
The mantissa is encoded as a sequence of bytes representing the coefficients of the decomposition of the initial number on the base 256. That is, if the mantissa of the number in decimal form is 32639, then the coded number will consist of two 7F 7F octets (32639 10 = 127 * 256 1 + 127 * 256 0 = 7F * FF 1 + 7F * FF 0 ).
Examples of coding REAL numbers in ASN.1 in binary representation:
Consequently, the entire floating-point number from our example (provided that the mantissa is “normalized”) will be encoded with the following sequence of octets:
AC FE 05
In addition to coding all parts of a floating-point number as a binary representation in the decomposition into various powers of two, there is additionally an excellent opportunity to represent such numbers in ASN.1 in the usual string form, in which we usually see such numbers. In this case, it is considered that the number is encoded with a base of 10.
When coding on base 10, the concept of "number representation forms" is additionally introduced. There are 3 such forms in total (forms NR1, NR2 and NR3) and they are described in a separate standard ISO 6093. Since this standard is paid, you can recommend the “ancestor” of ISO 6093 - ECMA-63, which is easy can be found on the Internet.
When encoding a floating-point number in the base 10 decomposition representation, the number representation form code is specified in the service information octet (01, 02 or 03 for the corresponding forms), and immediately after the service information octet, character codes representing the coded number are indicated. The following character codes are allowed:
All other characters are not allowed to be encoded (when decoding characters other than the above, the ASN.1 decoder is required to give an error).
Examples of encoding a floating-point number in decimal form:
In addition to the usual numbers, ASN.1 allows you to also encode a number of “special” numbers:
All special numbers are encoded with only one service information octet, without specifying the octets for the exponent and the mantissa:
UPDATE: a list of subsequent chapters of my article
UPDATE # 2: Link to encoding example file for all data types
UPDATE # 3: Maybe someone missed, but here is the implementation of C ++ ASN.1 coder / decoder with support for type REAL. And here is the implementation in JavaScript, but so far without the REAL type.
Source: https://habr.com/ru/post/150757/
All Articles