ASN.1 in simple terms (part 2)

I continue the publication on Habré of chapters from the article "ASN.1 by simple words". The previous part can be found at ASN.1 address in simple words (encoding type REAL) .

Chapter 3. Coding Type OBJECT IDENTIFIER

General description of the type:

Tag class - UNIVERSAL (00);
Tag Number - 6;
The value coding form is primitive (not constructive);

By itself, OBJECT IDENTIFIER is a set of integers without signed numbers, separated by the ".". Examples of possible OBJECT IDENTIFIER: 0.1.1, 1.1.1, 2.1234.1234.1234.1234.

OBJECT IDENTIFIER (OID) coding consists of sequential coding of all components of a given OID without signed integers (SID - sub identifier).
')
To reduce the overall size of the encoded OID, the following two rules apply to the first two SIDs:

The first non-signed integer (SID1) must be either 0, or 1, or 2. Other values are invalid. SID1 is excluded from the encoding process, instead, the value of SID1 is calculated naturally in decoding SID2 (see below);
Before encoding the second SID, the following formula is applied to it: SID2 = SID1 * 40 + SID2;
To guarantee the determination of SID1 from the result of the obtained expression for SID2, the following requirements are imposed on SID2:
If SID1 = 0 or SID1 = 1, then the value of SID2 should be in the range from 0 (inclusive) to 39 (inclusive);
If SID1 = 2, then SID2 can be expressed as an arbitrary integer without a signed number;

Encoding examples of the first two octets:

"0.39" is encoded with a single integer 0 * 40 + 39 = 39;
"1.0" is encoded with one integer 1 * 40 + 0 = 40;
"1.39" is encoded with one integer 1 * 40 + 39 = 79;
"2.0" is encoded by a single integer 2 * 40 + 0 = 80;
"2.39" is encoded with one integer 2 * 40 + 39 = 119;
"2.339" is encoded by a single integer 2 * 40 + 339 = 419;

Each SID is encoded independently of the others. All SIDs are encoded one after another without additional delimiters. The SID encoding rules are the same for all SIDs except the first two (see above).

It should be noted that theoretically each SID can have any large integer value. Therefore, each individual SID can be encoded by an arbitrary, arbitrarily large number of octets. Usually for encoding a variable number of octets, additional coding is used for the number of these octets (before a sequence of encoded octets, an octet is encoded containing the number of encoded octets). However, in the case of SID coding, a different approach is used - the most significant bit of each of the octets encoding a separate SID is a peculiar flag by which one can judge the last octet for this SID or not. That is, since one SID can be encoded with more than one octet (in the case of a large integer), the following rule applies: in all octets encoding the SID value except the last (least significant) octet, the most significant bit must be set to 1. Thus, it turns out that only the lower 7 bits are significant for encoding SID bits in an 8-bit octet, the most significant bit in each octet is used as a flag indicating the last octet. In this regard, before encoding, each SID is transferred from the decimal form to the decomposition form on the base 128 (7 bits are grouped).

SID coding examples:

SID = 64310 = 5 * 128 ¹ + 3 * 128 ⁰ = (05 03) ₁₂₈ . Since all but the least significant octets must have the high-order bit set, the SID equal to 643 is encoded in ASN.1 using two octets (85 03) ₂₅₆ ;
SID = 11354910 = 6 * 128 ² + 119 * 128 ¹ + 13 * 128 ⁰ = (06 77 0D) ₁₂₈ . Since all but the least significant octets must have the high bit set, the SID equal to 113549 is encoded in ASN.1 using three octets (86 F7 0D) ₂₅₆ ;
SID = 4915210 = 3 * 128 ² . In this case, the decomposition does not contain the lower degrees of the number 128. But since the value of the exponent of a number is not stored anywhere in the SID encoding, for the needs of coding, 49152 should be represented as (3 * 128 ² + 0 * 128 ¹ + 0 * 128 ⁰ ) = (03 00 00) ₁₂₈ (that is, the coded number must always consist of the number of octets greater than 1 of the highest power of the number 128 when decomposed). Since all but the least significant octets must have the high bit set, the SID equal to 49152 is encoded in ASN.1 using three octets (83 80 00) ₂₅₆ ;

When encoding SIDs, the minimum number of octets must be used. That is, SID = 643 should not be represented as (0 * 128 ² + 5 * 128 ¹ + 3 * 128 ⁰ ) = (00 05 03) ₂₅₆ and, therefore, should not be encoded as (80 85 03) ₂₅₆ . The simplest test for correct SID encoding is that the first octet of the encoded SID does not have to be _80,256 .

Chapter 4. Coding Type INTEGER

General description of the type:

Tag class - UNIVERSAL (00);
Tag Number - 2;
The value coding form is primitive (not constructive);

The coding of integers (INTEGER) has already been considered earlier (in the chapter on encoding of the REAL type). However, we once again recall the main features of this coding.

In ASN.1, both positive and negative numbers can be encoded. Each integer is practically unlimited in its value (that is, arbitrarily large (modulo) integers can be encoded in ASN.1).

Each integer is encoded by a sequence of octets, each octet is 8 bits of information. Each octet is a “weight” before the corresponding power of 256 that participates in decomposing a coded number in base 256. That is, to encode a number, the original number is first decomposed in base 256, and then the weights values before the corresponding powers 256 are encoded as octets. For example, the number 8388607 will be encoded as follows:

Let's decompose the number by the base 256: 838860710 = 127 * 256 ² + 255 * 256 ¹ + 255 * 256 ⁰ ;
We get that the “weights” with the corresponding degrees 256 will be: 127, 255 and 255;
We translate each number into a sequence of bits, and then encode this sequence of bits, grouping into groups of 4 bits. We get the following values for weights: 7F FF FF;

The encoding of negative integer values is carried out according to separate rules. In fact, not one, but two integers are stored in the coded negative whole: the main number and the number that must be subtracted from the main number, so that when decoding, to obtain the initially coded negative number. That is, when decoding, the initial negative number is obtained by the formula: N = X - Y, where X is the main number, and Y is the subtracted number. It is not difficult to understand that in order for N to turn out negative, it is necessary that the condition Y> X be fulfilled.

Detailed rules for the formation of both the main and subtracted numbers have already been described earlier, so I refer the reader to the chapter on coding type REAL.

Examples of encoding integers:

Suppose we already have a coded number of _80,256 = 128 * 256 ⁰ = 128 ₁₀ = (1000 0000) ₂ . Here the main number is formed by bits 1-7 (lower, rightmost) and is 0, the subtractive is formed by masking all the bits except the high and equal to (1000 0000) ₂ = 80 ₂₅₆ = 128 ₁₀ . Hence the coded number is 0 - 128 = -128;
Coding a positive value of +128 is performed using a sequence of octets of 0 * 256 ¹ + 128 * 256 ⁰ = (00 80) ₂₅₆ , that is, the most significant zero octet is added. In fact, here you can also calculate both the main number and the subtracted number: the main number will be equal to 128, and the subtracted will be equal to 0;
Encode the number -136 ₁₀ . The number 136 ₁₀ = 88 ₂₅₆ = (1000 1000) ₂ . Since in this number the most significant bit is already set, for the correct encoding the subtracted number must be formed by two octets and be equal to (80 00) ₂₅₆ = 128 * 256 ¹ + 0 * 256 ⁰ = 32768 ₁₀ . Consequently, the main number when encoding (-136) can be obtained after solving the following equation: x - 32768 = -136. That is, we get the value of "x" equal to 32626 = 127 * 256 ¹ + 120 * 256 ⁰ = (7F 78) ₂₅₆ . By setting the high bit for this number, we get that the final coding for the number (-136) will be carried out by two octets (FF 78) ₂₅₆ ;

In fact, a requirement is made to the integer to be encoded: the upper 9 bits of the coded number should not all be equal to 1, and should not all be equal to 0 (the higher 9 bits of the coded number should not be equal to each other). If the first 9 bits of the encoded number are 0, then without prejudice to the encoded value, the high-order octet (higher 8 bits) can be discarded (when decoding, it would simply add the term 0 * 256 ⁿ , which does not affect the value of the number). If the upper 9 bits of the coded number are 1, then the coded number is negative and can be transcoded using a smaller number of octets (extra zero zero octets were added to the subtracted number when encoding).

An example of adding extra null octets to a subtracted number. Take the coding number (-128). Put the subtracted number to be: (80 00) ₂₅₆ = 32768 ₁₀ , and the main value is calculated after solving the equation (x - 32768) = -128. Therefore x = 32640 ₁₀ = (7F 80) ₂₅₆ . By setting the most significant bit, we get the final coding (FF 80) ₂₅₆ (here, the 9 most significant bits are equal). But earlier we have already encoded the number (-128) and know that the encoding of this number in ASN.1 can be represented in a shorter form - in the form of only one octet (80) ₂₅₆ ;

Well, at the end of this chapter we once again remind the reader that in modern computer systems, the encoding of integers is (automatically) carried out according to the rules described above. True with one caveat: to encode negative numbers, the “addition of extra null octets to the subtracted number” is still used (see the previous paragraph). That is, if the number of octets (bytes) for storing an integer is 4, then the number (-128) will be encoded as FF FF FF 80 (that is, when the coding was applied, the “subtracted number” is equal to (80 00 00 00) ₂₅₆ = 2147483648 ).

Chapter 5. String Coding

In ASN.1, encoding is applied for a fairly wide range of string types. Here is a complete list of them:

NumericString;
PrintableString;
TeletextString (type completely coincides with T61String);
VideotexString;
VisibleString;
IA5String;
GraphicString;
GeneralString;
UniversalString;
BMPString;
UTF8String;

Some of them are already obsolete and not used string types (for example, VideotexString). Each type of string describes a certain set of characters that can be used in strings with a given type. In addition, so-called “control sequences” can also be used in the strings used, which allow you to automatically adjust the processing of a separate selected string to terminals supporting this feature. Anyone who has programmed in C / C ++ has often often used the “control sequence” \ n - newline. In fact, there are a lot of control sequences already in the standard; in addition, you can create your own control sequences that will be processed only at specialized terminals (for example, to switch the color of selected lines to red, to underline lines, etc.).

The most advanced and modern formats are string formats based on the Unicode standard. The basis of the standard is ISO 10646.

Type UniversalString (tag class UNIVERSAL, tag number 28, encoding form - primitive). Encodes Unicode strings where each character is encoded with 4 bytes (octets).

Type BMPString (tag class UNIVERSAL, tag number 30, encoding form - primitive). A subset of Unicode characters, each character is always encoded with 2 bytes (octets).

Type UTF8String (tag class UNIVERSAL, tag number 12, encoding form - primitive). Representation of Unicode characters, but with additional processing, allowing to encode each character into a sequence of bytes of variable length (from 1 to 7 bytes).

Chapter 6. Date and Time Coding

All types describing the date and time in ASN.1 are ordinary UTF-8 strings, where the values for the year, month, and so on are encoded in a specific format. Formats for the presentation of each type are described in the freely accessible standard ISO 8601.

UTCTime type. The simplest time type. It can encode only the date and UTC (Universal Coordinated Time) time. The string encoding UTCTime can be either YYMMDDHHMMSZ (where YY is the last two digits of the year, MM is two digits of the month, DD is two digits of the day, HH is two digits of an hour (24-hour format), MM is two digits of minutes , SS - two digits of seconds), or it can also encode UTC time together with relative offset for the corresponding time zone in the form YYMMDDHHMMSS + hhmm (or YYMMDDHHMMSS-hhmm).

Type GeneralizedTime. The UTCTime extension, which already uses 4 digits to indicate the year, plus the GeneralizedTime type, allows fractional values to be used for any of the time components (hours, minutes, or seconds). Accordingly, the formats may be as follows:

YYYYMMDDHHMMSSZ;
YYYYMMDDHHMMSS + hhmm;
YYYYMMDDHHMMSS-hhmm;
YYYYMMDDHHMMSS.nn (the number of characters after the point is not limited);
YYYYMMDDHHMM.nn (the number of characters after the point is not limited);
YYYYMMDDHH.nn (the number of characters after the point is not limited);

All the types described below are new to the latest version of the standard X.680: 2008.

TIME type. Describes only the time in the format HH: MM: SS, or in the format HHMMSS.

Type TIME-OF-DAY. The same as the TIME type, except that the format can only be HHMMSS.

DATE type Describes only the date in the format YYYYMMDD.

Type DATE-TIME. Describes both the date and the time for this date. The presentation format is YYYYMMDDHHMMSS. If the value of the extreme fractions of time is 0 (that is, for example, 0 seconds), then the zero value can be excluded (the string is shortened).

Type DURATION. Describes the difference between two time intervals. The presentation format is nnYnnMnnDTnnHnnMnnS.

UPDATE

The final part of my ASN.1 article is published in simple words (part 3, final) .

Source: https://habr.com/ru/post/150820/

All Articles