Problems of recognition of ID-documents on mobile devices on the example of machine-readable zones

Fig. 1 - Passport of the Russian Federation with MRZ-zone (Image source: en.wikipedia.org/wiki/Russian_passport )

Hello, today we want to tell you about the features of the problem of identification of identity documents, using a mobile phone. As an example, we consider the problem of recognizing machine-readable zones of the MRZ in images and frames of a video stream received from a camera of a mobile device.

')

1. What is MRZ?

Machine-Readable Zone (MRZ - Machine-Readable Zone) is a part of an identity document, made in accordance with international recommendations, enshrined in the document Doc 9303 - Machine Readable Travel Documents of the International Civil Aviation Organization .

An example of a machine-readable zone, made in accordance with these recommendations, is the MRZ of foreign passports of citizens of the Russian Federation (Fig. 1 - below).

2. Recognition of MRZ using scanners (including specialized ones)

Consider the features of the use of scanning equipment in the task of optical recognition of documents. When scanning the document is located in a plane perpendicular to the optical axis at a fixed distance from the recording matrix. This achieves homotheticity of the original document and its image, and minor distortions with small deviations from such an arrangement are easily detected and corrected. During the scanning process, the document is stationary during the exposure, therefore, defects associated with the displacement of the original document (blurring) of the image are excluded. Illumination in the scanner is formed by special powerful backlight lamps, which guarantee stable illumination characteristics and the absence of shadows.

A special case of scanning equipment is specialized document readers and hardware-software systems, in which the image is obtained according to the principles of a flatbed, planetary or slit scanner. The document in such devices is either pressed against the glass, or is inserted into a special slot (Fig. 2), which practically eliminates the deformation of the scanned document page.

Fig. 2 - Examples of the location of the document when using readers

Such readers make it possible to obtain images of documents in various lighting schemes (white, infrared, ultraviolet, white to the light). At the same time, for optical recognition, a scheme with white and infrared illumination can be used, which gives a high-contrast image with a low level of interference from background padding and protection elements.

Fig. 3 - Scan of the passport of Japan in the white and infrared bands (Image source: bersisteknoloji.com.tr/index_htm_files/Regula%208703_en.pdf )

The known mutual arrangement of the elements of illumination (lamps, LEDs) relative to the working surface on which the document resides allows completely eliminating (in the process of designing the device) or significantly simplifying the glare compensation (in the process).

Depending on the model of this kind, specialized equipment allows obtaining images in a resolution of 200 DPI and higher, with the majority of modifications having the ability to obtain images with a resolution sufficient for optical text recognition (300-400 DPI).

Thus, scanning devices provide high-quality images with minimal distortion, which allows optical text recognition with high quality and high reliability.

3. Shooting with small digital cameras

3.1. Common problems

Compared to scanners, the optical design of the camera is more complex and in itself introduces more distortion due to aberrations, glare and reflections inside the optical system. The use of photosensors (matrices) and analog electronics by devices for recording images inevitably leads to distortion of images, called digital noise. Sources of digital noise is the process of digitizing the analog signal itself (signal quantization errors, thermal noise and charge transfer on the matrix) and its further amplification. Digital noise is noticeable in the image as a superimposed mask of pixels of random color and brightness. Noise is more noticeable on monochromatic areas of the image, especially on dark ones. Unlike scanning, when quality lighting is guaranteed, when shooting with digital cameras, there is often a lack of illumination, and the effect of digital noise naturally increases many times over. Another source of distortion is image compression algorithms, which is especially typical for frames of a video stream.

Fig. 4 - Examples of distorted images of the symbols of the MRZ document

Depending on the characteristics of the lens and the position of the document relative to the plane of focus, some or all of the document image may be “blurred”. If due to the movement of the document or camera itself there is a shift during exposure, then “blurring” appears (Fig. 5), which is enhanced in low light conditions.

Fig. 5 - Examples of “blurred” symbol images

3.2. Projective and nonlinear distortions

Unlike scanners when shooting with a camera, the document itself is located in an arbitrary plane relative to the plane of the focused image. Deviation from the plane perpendicular to the optical axis leads to projective distortion of the document image. With minor deviation angles, it is possible to recognize the machine-readable zone without additional projective correction, but in the general case it is necessary to estimate the parameters of the projective basis and produce optical recognition for the projectively corrected image. In this case, errors are possible in determining the parameters of the projective correction, which leads to geometric distortions of symbol images. Moreover, as an object of the physical world, the original document is subject to mechanical deformations. For example, documents made on paper are subject to “bending” and “twisting” (most often along or across the main reading direction), and sometimes “waves” occur when the bends in different parts of the page are multidirectional. When shooting with a camera, it is difficult or simply impossible to ensure the absence of deformations (Fig. 6).

Fig. 6 - Different deformations

Mechanical deformation of the document page is combined with projective image distortion. Even in projective normalization, the characters in the image aligned in parallel lines on the source document may not have baselines. Moreover, not only the lines themselves, but also the images of individual characters are distorted. That is, even after proper projective normalization of the entire document, the image of a symbol from a region physically deformed on the original document will differ from the image of the same symbol from the undeformed region.

Fig. 7 - Examples of distorted symbol images due to projective and non-linear deformations.

3.3. Background problems

For the machine-readable zone, ICAO 9303 establishes that typing should be visually legible and black (at wavelengths B425 – B680 according to the ISO 1831 standard), and the paint should absorb well in the near infrared range (in the B900 range in accordance with ISO 1831). Thus, the requirements for contrast are imposed only for the infrared region of the spectral range. In practice, this leads to the fact that, subject to the standard, some countries use inks to print the background filling of the machine-readable zone, which are “transparent” in the infrared range and, at the same time, rather “dense” in the optical (Fig. 8).

Fig. 8 - Examples of zones with “dark” and “variegated” filling in the optical range

For small-format cameras of mobile devices, shooting in the infrared range is impossible; therefore, inhomogeneous background significantly complicates the process of optical zone recognition, especially in “unsuccessful” lighting conditions.

The document lighting scheme in scanners minimizes the appearance of shadows and highlights, even for “glossy” pages of documents. When shooting a camera in natural scenes on the images often occur differences in brightness (shadows, reflections, reflexes, etc.) and color distortions that complicate the task of image analysis and recognition, for example, due to the loss of existing or the appearance of fake object borders. The pages of most documents with a machine-readable zone are either made of special plastic or covered with a protective film and have good reflective properties. Such physical properties of objects in the survey result in glare on the document (Fig. 9). Additionally, document security elements often contain areas with “holographic” elements that also distort the image.

Fig. 9 - Fragments of the zone: highlight from an extended light source, holographic protection elements

3.4. Problems using the OCR-B font

Consider the effect of the above difficulties when using small-format digital cameras on the recognition of single characters.

For printing lines of text in the machine readable zone, ICAO 9303 establishes a valid subset of OCR-B characters, with some characters having similar outlines.

The most difficult to distinguish among themselves are the letter “O” and the digit zero, whose images differ only in proportions and a slight difference in the “curvature”. The insignificance of differences in outlines under conditions of even minor distortions or not very high resolution leads to the fact that even a person either distinguishes them with great difficulty or cannot distinguish them at all (Fig. 10).

- - - - -

Fig. 10 - Examples of hard-to-distinguish characters 0 (zero, left) and O (letter, right)

Thus, when using small-format digital cameras for obtaining images of documents, in general, it is impossible to guarantee a high quality image of a symbol. This leads to a significantly lower quality and reliability of the recognition results of individual characters, and contextual processing mechanisms begin to play a significantly more important role (compared to scanning).

4. Problems of language model

In modern systems of recognition and identification of structured documents, statistical correction mechanisms are used to improve the accuracy of recognition. These mechanisms use information about the structure of the document, about the “context” of recognition, and rely on the language model of the document being recognized (or of a recognized field). Algorithms of a similar statistical correction, or post-processing, are based on a group of related methods, such as Hidden Markov Models (HMM), finite automata, N-gram and dictionary methods, and mechanisms using weighted end converters (Weighted Finite-State Transducers, WFST).

4.1. Context power

Consider some text field F. From the point of view of the structure of a document, field F has some semantic structure. From the point of view of the presentation of the document, the F field also has some syntactic structure. On the basis of the semantics of the document and the syntactic structure of the presentation of the document for the field, you can define some language model. For example, let F be the “date of birth of the holder” of the machine-readable zone of a foreign passport of the Russian Federation. Then, according to the semantic structure, F contains information about the year, month and birthday of the holder. Since the MRZ of the foreign passport of the Russian Federation is made in accordance with the recommendations of ICAO 9303, a separate fixed position is allocated for the F field in the MRZ data structure (the second line of the MRZ, characters 14-19, with the checksum in the 20th symbol) and the syntactic structure is defined for it : the date is written in the format YYMMDD, where YY is the last two decimal digits of the year, MM is the decimal representation of the month number, DD is the decimal representation of the day number in the month, or as a string “<<<<<” of six placeholders, if date of birth is unknown. The checksum is represented as one decimal digit and its value is calculated by the algorithm specified by the ICAO recommendations 9303.

Based on certain semantic and syntactic field structures, a language model can be defined that will encode a set of possible field values. Such a language model can be presented in several ways, for example, using a BNF grammar, or in the form of a regular language encoded by a state machine. One of the possible ways of representing a language model is to build a testing grammar G on a set of all possible strings composed of alphabet characters based on the predicate of the word P. That is, the word S corresponds to the language model G if the predicate P takes the true value on the word S. Since the ICAO 9303 document provides for each field some rules restricting the set of possible field values (ie, reinforcing the predicate P), as well as a checksum mechanism, the representation of the field language model as a testing grammar G is quite convenient.

The task of statistical correction of the recognition result of the field F with the checking grammar G is quite simple: in the weighted set of possible alternative values of the field F to find the value with the maximum weight on which the predicate P is executed. If the number of all possible values of F is finite (for example, the maximum field length is limited), one can define “context power” as the ratio of the power of the falsity area of the predicate P to the power of the set of all possible values of F. The greater this ratio, the more powerful the field context, and accordingly, the greater the likelihood of successful correction of the recognition result. For example, from all possible lines of length 7, consisting of decimal digits, less than 0.4% are valid dates (taking into account the checksum), respectively, the context power for this field exceeds 99.6%.

4.2. MRZ Document Code

The field "document code" (document code) is a two-character type identifier of the MRZ-document. The document code is located at the very beginning of the first line of the MRZ zone, regardless of the type of the MRZ document, and the alphabet of its first character is strictly fixed ('P' for passports, 'V' for visas, 'A', 'C' or 'I 'for other identity documents), which allows you to build a sufficiently reliable procedure for correcting the recognition result for this symbol. However, the second character of the document code is left to the discretion of the issuing organization. Since the total checksum (see clause 4.7) does not cover the “document code” field, a language model (besides the general alphabet limit) for the second character of the document type cannot be built in the general case. It is also worth noting that there are organizations that issue special documents that resemble MRZ documents in their syntactic structure, but they are not. Such documents may contain the first symbol of the document code, which is not stipulated by the ICAO standard 9303. An example of such documents is the MRZ-like zone on a driver's license of the Republic of Moldova of the sample of 1995-2010 (Fig. 11). The structure of the MRZ-like zone on this type of documents coincides with the structure of documents of type TD-2, provided for in ICAO 9303, with the exception of the “document code” field.

Fig. 11 - MRZ-like zone on Moldova’s driver’s license sample 1995-2010 (Image source: www.skyscrapercity.com/showthread.php?t=1540248 )

4.3. Issuer Code and Citizenship

The fields “issuing state / authority” and “citizenship” (nationality) define, respectively, the unique code of the organization that issued the document containing the MRZ-zone and the nationality of the document holder. These codes are based on three-letter state codes in accordance with ISO 3166-1 with some extensions (codes corresponding to special non-governmental organizations authorized to issue identity documents and codes for persons without definite citizenship have been added). The language model of both fields can be a dictionary - i.e. just a finite set of all possible three-letter codes. The share of admissible codes from all possible three-letter words is ~ 1.4%, therefore the context power of such a language model is rather high - ~ 98.6%.

4.4. Document holder name

The “name” field is one of the most difficult fields from the standpoint of standardization, taking into account the diversity of the name structure in different countries and in different languages. ICAO 9303 describes some of the design requirements for a name, on the basis of which primary verification rules can be made: the “name” field can consist of one or two sections, separated by two placeholders (“<”), each section can consist of one or multiple words separated by a single placeholder character. Each word must consist only of letters of the Latin alphabet. There are no additional verification mechanisms provided by ICAO 9303 (the total checksum of the MRZ document for the “name” field does not apply). For the “name” field, you can use well-known post-processing methods for similar fields, such as N-gram models and dictionary methods.

4.5. Document number and personal number

The fields “document number” and “personal number” (personal number / optional data) are fields with a weakly fixed syntactic structure, in connection with which it is difficult to build a sufficiently powerful statistical correction mechanism. The alphabet of these fields is not limited (i.e. limited only by the characters possible in the MRZ document). If there is a recommendation for the “document number” field, according to which the number should not contain placeholders in the beginning and middle of the number (i.e., the field can be supplemented with placeholders with the required length, but only at the end), then the syntactic structure of the field “ personal number ”is completely left to the discretion of the organization issuing the document. Both fields have a checksum, but even with its use, the effectiveness of the post-processing mechanism is not high enough: since the alphabet contains both letters and numbers, the effectiveness of post-processing drops due to the features of calculating the checksum according to the algorithm described in ICAO 9303 (see clause 4.7). The power of the context for both fields can be increased using the result of other fields, such as the “issuing authority code”. Some organizations issuing identity documents define their own syntactic structure for the “document number” and “personal number” fields. Accordingly, after the recognition result of the “issuing authority code” field has been obtained (and corrected, see clause 4.3), the syntactic structure of the “document number” and / or “personal number” fields can be specified if the limitations of the issuing organization are known in advance.

4.6. Dates of birth and expiration of the document

The syntactic structure of the “birth date” and “expiry date” fields is described above in paragraph (4.1). These fields are the most successful from the point of view of the language model - the alphabet of their symbols is rigidly fixed (only numbers, with the exception of the separately considered case of an unknown date) and a language model with a sufficiently powerful context can be built on the basis of the semantic structure of the date field. When building a combined post-processing algorithm for several fields, you can also take into account the fact that the document’s expiration date cannot be earlier than the holder’s birth date, so the context power for considering these fields together can be further increased.

4.7. Checksums

According to ICAO 9303, a checksum is provided for the fields “document number”, “date of birth”, “expiration date” and “personal number”. There is also a so-called “composite check digit” (composite check digit), with the help of which the repeated validation of these four fields takes place. However, the total checksum is not provided for in all types of documents (it is not in the so-called MRV-A and MRV-B - in the variants of machine-readable visas). The checksum occupies one character of the MRZ-zone for each field, and is calculated as follows:

Each field symbol is assigned its weight. The first character is assigned a weight of 7, the second - 3, the third - 1. The fourth - 7, the fifth - 3, etc. repeating the weights 7, 3 and 1 cyclically.
The code of each character is multiplied by its weight. The code of the placeholder symbol ('<') is equal to zero, the code of each decimal digit is equal to the value of this digit, the code of each letter of the Latin alphabet is 9 + <letter number in the alphabet> (The code of the letter 'A' is 10, the code of the 'B' is 11 and so on. The code of the letter 'Z' is 35).
The resulting works are summarized. The value of the check digit is the remainder of the amount received modulo 10.

Since the final sum of weighted character codes is taken modulo 10, a significant number of collisions occur. Particular difficulties are caused by collisions on pairs of characters that are difficult to distinguish by single-character recognition mechanisms in the recognition conditions of mobile devices from the camera (see clauses 3.1, 3.2). So, the same codes (taken modulo 10) have the characters' F 'and' P ',' H 'and' R ',' G 'and' 6, 'S' and '8'. In such fields as “document number” and “personal number”, both numbers and letters of the Latin alphabet can be found, and the checksum is the main method of validation. However, if at the single character recognition stage one of the characters from the above pairs was mistakenly recognized as another member of this pair, the checksum will not change, and the likelihood that after the post-processing the field recognition result will be greatly reduced.
The weights by which the character codes of the field being tested are multiplied also raise questions. For example, weights 7 and 3, applied to neighboring symbols, add up to 10. This means that the same symbols (or different symbols, but with the same modulo 10 codes) with weights 7 and 3 together will give a zero contribution to checksum, no matter what characters it is. This in turn means that if there is a local distortion in the photo or on the frame of the video stream on which recognition of the MRZ document occurs, due to which two adjacent characters were recognized with an error (for example, the pair of digits' 00 'was recognized as a pair of letters' OO '), and these two characters are in the field positions with weights 7 and 3, then they cannot be corrected with the checksum. This is especially pronounced for the fields “document number” and “personal number”, since these fields have the widest alphabet of all fields of the MRZ document (both numbers and letters are allowed in their records).

In order to increase the reliability of the mechanism of validation of sensitive data, ICAO 9303 contains a general checksum for some types of MRZ documents. However, the total checksum does not apply to the entire MRZ document, but only to those of its fields that are already protected by its own checksum.

As a result, from the point of view of language modeling in order to build mechanisms for correcting the recognition results of a MRZ document, some fields provided for by ICAO 9303 allow building sufficiently powerful contexts. However, for individual fields (such as “document number”, “personal number”), the definition of a more strict syntactic structure would increase the recognition quality, both in systems working with cameras of mobile devices, and in traditional systems based on scanners. Also, improving the quality and reliability of recognition of MRZ documents would allow the introduction of checksums for all significant fields, or for total checksums that apply to the entire document.

5. Conclusion

We have described to you the main problems that we had to face when developing our software Smart 3D OCR MRZ - Software Developer Kit for offline recognition of MRZ documents on mobile devices. In the future, we plan to provide you with an overview article on the architecture and a number of articles on the algorithms that we use in our developments related to the recognition of documents in a video stream.

Source: https://habr.com/ru/post/259649/

All Articles