📜 ⬆️ ⬇️

Recognition of the RF Passport on a mobile phone. (UPD: 03/28/2015 + posted the program on the App Store)

Today it is difficult to find a person who at least once in his life did not come across directly or indirectly with the recognition of documents. Indeed, when in the world, in order to carry out any matter of any seriousness, personal identification is necessary, we constantly hear “You can have your passport” in order to once again enter your data into a computer in order to check whether you are allowed to enter, whether there are any unpaid debts and etc.



Of course, the giants in the field of data recognition could not go past such a well-set task in our age of global automation. Today, there are many different programs and hardware-software systems (both from large companies and relatively new players in this market) that allow us to solve this particular practical problem. At the same time, despite the local differences of all the proposed solutions (someone better recognizes, someone has a more thoughtful and modern interface, someone is simpler and easier to integrate, someone is cheaper or more expensive) globally all existing software solves the problem the same: obtaining a passport image with a scanner and subsequent recognition on a personal computer. As a result, this approach allows you to enter passport data from 2 to 26 seconds (depending on the performance of the scanner), which is tens of times faster and more reliable than manual input.
')
It would seem that the problem is solved! But how often do we see such “smart” solutions for recognizing passports in life? Alas, in post offices, many banks and even police stations (which deal with passports, probably more often than anyone else), passport data are still entered manually. What is the stumbling block here? Why such a reliable and high-quality solution of a specific applied problem is not used everywhere?

To understand the essence of the problem, let us turn to another example of the development of an innovative technology that does not have a direct relationship to the tasks of recognition - digital photography. Let us recall the 90s, when the first consumer digital cameras began to appear on the market. It would seem that here it is happiness: no film, instant viewing of the pictures taken, ease of storing photos - shoot and take pleasure in everything. In practice, the majority of people, as before, used the cameras not too often: at rest, celebrations and memorable events. But the real photo boom happened at the moment when the camera appeared on the smartphone. Digital photography immediately healed a second life and gained enormous popularity. And many other technologies went through the same in completely different areas: maps and navigation, Wi-Fi, social networks and much more.

We now return to the recognition of documents and try to draw a parallel. Maybe the small popularity of passport recognition systems is connected precisely with the inconvenience of the process itself, and not with the quality? Indeed, it is difficult to imagine the district police who decomposed on a lawn a laptop and a scanner and checking documents at the migrant. It is quite another thing if one could recognize and check a passport right in the hands with the help of some compact improvised device (for example, a smartphone). So we had the idea to write an ID document recognition program for a mobile phone. And of course, we decided to start with recognizing the passport of a citizen of the Russian Federation.



To further it was more interesting to read, we will show our application in action. Federal Law 152-FZ forbids us to publish images of these passports. Therefore, for demonstration purposes, a synthesized image of a passport printed from Wikipedia is used.

Formulation of the problem


So, the final goal is to recognize the passport of a citizen of the Russian Federation on a mobile phone. But in this formulation, the task sounds very vague. Let's clarify the essence of the problem by setting the constraints "along the axes", forming some semblance of technical requirements.

Target platform You need an application that can work on modern Android devices, as well as an Apple iPhone version 5s or higher. Such restrictions appeared after analyzing the current situation in the mobile device market. At the same time, an important element is the writing of a program that recognizes on a mobile device , and not a layer program that receives images, sends pictures to the cloud and gets the result back. And it's not at all on a slow mobile Internet, as it may seem at first glance. It’s just that in our country the federal law “On Personal Data” (152-) is in full force, which strictly regulates the activity on the processing of personal data. In accordance with the law, requirements for all private and state-owned companies and organizations, as well as individuals who store, collect, transmit or process personal data (including last name, first name, patronymic), increase substantially in Russia. Therefore, from the point of view of the law, the sooner any recognition program forgets personal data, the better (and even more so it’s not worth sending anywhere either the data itself or the passport image).

Object recognition. In most applications, the client basically requires a passport serial number, photo, last name, first name, patronymic, gender, and date of birth. All these data are located on the third (in accordance with the numbering) passport page. Therefore, we first solve the problem of recognizing the above "main" fields. That is, we will solve the problem of recognizing the third page of the passport of the Russian Federation.

Input data. In contrast to the classical approach (recognition of the scanned image), the smartphone allows you to get a video sequence. Combining the results of information recognition from different frames can significantly improve the quality of the system as a whole. True, this advantage is valid only on condition that individual frames can be processed very quickly, which translates smoothly into the question of performance.

Performance. According to competitors, to date, the best passport recognition software cope with this task in about 1-3 seconds on an average computer performance, excluding scanning. Therefore, we set ourselves the goal to solve this problem on a mobile phone no more slowly than in 3 seconds. At the same time, we want to process data at a speed of at least three frames per second on devices like Apple iPhone 5s. In other words, the average processing time of one frame should not exceed 0.3 seconds. If we recall that 1 frame consists of approximately 2 million pixels, and recognition is performed on devices that are much weaker than an average PC (see Table 1), then the task is more than unsolvable. I admit, we had to sweat a lot while optimizing the code and developing fast algorithms before we reached that speed. Later we will write a separate post about approaches to optimizing recognition programs on mobile devices. Now I can only remember: a year ago, we boldly answered “Challenge accepted” to this bold statement about speed.

Quality. The quality of recognition is often a decisive factor when choosing a particular system. Therefore, at the very beginning of development, we set ourselves a rather high bar - in the first version of the product, 95% of passports should be recognized correctly (excluding passports that cannot be automatically recognized). In general, assessing the quality of such recognition systems is a serious task that we want to talk about in future posts on Habré.

New problems with recognition on the smartphone


As our colleagues from various organizations have repeatedly stressed, the task of recognizing a RF Passport is extremely difficult. Moreover, the complexity is caused both by various security elements of the passport form itself (guilloche background, holographic elements, the presence of a glossy film), and high filling variability (inaccurate printing of personal data, the use of non-standard fonts, and the presence of mechanical damage).

However, when recognizing a passport on the phone, fundamentally new, not previously encountered when working with the scanner, are added to all the above problems:


In addition to new problems, “emerging” at the stage of image acquisition, no less serious difficulties await us further. So, for example, the task of coarse localization and identification of a document in a frame becomes relevant. Indeed, unlike the scanned image, when recognizing a video sequence, you need to be sure that the target document is present in the next frame. At the same time, this problem should usually be solved before projective normalization.

Moving on. For accurate positioning of text lines, it is necessary to find the boundaries of the passport and define a projective basis. For this, it is required to distinguish linear boundaries, angles, rounding, and other primitives in the conditions of noise; generate and select the document boundary options that are most appropriate for the model. After determining the projective basis, it is necessary to projectively correct the image zone and position the fields.

Now we are ready to be recognized. For data recognition, special methods of optical recognition are required for both individual characters and text fragments. A special feature of video stream processing is a rather low initial resolution (not exceeding 150-200 DPI) in the presence of noise and distortion, in particular glare and illumination, image defocusing and blurring.

After all the difficulties associated with the processing of an individual frame are successfully overcome, new tasks arise related to the recognition of the entire video sequence — this is context analysis and integration of the results. This topic is very interesting, and we will definitely devote a single article to it in future posts. For now, we only limit ourselves to announcing the existence of such tasks.

Conclusion


Thus, solving a seemingly “simple” task, recognizing a citizen of the Russian Federation passport, we faced not one dozen interesting tasks in the field of computer vision, but also in the field of effective software architecture and writing high-performance programs for mobile devices.

This post is rather introductory in nature and tells dear readers in general about our tasks, problems and interests. We will definitely continue the series of publications on Habré about concrete scientific and technical achievements, in which we will tell about solutions of individual subtasks of document recognition (and not only) on mobile devices.

As for the ready-made solution for recognizing the RF passport on a mobile device, we are happy to inform you that you can download the Android recognition demo program now ( Smart PassportReader at Google play ) and for iOS ( Smart PassportReader at the App Store ). And if by the nature of your activity you are interested in the SDK of our product, in order to “touch” alive and try to embed it in your mobile applications - write us at support@smartengines.biz and we will be happy to tell you how to do this, as well as answer other questions that interest you. .

And at the very end of a few screenshots of our program for the Apple iPhone

Source: https://habr.com/ru/post/252703/


All Articles