
We have already
talked about the recognition of text from a video stream, its advantages in comparison with photo processing and scenarios, where it is especially useful.
Today we are launching the
ABBYY Real-Time Recognition SDK for Android and iOS mobile platforms. Therefore, we want to talk about the features of data recognition on a mobile device, namely, about extracting information in a video stream using the example of one of the most complex documents - a civil passport.
We all often have to use our passport information. You need a passport to register with a mobile bank or payment system, purchase tickets, or rent a car. Now many people use a smartphone for these tasks. Typing on a small keyboard of a mobile device is very inconvenient. A particularly unpleasant input field is the data on the place of issue of the passport: they usually take a couple of lines and contain many abbreviations.
')
Carsharing companies told us that the main part of potential customers "falls off" exactly at the stage of manual entry of these data. So, the problem is not that people are afraid to declassify information about themselves, but that they feel sorry for wasting time on the dreary reprinting of text from a passport. All smartphones have cameras that can be used to simplify and speed up the entry of information from documents. The task is - you need to solve.
In fact, ABBYY already has a solution for this task running on a PC. However, a modern smartphone, though powerful, is still not a computer. We needed to make a correction for the allowable size of the application (not more than 50 MB) and the available RAM. Therefore, the option "just transfer existing technology to mobile platforms" we shoal and started a new development.
Why is it difficult to recognize a passport?
It would seem that difficult to recognize a passport? The structure of the passport is known: on the “half with photo”, there is a vertical line on the side with a series and number, to the left of it - 6-7 horizontal lines with personal data. But the difficulty lies not in figuring out which field is where, but in correctly recognizing the desired text.
A passport is not the easiest document to recognize. There are a number of problems that all those who are trying to process it face sooner or later.
Defects of shooting
In this blog,
we have already discussed more than once what difficulties arise when recognizing a photograph, and not a scan. The main difficulty is that the camera of the smartphone is not a scanner :) When photographing, we hold the device (and sometimes the document being photographed) in our hands, and the hand may waver. The result is blurred text on a photo that is simply impossible to recognize.
In the case of a passport, the situation is complicated by the fact that the most necessary pages are laminated. So, when photographing them, glare will inevitably occur, which will make the text unreadable.
Figure 1. Blik completely closes the name (passport data is changed)Difficult background
To protect the document from forgery, the FMS officers add holograms and further complicate the task of recognizing and retrieving passport data.
I think, having seen a photo of my passport, many were indignant because the double-headed eagle hit the face and finally spoiled the photo. In the same way, he can “spoil” the surname, date of birth and other fields, turning them into barely readable text on a photo - this is a problem for recognition systems.
Figure 2. The “RF” hologram covers the middle name (passport data is changed)Text size and color
Another unpleasant moment. Captions to the data fields (for example, "Date of birth", "Middle name", etc.) are written in very small print. Even good cameras of smartphones do not always allow to get an image on which this text can be recognized qualitatively. As a result, signatures are turned into garbage from different symbols, which can no longer be a reference point for searching data fields and interfere with the analysis of results.
In addition, often there are documents that are clearly regretted paint. The text is so pale that it can hardly be read. Recognizing such text on a photo is even more difficult.
Figure 3. Poorly contrast text (passport data is changed)What can help in recognition?
So, recognizing a passport is not easy. But difficulties exist in order to overcome them. And we want to share some ideas useful for recognizing a passport on a mobile phone.
Recognition from the video stream
The first problem we mentioned is the difficulty of working with a photo due to the presence of glare, blurred text, etc. To cope with it, you can try to level these defects. For example, use the functions of the
ABBYY Mobile Imaging SDK to assess the quality of the image and its suitability for recognition. If it becomes clear that there are no good results for a photo, the application may ask the user to take a second shot.
Another way to deal with defects is to refuse to work with photos and go to video stream processing, i.e. sequences of images (frames) received from the camera device.
It is difficult for a person to keep his hands straight when shooting a passport. This means that the defects will “travel” through the frames: in one, the flare may be located in the area of the Name field, and in the other - somewhere in the photo of the passport holder. By recognizing different frames separately, and then combining the results for each field, you can significantly improve the accuracy of the extracted data. This is just a statistic: for each passport field we will have a whole set of values obtained after recognizing successive frames, for them we will be able to calculate the average value and use it as the final result, which will be returned to the user. It is obvious that the average will change when adding data of the new processed frames, and it is logical to return the result only after its stabilization. In our workings for evaluating stability, we use a probabilistic criterion and stop processing only when the probability of a change in the result becomes extremely small.
Recognition Dictionaries
When recognizing text, the machine goes through hypotheses and assesses how much a fragment of an image is similar to one or another symbol of a language. In the case of processing images of poor quality is likely to choose the wrong hypothesis. But we have the opportunity to “prompt” to the discriminator which of the available hypotheses is more likely by connecting dictionaries containing the necessary vocabulary. In the case of a passport, this gives a noticeable increase in the accuracy of processing, correcting such “classical” recognition errors, such as 'AND' -> 'TH', 'O' -> '0', etc.
Regular expressions for numeric fields (dates, codes, and numbers) will also help. Their use will allow to exclude situations where the numbers are confused with the letters ('B' -> '8', '' -> '3', etc.).
Filtering small text
Dictionaries will help to improve the quality of recognition of the main text, but they cannot cope with small reference text (field signatures). That he did not interfere, it must be filtered. You can do this, for example, based on its color and size. These characteristics are noticeably different for the font, which is written the basic data of the passport, but still with their use errors are possible. Therefore, the filtering criteria should not be too strict. It is better to additionally use the information about the coordinates of the found text and throw away the garbage symbols located at a considerable distance from the main data.
Normalization of Recognized Values
Unfortunately, it’s not always possible to achieve 100% quality of the recognition of the main text and completely filter out all the small interfering garbage, and the remaining errors strongly affect the quality of the extracted data. To improve the result, you can use special post-processing:
- For numeric fields, it suffices to specify a regular expression describing this field and discard “extra” characters that fall into the result but are not suitable for the format. For example, there is not enough space in the passports for the “Date of Issue” field, therefore, often an incorrectly recognized signature to the field can “stick” to the value itself. As a result, we will see, for example, the following result: ":: May 16, 2010". Simple normalization will allow you to leave only the correct value for this field: "05/16/2010".
In addition, when recognizing from a video stream, you can use format checks to speed up the process, completing processing when getting a stable result that matches the format. Our internal tests show that this approach allows us to speed up the recognition process by about 15% without significant quality drawdown compared to the “full stabilization” described above.
- For vocabulary fields, you can use additional checking and correcting errors in the dictionary. The dictionary feature will also be useful for speeding up the processing process — returning the result for the field where all the words have passed the test, without waiting for the “full stabilization” of the meaning.
Of course, this is not all ideas and solutions that can be useful when recognizing a passport. In our future articles, we will continue to talk about this problem.
Passport recognition is one of the many data entry tasks you need to solve on a smartphone. In practice, such tasks are much more. For example, now almost all banks in their mobile applications offer the service of payment of housing and communal payments. Automatic recognition of the data needed for this (subscriber code, personal account number, etc.) will significantly simplify and speed up the process.
We at ABBYY are aware of the importance of such tasks, and therefore our team is actively working on a specialized API that will allow developers to quickly and as easily as possible create mobile applications to extract data from any documents. And the
ABBYY Real-Time Recognition SDK is available now.
Keep for updates :)
Olga Titova,
product department for developers