PassportVision - an easy way to recognize documents

Surely you may have got into such a situation: go to any institution (post office, bank, hospital, cash, etc.), where you need to present a passport to achieve your goals. And it seems like a small queue, only 5 people, but it will take a very long time to wait, because Aunt Masha spends a few minutes entering data from each passport into the computer. You just have to watch how her index finger slowly plans over the keyboard in search of the next key.

We were puzzled by such a sad situation in modern society and wrote PassportVision , which can recognize data from various documents and give the result to the user in a convenient form. The task turned out to be not so simple as it might seem at first glance: during the work on the project, we learned a lot of new things about domestic documents, computer vision and user interfaces. The head is already full of new ideas about the future development of the program, but we decided to find time to share our experience and knowledge gained.

')
Today in issue:

Popular misconceptions about passports
A little about the technologies used
Our interface approach
What is the best way to give data to the user

Popular misconceptions about passports

It would seem - what is so difficult to recognize data from a passport? In the end, 2014 is in the courtyard, humanity has already learned to recognize the text from the image. Applied a couple of filters, called the method recognize() - and that's it!

Alas, everything is not so simple. First of all, it is worth noting that the passport of a citizen of the Russian Federation is a document with special specifics. We didn’t even imagine how amazing passports could be until we looked at a few hundred examples. So, we offer you a list of misconceptions in the best traditions of misconceptions about names and time .

All these assumptions are wrong.

Text in passports is always in one font.
There are only two or three fonts with which passports are typed.
Okay, but the font can not be bold or italic
Well, at the same time bold and italic, he definitely will not
All passport data is written in one font.
No one fills in a passport by hand.
Each passport field is in a strictly defined place.
Each passport field is located near its strictly defined place.
Well, at least the surname will not crawl three lines down to the middle name
And never the data is written on top of the form lettering.
Well, the data will not exactly collide with each other.
The text is always written horizontally.
Ok, but the angle of the text does not exceed 10-15 degrees
All data are at the same angle.
Text is always black
Well, at least all the labels are exactly the same color.
Passport always has a machine-readable zone.
The computer-readable zone is always present on passports issued after July 1, 2011.
The machine-readable zone is always correct and meets standards.
All letters in the passport are written in upper case.
All passport data is written in the same register style
All dates are always in the same format.
The name cannot be broken into two lines with a hyphen
All passports are printed on identical forms with a fixed background.
There can be no numbers at the place of birth and at the place of issue
Personal signature cannot be over passport data
All passport data is always present.
There is no extra data
The place of issuance of the passport is always placed in three lines.
If “Anna” is written in the “Name” field, then the gender is necessarily female
Passport data is not erased over time
Passport has no mechanical damage.
Passport can not be soiled with jam

A little about the technologies used

Today, there will be really little about technology. In the course of writing our program, we are faced with a lot of interesting tasks: both in the field of computer vision, and in the field of building architecture and organization of the working environment. We are planning to expand this topic in a separate post, and today we will only mark the technology stack.

The main programming language is C #. Perhaps someone will find such a choice strange, but he fully justified himself. C # is great for creating a large-scale architecture, and the layout of a complex interface is easy enough (thanks to you, the creators of WPF). The target audience is sitting on Windows, so there are no problems with the platform. We solve various internal minor tasks using Python scripts.

OpenCV is used for image processing, Tesseract is used for text recognition (or rather, their wrappers are OpenCvSharp and Tesseract ). Performance-critical algorithms are written in C ++, but there are few of them. We have to work a lot with scanners, and to support both WIA and Twain (for the latter we use TwaintDotNet wrapper).

General workflow: version control system - Git, repository viewer - FishEye , continuous integration system - Bamboo , bug tracker - JIRA , code review system - Crucible (yes, we love Atlassian).

Our interface approach

Everything would be fine if we could recognize the data from any passport with absolute certainty. But alas, it is not. If you give the program an input of a good passport scan at 300 dpi, then, most likely, it recognizes it without errors. But if the program encounters a passport photo of sad quality (and in addition the font will be very pale and at an angle, and the inscriptions are not in their places), then we will have problems. Therefore, there is no choice but to ask the user to check the recognized data. And here it is very important to design a damn good and user-friendly interface that allows you to quickly check all the data and correct errors. After all, if the time spent on checking a passport is comparable to the time of manual data entry, the whole idea loses its meaning. We do not pretend that we managed to create the best interface in the world for such a task, but we can declare with full responsibility that it is good. In the end, we ourselves ~~eat our dog food~~ using our data checker every day for more than a year. Therefore, all the shoals and inconveniences very quickly caught the eye and asked themselves to fix it. So, what we paid special attention to in order to make life easier:

Navigation in the fields. In the main window on the left is the found image of the passport, and on the right - the recognized fields. For each field in the image of the passport, a frame is painted on it when clicking on it we move to the field we need. If we have the same field in the picture in several places (for example, the series and passport number can be found as many as three times), and the recognition results are different - then we turn the TextBox into a ComboBox so that you can easily select the desired option.
Suspicious characters. In the process of recognition, we mark some characters as suspicious - these are characters whose recognition we are not very sure of. If there are errors, they are almost certainly among the suspicious characters. Therefore, we paint them with reds and provide easy navigation: by hotkey you can go to the next or previous group of suspicious characters. Moreover, navigation is quite intelligent: if in the field almost all the characters are suspicious, then when we go to the error, we select not only red letters, but in general the whole field.
Tooltips. Initially, to check a field, one had to first look at the field, then look at the picture with the passport, find that field there. Then back to TextBox, then back to your passport. Then think and relate the text. It is long and uncomfortable. Therefore, right next to the TextBox we make a tooltip with the corresponding passport fragment. And in order to search for the necessary characters it was even more convenient, we circle the text selected by the user in the picture. If the user has already managed to write in the field of his characters, the program knows how to think out pretty well where they should be on the image and still leads around.
Autogeneration form. And what if the user does not need all the fields? Let's say you need only a full name. Why force the user to check all the results? Fortunately, the list of fields can be customized to display only those that you really need.

And you know, in general, we sawed UI for a very long time, a lot of things were done, you can’t tell them briefly. Better then we will make a separate post in which we will tell in detail about all usability solutions in the program. If you are engaged in the design of interfaces, then you probably will be interested in reading not only about the final result, but also about the process - how we came to this interface, why we did it and not otherwise.

What is the best way to give data to the user

Well, it would seem, everything is ready: the image is processed, the text is recognized, the user has checked everything. What else is missing for happiness? Let's think: once the data is recognized, it means that someone needs it. And, probably, this someone plans to use them later. And, presumably, he will use them in some of his own program. So, the recognized data must somehow be transferred to this program. A logical question arises: how are we going to do this?

PassportVision Office. Alas, there are many programs, and for each you need to look for your own approach. A sociological survey showed that the majority of the target audience enters passport data into documents that are prepared in MS Word. Therefore, we have made a separate edition of PassportVision for working with Word:

The idea is as follows: a special tab is added to the ribbon, with which you can create a template with special labels. So, the data was recognized, the user checked them, clicked OK. And, as if by magic, the data is inserted into the template instead of the previously prepared tags. Quite often, the document is issued with the participation of several actors, so for each label you can specify which person the passport belongs to. We put the first passport in the scanner, then the second - and our document is ready! Markers behave quite intellectually. For example, for a date, you can specify the presentation format, and the endings of individual words may depend on the gender of the person.

PassportVision Adaptive. Alas, not everyone uses Word, there are still many programs in the world in which data can be entered. For everyone to write separate editions is not very appropriate, so we wrote a universal edition that can be adapted to any application. PassportVision Adaptive emulates the user's work: where, what button to click, where to click. You only need to make a special macro in which you explain to the program, what would you do if you needed to use the recognition results. Yes, it is possible that a macro for complex software is not so easy to make, but it is a one-time operation. Once everything is set up, and the data will fall into the right places of your target program by pressing special keys. And if problems with writing a macro still arise, we help all clients cope with them.

PassportVision SDK. And some users want to use the recognition results in their own software. If you are a developer, then you can use a special API to get all the data in the right format. If your application is not developed under .NET, then do not worry: we carefully wrapped the API in a COM wrapper, so the SDK can be used from under C ++ and Delphi.

Other editions of PassportVision. Development is in full swing, we are trying to make many different editions of the program, so that everyone can choose a solution that is convenient for them. For example, a version under 1C will soon appear on the screens (there is also often a need to enter passport data), and the Adaptive version will be supplemented with special macros for web forms (you just need to specify which fields to fill in, and magic JavaScript will do the rest ).

Instead of conclusion

If you want to use our program in order to save people from manually entering passport data and make the world a better place, you can contact us and we will tell you more about PassportVision. And if you are not interested in the automation of workflow, but are interested in how software development is going on in different companies, then especially for you there will be posts with detailed information about work organization, computer vision and usability approaches. Development is actively continuing, now we are adding support for different types of documents (passports, birth certificates, etc. - they are already working in alpha mode), new product revisions and cool features. In the course come across some very interesting technical problems, for which you have to come up with exciting solutions. If you are interested, we will also publish posts about solving the most interesting problems - we hope someone will benefit from this experience.

Source: https://habr.com/ru/post/219535/

All Articles