📜 ⬆️ ⬇️

Flexible, very flexible forms ... at ABBYY FlexiLayout Studio

When I was just getting a job as a technical writer at ABBYY, I hardly imagined the scale of the system I would have to describe. No joke - under the general name ABBYY FlexiCapture hid four different products (ABBYY FlexiCapture, Scan Station, ABBYY FlexiLayout Studio, ABBYY FormDesigner), one of which had at that time five different options for installation and use. And this should happen, that at the very first step I ran into an application that stands alone even on the scale of the whole system. More precisely, not so - all components are special, each in its own way, each of them has its own beauty. ABBYY FlexiLayout Studio, which will be discussed in this article, stands out because it is the most difficult product for the user. In principle, even the word “user” is not quite appropriate here - in fact, the person who works with ABBYY FlexiLayout Studio is closer to the programmer.

ABBYY FlexiLayout Studio is designed to develop flexible descriptions. What is a flexible description and how is it different from a hard one? A hard description is a standard form. All copies of such documents before filling are the same, as they say, "to the light" - if you put them on each other, the same fields will be in the same place. It is enough to determine the coordinates of these places - and when processing the values ​​of the fields will be recognized. Everything is easy and clear.

But not always the situation is so simple. Many of the documents from which you want to extract data are not rigid forms. For example, ATM checks of different banks contain, in general, the same type of information. But they differ not only in its location, and even often in size. For such documents to create a hard pattern, of course, impossible. What is allowed?

')
And you can use ABBYY FlexiLayout Studio 10 to create a flexible description. The flexible description allows you to operate not with the coordinates of the fields from which you want to extract information, but with their location relative to the reference elements and each other, as well as the type of data to be recognized and their possible structure. For example, take two scanned checks from an ATM:





Obviously, the clearance is not the same. On the other hand, some of the elements are common: the name of the bank is above, the date and time of the operation is under it, the amount withdrawn and the balance below (called, however, in different ways), as well as technical information. Suppose we want to automate the input of information from checks into a certain program. We are interested in the date and time of withdrawal, the amount and the balance. For simplicity, we assume that there will be two types of checks processed, Raiffeisen Bank and Sberbank).

First we need to find the supporting element. In our case, this is the name of the bank: Raiffeisen BANK or OJSC Sberbank of Russia. To do this, create a Header element - a header, and in it a subitem BankName of Static Text type - and indicate that this element should contain either this or that bank name. This element must be mandatory - it identifies the check, and if it is not found, then it is senseless to process further.





Further, in the header, you can (separately) create a Date element of the Date type to search for and determine the date of the transaction, indicating that it is located below the name of the bank. Our case is very simple: the date format is almost the same, DD-MM-YY (YY), differs only in the separator and the format of the year. Therefore, we indicate that the order is always day-month-year, a year can consist of two or four digits, and the separator can be a slash or a period. At the same time, we limit the possible value of the date - from January 1, 2010 (suppose that processing of old checks is not required) until December 31, 2100 (it is unlikely that processing ATM checks will then be relevant): this is necessary for a more confident date search.

Then we create a Character String type Time element to search for the operation time - so far ABBYY FlexiCapture 10 does not support a special format for this case. In both cases, time is located to the right of the date and has a format of the form NN: NN: NN - this is how we describe this field.
For searches of the withdrawn amount and balance, you can use a similar method - create a Static Text element with a row header and look for the Currency element with a sum to its left. But it is easier to use the ready-made Labeled Field element, which is just a combination of a signature and an information field. Create two such fields Amount and Balance, specify the heading “Sum” for the first one, “Balance” or “Available balance” for the second one. We indicate that the data fields are of type Currency.

After that, it remains only to add blocks - the actual areas from which information will be extracted. In our simple case, they coincide with the Date, Time elements and Field subelements of the Amount and Balance elements. We check that everything is recognized correctly, we export the description to ABBYY FlexiCapture 10 - and voila, you no longer need to enter the dates of write-offs manually.

This is what the main window of the program looks like when manually creating a flexible description:



On the right are images of documents, hypothesis trees under them (you can track exactly how fields are searched for), from left to top down the list of pages with information about them, then the FlexiLayout structure, and below the properties of the selected element.

This is, of course, the simplest case. In reality, everything is usually much more complicated, and the possibilities of the program are very broad. You can create classifiers for quick sorting of documents. You can create multi-page descriptions. You can create and train flexible descriptions automatically. You can use a special language for programming flexible descriptions. Much can, ABBYY FlexiLayout Studio 10 help contains hundreds of pages - but if you are interested, we will tell about them in the next releases, this one turned out to be too long.

Note: yes, by the way, the personal information in the pictures is hidden manually after taking screenshots or scanning. But ABBYY FlexiCapture 10 (not ABBYY FlexiLayout Studio 10) can, if necessary, paint over such fields automatically.
Happy birthday, Dimych!
Pavel Sokolov
Department of Data Entry Products

Source: https://habr.com/ru/post/125851/


All Articles