What's new in ABBYY FineReader 11?

ABBYY FineReader 11 is coming out today, and now we will tell you how it differs from the tenth version. First of all, the search algorithms on the page of text, pictures and tables were quite noticeably tuned - what we call “Document Analysis”, and in the world with the clear word zoning. The main thing we were striving for while working on the new version was to improve the “understanding” of the documents that users encounter every day: books, contracts, magazines. One of the easily noticeable changes was that FineReader 11 learned to find vertical headers and footers.

As we have already said, in the new FineReader, blocks of different types are defined more precisely - this helps to “assemble” lines of text more correctly. For example, the previous version sometimes fell into a stupor, seeing an ultra-fashionable layout of books with "copyright notes on the margins":

As a result of this selection, the program considered that the lines from the second column are a continuation of the lines from the first, and the text was not aligned correctly.
')
Now we know about the existence of such books and reported this to our offspring. As a result, blocks are allocated correctly.

By the way, on the above images you can see another promotion of FineReader on the way to perfection - the tables became better divided into cells. On average, compared with the tenth version, the number of table partitioning errors was reduced by 25%. In addition, errors in the definition of headers and footers became less by 40%, and pictures and diagrams "are" better by 15%. However, the question of how to measure the number of analysis errors is quite subtle and, perhaps, deserves special attention. And with tables, everything is clear - we consider either one error or splitting a cell into two, or vice versa, combining two cells into one.

What else? Work with large (more than 100 files) document packages has become more stable. Now they are processed at the same speed as individual documents.

The changes also affected the processing of photographed documents: automatic correction of distortions works best. In addition, an updated image editor has appeared - you can manually adjust the brightness, contrast, intensity levels of light and shadow, or correct trapezoidal distortions.

Improved work with multipage documents - now recognized pages are better collected in a single document, in which fewer unnecessary divisions into sections. The new version of FineReader correctly determines the size of the fields and positioning of the headers and footers - this allows you to save formatting when exporting to RTF format.
The changes affected and export to PDF. For different types of tasks, you can use three new storage modes - “Best quality”, “Small size” and “Balanced mode”. These modes set the parameters for saving images that will be in your PDF. You will need the first mode if you want high quality pictures. The program will save them with a resolution of 300 dpi, compression will use formats without loss of quality ZIP, LZW, and JPEG and J2K with quality parameter 80. Black and white images are processed using CCITT4 and JBIG2. The second mode, “Small Size” makes sense to use if you save the file for the archive or in other cases if it is important to save on the weight of the file. PDF is compressed as much as possible while maintaining readability of the document. Image resolution will be 150 dpi here, JPEG quality parameter is 50. “Balanced mode” is a kind of compromise between quality and file size: 300 dpi resolution, JPEG quality parameter is 60. As for the technological aspect, MRC compression technology has been improved we wrote here ).

Good news for Linux users: in FineReader 11, you can save and convert document images and PDF files to the ODT format (OpenOffice.org Writer). The formatting of the original document, as usual, is transferred to the new format, so you do not have to waste time on additional editing. In addition, there was an export to DjVu.

Still, FineReader 11 can convert paper books into electronic ones: now it is possible to save recognition results not only in HTML (as it was in the “top ten”), but also in Electronic Publication (.ePub) and FictionBook (.fb2) formats, optimized for smartphones, e-books and tablet computers. Moreover, the script for creating e-books is placed in the “New Task” window that appears when the program starts.

Improved document style editor. Now you can customize all the style parameters in one dialog, and changes occur all at once throughout the document.

The changes affected the interface. In the “New Task” window, the functions that are needed most often are displayed. In addition, in the Corporate Edition interface can be customized for yourself, adding new scripts or importing created by other users.

If you are in a hurry and want to recognize the document as quickly as possible, you can use another new feature - black and white recognition mode.

Of course, in this case you will have to pay for the speed with a possible loss of quality, but we tried to make sure that the price was not too high.

Everything described above, except for changes in the functions in the “New Task” window, refers to two versions - ABBYY FineReader 11 Professional Edition and ABBYY FineReader 11 Corporate Edition. In the corporate version, in the editing mode, you can use the function of deleting confidential information - it is marked and deleted on export.

For users of the corporate version, we have also prepared a bonus - the ABBYY Business Card Reader program. Many of you are already familiar with the mobile version of this program, now it has moved to the desktop. You can scan a business card and load data into Microsoft Outlook with the distribution of contact data into the appropriate fields with one button, using a predefined script. Of course, if necessary, the data can be checked and edited.

Separately, I want to say about working with languages. We finally have the recognition of the Arabic language, and the quality - at the level of competitors, and even higher. Arabic OCR customers have long asked for us, so get it - sign it. In addition, FineReader has learned to recognize Turkmen (Latin) and Vietnamese. Dictionary support has appeared for Arabic, Vietnamese, Japanese, two Korean and Latin variants.
We also began to quickly recognize documents in some Asian languages: the recognition rate of Korean increased by as much as 30% without loss in quality, Japanese - by 10%. Our guys tracked for a rather long time, “thin spots” with the help of different profilers (in addition to home-grown timers, Intel VTune and AQTime from AutomatedQA Corp were used). Such a magic look to find a place in the code, which can be rewritten and speedily accelerated the program, we fortunately did not happen, but about a week spent time and acceleration by two percent to write somehow not from your hand, so leave the topic there .

We told you about the main changes in the new version of FineReader, which you can already buy in our store . Want to find out more - read more on the ABBYY website .

~~Well, if you want to try the new FineReader and write a review about it, identify yourself in the comments - and we will give you a promotional code.~~ ~~We have three of them!~~
Three people wishing to write a review have already been noted in the comments. Soon on Habré!

Source: https://habr.com/ru/post/126850/

All Articles

What's new in ABBYY FineReader 11?

More articles: