Recognize text using Hamming distance

This article was prompted to me by the article by Alex Povetkin - “ Pattern Recognition by the Method of Potential Functions ”

So, we are going to write a program in Delphi (I use version 6), which can translate characters from the image into text. The task is quite popular on the Internet, and for every post “ I want to implement character recognition !!! Help "the most frequent answers," read on the Internet "or" do not touch, use the file reader "and the like.

I, like many others, began by studying the basic algorithms. Of course, such monsters as FineReader spend huge amounts of money on the algorithmic component, and we don’t know their secrets, but a decent amount of information was found to understand the basic methods. But let's start from afar.

')

The essence of recognition is a comparison with the image.

In childhood, looking at the text, we see only a picture. We are not yet endowed with the necessary criteria for its analysis, in order to understand that the picture is not just a drawing, but a text, and not just a text, but some kind of story. We grow, and we begin to receive images of symbols, we learn that the drawing with the letter “A” denotes the letter A, and not the steamer. As we get older, we are recruiting an image base (or image criteria base), by which, by comparison and analysis, we can tell which character is in front of us.
Exactly the same situation with the computer. First, he simply receives a picture in memory, then searches for some areas on it, and then compares these areas with his base and carries out an analysis, the result of which will be the definition of a symbol in the image. Everything is simple in words - we transfer the picture to memory, look for some areas in the picture, compare it with the base, but the methods for comparing pictures are still not clear. We go to the Internet.

Comparison methods

Modern methods of comparing images are divided into two categories:

1. Comparison with the standard
2. Comparison by criteria

And the method of neural networks stands apart, since it is applicable in both cases. We will not use it and review its works, we will try to make everything as simple and clear as possible.

Comparison by criteria is most difficult, because it is difficult to perform an algorithmic task of determining the character traits of a character, and the efficiency of such an algorithm will probably be small. This is due to the similarity of characters, and if we recognize low-resolution text, then the cost of honing such an algorithm will be many times greater. However, for such a complex task, a source code was found with a solution.

Were also found the source code of uncomplicated programs that check the extreme points and bend points for the presence of a black area:
Zip Code Recognition
Yandex Image Recognition (CY)

Problems of recognition of these programs are obvious - if the images do not match (as well as the presence of noise, extraneous objects, etc.), these programs are ineffective. However, they cheerfully cope with a narrow circle of tasks assigned to them. We want to get a more universal example of the implementation of recognition.

Comparison with the benchmark - the task is simpler and more efficient. Everything is as simple as possible - to create a reference image and compare them. There are a lot of comparison methods - pixel-by-pixel, overlay, overlap with offset and others. The articles on this topic are full of internet. The most interesting work:
Pattern Recognition by Mobile Robot
The use of neural networks in image recognition
Implementation of the simplest pattern recognition algorithm

We will use the simplest comparison method - pixel-by-pixel . The point is alternate comparison of two corresponding pixels of two black and white images. Since the two pixels must be appropriate, the images must be the same in size.

Comparison is a very long algorithm, so we have to optimize the work of our program for comparing not two pictures, say 100x100 pixels (10,000 comparison operations), but something smaller, but everything is also relevant (comparing 8x8 pixels pictures, it’s quite difficult to distinguish a letter therefore, the more pixels we take, the greater the likelihood of correctness of recognition and the longer our algorithm requires). Experimentally, the optimal size was selected - 16x16 pixels. But before resizing it is necessary to find what we change.

Allotment

see function TForm1.PreParse1 (Pic: TPicture): string;
So, we need to find the characters in the picture, and find them all. To do this, the picture is translated into black and white ( see procedure TForm1.Mono (Bmp: TBitmap) ), and all objects are searched for through comparison with the extreme pixels of the hypothetical character. A red frame is built around each found symbol; it is for this frame that the image is resized (StretchBlt function) and copied for comparison ( see function TForm1.Parce3 (Pic: TPicture; x1, y1, x2, y2: Integer): TBitmap ; )

Comparison algorithm

See function TForm1.ParceBMP (bmp1: Tbitmap): string;
The algorithm is simple - we compare pixel by pixel ( see function TForm1.Compare (b1, b2: TBitmap): integer ), if two pixels are different (one is black, the other is white), then increment the counter R. After fully comparing the two pictures, we substitute the counter in the formula

The value obtained as a result of the comparison is called the Hamming distance .
Then we form an array of the results of calculations using this formula for different comparisons (our picture with all the standards). Take the maximum value of the array and the corresponding symbol. This is our result!

Base building

The next stage in the work of our program is the creation of a database (an array of records of a symbol and its graphic representation). We will make the standard database of images of Russian characters and numbers without the use of graphic editors - in the source code. To do this, you need to set an image of the desired size, open the canvas mode, and write the necessary letter in the image.
It is also necessary to process the resulting image as we process the original one, so that the standard is as close as possible to the recognizable image and vice versa (read about it below).

Var
myarray:array of record //
ch:char; //
bmp:TBitmap; //
end;
i : integer; //
Img:TBitmap; //()
im:TPicture; // ,
begin
SetLength(myarray, 73);//
im:=TPicture.Create; //
Img:=TBitmap.Create;
for i := 0 to 31 do
with Img.Canvas do //
begin
Img.Height:=16; //
Img.Width:=16;
Brush.Color := clWhite; //
Pen.Color := clWhite;
Rectangle(0,0,16,16); //
Pen.Color := clBlack; //
Font.Color := clBlack;
Font.Charset:=form1.FontDialog1.Font.Charset; // ( )
Font.Size := Form1.FontDialog1.Font.Size;
Font.Style := Form1.FontDialog1.Font.Style;
Font.Name := Form1.FontDialog1.Font.Name;
TextOut(0, 0, CHR(ORD('')+i));//
myarray[i].bmp:=TBitmap.Create;//
myarray[i].ch:=CHR(ORD('')+i);//
im.Bitmap.Assign(Img);//
myarray[i].bmp:=PreParse2(Im);// (
end;

Errors

Where do without them? We will also need a method for handling recognition errors. To do this, before recognition, we must make a special button that will show the characters and the result of the program, and ask us if the recognition is correct. If yes, then do nothing, otherwise enter into the database, the structure of which is the following - the index file, in which the number of characters in the database is specified, the following lines are a symbol, and a link to the image file ( see procedure TForm1.Learn (Pic: TPicture); ).

Short flowchart

Implementation

Define the tasks and draw the form:

The resulting source here .

Screen running program:

What is missing or future plans

- The program cannot recognize letters consisting of two figures: Y, Y.
- When turning / noise / other recognition is not correct.
- Sometimes it recognizes small letters as large, large as small (which is logical for this method)

Conclusion

Here is the next “recognizer” of the text. Who is it useful? Virtually no one. This is not a FineReader, and not Cuneiform. Everything is much simpler and clearer, and less effective. But despite this, I hope this article will be useful to the IT community. This system can be changed for any other needs.

In the near future I want to add a story with graphs of recognition results, as well as a comparison of this recognition method with others.

In the course of writing materials were used:
Pattern recognition using potential functions
Recognition of postal codes (taken image selection algorithm)

UPD: there were problems with the database, I decided, I reloaded the source code, thanks AmoN '

Source: https://habr.com/ru/post/90867/

All Articles