Testing GUI Applications Using OCR

Functional testing of the interface (GUI) of applications is a very important task, necessary, but not always trivial. The main question is: how to simulate the user? A simple, ordinary user who will have to work with your software directly from day to day.

It would seem, and here the recognition of texts?

How and what they usually test and automate

In general, this is a boring text, based mostly on experience.

Usually repelled from the software that needs to be automated. For example, for Win applications, you can look towards MSAA and its development of UI Automation. This framework recognizes the controls and screen text quite well through the API. Web applications can be "poked" by Selenium or WatiN, etc.

Such methods are not always suitable, for example, when a user works with an application through a remote desktop, or a Web application is stuffed with third-party controllers (ActiveX, Java applets, etc.).
')
Very well, this issue is described here , and the list of software is here .

Idea

we will go the other way © Mayakovsky

But what if you take a screenshot, remember some area and as a test recognize the text in this area. If there is an expected text in the specified area, then you can click there.

It sounds good, you have to try.

And the thorny path of implementation

First you need to choose the engine for text recognition or write it yourself. Writing for yourself is a non-trivial task, so let's take something ready and preferably Open Source. Well, for starters, read the manuals .

There is such a miracle called Tesseract OCR. And to it there is an API wrapper on .NET! It sounds simple, but not everything is so simple: take a picture, recognize and ... nothing - Tesseract did not see the text in our picture! To be more precise, I did not see a picture on a piece .

How to take a picture of a rectangular area of the screen

public static Bitmap GetAreaFromScreen(Rectangle area) { var rect = new Rectangle(area.X, area.Y, area.Width, area.Height); var bmp = new Bitmap(rect.Width, rect.Height, PixelFormat.Format24bppRgb); using (var g = Graphics.FromImage(bmp)) g.CopyFromScreen(rect.Left, rect.Top, 0, 0, bmp.Size, CopyPixelOperation.SourceCopy); return bmp; }

A piece of 'bad' code that didn't recognize anything

 public string RecognizeText(Bitmap source) { try { using (var page = engine.Process(source, PageSegMode.SingleLine)) { var text = page.GetText(); var conf = page.GetMeanConfidence(); threshold = conf; return text; } } catch (Exception e) { Trace.TraceError(e.ToString()); return ""; } }

Rummaging a bit ~~into the intricacies and mechanisms of recognition~~ on the Internet, I found this: stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy .

So, you must first do:

1. fix DPI (if needed) 300 DPI is minimum
2. fix text size (eg 12 pt should be ok)
3. try to fix text lines (deskew and dewarp text)
4. try to fix illumination of image (eg no no dark part of image
5. binarize and de-noise image

And only then try to recognize something there!

Here the AForge.NET library came to our rescue . It is, by the way, Open Source, like Tesseract.

a piece of 'good' magic code with AForge

 public string RecognizeText(Bitmap source) { try { var seq = new FiltersSequence(); seq.Add(new ResizeBilinear(source.Width * 2, source.Height * 2)); seq.Add(new Grayscale(0.2126, 0.7152, 0.0722)); seq.Add(new OtsuThreshold()); seq.Add(new Threshold(100)); temp = seq.Apply(source); using (var page = engine.Process(temp, PageSegMode.SingleLine)) { var text = page.GetText(); var conf = page.GetMeanConfidence(); threshold = conf; return text; } } catch (Exception e) { Trace.TraceError(e.ToString()); return ""; } }

I will explain the points:

DPI - the screenshot of the DPI norms is enough for us;
fix text size - it depends on the font size, sometimes it is necessary to increase, sometimes to make it thicker or taller. ResizeBilinear to help;
try to fix text lines - it depends on the accuracy of the selected area of the screen. Select the text sightingly, trying not to catch "artifacts" - handmade;
try to fix illumination of image - Grayscale and that's it!
binarize and de-noise image - we binarize and despair (!) OtsuThreshold and Threshold (100).

Now much better.

The kernel of the program is ready, it remains ~~quite a bit and everything will be cool~~ to write an application with some semblance of a test designer, where you can mark out the screen area and save in a kind of test plan. And write a test player. ~~Business, for 15 minutes.~~ Of course, I had to tinker here, but this is the topic of another article.

In the end, combining everything together, we got a simple test designer and a simple player.