Some time ago I was asked to expand one old comment to a full-fledged topic. I don’t think that in itself it is interesting enough, but I had an idea: why not combine the useful with the pleasant and not get acquainted closer with one curious tool, the news about which has recently spread all the IT resources.Problem
The main task that we will be solving within the framework of this topic is the preparation of scans and photographs of written sources (books, lectures, etc.) for their printing, compact storage, packaging in djvu, etc.
Photoshop and FineReader will not be considered. Although they provide a number of useful tools, they cost money, generally speaking.
In the presence of a scanner, everything is usually simple: you get images of good enough quality so that you can get by with minimal processing.
With photos it is more interesting: problems with lighting and geometrical distortions are added. Alas, it is difficult to automate the correction of geometric distortions. But with lighting and background it is quite possible to fight. What we do.
Instruments
Paint.NET is a raster graphics editor for Windows, with support for layers and filters.
Sikuli is essentially a tool for automating interactions with the graphical interface. Plus additional opportunities for testing applications, but in this article we do not touch them. We will use Sikuli to compensate for the lack of full support for macros in Paint.NET.
The main killer feature Sikuli should be the visibility and simplicity of the created scripts, according to the principle
“What you see, it works the way” (
“What you see is how it works” ). True, the overall dampness of the project spoils the impression. I worked with version 0.09. In the recently released version 0.10, the main rake has been removed, but there are still no many usual things, like the Undo function in the editor.
By the way, I recently stumbled upon a QAliber project. Apparently, it has a number of advantages in terms of interaction with the interface under test and general elaboration. But visibility ... In general, you can see and feel the difference :) Although, perhaps, on occasion, I’ll try to use QAliber.
The Sikuli architecture includes several layers written in various languages:
- The top level is the Jython API. In essence, the Sikuli scripts are programs in Python, and refer to the functions provided by the Jython API. (Each project is stored in the% scriptname% .sikuli folder. Inside the folder is the% scriptname% .py file and images in PNG format.) The author mentions the possibility of implementing the top level in any other language running on top of the JVM. You can work with Sikuli Java API directly from your program.
- The middle level is a Java API. It works with keyboard and mouse, and also interacts with the OpenCV library to search for predefined graphical templates on the screen.
- Accordingly, the lower, platform- dependent level is the OpenCV library , implemented in C / C ++.
I described the architecture is not quite as the author, but most importantly, that the idea of the system can be made.
Theory
Since our task is, in essence, the separation of the useful signal from noise, then, to explain the idea, you can use suitable analogies: a bandpass filter and an active noise cancellation system.
')
A simple
Threshold filter acts as a band-pass filter, simply “cutting off” pixels with a brightness below the specified boundary (setting the brightness to 0 for them, and to 255 for all others). More advanced
Levels sets two boundaries between which values change smoothly.
In case the brightness inside the image changes over a wide range, it will not be possible to “cut off” the noise without losing the useful signal just using a bandpass filter. We need a more cunning method.
The principle of active noise cancellation systems in two words can be expressed as: "(Signal + Noise) - (Noise) = (Signal)".
(Beep + Noise) is our shot. (Noise) - this is the background, everything except the text. (Signal) is, accordingly, the text.
At the beginning we only have (Signal + Noise), but in our case it’s easy to get from it (Noise), if we use a certain property of the useful signal (text): it consists of thin lines.
It is necessary to choose a filter that neatly “blurs” the text, so that the image looks like a clean sheet. As such a filter,
Median Blur is suitable (which in Paint.Net is for some reason in the
Noise menu as a means of
fighting noise . Well, we will use it for the opposite purpose, removing the useful signal :)
True, with illustrations, things may not be so smooth, and they will have to be processed separately ...
The action algorithm is as follows:
- Apply a Median Blur filter to the original image to get a clean background without text;
- Calculate the difference between the original and obtained in claim 1 images;
- Invert the image obtained in step 2 (we need dark text on a white background);
- Apply Levels filter to even out the contrast and get rid of the slight noise left after 1-2.
There could be beautiful schemes and illustrations here, but I could not reconcile my perfectionism with design abilities (or rather, their absence). I hope the meaning is quite transparent and without pictures.
Automation
So, the task for automation is using Sikuli to open and process a set of images according to the described algorithm in Paint.NET.
I didn’t think of anything better than opening the folder with images in advance and letting Sikuli walk through the icons launching Paint.NET through the context menu ...
Open Sikuli IDE and start a new script by declaring the necessary variables:
patterns = [
,
,
]
openwith_img = 
paintnet_img = 
waitfor_img = 
edited_text = "_edited"
base_timeout = 30000
negation_mode = 
difference_mode = 
- patterns - an array with images of those file formats that will be processed;
- openwith_img , paintnet_img - context menu items to click on;
- waitfor_img - the opening operation of Paint.NET will take some time and is considered completed when this fragment appears on the screen;
- edited_text - suffix that will be added to the names of processed files;
- base_timeout - the base value of the waiting time of all resource-intensive operations (in milliseconds), so as not to change the timeouts throughout the script, if necessary;
- negation_mode , difference_mode - while I was writing the script, I experimented with these two modes of layer blending. Therefore, it was convenient for me to declare them as variables.
Here it is necessary to pay attention to the fundamental problem of the Sikuli approach - the limited portability of scripts.
Almost certainly you have different icons of graphic formats. They will have to add to the script yourself. The rest of the images may be affected by the OS and the design used (VisualStyle). In my case, this is Windows XP and
Opus OS from b0se.
All the necessary functions follow.
def OpenWith (x, y, w):
rightClick(x)
click(openwith_img)
click(y)
wait(w, timeout = base_timeout *3 )
Opening a file through the context menu. The function should receive three patterns: a file icon, a menu item corresponding to the desired application (Paint.NET, for example), and a fragment that appears on the screen when the download is complete.
Habrap User forgive me for meaningless variable names.def SaveFile (suffix):
type( "f" , KEY_ALT)
click(
)
type(Key . END + suffix)
sleep( 1 )
type(Key . ENTER)
sleep( 1 )
type(Key . ENTER)
sleep( 7 )
Save file in Paint.NET. Press Alt + F to get to the File menu.
(In the script, I do not use all possible keyboard shortcuts to navigate the menu, although this would reduce the script and reduce the number of graphic fragments. I was faced with the fact that the combinations with Ctrl + Shift did not always work in Sikuli, so I acted in a more reliable way. )After clicking on the “Save As ...” menu item, the input focus will be on the file name input field. We add a suffix to it. I did not come up with a reliable sign of the completion of the save, and therefore at the end of the function I inserted inaction for a sufficient period of time (7 seconds).
def DoBlackWhite ():
type( "a" , KEY_ALT)
click(
)
wait(
, timeout = base_timeout)
The B / W filter is the first filter we need. Alt + A open the
Adjustments menu and select the required item. The filter works without parameters. We are waiting until the corresponding mark appears in the
History panel. (A very convenient panel appeared.)
def DoDuplicateLayer ():
type( "l" , KEY_ALT)
click(
)
wait(
, timeout = base_timeout)
Clone a layer. The process is similar. In our case, you do not need to switch between layers. This is good, otherwise you would have to tinker with the
Layers panel.
def DoInvertColors ():
type( "a" , KEY_ALT)
click(
)
wait(
, timeout = base_timeout)
Filter Negative. Similar to the previous one.
def DoOilPaint (a, b):
type( "c" , KEY_ALT)
click(
)
click(
)
sleep( 0.1 )
type(a + Key . TAB + Key . TAB + Key . TAB + b + Key . ENTER)
wait(
, timeout = base_timeout *2 )
Oil Painting Filter. I initially used it, but ultimately refused in favor of
Median Blur . Still save for the story :)
(There is no point in worrying about the dead code in this case. Suddenly, someone will come in handy ... In fact, all the functions for working with Paint.NET would be worth putting into a separate file if Sikuli would support the appropriate feature.)This is the first filter to have a settings dialog. A couple of necessary parameters are passed to the function, which are entered into the appropriate form fields.
def DoMedian (a, b):
type( "c" , KEY_ALT)
click(
)
click(
)
sleep( 0.1 )
type(a + Key . TAB + Key . TAB + Key . TAB + b + Key . ENTER)
wait(
, timeout = base_timeout *2 )
The
Median Blur filter is in the
Effects> Noise menu. It is configured similarly to the previous one, and is very useful to us.
def DoLayerBlend (mode):
type(Key . F4)
click(
)
click(mode)
type(Key . ENTER)
wait(
, timeout = base_timeout)
type( "m" , KEY_CTRL)
wait(
, timeout = base_timeout)
Mixing layers. For F4, open the
layer properties dialog and select the desired blend mode (passed as a parameter). Immediately glue the layers by Ctrl + M.
def DoLevels (iwp, ibp, ogamma):
k_del = Key . DELETE + Key . DELETE + Key . DELETE + Key . DELETE
type( "a" , KEY_ALT)
click(
)
type(k_del)
type(iwp)
type(Key . TAB + Key . TAB)
type(k_del)
type(ogamma)
type(Key . TAB)
type(k_del)
type(ibp)
sleep( 0.1 )
type(Key . ENTER)
wait(
, timeout = base_timeout)
Filter
Levels . The dialog allows you to configure five parameters: Input White Point, Input Black Point, Output White Point, Output Black Point, Output Gamma. At the output of the filter, we need to get the maximum contrast, so the OWP and OBP do not touch. The rest is passed as parameters.
The behavior of the input fields in this dialog is different from the other dialogs. You have to specifically clear them, imitating clicking on Delete.
def DoFilter ():
DoBlackWhite()
DoDuplicateLayer()
DoMedian( "35" , "50" )
DoLayerBlend(difference_mode)
DoInvertColors()
DoLevels( "235" , "200" , "1" )
We begin to collect all the blanks in a single unit. Actually, the rest of the script is for the sake of ensuring the work of this function. Here you can call the sequence of filters with the necessary parameters.
(It is recommended to adjust the DoLevels () parameters for each set of images, although at the end of the article I present examples performed in one pass with the specified parameters ...)def RunTaskOverImage (x):
OpenWith(x, paintnet_img, waitfor_img)
sleep( 2 )
DoFilter()
sleep( 1 )
SaveFile(edited_text)
sleep( 1 )
closeApp( "paint.NET" )
sleep( 1 )
Opening, processing, saving, closing a separate file. The parameter is the found
region that contains the file icon (or
pattern ) to be processed.
def main ():
for pat in patterns:
setThrowException( False )
find_regs = findAll(Pattern(pat) . similar( 0.95 ))
setThrowException( True )
if find_regs:
for region in find_regs:
RunTaskOverImage(region)
Search for all files on the screen, and processing found.
setThrowException () - the function allows you to change the behavior of Sikuli in the case when
findAll () does not find a single region corresponding to the pattern. In this case, we are not afraid if any pattern is not found on the screen.
Pattern (pat) .similar (0.95) - the pattern search is performed with some tolerance. This should, if possible, compensate for differences in interface settings on different machines. The default factor, 0.7, is a too soft condition. As a result, all my icons were considered the same, and the script tried to execute three times in a circle (by the number of patterns in the array). 1.0, however, should not be set either: OpenCV can skip even the necessary icons in this case.
sleep( 1 )
main()
popup( "done" )
Final chord: call the function
main () and report the completion of the script.
The
main () function is highlighted for easy debugging. Instead, you can substitute a call to any of the functions described, and debug it separately.
Download source code archiveView full source code.Testing
For the tests used: a picture from the comments, based on which this topic was written; a couple of free snapshots from your archive; random snapshot from the internet.
The speed was measured on a laptop with a Pentium M 2 GHz and 2 GB of RAM. Script execution time over 4 test images:
- Run 1: 6:32
- Run 2: 6:57
- Run 3: 6:47
- Run 4: 6:38
Average time: 6 minutes 43 seconds. Average processing time per image: 1 minute 41 seconds.
The main time is eaten by filters. But, I think, by optimizing the script, it would be possible to save ten seconds per image ...
findings
- If a person can extract useful information from the incoming data stream (read the text, parse the captcha ...), then an algorithm for the computer that extracts this information can also be compiled. The complexity and versatility of this algorithm is a separate issue. The more we want, the more details will have to be taken into account in the algorithm. The described algorithm allows you to clear text snapshots in more severe cases than a simple Threshold filter, but it also has its limitations.
- To consider Sikuli IDE as a serious tool, today is difficult. And not because “programming with pictures” is a silly idea. Just using Computer Vision while working with the interface is not very reliable, and the available tools are not very convenient and can add trouble even when solving simple tasks. At another time, when a similar task occurs, I will try QAliber .
- For a number of tasks, I think the Sikuli Java API will come in handy as a convenient wrapper over OpenCV for use in your own testing tools, etc.
Resources
Paint.NET official websiteOfficial site of Sikuli. Download links, documentation, etc.Blog with announcements and sample scriptsSikuli Documentation Version 0.10Sikuli page on LaunchPadPS: Thanks to
free0u for support. I apologize to those who were forced to wait and to whom this article would be more useful before the session than after.
UPD: Transferred to "Algorithms". If there is a better option - write.