Imagine for a minute that you are an unhappy programmer who gets to work every day on the bus, whose schedule is impossible to predict. In such a situation, it would be very convenient, when having breakfast at home, to receive an SMS on your phone, informing you that the bus will be at your stop in five minutes.
Fortunately, the transport company (which did not bother to draw up a clear timetable) contains a website where you can find out in real time using Google Maps where one bus is located. And then one day your patience cracks and you sit down to write a script that will send the desired SMS message. But it's not that simple. You will have to tinker for a long time to evaluate the speed and time of arrival of a real vehicle at your stop using a map and a small circle on it. You can't just write: “I want to receive SMS when this point falls into this rectangle on the map”. Or can you?
"Eyes of God"
That is how "sikuli" is translated from the language of the Mexican tribe Huichol.
')
Sikuli is a technology for searching and automating work with GUI elements based on images (screenshots). The project authors are Rob Miller (professor at the Massachusetts Institute of Technology), Tsung-Hsiang Chang (a graduate of the MIT) and Tom Yeh (a student at the University of Maryland). Their project was the best student work at the User Interface Software and Technology conference held under the auspices of ACM (Association for Computing Machinery).
So what is Sikuli? In fact, it is an API that allows you to write scripts (in Python) that automate the work with the user interface. The big advantage is that there is no need to have access to the source code of the applications you are going to work with. Sikuli uses only their screenshots to search for specific GUI elements, and emulates user actions such as mouse clicks and keyboard input to control the application. As a result, you can automate absolutely any applications on any operating system on which Sikuli is installed.
It’s not easy to describe everything in words, so I’ll give you a fragment of a script that automatically configures an Internet connection:
And this is not a diagram or part of any presentation. This is the script code itself. I don’t know about you, but I haven’t been able to pass a button image as a function =)
IDE
It is clear that in a regular text editor such scripts are not written. Therefore, the developers have created a very concise (but requiring improvement) programming environment - Sikuli IDE.
Here’s what the main window looks like:
Toolbar provides the programmer with only five actions to choose from:
- select the area of ​​the screen and it miraculously becomes a literal (if I may say so) in our scenario.
- we load images for the subsequent use in the scenario.
- we define a geometric region (a rectangular area of ​​the screen) in order to narrow the search area of ​​interface elements.
- running the script.
- a kind of debugging. Runs scripts, displaying all its actions.
You can save your work as source (a bundle of screenshots, a text-based python script and a visual script in html), or export it to an executable skl script. You can run this kind of script either by double-clicking the mouse (Mac only now!), Or like this:
Mac: open /Applications/Sikuli-IDE.app xxx.skl
Windows: PATH-TO-SIKULI / sikuli-ide.bat xxx.skl
Linux: PATH-TO-SIKULI / sikuli-ide.sh xxx.skl
Install Sikuli IDE on Mac OS X , Linux and Windows . Jython acts as an interpreter, so you need to have Java 5+ to work.
Sikuli API
A few words about the API that developers provide. It includes only two classes: Key and VDict . The first is a set of constants for special (non-character) keys, such as Enter, Tab, Home, etc. The second is an analogue of python dictionaries, using images as keys. In addition, at the disposal of the developers, several dozen functions, most of which take as parameters of the image (screenshots of interface elements). Here are some of them:
click (img, modifiers = 0) - makes a click on the area of ​​the screen that is most similar to img (file name or directly the screenshot); the search is performed using the find (img) function; modifiers - keyboard modifier mask
closeApp (app) - closes an application named app
hover (img) - finds the most similar to img area of ​​the screen and hovers the cursor on it
popup (msg) - displays a dialog box with the message msg
switchApp (app) - transfers focus to the app; if the app name is not found among the running applications, then openApp (app) is automatically called; (in Windows, the search is not by the application name, but by the test in the title bar)
type (* args) - test input (here the Key class may be useful, for example, for pressing Enter)
untilNotExist (img, timeout = 3000) - waits until the image img disappears from the screen; timeout - wait time in milliseconds
Thus, your script can do everything that an ordinary computer user can do.
Unit testing for GUI
Another possibility (and probably more serious) of Sikuli is to write scripts to test interfaces.
Again, the developers of Sikuli IDE tried to reduce the size of the code needed to complete the task. The testing code is automatically wrapped in a python class derived from junit.framework.TestCase, and the developer only needs to implement the standard setUp (), tearDown () methods and the test methods directly.
Access to the test panel can be obtained via the View / Unit Test menu item or by using the Ctrl + U combination. In it, you will find the Run button, which is responsible for running the test.
Personal impressions
The project is definitely interesting. So far there is only a beta version, which still requires a lot of improvements in terms of both performance and search for controls.
The script I wrote for launching iTunes, forcing podcasts to be updated, syncing a connected iPod, removing it and closing iTunes has not yet completed the task. Instead, he launched iTunes, rummaged through audio books (maybe looking for something?), Immediately removed the iPod and closed the program =)
A simpler, but less useful task, with the help of Sikuli, I still managed to solve. The script was able to independently send Google hello through the google toolbar in firefox. Although not immediately. I had to give him information in which part of the screen to look for the input field for further search. Otherwise, he tried to enter 'hello' in the address bar and, forgetting about me, plunge into the wilds of the World Wide Web.
Returning to the bus, it can be said that at present this problem with such ease and ease probably will not work. But the Sikuli project is the beginning of a new approach to programming, which may soon open up unprecedented possibilities for automating such actions. Perhaps this approach does not look too serious, but this does not mean that it is useless. But if you still prefer more serious programming, then the flag in your hands - make your contribution to this wonderful open-source project. After all, behind such an external simplicity hides a very not simple implementation of Sikuli.
Demo video
Here is the official video demonstrating work with Sikuli IDE.
"Bibliography"
Python411 Podcast Series is not the most important link, but it was the first in my acquaintance with the Sikuli project (it’s about him described in the latest issue). An interesting python podcast from an amateur programmer.
Sikuli Project Official Site - the official site of the project. You can watch demos on it, download the Sikuli IDE source code and intaller for your OS and, of course, read the documentation.
Picture-driven computing - an introductory article about technology and about the main goals of the project. It was from there borrowed "tale" about the bus and some other fragments of the topic.
Sikuli in Launchpad - more detailed information about the project for programmers: source codes, releases, bug reports and just an opportunity to contact the developers. Thanks to the_toster for the link.