Vasily Ryabov vasily-v-ryabov from the company Aquantia explains how using Python you can start testing your desktop interfaces. From the lecture, you will learn about open source tools and accessibility-technology support in the pywinauto library. Video and decryption are mainly intended for those who are engaged in testing software for Windows, but the author also talks a little about Linux and macOS.
- Hello. I have a technical report about automating desktop GUI tests. It will be purely about open source, about the paid I will not mention. Almost all open source solutions that can be used will be mentioned.
I do not pretend to have a full review and comparison, but I will tell you what I used and what our small community works on. ')
The task of the desktop GUI itself is more complicated than the web, because there are no uniform standards. For example, to send some control actions, it can still be solved more or less easily, but not so simple.
We also need to be able to get the text from the controls in order to verify some information, but not only for this. We also need to know exactly where to click, so as not to just click anywhere.
There are three common approaches. The most unreliable - they took zahardkozhennye coordinates, poked somewhere, with a finger into the sky. Once, and the screen resolution changed, the theme changed, the window size, the window opened in another place, something else, everything went and swam. Support is impossible, easier to test hands.
The second approach is more reliable, but also not so stable - based on pattern recognition. Now there are tools that have gained some popularity. This is not about text recognition. Rather, just search for a similar square or rectangle on the screen so that you can click there. The most popular of them is Sikuli. Lackey - analogue in pure Python. Basically, the report will be about tools in Python, but I will also mention other languages.
Finally, the most reliable and fast approach is based on accessibility technologies. Unfortunately, it is not always applicable - if you have nowhere to go at all, you can somehow recognize the image.
What are accessibility technologies? Mostly you have already heard about them: the good old Win32 API and the latest technology from Microsoft, not so new - MS UI Automation, which partially includes the Win32 API, but its hierarchy is different, everything is not so simple. It supports many more applications in itself.
Surely many of you used object inspectors like Spy ++. This is an automatist's best friend. UI Automation has its own object inspector, which also allows you to view the hierarchy of windows and find out how the tree is located there. This is Insect.exe, it is located in the Windows SDK in the Program Files folder. If everything is set, you can find it.
There are also technologies on Linux and Mac, I’ll mention them only on one slide. They exist, there are even tools that use them.
First, let's talk about the open source tools for the good old Win32 API. Though he is old, but smart.
There are quite well-known tools, such as AutoIt and AutoHotkey. Each of them has its own scripting languages, which is a bit not universal, but they are popular and have long existed on the market. Although the same AutoIt is rather positioned by developers for admin tasks, automation, in order to deliver something. Nevertheless, many testers adapted it for GUI tests.
At one time, I came across a Python library called pywinauto. Then I did not know about AutoIt or AutoHotkey. It is in pure Python, a rather beautiful interface, the license is also convenient. At one time I chose pywinauto for my tasks.
To illustrate, an example of how easy it is to write code on pywinauto. Just run the application, then we refer to the name of the main window, as an attribute of the object. Just as a member of the class appeal. Then we also turn to the same combo box - namely, the one that has Color written next to it. We select everything in the text. We click on the buttons and control that our window has ceased to be visible, that it has closed.
It looks quite comfortable, but we will analyze in more detail how this all works. Under this simple external concept a lot of things lie.
First, a little about history. The library appeared in 2006, until 2010 it was developed by the author Mark McMahon. Just in 2010, I began to use it, and it just ceased to be supported. So it came out, this is life. Then we in our applications just switched to 64 bits and, when we wrote the first tests, we realized that we need to test 64 bits already. I had to shake up the insides of the library myself.
We used our internal clone for a while in Intel, we lived with this clone for a very long time, it was quite stable, and by the end of 2014 it came to an understanding that we needed to return it to open source, because the project was not going to come to life itself. I had to revive him.
At the same time ported to the third Python from the second. After all the bureaucratic steps, a new major release of version 0.5.4 was released, and I continued to pursue this project in my spare time.
The next major release, which came out this fall, supported the MS UI Automation technology, that is, a much wider range of applications. He has already left the developer community. We continue to support this library, there are still many plans to support other platforms.
About the technology itself UI Automation. The name you heard, I guess. Some people think this is pure .NET technology. In fact, it is not. There is a wrapper on .NET. And the core of this library, for the client part, has a native COM interface.
There are other open source tools on the same C # that uses .NET technology. These are TestStack.White, which already exists for quite a long time, and the younger Winium.Desktop, which offers an interface in Selenium-style.
And that, and other library can be used, if you prefer C #. At one time, I gave students the task to drag and drop a file from explorer.exe using drag-and-drop to the same Yandex or Google Drive. Both with White and Winium, they coped normally, both tools are quite usable. The only thing is not quite clear how in C # they could become cross-platform in the future.
The pywinauto library is by popularity in the middle between them, everything is ahead.
UI Automation would use everything if it were completely perfect. Unfortunately, it is better than the Win32 API, but there is still some bulkiness. Plus, using the .NET wrapper may not be very good, because sometimes it is easy to miss some controls. There is a rarely reproduced but known bug.
Plus the COM interface, though native, is not quite standard. This is not IDispatch, as in MS Excel or Word. They use their own custom interfaces.
And of course, technology, too, not everyone can. Java has its own distinctive window system. GTK +, in my opinion. Generally with accessibility-technologies on Windows they have nothing. This is not 80% of the market, if we talk about Java and especially about GTK +. Therefore, you can cover a wide range of applications.
At one time we tried to use UI Automation from under .NET. There is a Python interpreter that works directly on .NET, “Iron Python”, but it is not perfect by itself either, I had to crush it with a small C # library, because there is an unpleasant bug with ArrayList. Moreover, the project has for some time been about to die. It seems to be resurrected again, if we talk about "iron Python". In general, not an option.
But there was a solution for pure Python, called CPython. This is Python, which is available to everyone on python.org or ActiveState Python.
There is a third-party non-standard Python project called Com Types. It supports these custom COM interfaces. He really saved us and allowed us to do a lot. Of course, he has his limitations. Apparently, there is a deep and small, but unpleasant and hard-to-find bug, which does not allow loading functions from the library that allow processing custom controls. Maybe in the future someone will solve this problem, but it is difficult to hope for someone.
Nevertheless, we can support all standard controls. Changes that need to be made to the pywinauto code: we introduced the concept of a low-level layer, left Win32 API support and added what we called another backend, uia.
At first, the only difference is that we run the application not only as an application object, but indicate that it supports another backend. And that's all. Then almost everything is the same. Not really - the hierarchy of windows is slightly different for different accessibility technologies. But fundamentally the approach is the same.
What is this approach? First of all, we need a starting point from which we will dance. This is an application object. You can start it and connect to an already running application. There is a large range of criteria by which we are able to connect: by exe, by windows somehow. There may be more criteria, only examples are given here.
In some desktop applications, especially on Windows 10, even the hierarchy of windows can be spread over several processes — like, for example, a calculator. In order to run throughout this hierarchy, it is sometimes more convenient to take and from the root element to dance, tons of the desktop object, which we recently implemented specifically for this purpose.
From the application or root element, we can create a window specification. This is just the core concept, the very foundation of the pywinauto interface device. Window specification - we just describe it. It may not exist, and it may no longer exist on the screen. And from this description we can look for the control.
After we found it, we create a Wrapper object - a wrapper attached to a real-life button, a real-life editbox, something green with tentacles that can pull control and control it.
Let us see in more detail what kind of descriptions you can create.
The first and easiest is to simply go from the application by the name of the main window, then by the name of the button, for example. But we have a restriction on access to an object through a point, by attribute. If the text is in Russian or written in Chinese - what to do? We'll have to use access by key, as to the usual Python dictionary. It is almost the same.
To be perfectly accurate, this is equivalent to the third option. The search for the window is approximately up to typos, because spaces cannot be entered into the access by attribute in the first option. So we can use this in more detail - creating detailed descriptions. There may be more than one criterion, but, as shown below, two or more at once, so that the text is the same and that the type of control we have coincides. When we specify the type of control explicitly, it works faster. This is a small trick on writing faster code.
How does this whole kitchen work? From the window specification, you need to somehow create a Wrapper. The first blue line and the second one work in exactly the same way, they do the same thing, just creating a Wrapper object is hidden with Python tools. Python has not yet done so.
If we do this without a click, but simply write such a statement at the bottom, they will return slightly different things. The first option will return the window specification, the second - Wrapper, with which you can work. When laying, when developing a test, it is convenient to explicitly call a Wrapper object and see what methods it has, what we can do with the control. In the production code, in order not to clutter up, you can remove these Wrapper objects. Then they will be automatically created, and the Click method or something else will start to be called.
Prior to this, we did not explicitly indicate inside what class object we are creating. Pywinauto can automatically search for the desired class and create the desired class for the desired Wrapper. It works through metaclass in Python. Another black magic python. This is a kind of registry of objects, called the registry pattern. It stores some class registries. Depending on what type of control you have - usually this is a string of a class name - it can create an object of the necessary class.
If we call the base class, which is called HwndWrapper, then the second branch new_class automatically searches for the necessary Wrapper and creates an object of this class. And if we create a derived class, it simply creates it explicitly. If we want, we can explicitly create the specified class, but in most cases this is not necessary, which slightly reduces the input threshold for people with little or no programming experience.
This is how the derived class from HwndWrapper is implemented. This indicates which class name is accepted. This is support for standard ComboBox and Windows Forms. Next, we have methods that can pull controls.
Consider a part of the example with dragging from explorer.exe somewhere in Yandex.Disk, but without Yandex.Disk so far. The example is not so trivial, in explorer using the Win32 API, not everything can be done. We have the pywinauto folder open, we want to click on the file and see its properties, a training example.
To complete it, you need not so many lines on pywinauto. We connect to the explorer, which has the window title we need. Switched to the active window. Next is a list of files.
It is necessary to add a function that waits until the use of the processor drops, because there is a lazy initialization, more on that later. Also one additional line.
We do the right_click_input, it clicks, we call the properties menu in the context menu with one line. Then, since the properties are opened not in explorer.exe, but in another process, we go to it through the desktop object, we get a dialog and just click on the Cancel button. All it takes is so much.
Result.
A little bit of magic. Supervision
Where can I get identifiers by which we can communicate to controls? Suppose there is a main window, it has a title, it is visible. Then you can print all the identifiers. Some things can actually be copied into Python code, into your own script.
For example, the ComboBox attribute has two suggested names: simply ComboBox and border-sized ComboBox. On the left - just a static control.
We have five ways to cast a spell. We can apply simply by text, the OK button. We can say that this is not just OK, not just some inscription, but also a button ,. We can, if there are a lot of buttons and all have the same text, just refer to the index: button1, button2, etc. If the controls are dynamic, EditBox, the text in it is constantly changing, we enter and delete something there, then we need something something static to access the control equally each time. Any label on the left or a TextBox can be static. In the case of Tab Control and other elements of the list, we can refer to the type of control and the name of one of the tabs. Internal text can also participate in this name. This scheme fully covers how to access controls.
There is a very detailed way - when we create a Windows specification by calling the Window method. It doesn't matter how the specification is created. According to it, we can wait for it to appear or when it disappears, switch simple states of the type enabled or visible.
Another useful thing, when it comes in handy for the same explorer, is to wait for the CPU load to fall. And not for the whole system, but for this particular process. There are such things in the Win32 API, we implemented them through such a simple wrapper. We expect that the processor load for the specified process will fall below 10% and, if this has not happened within 10 seconds, we throw an exception.
Also, for expectations, there are methods that do not throw exceptions, but simply return true or false, exist, visible, etc.
Maybe it looks understandable, but the topic of desktop automation is complex, there are a lot of pitfalls.
It takes some experience to write GUI tests. Programming experience is very desirable.
About common problems that do not depend on the library. You can meet them both with pywinauto and with any other library.
Often for graphic tests you need an active desktop. If you locked the system and went to dinner, there is no active desktop. If you went through the remote desktop and turned it off, you again have no active desktop. And if unfolded, that is. You want the script to run, test something while you are having lunch. For remote desktop, you can not minimize it, but exit the full-screen mode, leaving it in window mode, run the script and quickly switch to your local laptop, for example. At this time, the script works fine, but the way is a little collective farm, because you need to turn off the laptop and go home in the end.
To run tests in the lab, you need to configure the VNC server, and even if we disconnect from the session, the active desktop is still disconnected. This is one of the tricks that had to be learned and applied.
And of course, if you run the script from under the same jenkins-slave, which God forbid, comes as a service, then again it will not work. This is an operating system limitation. It is impossible to work with the graphical interface as from under the service.
A little about other platforms. On Linux, there is also a Python package - pyatspi2, which uses atspi technology. On OS X there is pyatom. They are not very convenient to use and require, in particular, compilation. Pywinauto does not require compilation, just put one line. Javista recommend the Jemmy library of those that can really be used.
Recently, our small community has been working in this direction. Technology atspi on Linux and OS X. There is a simple prototype, which is only in the process of learning.
As for Java, they have not even started there. But from more or less usable there is a library JPype, which from pure CPython allows you to call Java code. I know there is a jython that runs right in a java virtual machine. This is also a Python interpreter, it also has features that are not very compatible with regular CPython. A more promising direction in the development towards Java, I would see JPype.
Of course, I want more features. Other things have already been laid in pywinauto, but they are still poorly used. Implemented global hooks when we can subscribe to an event in the operating system. , , . , , , .