I present to you the translation of my article on Medium.com: part 1 , part 2 . Since the first part of the article contains mostly already stated in this post , I cite the translation of only the second part.
In the first part of the article, I talked about simple approaches that allow building a scalable Selenium cluster without writing code. In this part we will consider more subtle issues of working with Selenium:
All the new tools described in the first part are in fact smart lightweight proxies that redirect user requests to real Selenium hubs and nodes. If you reflect a little, then questions arise:
One way would be to use hardware with one Selenium hub and multiple nodes with different browsers. It looks reasonable, but actually uncomfortable:
The easiest way to have the same number of nodes per hub is to run them inside a single virtual machine. If each browser version is a separate virtual machine, then counting the total number of available browsers becomes an elementary school task. You can easily add and remove virtual machines that contain compatible versions of the node and browser. We recommend this approach when installing a Selenium cluster in the cloud with a constantly available number of each browser version.
What else besides Selenium hub and node is located inside the virtual machine for everything to work?
Virtual machines do not have a monitor, so Selenium must be running in a special version of the X server that emulates the display. This implementation is called Xvfb . It starts like this:
xvfb-run -l -a -s '-screen 0 1600x1200x24 -noreset' \ java -jar /path/to/selenium-server-standalone.jar -role node <... >
Please note that Xvfb is only needed for the Selenium node process.
#!/bin/bash apt-get -y install linux-sound-base libasound2-dev alsa-utils alsa-oss apt-get -y install --reinstall linux-image-extra-`uname -r` modprobe snd-dummy if ! grep -Fxq "snd-dummy" /etc/modules; then echo "snd-dummy" >> /etc/modules fi adduser $(whoami) audio
As you can see, Selenium is a Java application. To run Selenium, you need to install Java Virtual Machine (JVM). The smallest Java installation package, called JRE, is about 50 megabytes in size. Selenium JAR latest version 3.0.1 adds another 20 megabytes. Now add the size of the operating system, the necessary fonts, the size of the browser itself and you easily reach several hundred megabytes. And although hard drives are cheap now, we can do better. Selenium versions 2.0 and 3.0 are also called Selenium Webdriver. This is due to the fact that support for different browsers is implemented using separate applications, called web drivers.
Here's how it works:
Now that we’ve figured it out, the question is: isn’t it too expensive to spend hundreds of megabytes for simple proxying? A year ago, the answer was definitely not, because there was no driver application for Firefox - the most frequently used browser in Selenium. The responsibility of Selenium was to launch Firefox, load a special extension into it, and proxy requests to the port opened by this extension. Over the past year the situation has changed. Starting with Firefox 48.0, Selenium interacts with the browser using a separate binary driver, Geckodriver . This means that now for most desktop browsers we can completely remove Selenium Server and proxy requests directly to the drivers.
In the previous sections, I described how you can build a Selenium cluster using virtual machines in the cloud. In this approach, virtual machines are always running and constantly spend your money. In addition, the total number of browsers available for each version is limited and can lead to the complete exhaustion of available browsers during peak loads. I heard about working and even patented complex solutions that launch and warm up a pool of virtual machines, depending on the current load, in order to always have available browsers. It works, but can you do better? The main problem of hypervisor virtualization is speed. Starting a new virtual machine may take a few minutes. But let's think a bit - do we need a separate operating system for each browser? - No, only simple isolation by disk and network is needed. This is why container virtualization is becoming relevant. At the moment, containers work mostly only under Linux, but, as I said, Linux covers 80% of the most popular browsers. Browser containers start in seconds and stop even faster.
What should be inside the container? - Almost the same as inside the virtual machine: the browser itself, fonts, Xvfb. For older versions of Firefox (<48.0), you still need to install Java and Selenium Server, but for Chrome, Opera, and newer versions of Firefox, we can use the driver application as the main container process. If you use a lightweight Linux distribution (for example, Alpine), you can get very small and lightweight containers.
At the moment, the most popular and well-known container platform is Docker . Selenium developers provide a set of ready-made Docker containers for running Selenium in Standalone or Grid mode in Docker. In order to start a cluster of such images, you need to start and stop containers manually or with the help of tools like Docker Compose . This approach is much better than installing Selenium from packages, but it would be even better if there was a server with the following behavior:
We made such a demon ... and even more.
Over the years of using the Selenium server on a large scale, we realized that it is very inefficient to use the JVM and the "thick" Selenium JAR for simple proxying requests. Therefore, we were looking for a more lightweight technology. Our choice was in the Go programming language, also known as Golang. Why is Go better for our goals?
We never came up with a good name for the demon described above. Therefore, we called it simply Selenoid . To try Selenoid you need to follow 3 simple steps:
{ "firefox": { "default": "latest", "versions": { "49.0": { "image": "selenoid/firefox:49.0", "port": "4444" }, "latest": { "image": "selenoid/firefox:latest", "port": "4444" } } }, "chrome": { "default": "54.0", "versions": { "54.0": { "image": "selenoid/chrome:54.0", "port": "4444" } } } }
As in the XML file for Gridrouter, a list of available browser versions is provided. Since Selenoid runs containers on the same machine or through the Docker API, there is no need to specify host names and regions. For each browser version, you need to specify the name of the container, its version and the port on which the main process of the container is listening.
$ selenoid -limit 10 -conf /etc/selenoid/browsers.json
By default, Selenoid starts on port 4444, just like a regular Selenium hub.
Our experiments show that even containers with a standard Selenium server inside start in a few seconds. In return, you get guaranteed disk and memory status. The browser is always in a state like after installation on a clean operating system. In addition, you can install Selenoid on a large cluster of hosts with the same set of supported browsers saved as Docker images. This gives you a large cluster of Selenium, which automatically scales with the consumption of browsers. For example, if current user requests require more Chrome, more containers automatically launch. When there are no requests for Chrome, containers with Chrome are stopped and the vacant hosts can be used for other browsers.
In order to ensure a better distribution of the load across the cluster, Selenoid limits the total number of concurrent sessions on a host and queues all requests that exceed the limit. Requests from the queue are processed as earlier sessions are completed on the same host.
But Selenoid allows you to run not only containers. It can also run on-demand web driver processes. The main application of this functionality is the replacement of Selenium Server with Windows. In this case, Selenoid starts the IEDriverServer process, which saves memory and avoids proxying errors in Selenium itself.
You may remember that the original GridRouter is a Java application. We wrote a lightweight implementation of this proxy on Go from scratch and simply called the Go Grid Router (or ggr). What are the advantages of the new version compared to the old one?
In conjunction with Selenoid, allows you to create a scalable, reliable Selenium cluster:
In this part, I talked about the latest technologies that can be used to organize a modern Selenium cluster:
In conclusion, in one place are collected links to the products mentioned in the article:
Source: https://habr.com/ru/post/322742/
All Articles