📜 ⬆️ ⬇️

We put Selenium Grid on Apache Mesos wheels

Hi, Habr! My name is Nastya, and I do not like queues. Therefore, I will tell you, using the example of Alpha Laboratories and our research, how we can organize the infrastructure and architecture for running tests in order to get the result several times faster. For example, we managed to achieve such a figure as 5 minutes of total test time for an application. To do this, we had to change the approach to launch Selenium Grid.



Before I start talking about the selenium grid itself and everything related to it, I want to clarify the essence of the problem that we were trying to solve.
')
Last year, we implemented DevOps as a process. And at one moment, automating everything and everything, we realized that the time to market for each artifact at the testing stage should not exceed 30 minutes. Conceptually, we wanted some releases to pass the authentication if they do not need acceptance testing. For those artifacts that need to be checked by hand, 30 minutes is the time for which the tester receives the results of the autotest run, analyzes them, and also does acceptance testing. At the same time autotests should be automatically launched within our pipeline.

To achieve this goal, we needed to speed up the run of autotests. But in addition to speeding up autotests, it was necessary to make sure that with all the abundance of projects we did not have a queue for their launch.

Most often, the task of accelerating the AutoTest run is solved in two ways:


We in our company adhere to the second approach, but not because we have no money. I am an engineer, and, like many engineers, lazy about such matters. Therefore, I decided to take a more difficult and interesting path. And at the same time save the bank that same bag of money.

So, the goal is clear: to speed up and eliminate the queues to run autotests without raising additional funding.

At the very beginning we had a rather small park consisting of 15 virtual machines.


In total, we have about 20 projects with autotests, which are launched at different times and with different frequencies.

Our teams:


And all teams focus on delivering value to the customer quickly. Of course, no one wants to “hang” in the queue to run autotests.

Resources sorely lacked. Why? Let's look at a specific example:

  1. We have a project in which about 30 tests (this is an average figure)
  2. If we run tests in one thread, then this is at least 30 minutes.
  3. Our goal is to meet in 10 minutes - it means that we need to parallelize the test run on several browsers, and accordingly - on several machines.
  4. So, we run these tests in parallel in at least 3 threads. In practice, it turns out that each project generates from 5 to 10 threads.
  5. And now let's remember our 20 projects. If we have a situation when everyone wants to run autotests at the same time, in order to avoid a queue, at least 60 sessions with tests should be raised.
  6. 40 still rise, given the fact that 2 sessions per virtualku.
  7. And the rest will be in the queue - at least 10 minutes.

Notice, we have considered a very positive case, when there are few tests in a project, and only 3 streams. Iron is not enough, you need to think about how to ease the load on the virtual. What if we move from virtual machines to docker containers?

Counted:


Let's look at the configuration of one docker-container, which will allow us to run tests into 1 stream, and compare with what we had when using virtual machines:
500 RAM, 0.01% core, and HDD 400 mb.



It turns out that at one point in time we can create 120 containers!

This not only covers our requests in 60 sessions, but also insures for the future. After all, the number of teams is growing, which means that the number of projects launched is also constantly growing. So, it became quite obvious that we need to take the available resources and combine them into a single computing power space, this is also called the sandbox. Combining, we do not want to think about it in the paradigm of some hosts / virtual machines. We just want to have a space to which we can connect using some api, and create our own docker containers in it, on which we will then run tests.

Dynamic sandbox


So, we need to create a sandbox for computing resources. However, it should be dynamic: i.e. We should be able at any time to connect / disconnect from it the resources that we have. Moreover, all the hosts that we connect can have different configurations and be on different subnets, for us it’s just the main thing that between them it was possible to establish communication over certain ip and ports. A dynamic sandbox is also called a cloud or cluster, and in it we have an interface for creating and managing docker containers.

When we understood how we wanted to solve the problem, we built our sandbox by combining our hosts into a cluster using Apache Mesos and Marathon.



Thus, we get a common space with computational resources, which has its api. The API is provided by Marathon, and Apache Mesos unites the hosts.

Test orchestrator: Selenium grid to the rescue


We decided that we need a cluster, and even created it. But the question is, how are we going to run tests in a cluster? You remember that in any case we want to receive test results in no more than 10 minutes?

And here the parallelization of test run should come to our aid.

To solve this problem, we need a centralized tool that will allow running and parallelizing tests in several threads for each project. There are several popular tools.


Although my story is about how we ran the selenium grid in docker containers - first we will look at how the grid works in virtual machines.



In fact, the whole procedure consists of 3 actions:

1. We copy Selenium Standalone Server (the version we need) to some directory.
2. Then we execute the command that launches this server in the mode we need: hub or node mode. Please note that the same physical jar-nickname that you duplicate to different hosts is responsible for these two functions.

$ java -jar selenium-server-standalone.jar -role hub 

3. Configure the node. Either through the command line, or in the json-file we specify a set of browsers and their parameters.

 $ java \ -jar selenium-server-standalone.jar \ -role node \ -hub http://host1:4444/grid/register 

What makes the hub after the start of the grid



What does the node



What is the difference between starting grid in docker containers?


1. The node at the time of start is already configured.

Let's look at the contents of the node. The json-config file for the node is in the container with it, then we rename it, and our server will learn about its parameters from this file:

 /opt/selenium/generate_config > /opt/selenium/config.json 

Moreover, if we look at the contents of the Dockerfile node itself, we will see that when we configure the node environment, we immediately set the environment variables, which are then written to this config. Thus, we don’t need to go into the “guts” of the container itself to change the launch parameters of the node, we just need to override the values ​​of the specified variables in the Dockerfile. And that's all.

2. When we start a node in a container, we can always be sure that our environment already has a browser and a driver for it. Because all this is configured and installed at the time of the assembly of the image itself.

 $ /opt/selenium$ ls chromedriver-2.29 selenium-server-standalone.jar config.json 

3. We also have a sh script that runs after the container has started. And in this script we see that after the container has risen - our java server starts right away.

 $ java ${JAVA_OPTS} -jar /opt/selenium/selenium-server-standalone.jar \ -role node \ -hub http://$HUB_HOST:$HUB_PORT/grid/register \ -nodeConfig /opt/selenium/config.json \ ${SE_OPTS} & 

Similarly, all in relation to the hub.

As a result, the launch of the selenium grid in the container is reduced to one team - the start of the docker container.

Static grid problem


Despite the fact that the hub is well able to work with queues and timeouts, at the very beginning of using a static grid, we experienced problems due to timeouts. If the hub and node were not used for a long time, then during the subsequent connection we caught situations when, when creating a session at the node, this very session fell off precisely because of time-outs or because remotewebdriver could not lift the browser. And all these problems were treated with a grid restart, it was then that we realized that for us on-demand the selenium grid would be the solution.

We also didn’t want the static grid to just occupy a place in a cluster that is already small in our case. How to solve the situation when for different projects we need different grid configurations? When for one project need one version of the browser, for another - another? Obviously, keeping grids on is not a good idea.

Selenium Grid On-Demand


Therefore, we wanted to raise the selenium grid on request: I will explain with an example


It would seem an ideal concept. We use this approach to solve two problems at once: both with the degradation of the grid, and with the lack of space in the cluster to store various configurations of the grid.

Automation of the creation of Selenium Grid On-Demand


To solve this problem, it was necessary to write an automated grid creation script. We solved it with the help of ansible, having written the necessary roles. I will not tell what is ansible. But I can say that you can also write such a script in bash-e or in another programming language, which gives you two commands to create and delete a grid.

Remember that starting a grid consists of running a couple of commands. And each team has its own parameters. And in order to automate the launch of these commands, these parameters need only be automatically calculated before the command is launched. Or hardcode.

We cannot hardcode, because we a priori do not know on which host and port the components of the Selenium Grid go up, since Apache Mesos decides for us.

Of course, we can dodge and manually monitor the open ports and hosts on which we are raising the Selenium Grid, but then why do we need Apache Mesos and Marathon at all if we do everything manually?

So, it was necessary to automate the calculation of the following parameters:


Api Marathon helped us in this, and with its help we obtained data on which host and port the hub went up to. And then this value was transferred before the start of the node. So, what we have:

Deploy Selenium Grid
 $ ansible-playbook -i inventory play-site.yml \ -e test_id=mytest \ -e nodes_type=chrome \ -e nodes_count=4 

test_id:
nodes_count:
nodes_type: [chrome|firefox]

Delete Selenium Grid
 $ ansible-playbook -i inventory play-site.yml \ -e test_id=mytest \ -e clean=true 

Shell scripts executed on Jenkins, before running the ansible playbook, are calculated automatically and pass the value of the variable. The test run is built into the pipeline using job dsl.

 export grid_name=testproject export nodes_count=$(find tests -name "*feature" \ | grep -v build | grep -v classes | grep features | wc -l) cd ansible ansible-playbook -i inventory play-site.yml \ -e test_id=$grid_name \ -e nodes_type=chrome \ -e nodes_count=$nodes_count export hub_url=$(cat hub.url) currentdir=$(pwd) cd ../tests ./gradlew clean generateCucumberReport \ -i -Pbrowser=$browser -PremoteHub=$hub_url 

As soon as we solved this problem and learned to raise the selenium grid in our cluster, we hurried to run the tests, and this was where we were disappointed. Tests do not run, moreover - the hub does not even raise the session with the node.

The problem of raising Selenium Grid On-Demand in a distributed cluster


Let's see what our scripts lacked.

Take another look at what the command would look like if we ran the nodes in the Docker container for the selenium grid every time:

 $ docker run -d -p 6666:5555 selenium/node-chrome 

Do you see two ports? Probably some of you wondering where the second port came from. So, the docker has an internal port and an external port. The external port listens to the container itself. And the internal port is monitored by the selenium server standalone process itself, which runs in the -node mode.

In this example, all requests for port 6666 of the container will be forwarded to port 5555 of the node inside it.

Running a node in Marathon


When configuring an Apache Mesos cluster, we specify a range of ports for each host. This range is used for containers that are lifted by Marathon.

For example, if we set a range of 20000-21000, then our containers will receive a random port from this range.

A marathon agent runs something like this.

 $ docker run -d -p <?>:5555 selenium/node-chrome 

When the container is launched, it selects the next free port and substitutes it for the question mark. Thus, at the time of the start of the node in the network bridge mode, we have a mapping of ports.

 $ docker run -d -p 20345:5555 selenium/node-chrome 

Marathon starts a container on a random host and a random port.

The node sends the wrong coordinates.


Docker containers, by default, run in bridge mode. What does this mean for us? And the fact that the node will not see your real IP and port! Suppose that Apache Mesos has raised to us a node on host 192.168.1.5 and port 20345. But the process of the node in the container will think that it goes up on some 172.17.0.2; and its port is 5555.

 host = 172.17.0.2 port = 5555 

And she will register on the hub with the return address. Naturally, the hub at this address will not find it. And when running tests, the hub will not be able to raise the browser session.


Solving the problem of registering nodes on the hub


But there is also a host mode. When a container uses the host ports directly and there is no such thing as an internal port.

When we thought about solving this problem, naturally, we thought, why do we need to start the container and at the same time create a network bridge, and why not use the host mode? We indicate one port on which we rise, and the container, and the selenium server immediately looks at it.

But it was not there. In order for our tests to be performed in a docker-container, which as such has no display, we also need to take screenshots, we use an xvfb-server, which also occupies a certain port when the container starts. By the way, so the host mode does not suit us at all. We'll have to somehow twist the bridge mode.

Container environment variables


When Marathon started the container, it sets the actual host and ports on which it picked up the container in the environment variables of this container.

That is, the container has the values ​​of the variables HOST and PORT0.
This means that inside the container there is information on which host it is deployed on and what external ports it has.

In order for us to get everything working, it is necessary that the values ​​of the host and Port variables sent in the registration request contain the values ​​of the container's HOST and PORT0 variables.

 { … "host": "$HOST", "port": "$PORT0", … } 

The HOST parameter is easy to specify - Selenium has a special setting.

With port harder. If you transfer this PORT0, then Selenium will not only register with it on the hub, but also rise on it! Why is this a problem?

For example, Apache Mesos gave us an external port 20765. At the start of the container it makes the mapping: 20765: 5555. The second number we ask immediately, hard, in the config. And the docker will expect that inside the container the node will hang on 5555. And it will forward connections from the external port 20765 there.

But if we pass the -port 20765 parameter to the node, then it will listen to 20765 from the inside! Not 5555. And all requests from the outside will not be processed.


You may have already guessed that the problem can be solved by dividing the port concept into two separate ones. The port on which the node rises, and the port, which it must inform the hub. In the docker-environment, these values ​​usually do not match.

How to tell the node about these ports?
No

Out of the box Selenium Standalone Server does not know how.
Need to patch Selenium.

Patches Selenium Server


The code for Selenium itself is on GitHub. And we decided to add some more ... wonderful code to the selenium standalone server.

Added advertisePort parameter.

 @Expose( serialize = false ) @Parameter( names = "-advertisePort", description = "<Integer> : advertise port of Node. " + "This port is sent to Hub for backward communication with this node." ) public Integer advertisePort; 

And the condition in the registration method on the server.

 if (registrationRequest.getConfiguration().advertisePort != null) { registrationRequest.getConfiguration().port = registrationRequest.getConfiguration().advertisePort; } 

Now, if the advertisePort parameter is set when the node is started, then it is used instead of the standard port during registration on the hub. This is a local patch, we have not done a pull request to the selenium repository yet. When we run to the end of our scheme, let's do it.

With this parameter, nodes are correctly registered on the hub. Checked works. Tests are run.

And yes, we used Marathon, as it is used by our developers. This is essentially a proof of concept. But in general, this framework is not ideal for running the selenium grid, as it is focused on long running tasks. Such as services, UI-applications.

findings


In a dynamic organizational environment, dynamic resource management is required. Statics will break about process problems.

Therefore, our test run system consists of the following components:


We did not need additional funding. And we accelerated the test run not even up to 10 minutes, but up to 5 minutes. The average metric for our projects began to equal exactly 5 minutes. 2 minutes for all procedures for lifting / removing a grid, project assembly, etc. And 3 minutes to complete test suites.

Was the result of the effort worth the effort? Of course, because in the dry residue, we accelerated the test run at least twice.

If you do not like queues too much and are trying to speed up the run of tests, perhaps our experience will be useful to you.

By the way, if from the posts about testing your heart beats more often and there is a desire to do something like this - please note that we have a vacancy for the tester.

And if there are any questions and clarifications - be sure to write in the comments.

Source: https://habr.com/ru/post/331434/


All Articles