Experience in automating testing of the stability of the RTLS server operation under internal load

Introduction

In this article, I will talk about how automated quality testing of the RealTrac server’s stability occurs at the RTL-Service’s quality control department while simultaneously servicing a large number of mobile radar devices. For further understanding, I suggest to familiarize with useful terminology:
RealTrac RTLS-server (server) - RealTrac server software that interacts with system hardware and calculates the location of devices.

RealTrac Application Server (Application Server) is the server software necessary for the operation of a web application, providing a software interface for accessing the main functions of the system.

The RealTrac access point (hereinafter TD) is a device designed to transfer data between mobile network devices and a system server. Access points are permanently installed at the facility, their coordinates are recorded on the client software map and recorded in the database on the system server. TD can operate in a gateway or repeater mode. The mode is determined by the presence of a wired Ethernet connection to the network (gateway access point, STD) and the lack thereof (repeater access point, RTD). Data exchange with the server is carried out only by the gateway. An example of an access point is shown in Figure 1.
')
Fig. 1. Example access point.

A mobile device (MU) is a device that is a mobile radio node that allows you to determine in real time the location of a person or other object to which it is attached. Depending on the type of device, it can perform additional functions, such as audio transmission. An example of a mobile is shown in Figure 2.

Fig. 2. An example of a mobile device.

Survey cycle (alive cycle) - the period during which devices send live data about their condition.
Figure 3 shows the RTLS system architecture. In the zone indicated by the black line are the components requiring testing within this task.

Fig. 3. High-level architecture RealTrac.

The internal load on the server is characterized by devices from the RealTrac wireless segment that interact with the server, their number and the intensity of data exchange. Data is transmitted by internal protocol - INCP.
External load is a request for a public API - RTLSCP.

Formulation of the problem.

The task was to determine the stability of the server software during a long-term load of 2000 mobile devices with a polling cycle of 2 seconds, i.e. check whether there will be any freezes, crashes or errors in the software. It was also required to determine the consumption of processor and memory resources (max, min, average).

Initially, this problem was solved manually, but it quickly came to the realization that this task should be automated as soon as possible.
Inside the department, the problem was divided into the following subtasks:
1. Determine the approach to testing and tools.
2. Writing configuration files for the internal load generator.
3. Implementation of the testing system.
4. Automate the launch of tests.

Testing approach.

Due to the specifics of the problem, it was decided to implement its own small system for automated stability testing. The core of the test system is the application controller, which at certain points in time ssh sends and runs scripts on "slave" machines. Scripts perform two main functions: deploying the system to a remote machine and monitoring resources. Figure 4 shows the interaction scheme.

Fig. 4. The interaction scheme of test servers.

The next task is to automate testing cycles. To solve it, we did not reinvent the wheel and used our local build server. That is, the testing process itself will take place during assembly, and the assembly will take place according to a predetermined schedule.

Internal load generation

In order not to constantly use real devices for tests and to be able to load the server with any number of devices, an incptester application has been developed inside the company for emulating wireless segment devices - AP and MU.
From the point of view of the tester, it is enough to know only that incptester is configured using the configuration file. The file has the following form:

[general]
geo_lat=61.786838
geo_lng=34.353548
geo_alt=1

[tracks]
id=1,type=POLY,TF=0,VRT=(20:0:0)(21:0:30)(60)(40:0:0)(60)

[devices]
mac=CF0000000000,devtype=1,ip=127.0.1.1,cycle=30000,x=0.0,y=0.0,z=0
mac=C00000000001,devtype=2,ip=127.0.0.2,cycle=30000,x=5.0,y=0.0,z=0
mac=000000BAD001,devtype=4,cycle=2000,track=1
mac=000000BAD002,devtype=6,cycle=2000,track=1

In the [general] block, the geographic coordinates of the center point are determined, from which the local coordinates (x, y, z) will be counted. The [tracks] block is needed to describe the trajectories along which mobile devices will move. The trajectory is described by a sequence of points. In the [devices] block, the devices themselves are written, which will emulate the behavior of real devices.

In addition to the mac-address, each device has a devtype, a parameter that determines the type of device (1, 2 - stationary APs and 3-6 - MU). Also, there is a parameter cycle, which determines the cycle of polling devices. For stationary access points you need to specify a specific position through the coordinates. For mobile devices, you can set the trajectory along which they will move.

As a result, to generate the required load, you need to create a configuration file for the required number of devices for incptester.
In order not to manually register 2000 devices in the config, a small bash script was written, which adds the specified number of lines to the base file.

add_devices.sh

 #!/bin/bash set -e if [ "$#" -ne 2 ] then echo "Usage: ./${0} <base_config> <dev_num>" exit 1 fi INCPTESTER_CONF_PATH=./${1}.conf if [ ! -f ${INCPTESTER_CONF_PATH} ]; then cat ./incptester_geo-base.conf > ${INCPTESTER_CONF_PATH} fi alias get_next_mac='python -c "import sys; print '{:012x}'.format(int(sys.argv[1], 16)+int(sys.argv[2], 16)).upper()"' last_mac=$(tac ${INCPTESTER_CONF_PATH} | grep -m 1 . | grep -o -P "[A-z0-9]{12}") for count in $(seq ${2}); do next_mac=$(get_next_mac "0x${last_mac}" "0x1") echo "mac=${next_mac},devtype=6,cycle=2000,track=2" >> ${INCPTESTER_CONF_PATH} last_mac=${next_mac} done

Used software.

It was decided to implement the controller application in java. To describe the test scripts in the program, we used the cucumber library and, accordingly, the gherkin language. As a build tool, we took gradle. The build itself was built into the local Hudson server.

Since the system runs on Debian Linux, it is advisable to implement scripts to interact with the OS on bash. This includes installing / removing deb-packages and configs. To monitor current processes, we used the psutil package for python, with periodic unloading of values for consumed resources into csv.

The main script.

The scenario is divided into the following steps:
1. Remove previous packages from test servers.
2. Prepare configuration files for deb packages.
3. Copy configuration files and scripts to test servers.
4. Deploy the system to test servers.
5. Run incptester with the configuration for 2000 mobile devices on slave1.
6. Start the server with internal load for a certain amount of time on slave1 with parallel monitoring of resources.
7. Start the application server on slave2, configured on the main server, located on the slave with parallel monitoring of resources.

Implementation.

In this section, I will briefly present the main points of the testing system implementation.
The test script is easily converted into a feature of Gherkin. In the steps performed, you can set the parameters that will be passed to the executable method. It looks something like this:

@Load_stability_geo
Feature: Load_stability_geo
This test starts the large number of the devices and monitors the system resources

@Install
Scenario: Instalation of RealTrac system in geo mode
Given I delete the previous Realtrac-server from the both test-servers
And I prepare all deb configs
And I copy all configs and scripts to the test servers
And I install the main Realtrac-server on the test-server and incptester and stop service rtlserm for geo configuration
Then Run first part of the test with the inside load for 11520 steps
Given I install the app server
Then Run second part of the test with the inside load for 11520 steps

For the script, we write a separate class LoadStabilityGeo. The class will contain methods that perform the steps from the feature. An example of passing a parameter to a method. Parameter is parsed by regular expression.

 import rtls.test.utils.RTLSUtils; import cucumber.api.java.en.And; import cucumber.api.java.en.Then; import cucumber.api.java.en.When; public class LoadStabilityGeo { // some other methods @Then("^Run first part of the test with the inside load for (\\d+) steps") public void First_Monitoring(int count_time) throws ScriptFaildException, Exception { int times=0; double minutes; int check_time_min=10; //each 10 minutes the resources is verified monitoring_file=NameFileDataFormat("monitoring", "csv"); path_monitoring_file=" /home/"+user+"/TestResult/"; path_monitoring_file=path_monitoring_file+monitoring_file; minutes=0.5; while (times <= count_time) { run_ssh_cmd("resource_get.py "+path_monitoring_file+" rtls", "main_server"); If (((times%check_time_min)==0) && times!=0){ checkResource("rtls", rtlscp_port, rtlscpip, check_time_min); times=times+1; }else{ Sleep_time(minutes); times=times+1; } } System.out.println("The first part of the stress test in geo-mode is successfully finished"); } }

A separate class RTLSUtils was also written, containing static methods for working with scripts (execution / checking of results) and other general methods. An example of a method for running a command in the OS:

 public class RTLSUtils { // some other methods public static void executeCommand(String command) throws ScriptFaildException { try { String line; System.out.println("Excecute " + command); String[] env = new String[]{"DEBIAN_FRONTEND=noninteractive"}; Process p = Runtime.getRuntime().exec(command, env); BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream())); BufferedReader error = new BufferedReader(new InputStreamReader(p.getErrorStream())); while ((line = input.readLine()) != null) { System.out.println(line); } input.close(); while ((line = error.readLine()) != null) { System.out.println(line); } error.close(); try { if (p.exitValue() != 0) { throw new ScriptFaildException(new Exception("error to execute command " + command)); } } catch (IllegalThreadStateException ex) { } } catch (IOException ex) { throw new ScriptFaildException(ex); } } }

We now turn to the scripts. Bash scripts perform the simple function of removing and installing deb packages. An example of installing pacts on a remote machine via apt-get.

install_deb.sh

 #!/bin/bash set -e set -x #     SCRIPT_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" #  (     ) TEST_USER=user TEST_SERVER_IP=192.168.1.2 #     #    ,     if [ "${TEST_SERVER_IP}" == "127.0.0.1" ]; then COMMAND_PREFIX= else COMMAND_PREFIX="ssh ${TEST_USER}@${TEST_SERVER_IP}" fi APT_CONFIG_DIR="/home/${TEST_USER}/apt-get/" VERSION=$1 #  deb   if [ "${TEST_SERVER_IP}" == "127.0.0.1" ]; then if [ ! -d ${APT_CONFIG_DIR} ]; then mkdir ${APT_CONFIG_DIR} fi cp ${SCRIPT_PATH}/debconf.dat $APT_CONFIG_DIR else scp ${SCRIPT_PATH}/debconf.dat ${TEST_USER}@${TEST_SERVER_IP}:${APT_CONFIG_DIR} fi #   ${COMMAND_PREFIX} sudo debconf-set-selections ${APT_CONFIG_DIR}debconf.dat ${COMMAND_PREFIX} sudo apt-get update && true ${COMMAND_PREFIX} sudo DEBIAN_FRONTEND=noninteractive apt-get install --force-yes -y some-package-${VERSION}

Python script that performs process data upload to csv file.

resources.sh

 #!/usr/bin/python # -*- coding: utf-8 -*- # depends on; # sudo pip install psutil # Usage: python resources.py </path/to/file> <proc_name> import time import datetime import psutil import sys import csv def convert_bytes(bytes): '''         :param bytes: :return: ''' bytes = float(bytes) if bytes >= 1099511627776: terabytes = bytes / 1099511627776 size = '%.2fT' % terabytes elif bytes >= 1073741824: gigabytes = bytes / 1073741824 size = '%.2fG' % gigabytes elif bytes >= 1048576: megabytes = bytes / 1048576 size = '%.2fM' % megabytes elif bytes >= 1024: kilobytes = bytes / 1024 size = '%.2fK' % kilobytes else: size = '%.2fb' % bytes return size def get_string(proc, proc_name): '''    . :param proc:   :param proc_name:   :return:  ''' data_time = datetime.datetime.fromtimestamp(time.time()).strftime("%Y-%m-%d %H:%M:%S") ram = convert_bytes(proc.memory_info().rss) ram_percent = round(proc.memory_percent(),2) cpu = proc.cpu_percent(interval=1) wrt_str = ("{0} {1} {2} {3} {4} {5}".format(data_time, proc_name, proc.pid, ram, ram_percent, cpu)) return wrt_str if __name__ == "__main__": if (len(sys.argv) == 3): pathresult_log = sys.argv[1] target_proc = sys.argv[2] #     if (target_proc=="rtls"): res_log_rtls = open(pathresult_log, 'a') writer_rtls_log = csv.writer(res_log_rtls) elif (target_proc == "rtlsapp"): res_log_app = open(pathresult_log, 'a') writer_app_log = csv.writer(res_log_app) elif (target_proc == "all"): res_log_rtls = open(pathresult_log, 'a') res_log_app = open(pathresult_log, 'a') writer_rtls_log = csv.writer(res_log_rtls) writer_app_log = csv.writer(res_log_app) else: print ("Types of the test are not correct") sys.exit(1) proc_rtls = 0 proc_rtlsapp = 0 try: #     procs = [p for p in psutil.process_iter()] for proc in procs: if (proc.name() == 'java' and proc.username() == 'rtlsadmin'): proc_rtls = proc elif (proc.name() == 'node' and proc.username() == 'rtlsapp'): proc_rtlsapp = proc except psutil.NoSuchProcess: pass else: #     if (target_proc == "rtls" or target_proc == "all"): if (proc_rtls != 0): proc_name = "rtls-server" str_rtls = get_string(proc_rtls, proc_name) writer_rtls_log.writerow(str_rtls.split()) elif (target_proc == "rtlsapp" or target_proc == "all"): if (proc_rtlsapp != 0): proc_name = "rtls-app" str_rtlsapp = get_string(proc_rtlsapp, proc_name) writer_app_log.writerow(str_rtlsapp.split()) finally: #   if (target_proc == "rtls" or target_proc == "all"): res_log_rtls.close() elif (target_proc == "rtlsapp" or target_proc == "all"): res_log_app.close() else: print("Input parameters are not correct") sys.exit(1)

Results.

Test results are reflected in csv files. Example:

2016-03-04,11: 03: 55, rtls-server, 30237,1.29G, 32.71,167.8
2016-03-04,11: 04: 27, rtls-server, 30237,1.33G, 33.63,166.9
2016-03-04,11: 04: 59, rtls-server, 30237,1.34G, 34.0,172.8

Here, each line contains the date, time, process name, process identifier, percentage of memory used, and percentage of processor usage.
With this data it is already possible to analyze and build graphs. In addition, it is convenient to send the received data to any monitoring service with graphic dashboards, for example, to Grafana.

That's all. In the future, we plan to describe in more detail how the process of external load is implemented, as well as about testing the server in the aggregate of internal and external load.
The author: Nikita Davydovsky

Source: https://habr.com/ru/post/302138/

All Articles