Testing in Yandex: ObjectBuilders for describing and generating synthetic test data

Hello! My name is Denis Chernilevsky. In Yandex, I lead the display advertising system automation testing group. In the course of my work at Yandex and at previous places, I happened to lead teams of 10+ people, streamline processes, and come up with approaches to automate the testing of various systems. And it just so happened that in each of these projects we had to think about preparing test data. Following a fairly long reflection, an approach was invented that allows us to solve this problem in general form and apply it in different projects. Besides the fact that I will talk about it on the Test environment , I decided to give details here.

By the way, if you can’t come to our event for testers, you can watch the broadcast , which will begin tomorrow, on Saturday, November 30 at 11:00.

')
This article is based on the experience of solving the problem of preparing complex sets of synthetic test data in the process of automating the testing of the Yandex display advertising system. Of course, we are not the first to face such a task, therefore, first we analyzed the existing approaches and solutions. As a result, the ObjectBuilders library (in Python) has become a solution, which can be applied in projects where it is necessary to create hierarchically related data sets. It allows you to set their relationships, parameters and properties. And also gives several bonuses as side effects.

Below I will try to describe and show the work of this tool in understandable examples, but first, in more detail, about the problem being solved and possible alternatives for its solution.

Task: it is necessary to test the display advertising system

So, “what to do if for the test I need to create in the system 30 (50, 100 ...) objects connected with each other from many to many, many to one, one to many, one to one, and having 10 ( 20, 40 ...) properties each? "

I’ll just make a reservation here: it’s about the establishment of a functional blackbox testing - without unit tests or load.

So, how are we going to solve it. Okay, so it should be. What is our system? Yeah, a database with data on advertising campaigns, a back-end for an advertising competition and statistics calculation, the front-end accepts http requests and gives an answer in a certain format.

We will build infrastructure, stands, adjust CI and write tests, say, in Python and everything will be fine! Once Python, then for tests we will choose PyTest: convenient, beautiful, reports in JUnit format and easy extension of functionality in the form of plug-ins. Everything, you can write tests! Actually, of course, a couple of person-years left for all of the above, but this is not about us now.

It seems everything is simple: they started one or several advertising campaigns with different parameters, launched the system, pulled the http request, got an answer, checked that they were giving what they expected and, maybe, checked some side things: logs, statistics, etc. d.

Passed day ~~week~~ in attempts to force the system to show at least one banner. It turns out that in order to create one advertising campaign in the system and to show at least one banner, it is necessary to score 30 objects with 15 parameters each into the database, so that they are correctly connected to each other and that different test data do not affect each other. Well, it is also desirable that the test be run several times in a row!

But after all, we also need different advertising campaigns in each test, and even a few, and with different settings.

And here it seems that we hit. But you have to do something.

Thinking, we have compiled the following list of requirements for the creation procedure and the test data itself:

Created data must ensure the correct operation of the system.
When changing the logic of connections or parameters of objects (for example, when developing a new feature), it should be possible to easily and quickly repair ALL tests that use these objects — that is, need easy test support in the future.
The data of one test should not affect the behavior of another test.
It should be possible to restart the test several times. Therefore, the data must either be returned to its original state each time, or be generated anew.
According to the description of these data, it should be possible to understand exactly how they differ from all other cases and why such settings are necessary for this test.

Solutions

It is impossible to take existing data from the production system.

They may change, the test will work unpredictably; It is difficult to notice these changes (the test will continue to work, but it will check not what is needed).
They may disappear - the test will stop working altogether.
One test can change the data used by another test.
In general, it is not clear what and how is used by each test.
The only advantage of this solution is the lack of costs for the actual preparation of data.

The option to prepare the base in advance with the data for each specific test is also not suitable.
Difficult: too many parameters and objects

You can easily make mistakes when creating data, and then the error will be difficult to find.
It is difficult to understand exactly which parameters in a particular data set influence the test behavior (it is difficult to understand what the test checks and why)
It will not be possible to restart the test - it can change the parameters of its initial data set, you will have to reload the data each time you start
When changing the logic of the product being tested, you will have to change each object touched several times by hand, as it may occur in different tests.
Compared to option 1, this solution has one plus: independent data can be prepared, so that tests do not affect each other.

We refused these options right away, as they violate most of our demands. A good idea to use the Builder pattern. Consider its pros and cons in more detail.

Builder pattern, or why we didn’t use it

Builder is a design pattern that allows you to separate the construction of an object from its presentation. I will not describe it in detail. Let me just say that his main idea is to create classes that allow you to create objects of other classes through a clear interface. Thus, the builder encapsulates the entire logic of creating and configuring objects.

From the examples on the wiki, it is clear that this approach satisfies all our requirements:

Do the data created ensure the correct operation of the system? Yes, correctly describing the configuration logic of objects in the builder, you can make sure that any created objects have the right connections and parameters.
Is it possible when changing the logic of connections or parameters of objects to quickly and easily repair ALL tests using these objects? Yes, due to the encapsulation of the creation logic of objects in the builder, we can change this logic only in one place, that is, we select it in a separate layer.
The data of one test does not affect the behavior of other tests? Yes, because with the help of the builder at the beginning of each test we generate our own set of objects, and not use any existing ones.
Is it possible to restart the test several times? Yes, inside the builder you can implement the logic of creating unique objects, for example, giving them unique names or IDs each time.
It is possible, by the description of these data, to understand how they differ from all other cases and why exactly such settings are necessary for this test? Mmm, almost. Creating some objects will look like calling a set of methods using a builder. With their proper naming, it will be possible to intuitively understand what they customize.

Fine!

Moreover, the Builder can be used to create Composite objects. That is, entire graphs of related objects that can be interpreted as a single logical object. We just need to create a multitude of objects, with different types of connections between each other and to be able to adjust any parameters of both the whole tree of objects and its individual components in a single manner.

It would seem that all is well, BUT these patterns have a drawback: for each type of objects, you need to implement your Builder and perhaps your Composite as separate classes. It is also necessary to describe the methods that allow modifying the objects that we are going to build, their connections and properties.

Being lazy people, we thought about how it would be nice to somehow formally describe the data model, on the basis of which Builder will work and be able to easily modify it.

As a result, a small library called ObjectBuilders in Python was born. It allows you to create graphs of related objects, manage the connections and properties of the objects themselves, remember the modifications applied to the default configuration in the form of patches and reuse them in the future.

A bit of theory and description of the approach

As described above, our system under test operates with a set of interconnected objects with a certain set of parameters (this is how most systems work). These objects and their connections can be represented as a graph or tree (vertices are objects, edges are connections between objects). Actually, our task is to construct a graph of objects and their parameters for each specific test.

Here the following fact will help us: if you write your tests carefully and test only a small piece of functionality with each test, then in most tests the graphs will coincide with the accuracy of the tested parameter / connection.

Moreover, in your test, you should always set the values of all parameters that affect its behavior, we do not care about all the other parameters - the main thing is that it works. If this is not the case, and there are still influencing parameters, see the previous paragraph - you must set them! There will be a lot of influencing parameters if your test checks a lot of functionality at once, and few if you check a specific piece of functionality.
All this means that we can fix in advance some of the most frequently used template of the graph and parameters, and only “twist” (patch) it a little in each test.

Plus, it would be great to accumulate knowledge about configurations and reuse “patches” on the graph in various combinations, i.e. to accumulate knowledge about the system, which will allow in the future (for those who have forgotten or for those who did not know) to understand how the system operates on the data supplied to it.

In fact, we decided to turn the Builder pattern “inside out” and began to describe not the creation of each specific object and then build an associated graph from them, but began to build the object graph itself and then apply modifications to this graph and its vertices and connections.

A simplified example is a car factory

The initial task of testing an advertising system is rather cumbersome and has many terms that most people don’t understand, so we’ll use the following example to further illustrate the work of our library.

Suppose there is a car factory. It is necessary to test it, that is, to submit the configuration of the car to the control computer of the conveyor, which must be assembled and see what will be output.

We will feed the various car configurations. Each configuration consists of the following parts: chassis (passenger, all-terrain), engine (type: diesel or gasoline and volume), wheels (number of wheels, diameter, cast or stamped), body (sedan, coupe, convertible, all-terrain vehicle), gearbox (manual, automatic). Ideally, we need to check all possible correct combinations (except those prohibited by TK).
It is clear that the plant can produce different cars, but some kind occurs more often than others. Based on the general logic and market requirements, it would be logical to assume that most often we will produce cars with 4 wheels, sedans, 15 "forged disks, with a 1.6 petrol engine and a manual gearbox.

At least 4% of them will have 4 wheels and a sedan body! This will be our basic configuration. The remaining parameters are selected on the same principle.

It is also worth considering the amount of functionality of each configuration: that is, if the most checks will fall on some configuration, you should choose it as the base one. In other words, if you, despite market requirements, are going to test an SUV in the tail and in the mane, then this configuration will be your basic one, since it will most often occur in your tests.

Next we only need to change some of the parameters of the basic configuration for different cars. Let's see how these principles are implemented in our library.

Implementing and using the ObjectBuilders library

Our tool works on 2 levels:

Constructs - provides a tool for setting up links between objects and generating the objects themselves.
Modifiers - provides a tool for describing and applying patches (modifiers) on the object graph.

Modifiers, in turn, are divided into 2 types:

InstanceModifiers - object modifiers.
ConstructModifiers - Construct modifiers.

For example.

class Bar: bar = 1 class Foo: baz = 10 bars = Collection(Bar) my_foo = Builder(Foo).withA(NumberOf(Foo.bars, 5)).build()

Here the construction bars = Collection(Bar) is level 1. We say that the class Foo can contain N Bar objects. These objects can be accessed via my_foo.bars[i] . And the NumberOf(Foo.bars, 5) construction NumberOf(Foo.bars, 5) is level 2. We want to get a Foo object with 5 Bar objects inside.

By default, a graph of 2 objects will be created: Foo and a nested collection of Bar objects, consisting of one element.
NumberOf(Foo.bars, 5) is the same modifier that can be applied to the graph so that 5 Bar objects are embedded in the Foo object.

Let's return to our car factory.

Object model

To begin with, we need to describe the classes of objects and their properties that our factory operates with.

 CHASSIS_LIGHT = 0 #  CHASSIS_HEAVY = 1 #  ENGINE_PETROL = 0 #  ENGINE_DIESEL = 1 #  WHEEL_STAMPED = 0 #  WHEEL_ALLOY = 1 #  BODY_SEDAN = 0 #  BODY_COUPE = 1 #  BODY_CABRIO = 2 #  BODY_HEAVY = 3 #  TRANSMISSION_MANUAL = 0 #  TRANSMISSION_AUTO = 1 #  # class Chassis: type = CHASSIS_LIGHT # class Engine: type = ENGINE_PETROL volume = 1.6 # class Wheel: radius = 15 type = WHEEL_STAMPED # class Body: type = BODY_SEDAN number = ??? #      .         . #  class Transmission: type = TRANSMISSION_MANUAL # #       .  -   /    . class Spoiler: foo = None

Fine!

Here, the values with which the fields of our classes are initialized were just chosen based on the most common configuration. Now there are not enough connections between objects and dynamic parameters: you need to specify how the individual parts are assembled in the car and specify the parameters.

For example.

 chassis = Chassis() wheels = [Wheel() for _ in range(4)] engine = Engine() body = Body() ... chassis.wheels = wheels chassis.engine = engine engine.chassis = chassis chassis.body = body body.number = random() ...

For small graphs it is easy. In the case when there are several dozens of objects and many parameters in a graph, it becomes very expensive. To facilitate such a task, ObjectBuilders provides Constructs.

Constructs

Above was a short example of Foo-Bar, in which the Collection () construct was used.
In our library there are several different types of structures for different needs.

class Collection (typeToBuild, number = 1)

A collection of objects of type typeToBuild. After calling Builder.build (), this construct becomes a list of objects of type typeToBuild. Default number of objects = 1

class Unique (typeToBuild)

After calling Builder.build (), it becomes a unique object of type typeToBuild. Unique in the sense that even if in our graph there is already somewhere an object of type typeToBuild, we will still generate a new one.

class Reused (typeToBuild, local = False, keys = [])

In contrast to the Unique construction, if there is already an object of type typeToBuild in the graph, then it will be used, if not, a new object will be created. In this case, using the keys parameter, you can specify which fields of the typeToBuild class must match in order for the objects to be considered the same.

class Maybe (construct)

This construction suggests that the connection with another object (and this object itself, respectively) may or may not be present. By default, when you call Builder.build() converted to None. Enabled() modifier, which will be described below.

class Random (start = 1, end = 100500, pattern = None)

This construct at the build() stage is converted either into a random int from start to end, or, if a pattern is specified, into a string. The pattern must contain one% d marker, in place of which a random number from start to end will be substituted.

class Uplink ()

Allows you to configure the connection of objects in both directions, for example: foo.bar.foo = foo.

Let's try to rewrite our model using Constructs so that it contains information about the connection of objects with each other.

 # class Chassis: type = CHASSIS_LIGHT **engine = Unique(Engine)** **body = Unique(Body)** **wheels = Collection(Wheel, number=4)** **transmission = Reused(Transmission)** # class Engine: type = ENGINE_PETROL volume = 1.6 **transmission = Reused(Transmission)** # class Wheel: radius = 15 type = WHEEL_STAMPED **transmission = Reused(Transmission)** # class Body: type = BODY_SEDAN number = **Random()** #      .         . spoiler = **Maybe(Unique(Spoiler))** #     . - . #  class Transmission: type = TRANSMISSION_MANUAL # class Spoiler: foo = None

As a result, we described that:

By the chassis are attached: engine, body, wheels, transmission.
Several wheels are mounted - 4 (by default, but it will be possible to change it dynamically later).
The chassis, engine and wheels attach themselves to the transmission. And the same. This is indicated by the Reused construction.
A spoiler can be attached to the body (as an option, there is no default).
The body number will be generated during vehicle assembly.
All other parameters are set by default (but they can also be dynamically set later).

What to do now with this? Now we can get a full-fledged car in one call! True to the base.

 car = Builder(Chassis).build()

The object type is car - Chassis, therefore:

 >>>car.engine.volume 1.6 >>>car.wheels[0].radius 15 >>>car.body.spoiler None

Well, and so on.

That is, in addition, in fact, an object of type Chassis, all objects associated with an object of type Chassis will be created: Engine, Wheel x 4, Body, Transmission. The Spoiler object will not be created, since by default Construct Maybe () does not create an object, but is converted to None.

In general, we can start building a car from any part (the top of our graph), passing any of our classes to Builder (typeToBuild).
But since our graph has directed edges, a graph will be constructed only with those vertices that can be reached from the initial one.

In the current implementation, from the top of Chassis you can get to all of the tops, from the top of the Engine only to Transmission, and from the top of Body only to Spoiler.

That is, for example:

 >>>engine = Builder(Engine).build() >>>engine.trasmission.type == TRANSMISSION_MANUAL True >>>engine.transmission.engine AttributeError: Transmission instance has no attribute 'engine'

Only these two objects will be created.

In order to be able to start collecting a full-fledged car from any of the parts, it is necessary to hand over our communications bidirectionally.

 # class Chassis: type = CHASSIS_LIGHT engine = Unique(Engine) body = Unique(Body) wheels = Collection(Wheel, number=4) transmission = Reused(Transmission) # class Engine: type = ENGINE_PETROL volume = 1.6 transmission = Reused(Transmission) **chassis = Uplink()** **Engine.chassis.linksTo(Chassis, Chassis.engine)** # class Wheel: radius = 15 type = WHEEL_STAMPED transmission = Reused(Transmission) **chassis = Uplink()** **Wheel.chassis.linksTo(Chassis, Chassis.wheels)** # class Body: type = BODY_SEDAN number = Random() #      .         . spoiler = Maybe(Unique(Spoiler)) #     . - . **chassis = Uplink()** **Body.chassis.linksTo(Chassis, Chassis.body)** #  class Transmission: type = TRANSMISSION_MANUAL **chassis = Uplink()** **engine = Uplink()** **Transmission.chassis.linksTo(Chassis, Chassis.transmission)** **Transmission.engine.linksTo(Engine, Engine.transmission)** # class Spoiler: foo = None **body = Uplink()** **Spoiler.body.linksTo(Body, Body.spoiler)**

Now, when calling Builder (typeToBuild) .build (), we can build a car from any detail! Wherein:

 >>>engine = Builder(Engine).build() >>>engine.transmission.chassis.engine == engine True >>>engine.transmission.chassis.wheels[0].transmission.engine == engine True

Bilder, passing through the links (including Uplink'am), will create all the necessary objects, which simplifies the use of the model.
For example, if you use an object of type Engine, it is more convenient for you to use the Builder (Engine), but all the necessary objects will be created and connected with each other.

So, we looked at how you can conveniently describe the object model of our system under test by building a whole car with it. But this car is still in the base.

Modifiers

To modify the basic configuration of our objects graph, several types of modifiers are provided in the ObjectBuilders library. As mentioned above, modifiers are divided into two types: InstanceModifier and ConstructModifier.

InstanceModifier allows you to change the fields of objects that are represented by values, as well as perform some actions on the finished objects.
ConstructModifier allows you to change the parameters of the Constructs objects that are assigned to some fields of our classes.

Consider the list of all modifiers and their capabilities (their work will be demonstrated on examples below).

class InstanceModifier (classToRunOn)

Actually the first and only (so far) InstanceModifier level modifier. Allows you to change the values of fields of objects or perform over them some actions. When you call Builder.build() will be executed on each of the objects of type classToRunOn in our object graph.

It has two methods.

def thatSets(self, **kwargs) - allows changing the field values of objects. The arguments are the key = value pairs, where key is the name of the field in the object of the classToRunOn type, to which the value must be assigned.
def thatDoes(self, action) - allows you to perform some action on objects of type classToRunOn. action is a method that takes an object of type classToRunOn as an argument.

class Enabled (what)

ConstructModifier level modifier. Allows you to translate the Maybe construction into an active state so that when you call Builder.build () it starts creating an object. what - Maybe construction. For example Body.spoiler in our example about the car.

class Given (construct, value)

ConstructModifier level modifier. Allows at the stage of creating a graph to replace any Construct (what) with a specific object or value.

class HavingIn (what, * instances)

ConstructModifier level modifier. Used on Collection constructs (what), allowing you to add specific objects (* instances) to them.
Or, if one of the * instances elements is an int, then the size of the collection will increase by the value of this int. At the same time, the number of objects generated by the Collection by the construction of objects will decrease by the number of objects that we have explicitly added.

class NumberOf (what, amount)

ConstructModifier level modifier. Used on Collection constructs (what), allowing you to resize them to amount.

class OneOf (what, * modifiers)

ConstructModifier level modifier. Used on Collection constructs (what), allowing you to apply a set of modifiers * modifiers on one of the objects in the collection. Actually, at the moment - this is the whole set of available modifiers. They allow you to conveniently describe almost any modifications of the basic configuration of our graph.

Modifiers are applied to our object graph at the build stage by calling the Builder.withA (* modifiers) method.

Consider the work of modifiers in the examples.

Example 1. We want a car with a spoiler!

 spoiler_option = Enabled(Body.spoiler) car = Builder(Chassis).withA(spoiler_option).build() >>>car.body.spoiler is not None True

Example 2. We want a car with a diesel six-liter engine!

 big_diesel = InstanceModifier(Engine).thatSets(type=ENGINE_DIESEL, volume=6.0) car = Builder(Chassis).withA(big_diesel).build() >>>car.engine.volume == 6.0 True >>>car.engine.type == ENGINE_DIESEL True

You can do the same thing with InstanceModifier.thatDoes ():

 def make_big_engine(engine): engine.type = ENGINE_DIESEL engine.volume = 6.0 big_diesel = InstanceModifier(Engine).thatDoes(make_big_engine)

Usually, this is used if the parameters need to be somehow dynamically calculated at the test execution stage, since in the make_big_engine method you can perform any calculations or implement any conditions.

Example 3. We want 6 wheels and a heavy platform.

 six_wheeled_heavy_chassis = [NumberOf(Chassis.wheels, 6), InstanceModifier(Chassis).thatSets(Chassis.type=CHASSIS_HEAVY)] car = Builder(Chassis).withA(six_wheeled_heavy_chassis).build() >>>len(car.chassis.wheels) == 6 True >>>car.chassis.type == CHASSIS_HEAVY True

Here you should pay attention to the fact that the Builder.withA () method can accept as one Modifier object, an array of such objects, or an array with nested arrays of any depth.
Example 4 We want a powerful all-terrain vehicle with a large body and a bunch of wheels.

 rover_capabilities = [big_diesel] + \ six_wheeled_heavy_chassis + \ [InstanceModifier(Body).thatSets(type=BODY_HEAVY)] rover = Builder(Chassis).withA(rover_capabilities).build()

The key point is that we have reused the modifiers big_diesel and six_wheeled_heavy_chassis we have already written. So in real life - the more tests you write, the more you will have ready-made modifiers. This gives three advantages:

You can reuse modifiers and write tests faster, thinking about how to test, and not about how to prepare the necessary data
Correct modifier names make it easy to understand exactly what data is used in the test and how it is configured.
With changes in the system configuration (for example, the developers have made a feature in which the connections between objects change) you can easily repair all the tests in one place, having fixed only modifiers that have become wrong!

Example 5. We want to check how the all-terrain vehicle will go if one of the wheels has a radius of 14 ", one is 16", and 4 others - 15 ".

 def wheel_radius(radius): return OneOf(Chassis.wheels, InstanceModifier(Wheel).thatSets(radius=radius)) car = Builder(Chassis).withA(rover_capabilities) .withA(wheel_radius(14), wheel_radius(16)) .build()

Example 6. We still have 2 of the last unused types of modifiers: HavingIn and Given. They differ only in the fact that HavingIn is applied on the construction of the Collection, and Given on all other constructions in order to place the finished object instead of them.

Consider only havingIn example. Suppose we already have one wheel from the past car and we want to put it on a new one.

 wheel = Wheel() car = Builder(Chassis).withA(HavingIn(Chassis.wheels, wheel)).build()

The Given modifier works similarly.

BUT: you need to be careful with the use of these modifiers, since transferring the finished object there, no links will be created in it or something will change, it will be substituted into the as-is graph.

Consider the algorithm of the call Builder(typeToBuild).withA(*modifiers).build() :

A current object of type typeToBuild
If there are Construct modifiers in the * modifiers list, they are applied to Construct objects of the corresponding type in the properties of the current object
All Construct objects in the properties of the current object are converted into objects, in accordance with the rules of each individual Construct
If you have InstanceModifier(currentObjectType).thatSets() in the * modifiers list, they are applied to the current object
Recursively descend into all objects obtained from the converted Constructs, and repeat the algorithm from step 2
In the presence of InstanceModifier(clazz).thatDoes() in the list * modifiers, they are used by objects of type clazz in the constructed graph

That's all! With this simple library, we were able to greatly facilitate our lives and save a lot of man-hours of work.

The bottom line: what do we have and why is all this necessary?

ObjectBuilders provides the ability to easily and in a readable form describe the properties of the data described by the object model and dynamically generate this data.
The logic of connections between objects and their modifications are separated into a separate logical level, which allows you to reuse the previously described data configurations and, if necessary, easily repair the communication logic and data configuration in one place, instead of repairing it in all the places where the data were used.
Now, in tests, it is possible to describe only the properties that really affect the test behavior, without worrying about all the necessary "side" settings and objects, which makes it easy to understand the relationship between the data and the system behavior (tests in the test).

However, our approach at the current stage of implementation has its drawbacks:

In the case of large object models, it becomes quite time-consuming to correctly describe all relationships.
With the accumulation of a large amount of “packages” of modifiers, the temptation to write one's own increases, rather than re-using existing ones.
When simultaneously applying a large set of modifiers to a graph, it is difficult to notice / find data inconsistencies in case of an error.

The first minus is not so critical, because usually the model either changes very rarely or changes very little. The second minus is inherent in any developing piece of code (copy-paste lovers are everywhere). The third minus is gradually negated as the accumulation of ready (read "debugged") modifier packages, besides, the discrepancies will still become visible during the test run over the data obtained.

Even taking into account all the shortcomings, our approach saves time when writing tests, supporting tests, needing to figure out how this or that system behavior depending on data is configured. In general, everything that was said earlier was spoken about objects in memory when Python was started, which have nothing to do with the data actually entering the system. - — , .

, , XMLRPC API SQL Alchemy . ObjectBuilders JSON XML. ObjectBuilders, . , - . .

Github.
pypi.

Use on health!

Source: https://habr.com/ru/post/204192/

All Articles