How to make friends unit testing with a database

The story of how the system of automatic testing of methods interacting with the database was developed, with a detailed description of the pitfalls encountered in the development and implementation of the system in the project environment.

Introductory word

When we started working on a new project, the idea was to use the TDD approach and, accordingly, write unit tests for each of the components used. Since the whole project works closely with the database, then, of course, the question was raised about what to do with the components connected directly to the database, and how to test them. We did not find a quick way to solve this problem and decided to postpone this issue for later. But he did not cease to let me go for a long time and one day I managed to develop a framework within the framework of a working project that allowed us to quickly test the interaction with the database.

The testing system was developed within the framework and context of the working project, in which its own ORM database system is used, which accepts parameterized SQL queries and, on output, independently parses the resulting tabular data into objects. What imposes some restrictions in the development of testing systems.
')
I want to immediately say that I call the developed tests modular, but in fact they are integration tests that only have all the conveniences and advantages of modular tests.

Motivation

Why do we force ourselves to write unit tests? To improve the quality of the developed application. The more developed a project, the more likely it is to break something, especially during refactoring. Unit tests allow you to quickly verify the correctness of the system components. But writing unit tests for methods that interact with the database is not such an easy task, because For the test to work correctly, a customized environment is required, and more precisely, then:

Configured database server
Correctly configured database connection strings
Test database with all required tables
Correctly filled tables in a test database

And it all looks quite difficult, so most developers do not even think about it. Accordingly, my goal was to develop a system that would simplify the writing of tests to such an extent that they do not differ from the usual unit tests of other components of the project.

Description of the problem

Before starting a story about development, I would also like to tell you how, in most cases, such things are tested and what it can lead to.

1. Testing a query in the database

Before the developer wrote a method that interacts with the database, he writes a normal SQL query with parameters and checks its operation on his local copy of the database. This is the fastest and easiest way to check the request for errors. But it is worth noting that with this approach you can make a mistake during the transfer of the request to the project code or in the code that executes this request. As an example: you can accidentally skip the initialization of some parameters, which will lead to incorrect operation of the request.

2. Testing through a direct method call

Another simple way is to simply open a section of code that is always executed in our application, for example Main, call a method that interacts with the database, set a breakpoint and see the result ... Very similar to debugging through displaying informational messages on the screen.

3. Testing through the user interface of the application

Another very interesting and quite popular way to test the written method. Take the application, and perform a chain of sequential actions that lead to a call to your method that interacts with the database. The method is rather complicated and does not guarantee that the request is actually executed, because the application could, instead of executing it, access the cache or perform another operation, because this was indicated somewhere in the settings.

And all these difficulties lead to the fact that you have to spend a lot of time testing quite simple things. And, as you know, if it is checked for a long time, then most likely the developer is too lazy to test it, which will lead to errors. According to statistics, about 5% of errors associated with non-working database queries are registered in the bug tracker. But I also want to note that the majority of such errors are discussed verbally and often enough, and unfortunately, they cannot be taken into account in these statistics.

Development start

After I convinced you of the need for this system, you can proceed to the description of the process of its creation.

How to develop an automated testing system?

The first questions I have are how to develop such a system, what requirements should it meet? Therefore, I had to spend time researching the subject area, despite the fact that I constantly write unit tests. Testing system development is a completely different field. In this study, I tried to find existing solutions, to find possible ways to implement. But first I had to remember the principles of developing unit tests:

One test must match one scenario.
The test should not depend on time and various random variables.
The test must be atomic
Test run time should be short

After that, I needed to develop a testing process that would meet these requirements.

What testing process is suitable?

As a result of a long reflection, a simple 3-stage process was developed:

Initialization - at this stage, a connection was made to the database server, the loading of initialization scripts of the new database, as well as its analysis. After that, a new empty database was created, which will be used to run the test method. And of course, its initialization is a necessary structure.
Execution is a test method that meets the AAA approach.
Completion - at this moment the used resources are released: deletion of the used database, completion of open connections to the database server.

Diagram of the developed testing process

Thus, thanks to this process, we managed to satisfy the requirements that unit tests must meet. Since each test method will run on a dedicated database, then we can say that they are atomic and, accordingly, can be run in parallel.

Analysis of the developed system

Performance

Perhaps now, many might think that this approach will not work quickly on big data, and they will be right. The project is large, the database is large, contains many different tables and initial data, respectively, the initialization of the new database is prohibitively long. The table below shows the result for one test:

Initialization stage	Time, ms	Share
File upload	6	<1%
Script preparation	nineteen	<1%
Parsing script	211	one%
Execution of script commands	14660	98%
Total	14660

As you can see, it takes about 15 seconds to initialize one test database. But the project will clearly write more than one test. If we assume that about 100 tests are written in the project, then their total execution time will be more than half an hour! Immediately it became clear that such tests do not meet the basic principles - short execution time.

I had to sit down for the analysis of system performance. I have identified four main sections of the test initialization that can be optimized. As a result, I received a table that is presented above, and as can be seen from it, 98% of the time is spent on the stage involved in sending commands to the test database initialization script. We had two main ideas that would help correct this situation - the use of transactions and the use of only the tables we need from the database being tested.

The first option was to test a set of methods using a single database that was created in advance. Next, to perform each method, including the insertion of test data, it was surrounded by a transaction. After executing the test method, the transaction was rolled back and thus the database remained in its pure form ... had to remain in its pure form. But as it turned out, most of the tables that are used on the project use the MyISAM engine. Which is not known to support transactions. But this is not the only reason this option was dropped. As mentioned earlier, the test must be completely atomic, in order to be able to run tests in parallel. And since this approach relies on the use of one common database, this completely violates the atomicity of the test, which does not meet the above requirements.

The second option is that at the time of writing the tests, the developer himself indicates which tables he will need to perform this test. The system automatically found these tables in the database initialization script, as well as all the necessary dependencies, such as initialization data or additional tables linked by a secondary key. This approach ensured the atomicity of each test. After the new mechanism was implemented in already written tests, I also took a measurement of the average test initialization time, the result is presented in the table below:

Initialization stage	Time, ms	Share
File upload	6	one%
Script preparation	22	five%
Parsing script	254	62%
Execution of script commands	134	32%
Total	416

As you can see after this optimization, we managed to save 97% of the time! A good step towards quick tests for testing database queries. This table also shows that there is still room for optimization, but at the moment this test run time fully meets the needs and requirements.

Development of an automatic data generation system

With the performance of the system everything was fine, but for some reason most of the developers in the project still avoided this system. I had to understand why this happens. As it turned out, writing tests for such a system also took a lot of time and skill to fill all the necessary data. Just take a look at this code:

[TestMethod] public void GetInboxMessages_ShouldReturnInboxMessages() { const int validRecipient = 1; const int wrongRecipient = 2; var recipients = new [] { validRecipient }; //        ,    var message = new MessageEntity(); HelperDataProvider.Insert(message); //        var validInboxMessage = new InboxMessageEntity() { MessageId = message.MessageId, RecipientId = validRecipient }; var wrongInboxMessage = new InboxMessageEntity() { MessageId = message.MessageId, RecipientId = wrongRecipient }; //      HelperDataProvider.Insert(validInboxMessage); HelperDataProvider.Insert(wrongInboxMessage); //   var collection = _target.GetInboxMessages(recipients); Assert.AreEqual(1, collection.Count); Assert.IsNotNull(collection.FirstOrDefault(x => x.Id == validInboxMessage.Id)); Assert.IsNull(collection.FirstOrDefault(x => x.Id == wrongInboxMessage.Id)); }

The test is written using the AAA approach, the first part is creating new entities, linking them and inserting them into the database. Then the method being tested is called, and then the result is checked. And this is another simple case when we have a secondary key binding with only one table and there are no additional requirements for the filled fields. And now let's see the same example, but using the system of automatic generation of entities:

 [TestMethod] public void GetInboxMessages_ShouldReturnInboxMessages() { const int validRecipient = 1; const int wrongRecipient = 2; const int recipientsCount = 2; const int messagesCount = 3; var recipients = new [] { validRecipient }; //       DataFactory.CreateBuilder<InboxMessageEntity>() //     ,         .UseForeignKeyRule(InboxMessageEntity inboxEntity => inboxEntity.MessageId, MessageEntity messageEntity => messageEntity.MessageId) //        .UseEnumerableRule(inboxEntity => inboxEntity.RecipientId, new[] { validRecipient, wrongRecipient }) //  ,   N:N,             .SetDefaultGroup(new FixedGroupProvider(recipientsCount)) //   -        .CreateMany(messagesCount * recipientsCount) .InsertAll(); //   var collection = _target.GetInboxMessages(recipients); Assert.AreEqual(messagesCount, collection.Count); Assert.IsTrue(collection.All(inboxMessage => inboxMessage.RecipientId == validRecipient)); }

Just a few lines of setting up the generator and at the output we get a completely ready for testing database with all the necessary data. This system is based on the rules for entities, as well as on the basis of the grouping of these rules. This approach allows you to customize relationships between entities of the form N: N or N: 1. This system has the following rules:

DataSetterRule - allows you to specify specific values for one of the fields of the entity
EnumerableDataRule - allows you to set a list of values that will alternate for different entities. For example, the first value from the list will be set for the first created entity, the second value for the second, and so on. using cycling
RandomDataRule - generates a random value from the list of available ones; it is very convenient to use to generate big data in order to test a complex query for performance
UniqueDataRule - generates a random unique value for the specified entity field. This rule is good for the case when you need to create a set of entities in which a unique constraint is imposed on a column in a table.
ForeignKeyRule - the most useful rule allows you to link two entities. Setting for this rule the grouping of entities, you can as a result get connections between entities of the form N: N or N: 1

By manipulating these rules you can create different data sets. After the CreateMany or CreateSingle method is called to create an entity, the builder will go through all the necessary rules, fill the entity and then save it into a separate internal buffer. And only after the InsertAll method is called, the builder will send all the entities to the database. The scheme of work is presented below:

Implementation of the system in the project environment

Of course, the introduction of this system in the deployment process was mandatory, because few developers start tests manually. Running every time you deploy an application is the perfect way to solve this problem.

Unfortunately, this system is not recommended to be built into the assembly process, since the server that builds should not have access to test database servers, and the testing process itself is resource-intensive, so it was decided to postpone the process of running integration tests to a test environment. To do this, a separate deployment step was created to run the tests, with a set of scripts that automatically started the testing agent and analyzed the result of its work. The standard test agent from Microsoft, MSTestAgent, was used to run the tests. Writing scripts for analysis facilitated the fact that the test result file was recorded in XML format and therefore all the analysis of the results was reduced to a couple of simple XQuery queries. On the basis of the data obtained, reports were built that later were sent to the developers' mail or, if necessary, to the chat team.

The scheme of the test run and alert developers

Conclusion

During the development, we had to solve two serious problems that could force us to abandon the further use of this system: a long test execution time, and the complexity of test data initialization.

After the system of automatic testing of requests to the database was developed, it became much easier to write and test the code associated with requests to the database. The TDD approach can be safely applied not only for “classic” components, but also for components that interact closely with the database. After the integration of this system into the project environment, it became much easier to monitor the quality of the project, since the incorrect behavior of the components is detected during the project build and is immediately visible to all developers.

And finally, I would like to know how the data access level is tested on your projects?

Source: https://habr.com/ru/post/304322/

All Articles