
Some time ago, an unpleasant story happened to me, which served as a trigger for a small project on the githaba and turned into this article.
An ordinary day, a regular release: all tasks are checked up and down by our QA-engineer, so with the calmness of a sacred cow we “roll up” to the stage. The application behaves well, in the logs - silence. We decide to make a switch (stage <-> prod). Switch, look at the devices ...
')
It takes a couple of minutes, the flight is stable. A QA engineer does a smoke test, notices that the application somehow slows down unnaturally. Write off on warming up the caches.
It takes another couple of minutes, the first complaint from the first line: the data is loaded for customers for a very long time, the application slows down, it takes a long time to respond, etc. We start to worry ... we look at logs, we look for the possible reasons.
It takes another couple of minutes, a letter arrives from DB-admins. They write that the execution time of queries to the database (hereinafter referred to as the DB) has broken all possible boundaries and tends to infinity.
I open monitoring (I use
JavaMelody ), I find these requests. I launch PGAdmin, I reproduce. Really long. I add “explain”, I look at the execution plan ... so it is, we forgot about the indexes.
Why is code review not enough?
That case taught me a lot. Yes, I “put out the fire” for an hour, having created the necessary index directly on the sale, something like this (do not forget about the CONCURRENTLY option):
CREATE INDEX CONCURRENTLY IF NOT EXISTS ix_pets_name ON pets_table (name_column);
Agree, this was tantamount to deployment with downtime. For the application on which I work, this is unacceptable.
I made conclusions and added a special bold item to the checklist for code review: if I see that one of the Repository classes was added / changed during the development process, I check the sql migrations for the presence of a script that creates and modifies the index. If it is not there, I write the author a question: is he sure that an index is not needed here?
It is likely that the index is not needed if there is little data, but if we work with a table in which the number of rows is counted in millions, the error of the index can become fatal and lead to the history outlined at the beginning of the article.
In this case, I ask the author of the pull request (hereinafter PR) 100% to make sure that the request he wrote in HQL is at least partially covered by the index (Index Scan is used). For this developer:
- starts the application
- searches for converted (HQL -> SQL) query in logs
- opens PGAdmin or another database administration tool
- generates in the local database, so that no one interferes with his experiments, the amount of data acceptable for tests (minimum 10K - 20K records)
- executes request
- requests execution plan
- carefully examines it and draws appropriate conclusions
- Adds / changes an index, achieving that the execution plan arranged it
- is written to the PR that the request coverage is checked
- expertly assessing the risks and the seriousness of the request, I can double-check his actions
A lot of routine actions and the human factor, but for a while it suited me, and I lived with it.
On the way home
They say it is very useful to at least occasionally walk from work, without listening to music / podcasts along the way. At this time, just thinking about life, you can come to interesting conclusions and ideas.
One day I went home and thought about what was that day. There were several reviews, each one I checked with the checklist and did a number of actions described above. I was so tired that time I thought, what the hell? Is it impossible to do this automatically? .. I quickened my step, wanting to “smash” this idea as soon as possible.
Formulation of the problem
What is the most important thing for a developer in execution plan?
Of course, seq scan on large amounts of data, caused by the lack of an index.
Thus, it was necessary to make a test that:
- It is executed on a DB with a configuration similar to Prodovsk
- Intercepts a database query made by the JPA repository (Hibernate)
- Get it Execution Plan
- Parsing Execution Plan, decomposing it into a convenient data structure for checks
- Using a convenient set of Assert methods, checks for expectations. For example, that seq scan is not used.
It was necessary to test this hypothesis rather by making a prototype.
Solution architecture

The first problem that had to be solved was the launch of a test on a real database, which coincides in the version and settings with the one used in the sale.
Thanks to
Docker & TestContainers , they solve this problem.
SqlInterceptor, ExecutionPlanQuery, ExecutionPlanParse, and AssertService are the interfaces I currently implement for Postgres. Plans to implement for other databases. If you want to participate - welcome. The code is written in Kotlin.
I put all this together on GitHub and called
checkinx-utils . You do not need to repeat this, it is enough to connect dependency to checkinx in maven / gradle and use convenient asserts. How to do this, I will describe in more detail below.
Description of the interaction of components CheckInx
ProxyDataSource
The first problem to be solved is interception of queries ready to be executed to the database. Already with the set parameters, without questions, etc.
To do this, you need to wrap the real dataSource into a certain Proxy, which would allow you to integrate into the query execution pipeline and, accordingly, intercept them.
Such ProxyDataSource has already been implemented by many. I used the
ttddyy ready-made solution, which allows me to set my Listener to intercept the request I need.
I substitute the original DataSource using the DataSourceWrapper class (BeanPostProcessor).
SqlInterceptor
In essence, its start () method sets its Listener to the proxyDataSource and starts intercepting queries, saving them in the internal list of statements. The stop () method, respectively, removes the installed Listener.
ExecutionPlanQuery
Here, the original request is transformed into a request for a execution plan. In the case of Postgres, this is the addition to the query of the keyword "EXPLAIN".
Further, this request is executed on the same database from testcontainders and a “raw” execution plan (list of lines) is returned.
ExecutionPlanParser
It is inconvenient to work with a “raw” execution plan. Therefore, I parse it into a tree consisting of nodes (PlanNode).
Let us analyze the PlanNode fields using the example of a real ExecutionPlan:
Index Scan using ix_pets_age on pets (cost=0.29..8.77 rows=1 width=36) Index Cond: (age < 10) Filter: ((name)::text = 'Jack'::text)
Assertservice
With the data structure returned by the parser it is already possible to work normally. CheckInxAssertService is a set of checks for the PlanNode tree described above. It allows you to set your own lambda checks or use the pre-specified, from my point of view, the most popular. For example, so that your request does not have a Seq Scan, or you want to make sure that a specific index is used / not used.
CoverageLevel
Very important Enum, I will describe it separately:
Further we will sort some examples of use.
Test examples using CheckInx
I made a separate project on the GitHub
checkinx-demo , where I implemented the JPA repository to the table pets and tests to this repository that check coverage, indexes, etc. It will be useful to look there as a starting point.
You may have such a test:
@Test fun testFindByLocation() {
The execution plan could be as follows:
Index Scan using ix_pets_location on pets pet0_ (cost=0.29..4.30 rows=1 width=46) Index Cond: ((location)::text = 'Moscow'::text)
... or like this, if we forgot about the index (the tests turn red):
Seq Scan on pets pet0_ (cost=0.00..19.00 rows=4 width=84) Filter: ((location)::text = 'Moscow'::text)
In my project, I most of all use the simplest assert, which says that Seq Scan is missing in the execution plan:
checkInxAssertService.assertCoverage(CoverageLevel.HALF, sqlInterceptor.statements[0])
The presence of such a test suggests that I, at a minimum, studied the execution plan.
It also makes project management more explicit, and the code's documentation and predictability increase.
Mode for experiencedI recommend using ScheckInxAssertService, but if there is a need, you can bypass the parse tree (ExecutionPlanParser) yourself or, in general, parse the “raw” execution plan (the result of the ExecutionPlanQuery).
@Test fun testFindByLocation() {
Connect to the project
In my project, I singled out such tests into a separate group, calling it Intensive Integration Tests.
It is easy to connect and start using checkinx-utils. Let's start with the build script.
First connect the repository. Someday I will load checkinx into maven, but now I can only download artifact from GitHub via jitpack.
repositories { // ... maven { url 'https://jitpack.io' } }
Next, add the dependency:
dependencies { // ... implementation 'com.github.tinkoffcreditsystems:checkinx-utils:0.2.0' }
We finish the connection by adding a configuration. Now only Postgres is supported.
@Profile("test") @ImportAutoConfiguration(classes = [PostgresConfig::class]) @Configuration open class CheckInxConfig
Pay attention to the test profile. Otherwise, you will find the ProxyDataSource in your product.
PostgresConfig connects several bins:
- DataSourceWrapper
- PostgresInterceptor
- PostgresExecutionPlanParser
- PostgresExecutionPlanQuery
- CheckInxAssertServiceImpl
If you need some kind of customization that the current API does not provide, you can always replace one of the beans with your implementation.
Known Issues
Sometimes the DataSourceWrapper fails to spoof the original dataSource due to the Spring CGLIB proxy. In BeanPostProcessor, in this case, it is not the DataSource that comes, but the ScopedProxyFactoryBean and there are problems with type checking.
The easiest solution would be to create a HikariDataSource for tests manually. Then your configuration will be as follows:
@Profile("test") @ImportAutoConfiguration(classes = [PostgresConfig::class]) @Configuration open class CheckInxConfig { @Primary @Bean @ConfigurationProperties("spring.datasource") open fun dataSource(): DataSource { return DataSourceBuilder.create() .type(HikariDataSource::class.<i>java</i>) .build() } @Bean @ConfigurationProperties("spring.datasource.configuration") open fun dataSource(properties: DataSourceProperties): HikariDataSource { return properties.initializeDataSourceBuilder() .type(HikariDataSource::class.<i>java</i>) .build() } }
Development plans
- I would like to understand whether it is necessary for someone other than me? To do this, create a survey. I will be glad to answer honestly.
- See what you really need and expand the standard list of assert methods.
- Write implementations for other databases.
- The construction of sqlInterceptor.statements [0] does not look very obvious, I want to improve.
I would be glad if someone wants to join and log out, playing in Kotlin.
Conclusion
I am sure that there will be comments:
it is impossible to predict how the query planner will behave on the sale, it all depends on the collected statistics .
Indeed, the planner. using the statistics collected earlier, can build a different plan from the one being tested. Meaning a little different.
The task of the scheduler is to improve rather than worsen the request. Therefore, for no apparent reason, he will not suddenly use Seq Scan, but you can unknowingly.
CheckInx you need to write a test, do not forget about studying the query execution plan and consider the possibility of creating an index, or vice versa, the test clearly shows that no indexes are needed here and you are satisfied with Seq Scan. This would save you from unnecessary questions on code review.
Links
- https://github.com/TinkoffCreditSystems/checkinx-utils
- https://github.com/dsemyriazhko/checkinx-demo
- https://github.com/ttddyy/datasource-proxy
- https://mvnrepository.com/artifact/org.testcontainers/postgresql
- https://github.com/javamelody/javamelody/wiki