Java recently turned 20 years old. It would seem that today everything is written on java. Any idea, any project, any tool in java? - it is already there. Especially when it comes to such trivial things as a pool of connections to the database, which is used by millions of developers around the world. But it was not there! Meet the -
HikariCP project - the fastest connection pool to date in java.
HikariCP is another vivid example of the fact that it is always worth to question the effectiveness of some decisions, even if they are used by millions of people and they live for decades. Hikari is a great example of how micro-optimizations that, individually, can never give you more than 0.00001% gain, all together make it possible to create a very fast and efficient tool.
This post is a free and partial translation of the article
Down the Rabbit Hole by HikariCP mixed with the flow of my mind.
')

Down the Rabbit Hole
This article is a recipe for our secret sauce. When you begin to look at all sorts of benchmarks, you, like a normal person, should have a healthy share of skepticism towards them. When you think about performance and the pool of connections, it's hard to avoid the insidious idea that the pool is the most important part of it. In fact this is not true. The number of
getConnection () calls in comparison with other operations of a typical JDBC is quite small. A huge number of performance improvements are achieved by optimizing wrappers around
Connection ,
Statement , and so on.
In order to make HikariCP fast (which it is), we had to dig down to the bytecode level and below. We used all the tricks we know to help JIT. We studied the compiled bytecode for each method and even modified the methods to fit the inline limit. We reduced the number of inheritance levels, limited access to some variables to reduce their scope, and removed any type conversions.
Sometimes, seeing that the method exceeds the inlineing limit, we thought about how to change it in such a way as to get rid of several byte instructions. For example:
public SQLException checkException(SQLException sqle) { String sqlState = sqle.getSQLState(); if (sqlState == null) return sqle; if (sqlState.startsWith("08")) _forceClose = true; else if (SQL_ERRORS.contains(sqlState)) _forceClose = true; return sqle; }
A fairly simple method that checks if there is a connection loss error. And now baytkod:
0: aload_1 1: invokevirtual #148
Probably it’s no secret to anyone that the inlining limit in the Hostpot JVM is 35 bytecode instructions. Therefore, we paid some attention to this method to shorten it and changed it as follows:
String sqlState = sqle.getSQLState(); if (sqlState != null && (sqlState.startsWith("08") || SQL_ERRORS.contains(sqlState))) _forceClose = true; return sqle;
It turned out pretty close to the limit, but still 36 instructions. Therefore, we did this:
String sqlState = sqle.getSQLState(); _forceClose |= (sqlState != null && (sqlState.startsWith("08") || SQL_ERRORS.contains(sqlState))); return sale;
It looks easier. Is not it? In fact, this code is worse than the previous one - 45 instructions.
One more attempt:
String sqlState = sqle.getSQLState(); if (sqlState != null) _forceClose |= sqlState.startsWith("08") | SQL_ERRORS.contains(sqlState); return sqle;
Note the use of unary OR (|). This is a great example of sacrificing theoretical performance (as in theory || it will be faster) for the sake of real performance (as the method will now be inline). Result bytecode:
0: aload_1 1: invokevirtual #153
Just below the limit of 35 bytecode instructions. This is a small method and in fact is not even heavily loaded, but you get the idea. Small methods not only allow JIT to embed them in code, they also mean fewer actual machine instructions, which increases the amount of code that fits in the processor's L1 cache. Now multiply all this by the number of such changes in our library and you will understand why HickaryCP is really fast.
Micro optimizations
HikariCP has a lot of micro optimizations. Individually, they certainly do not make paintings. But all together greatly increase the overall performance. Some of these optimizations are fractions of a microsecond for millions of calls.
ArrayList
One of the most non-trivial optimizations was deleting the
ArrayList <Statement> collection in the
ConnectionProxy class, which was used to track open
Statement objects. When
Statement is closed, it must be removed from this collection. Also, if the connection is closed, you need to go through the collection and close any open
Statement and after that - clear the collection. As it is known,
ArrayList checks the index ranges for each
get (index) call. But, since we can guarantee the choice of the correct index - this check is superfluous. Also, the implementation of the
remove (Object) method performs a pass from the beginning to the end of the list. At the same time, the generally accepted pattern in JDBC is either to immediately close
Statements after use or in the reverse order of opening (FILO). For such cases, the passage that begins at the end of the list will be faster. Therefore, we replaced the
ArrayList <Statement> with a
FastStatementList in which there is no range check and the removal of elements from the list starts from the end.
Slow singleton
In order to generate proxies for the
Connection ,
Statement ,
ResultSet objects, HikariCP initially used the singleton factory. In the case of, for example,
ConnectionProxy, this factory was in the static field
PROXY_FACTORY . And there were several dozens of places in the code that referred to this field.
public final PreparedStatement prepareStatement(String sql, String[] columnNames) throws SQLException { return PROXY_FACTORY.getProxyPreparedStatement(this, delegate.prepareStatement(sql, columnNames)); }
In baytkode, it looked like this:
public final java.sql.PreparedStatement prepareStatement(java.lang.String, java.lang.String[]) throws java.sql.SQLException; flags: ACC_PRIVATE, ACC_FINAL Code: stack=5, locals=3, args_size=3 0: getstatic #59
You can see that the
getstatic call comes first to get the value of the static field
PROXY_FACTORY . Also note the last
invokevirtual call for the
getProxyPreparedStatement () method of the
ProxyFactory object.
The optimization was that we removed the singleton factory and replaced it with a class with static methods. The code began to look like this:
public final PreparedStatement prepareStatement(String sql, String[] columnNames) throws SQLException { return ProxyFactory.getProxyPreparedStatement(this, delegate.prepareStatement(sql, columnNames)); }
Where
getProxyPreparedStatement () is a static method of the
ProxyFactory class. And this is what baytcode looks like:
private final java.sql.PreparedStatement prepareStatement(java.lang.String, java.lang.String[]) throws java.sql.SQLException; flags: ACC_PRIVATE, ACC_FINAL Code: stack=4, locals=3, args_size=3 0: aload_0 1: aload_0 2: getfield #3
Here you should pay attention to 3 points at once. The
getstatic call no longer
exists .
invokevirtual has been replaced by
invokestatic , which in turn is better optimized by the virtual machine. And the last point, which is difficult to see - the size of the stack has decreased from 5 elements to 4. Since prior to optimization in the case of
invokevirtual, the stack should also
receive a reference to the
ProxyFactory object
itself . This also means an extra pop instruction for getting this link from the stack when
getProxyPreparedStatement () is called. In general, if we sum up, we got rid of access to the static field, removed the extra push and pop operations on the stack and made the method call more suitable for JIT optimization.
The end.
Full original
Down the Rabbit Hole .
UPDATE:
In the comments part of the article “Slow Singleton” caused a lot of discussion.
apangin argues that all these micro optimizations are meaningless and do not give any gain. The
comment is a simple benchmark of the same value
invokeVirtual and
invokeStatic . And
then the benchmark pool of classmates connections, which is supposedly 4 times faster than HickaryCP. To which the author HickaryCP gives the
following answer :
First I would like to comment on @odnoklassniki comment that their pool is 4x faster. I have added their pool. Here is the result vs. HikariCP:
./benchmark.sh clean quick -p pool=one,hikari ".*Connection.*" Benchmark (pool) Mode Cnt Score Error Units ConnectionBench.cycleCnnection one thrpt 16 4991.293 ± 62.821 ops/ms ConnectionBench.cycleCnnection hikari thrpt 16 39660.123 ± 1314.967 ops/ms
This is showing HikariCP at 8x faster than
one-datasource .
JMH test harness itself has changed. I checked out the results. I just checked it out. I ran both using the benchmark harness available at that time:
Before static proxy factory methods:
Benchmark (pool) Mode Samples Mean Mean error Units ConnectionBench.testConnectionCycle hikari thrpt 16 9303.741 67.747 ops/ms
After static proxy factory methods:
Benchmark (pool) Mode Samples Mean Mean error Units ConnectionBench.testConnectionCycle hikari thrpt 16 9436.699 71.268 ops/ms
It shows a slight improvement.
If you’re not sure, you’re not sure what you’ve seen.
EDIT: And wow has HikariCP performance improved since January 2014!