📜 ⬆️ ⬇️

HikariCP - the fastest pool of connections in java

Java recently turned 20 years old. It would seem that today everything is written on java. Any idea, any project, any tool in java? - it is already there. Especially when it comes to such trivial things as a pool of connections to the database, which is used by millions of developers around the world. But it was not there! Meet the - HikariCP project - the fastest connection pool to date in java.

HikariCP is another vivid example of the fact that it is always worth to question the effectiveness of some decisions, even if they are used by millions of people and they live for decades. Hikari is a great example of how micro-optimizations that, individually, can never give you more than 0.00001% gain, all together make it possible to create a very fast and efficient tool.

This post is a free and partial translation of the article Down the Rabbit Hole by HikariCP mixed with the flow of my mind.
')
image



Down the Rabbit Hole



This article is a recipe for our secret sauce. When you begin to look at all sorts of benchmarks, you, like a normal person, should have a healthy share of skepticism towards them. When you think about performance and the pool of connections, it's hard to avoid the insidious idea that the pool is the most important part of it. In fact this is not true. The number of getConnection () calls in comparison with other operations of a typical JDBC is quite small. A huge number of performance improvements are achieved by optimizing wrappers around Connection , Statement , and so on.

In order to make HikariCP fast (which it is), we had to dig down to the bytecode level and below. We used all the tricks we know to help JIT. We studied the compiled bytecode for each method and even modified the methods to fit the inline limit. We reduced the number of inheritance levels, limited access to some variables to reduce their scope, and removed any type conversions.
Sometimes, seeing that the method exceeds the inlineing limit, we thought about how to change it in such a way as to get rid of several byte instructions. For example:

public SQLException checkException(SQLException sqle) { String sqlState = sqle.getSQLState(); if (sqlState == null) return sqle; if (sqlState.startsWith("08")) _forceClose = true; else if (SQL_ERRORS.contains(sqlState)) _forceClose = true; return sqle; } 


A fairly simple method that checks if there is a connection loss error. And now baytkod:

 0: aload_1 1: invokevirtual #148 // Method java/sql/SQLException.getSQLState:()Ljava/lang/String; 4: astore_2 5: aload_2 6: ifnonnull 11 9: aload_1 10: areturn 11: aload_2 12: ldc #154 // String 08 14: invokevirtual #156 // Method java/lang/String.startsWith:(Ljava/lang/String;)Z 17: ifeq 28 20: aload_0 21: iconst_1 22: putfield #144 // Field _forceClose:Z 25: goto 45 28: getstatic #41 // Field SQL_ERRORS:Ljava/util/Set; 31: aload_2 32: invokeinterface #162, 2 // InterfaceMethod java/util/Set.contains:(Ljava/lang/Object;)Z 37: ifeq 45 40: aload_0 41: iconst_1 42: putfield #144 // Field _forceClose:Z 45: aload_1 46: return 


Probably it’s no secret to anyone that the inlining limit in the Hostpot JVM is 35 bytecode instructions. Therefore, we paid some attention to this method to shorten it and changed it as follows:

 String sqlState = sqle.getSQLState(); if (sqlState != null && (sqlState.startsWith("08") || SQL_ERRORS.contains(sqlState))) _forceClose = true; return sqle; 


It turned out pretty close to the limit, but still 36 instructions. Therefore, we did this:

 String sqlState = sqle.getSQLState(); _forceClose |= (sqlState != null && (sqlState.startsWith("08") || SQL_ERRORS.contains(sqlState))); return sale; 


It looks easier. Is not it? In fact, this code is worse than the previous one - 45 instructions.
One more attempt:

 String sqlState = sqle.getSQLState(); if (sqlState != null) _forceClose |= sqlState.startsWith("08") | SQL_ERRORS.contains(sqlState); return sqle; 


Note the use of unary OR (|). This is a great example of sacrificing theoretical performance (as in theory || it will be faster) for the sake of real performance (as the method will now be inline). Result bytecode:

 0: aload_1 1: invokevirtual #153 // Method java/sql/SQLException.getSQLState:()Ljava/lang/String; 4: astore_2 5: aload_2 6: ifnull 34 9: aload_0 10: dup 11: getfield #149 // Field forceClose:Z 14: aload_2 15: ldc #157 // String 08 17: invokevirtual #159 // Method java/lang/String.startsWith:(Ljava/lang/String;)Z 20: getstatic #37 // Field SQL_ERRORS:Ljava/util/Set; 23: aload_2 24: invokeinterface #165, 2 // InterfaceMethod java/util/Set.contains:(Ljava/lang/Object;)Z 29: ior 30: ior 31: putfield #149 // Field forceClose:Z 34: return 


Just below the limit of 35 bytecode instructions. This is a small method and in fact is not even heavily loaded, but you get the idea. Small methods not only allow JIT to embed them in code, they also mean fewer actual machine instructions, which increases the amount of code that fits in the processor's L1 cache. Now multiply all this by the number of such changes in our library and you will understand why HickaryCP is really fast.

Micro optimizations



HikariCP has a lot of micro optimizations. Individually, they certainly do not make paintings. But all together greatly increase the overall performance. Some of these optimizations are fractions of a microsecond for millions of calls.

ArrayList



One of the most non-trivial optimizations was deleting the ArrayList <Statement> collection in the ConnectionProxy class, which was used to track open Statement objects. When Statement is closed, it must be removed from this collection. Also, if the connection is closed, you need to go through the collection and close any open Statement and after that - clear the collection. As it is known, ArrayList checks the index ranges for each get (index) call. But, since we can guarantee the choice of the correct index - this check is superfluous. Also, the implementation of the remove (Object) method performs a pass from the beginning to the end of the list. At the same time, the generally accepted pattern in JDBC is either to immediately close Statements after use or in the reverse order of opening (FILO). For such cases, the passage that begins at the end of the list will be faster. Therefore, we replaced the ArrayList <Statement> with a FastStatementList in which there is no range check and the removal of elements from the list starts from the end.

Slow singleton



In order to generate proxies for the Connection , Statement , ResultSet objects, HikariCP initially used the singleton factory. In the case of, for example, ConnectionProxy, this factory was in the static field PROXY_FACTORY . And there were several dozens of places in the code that referred to this field.

 public final PreparedStatement prepareStatement(String sql, String[] columnNames) throws SQLException { return PROXY_FACTORY.getProxyPreparedStatement(this, delegate.prepareStatement(sql, columnNames)); } 


In baytkode, it looked like this:

 public final java.sql.PreparedStatement prepareStatement(java.lang.String, java.lang.String[]) throws java.sql.SQLException; flags: ACC_PRIVATE, ACC_FINAL Code: stack=5, locals=3, args_size=3 0: getstatic #59 // Field PROXY_FACTORY:Lcom/zaxxer/hikari/proxy/ProxyFactory; 3: aload_0 4: aload_0 5: getfield #3 // Field delegate:Ljava/sql/Connection; 8: aload_1 9: aload_2 10: invokeinterface #74, 3 // InterfaceMethod java/sql/Connection.prepareStatement:(Ljava/lang/String;[Ljava/lang/String;)Ljava/sql/PreparedStatement; 15: invokevirtual #69 // Method com/zaxxer/hikari/proxy/ProxyFactory.getProxyPreparedStatement:(Lcom/zaxxer/hikari/proxy/ConnectionProxy;Ljava/sql/PreparedStatement;)Ljava/sql/PreparedStatement; 18: return 


You can see that the getstatic call comes first to get the value of the static field PROXY_FACTORY . Also note the last invokevirtual call for the getProxyPreparedStatement () method of the ProxyFactory object.
The optimization was that we removed the singleton factory and replaced it with a class with static methods. The code began to look like this:

 public final PreparedStatement prepareStatement(String sql, String[] columnNames) throws SQLException { return ProxyFactory.getProxyPreparedStatement(this, delegate.prepareStatement(sql, columnNames)); } 


Where getProxyPreparedStatement () is a static method of the ProxyFactory class. And this is what baytcode looks like:

 private final java.sql.PreparedStatement prepareStatement(java.lang.String, java.lang.String[]) throws java.sql.SQLException; flags: ACC_PRIVATE, ACC_FINAL Code: stack=4, locals=3, args_size=3 0: aload_0 1: aload_0 2: getfield #3 // Field delegate:Ljava/sql/Connection; 5: aload_1 6: aload_2 7: invokeinterface #72, 3 // InterfaceMethod java/sql/Connection.prepareStatement:(Ljava/lang/String;[Ljava/lang/String;)Ljava/sql/PreparedStatement; 12: invokestatic #67 // Method com/zaxxer/hikari/proxy/ProxyFactory.getProxyPreparedStatement:(Lcom/zaxxer/hikari/proxy/ConnectionProxy;Ljava/sql/PreparedStatement;)Ljava/sql/PreparedStatement; 15: areturn 


Here you should pay attention to 3 points at once. The getstatic call no longer exists . invokevirtual has been replaced by invokestatic , which in turn is better optimized by the virtual machine. And the last point, which is difficult to see - the size of the stack has decreased from 5 elements to 4. Since prior to optimization in the case of invokevirtual, the stack should also receive a reference to the ProxyFactory object itself . This also means an extra pop instruction for getting this link from the stack when getProxyPreparedStatement () is called. In general, if we sum up, we got rid of access to the static field, removed the extra push and pop operations on the stack and made the method call more suitable for JIT optimization.

The end.

Full original Down the Rabbit Hole .

UPDATE:
In the comments part of the article “Slow Singleton” caused a lot of discussion. apangin argues that all these micro optimizations are meaningless and do not give any gain. The comment is a simple benchmark of the same value invokeVirtual and invokeStatic . And then the benchmark pool of classmates connections, which is supposedly 4 times faster than HickaryCP. To which the author HickaryCP gives the following answer :

First I would like to comment on @odnoklassniki comment that their pool is 4x faster. I have added their pool. Here is the result vs. HikariCP:

 ./benchmark.sh clean quick -p pool=one,hikari ".*Connection.*" Benchmark (pool) Mode Cnt Score Error Units ConnectionBench.cycleCnnection one thrpt 16 4991.293 ± 62.821 ops/ms ConnectionBench.cycleCnnection hikari thrpt 16 39660.123 ± 1314.967 ops/ms 


This is showing HikariCP at 8x faster than one-datasource .

JMH test harness itself has changed. I checked out the results. I just checked it out. I ran both using the benchmark harness available at that time:

Before static proxy factory methods:
 Benchmark (pool) Mode Samples Mean Mean error Units ConnectionBench.testConnectionCycle hikari thrpt 16 9303.741 67.747 ops/ms 


After static proxy factory methods:
 Benchmark (pool) Mode Samples Mean Mean error Units ConnectionBench.testConnectionCycle hikari thrpt 16 9436.699 71.268 ops/ms 


It shows a slight improvement.

If you’re not sure, you’re not sure what you’ve seen.

EDIT: And wow has HikariCP performance improved since January 2014!

Source: https://habr.com/ru/post/269023/


All Articles