Database connection pools - why and why

When your project starts to be popular and every millisecond of processing a request from the user becomes critical, you have to look for bottlenecks in the system. Often, it takes most of the time to execute an SQL query from a database application. Let's try to figure out what can be optimized in the program when working with the database.

Theory

For a better understanding, consider what steps are performed when the application makes a query to the database, for example, a query query:

Opening a connection to the database and sending a request to the server.
Server parsing SQL query.
The server optimizes the query based on the rules, as well as from the statistics on the tables. As a result, a query execution plan is built.
The server executes the query in accordance with the previously constructed plan and sends the results to the user.

What can you save time on?
The first step of opening a connection to the server is quite long and we can exclude it by preparing in advance a pool of already open connections and providing connections from it to the application as needed.

It is also possible to avoid repeated execution of steps two and three if we use bound variables when writing queries and cache the results of step three, which we receive from the server.

Currently, most drivers for working with the database support connection pooling. However, there is always the temptation to write your own implementation, which will work faster. Let's check how much we will win using connection pools and caching, both in the boxed solution and in the self-written one.
')

Measuring method

For tests, we use the freely distributed PostgreSQL DBMS, and the client will write to JAVA. In the database, create a small table test.test_table (about 10 rows), consisting of the primary key id and the string value value. Let our clients execute queries to the database in parallel, for this we will create threads that will make simple search queries on the primary key in this table. When creating threads, we will specify a different implementation of connection pools, which will allow us to compare performance, because The thread will calculate the total time it spent on executing 100 queries.

class TestThread extends Thread { private DBPool pool; private long workTime = 0; private long foundStr = 0; @Override public void run() { workTime = System.currentTimeMillis(); //   Connection con = null; PreparedStatement st = null; ResultSet rs = null; Random rnd = new Random();//        for (int i = 0; i < 100; i++) { try { con = pool.getConnection();//     //         st = con.prepareStatement("SELECT a.* FROM test.test_table a WHERE id =?"); st.setObject(1, rnd.nextInt(10)); rs = st.executeQuery();//   if (rs.next()) { String tmp = (rs.getString(2)); //   if (tmp != null) { foundStr++; } } } catch (SQLException ex) { //  ,    System.out.println("Pool " + pool + " exeption " + ex); } finally { //   ,       try { if (rs != null) rs.close(); } catch (SQLException e) { //ignore } try { if (st != null) st.close(); } catch (SQLException e) { //ignore } try { if (con != null) pool.putConnection(con); //      } catch (SQLException e) { //ignore } } } workTime = System.currentTimeMillis() - workTime; //    } }

Now we will make some pools, and we will compare productivity.
The first one will be the classic one, which opens a connection to the server for each request and closes it after the request is completed.

  class DBPool { private String url, user, password; DBPool(String url, String user, String password) throws ClassNotFoundException { this.url = url; this.user = user; this.password = password; Class.forName("org.postgresql.Driver"); } public Connection getConnection() throws SQLException { return DriverManager.getConnection(url, user, password); } public void putConnection(Connection connection) throws SQLException { connection.close(); } }

The second will be using the special caching data source class from the PostgreSQL JDBC driver - PGPoolingDataSource. Which allows you to set the size of the pool of connections, as well as the initial number of connections. In addition, PreparedStatement has settings in its settings; setPrepareThreshold is responsible for the number of request executions after which the request is cached and does not require parsing and building an execution plan.

  class DBPoolCache extends DBPool { private PGPoolingDataSource source; DBPoolCache(String host, String database, String user, String password) { source = new PGPoolingDataSource(); source.setDataSourceName("A Data Source"); source.setServerName(host); source.setDatabaseName(database); source.setUser(user); source.setPassword(password); source.setMaxConnections(20);//  source.setInitialConnections(20);//     } public Connection getConnection() throws SQLException { return source.getConnection(); } public void putConnection(Connection connection) throws SQLException { connection.close(); } }

Well, at the end of our implementation of pools, when we ourselves cache connections to the database as well as the results of SQL query parsing (PreparedStatement).

  class DBPoolCacheMy extends DBPool { private String url, user, password; private PGSimpleDataSource source; private BlockingQueue<Connection> connections = new ArrayBlockingQueue<Connection>(20); DBPoolCacheMy(String host, String database, String user, String password) throws SQLException { source = new PGSimpleDataSource(); source.setServerName(host); source.setDatabaseName(database); source.setUser(user); source.setPassword(password); for (int i = 0; i < 20; i++) {//  connections.add(new MyConnection(source.getConnection())); } } public Connection getConnection() throws SQLException { try { //     return connections.poll(2, TimeUnit.SECONDS); } catch (InterruptedException e) { return null; } } public void putConnection(Connection connection) throws SQLException { connections.add(connection); } }

You also have to implement your own database connection class, which will be caching the PreparedStatement.

  class MyConnection implements Connection { private Connection connection; protected MyConnection(Connection connection) { this.connection = connection; } private ConcurrentHashMap<String, PreparedStatement> statements = new ConcurrentHashMap<String, PreparedStatement>(); public PreparedStatement prepareStatement(String sql) throws SQLException { PreparedStatement statement = statements.get(sql); if (statement == null) { statement = new MyStatement(connection.prepareStatement(sql)); statements.put(sql, statement); } return statement; } ..... }

Plus a class that implements the PreparedStatement interface and does not respond to closure.

  class MyStatement implements PreparedStatement { private PreparedStatement statement; MyStatement(PreparedStatement statement) throws SQLException { this.statement = statement; ((PGStatement) statement).setPrepareThreshold(1); } public void close() throws SQLException { //ignore } ..... }

Conclusion

Finally, let's compare the performance of three different connection pools, run the tests with the number of parallel threads from 1 to 10, for different implementations. The result is the following dependence of the total task execution time on the number of threads.

From the graph it is clear that it is obviously necessary to cache connections with the database, this gives a significant increase in system performance. But to write a self-written implementation of connection caching and PreparedStatement does not give tangible benefits.

Source: https://habr.com/ru/post/194142/

All Articles

Database connection pools - why and why

Theory

Measuring method

Conclusion

More articles: