H2 Evolution - window functions, CTE, JSON / XML in embedded database

There are open source projects that have become commercially successful mainstream, such as PostgreSQL / Elasticsearch. Others, for example RethinkDB, lost in the market and stopped development. And the H2 database project of an embedded database written in java language develops and flourishes in its niche.

To demonstrate the functionality of SonarQube, Jira, Confluence when you first start using the H2 database. H2 is the basis for running SQL tests in memory, in almost any JVM project. There is an example of the application less well known to users - this is using H2 in a distributed ignite-sql and this is already a production ready script for using an embedded database as part of another solution. Less than a month ago, version 1.4.199 was released in which you can now write quite complex SQL queries.

In projects, I never relied on H2 as a full-fledged database with data saved to disk. Rather, as a data transformation module in JVM memory, with good SQL support. But for this application it was severely limited by the lack of window functions . And now, after more than half a year from the beginning of the development of the functional , now H2database caught up with SQLite . And this is a great merit of Yevgeny Ryazanov from Irkutsk - he has never seen such a pace of development as in non-commercial open source projects. Also in the project repository regularly appear komits of other Russian-speaking contributors. And in the moments of release - the founder of the project Thomas Mueller .

H2 has support for recursive queries (CTE) . This is a standard SQL method for working with hierarchical data in tables and query decomposition (here you can step on the scheduler's rake). About recursive queries available in the publication with examples.
')
To work with poorly structured data in a future version, an implementation of the SQL / JSON standard will appear. In the meantime, I have expanded H2 for my needs using the XQuery 3.1 user-defined processor function, based on BaseX . The code is available in the github project H2XQueryAdapter . This is a table function that, using XML or JSON format using XQuery, can extract data and check for the values returned by the function, the type and constraint not null. Next, all the power of SQL expressions is available to transform the result of an XQuery conversion in the JVM process memory.

The table function xquery () is overloaded and has two options - with one parameter xQuery query and a variant with xQuery query and the second - sql query string for generating parameters for xQuery itself.

The approach with such transformations proved to be excellent in a project for processing a petabyte volume of raw data in a project of a biomedical data warehouse .

create table xresult (GR VARCHAR(500) not null,AR varchar, VER VARCHAR(50)) as select * from xquery('declare variable $getHeader as xs:boolean external := false(); declare variable $getData as xs:boolean external := true(); <csv> { if($getHeader) then( <record><mavengr>VARCHAR(500) not null</mavengr><artifactname>varchar</artifactname><versionname>VARCHAR(50)</versionname></record> ), if($getData) then(( for $row in doc("http://central.maven.org/maven2/org/springframework/spring-context/5.1.4.RELEASE/spring-context-5.1.4.RELEASE.pom")//*:dependency return <record><mavengr>{$row/*:groupId/text()}</mavengr><artifactname>{$row/*:artifactId/text()}</artifactname><versionname>{$row/*:version/text()}</versionname></record> )) } </csv>')

This implementation has restrictions on the xquery request format.

First, you need to declare two external variables getHeader and getData - it is necessary that the table function does not cause data transformation several times, discarding unnecessary results, in order to get only the names and type of columns. Multiple function calls are features of the H2 database with java functions that return a list of values.

Secondly, the format of the result should be as for csv serialization

 <csv> <record><1> </1>...<N> </N></record> <record><1></1>...<N></N></record> </csv>

Thirdly, it is necessary to declare the data type, the dimension and the mandatory field, for example: decimal (20,4) not null

I tried to reuse the SQL parser from H2 with the column data type parsing, but it turned out to be so strongly connected with other database objects that I refused to work without creating a database and a session. I console myself with the thought that the developers have done so to simplify the design of the application and not to make a parser for all occasions for any BNF grammars.

You can run this example in debug java mode. New transformation rules can be developed in your familiar xquery editor or in the open source BaseX GUI .

It is possible to convert almost any java collection or POJO into a virtual table H2. The code of the project is H2POJOTable on github. In the example by reference, the platform MemoryManagerMXBeans is turned into a table function H2. Perhaps the approach will be a small consolation for those who are sad about the lack of LINQ and support for set operations in Java.

 try (Statement statement = connection.createStatement()) { String pojoTableAlias = "create alias MemoryManagerMXBeans as $$ \n" + "import java.lang.management.ManagementFactory;\n" + "import java.lang.management.MemoryManagerMXBean;\n" + "import org.h2.expression.function.pojo.*;\n" + "import java.sql.*;\n" + "import java.util.Collections;\n" + "@CODE\n" + " ResultSet getRuntimeStat(Connection connection) throws Exception{\n" + " return H2PojoAdapter.toTable(connection, new CollectionWraper<>(MemoryManagerMXBean.class," + " ManagementFactory::getMemoryManagerMXBeans, Collections.emptyMap()));\n" + " }\n" + "\n$$"; statement.executeUpdate(pojoTableAlias); } try (Statement statement = connection.createStatement()) { try (ResultSet resultSet = statement.executeQuery("select * from MemoryManagerMXBeans()")) { int columnCount = assertResultSet(resultSet, new String[]{"memoryPoolNames", "name", "valid"}); assertThat(columnCount).isGreaterThan(1); } }

Sometimes the old functionality in H2 breaks down and is not so much in demand by users. As an example, in my work projects, programs read data from an AWS S3 URL. Therefore, I hope that a known error will be repaired by accepting my pull request . On the way to fix this error are unstable tests for TLS, which also do not work under Java 11.

H2 allows you to use the PostgreSQL ODBC driver by emulating a subset of its network protocol. Which also theoretically allows you to bind it through FDW in PostgreSQL.

In addition to the database itself, H2 also includes a minimalist web console with support for auto-completion when editing, with a servlet or standalone launch option. H2 looks like a “Swiss knife” for developers - a compact and versatile tool if your project already uses a JVM. When I tried to use this console with “crooked” jdbc, the Redshift DBMS driver made its first red-eyed pull request into the project. Noel Grandin , one of the project participants, helped me with code review and accepted corrections.

If you need an analogue of Berkeley DB Java Edition - there is a MVStore in the project - a persistent storage for key-value data and the default MVCC “engine” in the latest versions of the database. It is amazing that the database even has basic support for geofunctions and full-text search.

Thanks to the H2database contributor, everyone uses this database and reports errors! H2 database has been developing since 2005, and now supports window functions, recursive queries, is one of the most powerful SQL “engines” for processing data in JVM memory and is expanding with tabular functions for working with weakly structured data.

Source: https://habr.com/ru/post/446380/

All Articles

H2 Evolution - window functions, CTE, JSON / XML in embedded database

More articles: