⬆️ ⬇️

Comparative analysis of frameworks for working with ontologies for .NET and Java

It's no secret that the lion's share of projects related to the Semantic Web is being developed in Java. The frameworks for working with semantic ontologies are no exception: all major projects (Jena, OWL API, Sesame, etc.) are written in Java. The only serious representative using .NET is Intellidimension with RDF Gateway and Semantics.SDK products.



In this article I will describe my experience with the above frameworks and share the results of testing.



Introduction



This article is not a comprehensive review of the above frameworks. The article is aimed at analyzing the performance of the basic capabilities of frameworks: ontology loading, inference and SPARQL query execution.



Before diving into technical details, I’ll say a few words about the framework from Intellidimension (as the least well-known product for the Java-based community). Unlike the other frameworks considered in this article, which are OpenSource projects, RDF Gateway and Semantics.SDK are distributed with closed source codes and cost a fair amount of money. For example, RDF Gateway 3.0 Enterprise alone costs $ 10,000 (although version 2.0 was worth only $ 2,000). By the way, the rizoner used during the tests - Pellet and Owlim - are also not free: Pellet is distributed under the dual license, and Owlim offers only a memory version for free; the version working with storage costs 700 euros for each processor core used.

')

Testing



I was faced with the task of choosing a framework for implementing a project under .NET, so Java projects in their pure form did not interest me (initially I didn’t even plan to test them). It was necessary to choose the interop tool between Java and .NET. My choice fell on ikvm.net , which allows you to convert jar'y to .NET dll. Having received the .NET version of Jena, the OWL API and Sesame, I set about testing them. However, testing would be incomplete if it did not contain the results of testing Java frameworks in their native environment. Thus, the following are involved in testing: Intellidimension Semantics.SDK 1.0, OWL API 2.2.0 + Pellet 2.0rc5 (both under Java and under .NET), Jena 2.5.7 + Pellet2.0rc5 (both under Java and under .NET) and partly Sesame 2.24 + SwiftOwlim 3.0b9.



Sesame had to exclude from the initial testing due to the logic of inference in Owlim that differs from the Semantic SDK and Pellet (Owlim is the main rizoner used in conjunction with Sesame). So, Pellet and Semantic SDK are aimed at output at the time of the query (query-time reasoning), although they include the means of advance output; Owlim, on the other hand, aims at a full logical conclusion (full materialization). We will talk more about this in the section General Information about Reasoners.



The ontology Thesaurus 09.02d of the NCI Institute was chosen as a test. True, in its pure form, it contains a number of inconsistencies (inconsistency). After talking with customer support inconsistencies were identified. I used a modified version of 09.02d (which you can download from my dropbox), although version 09.04d is already available, in which there are no inconsistencies.



The following situation was modeled for testing:

1. First, the ontology from the file was loaded into the model;

2. Then, to this model, 3 SparQL queries were executed successively (query texts can be downloaded here ).



First, consider the test results of the first stage:





Despite the fact that .NET / ikvm frameworks are 2-3 times slower than their Java counterparts, they turned out to be faster than Intellidimension.



In terms of working with memory. NET left a more pleasant impression. The Java garbage collector requires you to specify the maximum amount of RAM that can be allocated for the heap (Xmx parameter); .NET uses a more logical policy, in my opinion: it consumes as much memory as it needs (unless a limit is set). The restriction through Xmx is a very old “bug” , which, unfortunately, is still not fixed. As a solution, it is proposed to simply set Xmx with a margin (if, of course, the amount of RAM allows), however, in this case (testing was done with Xmx: 12g), the garbage collector does not bother itself (which we see from the test results). You can go the other way - to select the minimum value of Xmx for specific input data (but at the risk of running into OutOfMemoryException). Thus, you can bring the amount of memory consumed by the JVM to CLR similar indicators (albeit with some additional performance loss for more frequent garbage collection).



Regarding the limit on the maximum amount of memory used, a rather curious case occurred. After successfully downloading the Thesaurus ontology using the API converted to .NET OWL, I decided to open this ontology in Protége (which is based on the same version of the OWL API) to familiarize myself with its structure. However, instead of a tree of concepts and instances, I received an OutOfMemoryException (despite the fact that there was plenty of free memory). And although increasing the value of the Xmx attribute resolved the problem, such a lack of dependency on the garbage collector in Java is not encouraging. The curiosity of the situation lies in the fact that despite the fact that the Java application does not run in its native environment (without dancing with the JVM), it works after conversion to .NET using ikvm.



Now let's move on to the second item of testing - execution of SPARQL queries. Unfortunately, at this stage it is necessary to leave behind the leader of the first test - the OWL API. The fact is that the OWL API, which is implemented by rizoner, does not contain methods for performing SPARQL queries. This is due to the fact that SPARQL was created as a query language for RDF-graphs (and it is not very friendly with OWL), and the OWL API, as you might guess from the name, is focused on OWL. Now we are working on the SPARQL-DL standard and, possibly, its support will be implemented in one of the following versions of the OWL API. At the moment, it remains only to use class expressions, which allow you to write queries using the Manchester syntax. Class expressions are, of course, not SPARQL ... but for most tasks they are sufficient.



So, the test results:





First, I will comment on dashes in the column Intellidimension. Since the launch of the test application, 6 hours have passed, the process weighed about 6 GB, and there was no result. I did not have the patience to wait longer. Forced to count Semantics.SDK technical defeat. It is fair to say that Semantics.SDK copes with smaller ontologies: it does the output and processes the requests ... however, comparing the results with Jena + Pellet, I can confidently say that Semantics.SDK does not always produce the full result.



Pellet coped with the conclusion in a very reasonable time (requests were not easy) and, as in the first stage of testing, the .NET / ikvm framework looks more preferable than Intellidimension.



At this stage of testing is over. Summarizing, we can say that the winner was the Jena + Pellet system, and the copyright award goes to the OWL API + Pellet.



General information about rizoner



In general, there are two approaches to the implementation of logical inference: based on rules (using forward-chaining and / or backward-chaining algorithms) and based on a semantic table ( semantic tableau ). On the basis of the rules, Semantics.SDK and Owlim are implemented, and on the basis of the semantic scoreboard - Pellet.



As far as I can tell, rule-based rizoner is advantageous to use for languages ​​with low expressiveness, and rizoner on the basis of a semantic scoreboard - for languages ​​with high expressiveness. If Pellet (OWL-DL) and Owlim (OWL-Tiny) confirm this observation (being on opposite sides of the barricade), then Semantics.SDK (OWL-FULL) is an exception (and judging by the performance tests, there is nothing good in this exception) .



Consider the diagram from the official website of Owlim:



As you can see, Owlim only supports OWL Tiny. This feature allows him to achieve very high performance.



Owlim is the only (of those reviewed) rizoner that supports multi-threaded output. Thus, Owlim realizes the advantage of rule-based systems over the semantic scoreboard - the possibility of parallelization (there are no algorithms to parallelize the process of building a semantic scoreboard yet). The developers of Semantics.SDK claim that their output is also parallelized ... but in the trial version this is not implemented (in my opinion, not the best limit for the trial version). Please give me a short-term license for testing remained unanswered, so you have to believe the developers of the word.



In an ideal world, a rizoner would definitely have to exist, which, when analyzing an ontology, would determine the level of expressiveness and use appropriate algorithms for its derivation. At the moment, there is no such rizoner, and it is unlikely to ever appear.



Conclusion



In this article, the emphasis is on performance, but when choosing a semantic framework for a particular project, the emphasis should be primarily on the functionality (support (non-relational) storage, support for OWL2, etc.). Discussion of these issues is beyond the scope of this article (and I’m not so familiar with each of the frameworks to do such an analysis). In terms of functionality, I wanted to praise the products from Intellidimension: a bunch of RDFGateway and Semantics.SDK is a very powerful framework, which has no analogues in the Java world ... but the frills of their rizoner have cooled my feelings for this framework.



PS Many thanks for the educational program provided during the writing of the article, Klin Pavel.

PP. S. Original article in my blog .

Source: https://habr.com/ru/post/60334/



All Articles