What is happening with RDF repositories?

Semantic Web and Linked Data are similar to near space: there is no life there. To go there for a more or less long time ... I don’t know what they told you in childhood in response to "I want to become an astronaut." But you can watch what is happening and being on Earth; becoming an amateur astronomer or even a professional is much easier.

The article will discuss fresh, not older than a few months, trends from the world of RDF-repositories. The metaphor in the first paragraph is inspired by the epic size of the advertising picture under the cut.

Epic picture

I. GraphQL to access RDF

They say that GraphQL claims to become the universal language of access to databases. What about the ability to use GraphQL to access RDF?

"Out of the box" provide this opportunity:

Stardog ( blog , documentation );
TopQuadrant products ( webinar , documentation ).

If the repository does not provide such an opportunity, it is implemented independently by writing the appropriate “resolver”. So did, for example, in the French project DataTourisme . Or you can already write nothing, but simply take HyperGraphQL .

From the point of view of the orthodox follower of Semantic Web and Linked Data, all this, of course, is sad, because it seems to be intended for integrations built around the next data silos, and not suitable platforms for that (of course, RDF storages).

Impressions of comparing GraphQL with SPARQL are twofold.

On the one hand, GraphQL looks like a distant relative of SPARQL: it resolves the problems of re-sampling and multiplicity of queries that are typical for REST - without which, probably, it would not be possible to be considered a query language, at least for the web, and have the name “-QL”;
On the other hand, GraphQL’s hard schema grieves. Accordingly, its “introspectiveness” seems very limited in comparison with the full reflexivity of RDF. And there is no analogue of the property paths, so it’s not even very clear why it is “Graph-”.

Ii. Adapters to MongoDB

Trend complementary to the previous one.

In Stardog, it is now possible - in particular, everything on the same GraphQL - to configure the display of MongoDB data in virtual RDF-graphs;
Ontotext GraphDB has recently made it possible to insert fragments in MongoDB Query into SPARQL.

If we talk more broadly about adapters to JSON sources, which allow more or less on-the-fly to represent JSON as RDF, then it is worth remembering the existing SPARQL Generate for a long time, which can be adapted, for example , to Apache Jena.

Summarizing the first two trends, one can say that RDF storages demonstrate full readiness for integration and functioning in conditions of “multivariant storage” (polyglot persistence). It is known, however, that this last is no longer in vogue, and multimodel is coming to replace it. And what about multi-model in the world of RDF-storage?

In short, no way. I would like to devote a separate article to the topic of multi-model DBMS. In the meantime, it can be noted that there are currently no multi-model DBMS, in which the main model would be graph (a variety of which can be considered RDF). Some small multi-model support for RDF storages of an alternative LPG graph model will be discussed in section V.

Iii. OLTP vs. OLAP

However, the same Gartner writes that multimodel is a sine qua non condition primarily for operational DBMS. This is understandable: in a situation of "multivariate storage," the main problems arise with transactionality.

But where are the RDF storages on the OLTP scale? I would say this: neither there nor here. To designate what they are intended for, some third abbreviation is needed. As an option, I would suggest OLIP - Online Intellectual Processing.

However, all the same:

the mechanisms implemented in GraphDB with MongoDB are not least designed to bypass recording performance problems;
Stardog goes even further and completely rewrites the engine, again with the goal of improving write performance.

Now let me introduce a new player in the market. From the creators of IBM Netezza and Amazon Redshift - AnzoGraph . A picture of an advertisement for a product based on it was posted at the beginning of the article. AnzoGraph positions itself as a GOLAP solution. How do you like SPARQL with window functions? -

SELECT ?month (COUNT(?event) OVER (PARTITION BY ?month) AS ?events) WHERE { … }

Iv. RocksDB

Above was already a reference to the announcement of Stardog 7 Beta, where it was said that Stardog was going to use RocksDB as the underlying storage system — the “key-value” storage, the Facebook fork of the Google LevelDB. Why is it worth talking about a certain trend?

First, judging by the article on Wikipedia , not only RDF storages are transplanted to RocksDB. There are projects on using RocksDB as a storage engine in ArangoDB, MongoDB, MySQL and MariaDB, Cassandra.

Secondly, projects are being made on RocksDB (that is, not products) on the relevant subject.

For example, eBay uses RocksDB in the platform for its “knowledge graph”. By the way, it's fun to read: SPARQL . As in a joke: how many knowledge graph we do, all the same it turns out RDF.

Another example is the Wikidata History Query Service , which appeared several months ago. Prior to his appearance, he had to access the standard Mediawiki API via MWAPI for historical information. Now much is possible in pure SPARQL. "Under the hood" there, too, RocksDB. By the way, made WDHQS, it seems, the person who imported Freebase in the Google Knowledge Graph.

V. LPG support

Let me remind you the main difference between LPG graphs and RDF graphs.

In LPG, scalar properties can be hung on edge instances, while in RDF they can be hung only on “types” of edges (but not only scalar properties, but also ordinary connections). This limitation of RDF compared to LPG is overcome by one or another simulation technique. The limitations of LPG compared to RDF are more difficult to overcome, but LPG-graphs are more than RDF-graphs, similar to the pictures from the Harari textbook, so people want them.

Obviously, the LPG support task falls into two parts:

introducing changes into the RDF model, which make it possible to imitate LPG constructions in it;
making changes to RDF requests to the language, making it possible to access the data in this modified model, or the implementation of the ability to make requests to this model in popular LPG query languages.

V.1. Data model

There are several possible approaches.

V.1.1. Singleton property

The most literal approach to harmonizing RDF and LPG is probably the singleton property :

Instead, for example :isMarriedTo , predicates are used :isMarriedTo1 , etc.
Then these predicates become subjects of new triplets:: :isMarriedTo1 :since "2013-09-13"^^xsd:date , etc.
The connection of these instances of predicates with a common predicate is established by triplets of the form :isMarriedTo1 rdf:singletonPropertyOf :isMarriedTo .
Obviously, rdf:singletonPropertyOf rdfs:subPropertyOf rdf:type , but think about why you should not write simply :isMarriedTo1 rdf:type :isMarriedTo .

The LPG support task is solved here at the RDFS level. Such a solution requires an entry in the appropriate standard . Some changes may be required from RDF-repositories that support attaching effects, but for now Singleton Property can be perceived as just another modeling technique.

V.1.2. Reification Done Right

Less naive approaches stem from the realization that instances of properties are fully instantiated by triplets. Having the ability to say something about triplets, we will be able to talk about instances of properties.

The most solid of these approaches is RDF * , also known as RDR, born in the depths of the Blazegraph. From the very beginning, he chose AnzoGraph for himself. The solidity of the approach is determined by the fact that it proposes corresponding changes in RDF Semantics . The bottom line, however, is extremely simple. In Turtle serialization, RDF can now write something like this:

 <<:bob :isMarriedTo :alice>> :since "2013-09-13"^^xsd:date .

V.1.3. Other approaches

You can not bother with formal semantics, but simply assume that triplets have some identifiers that are, of course, URIs, and make up new triplets with these URIs. You only need to give access to these URIs in SPARQL. So does Stardog.

In Allegrograph went intermediate way. It is known that there are triplet identifiers in Allegrograph, but they do not stick out when implementing triple attributes to the outside. However, even formal semantics is very far away. It is noteworthy that the attributes of the triplets are not a URI, and the values of these attributes can also be only literals. LPG adepts get exactly what they want. In the specially invented NQX format, an example similar to the one above for RDF * looks like this:

 :bob :marriedTo :alice {"since" : "2013-09-13"}

V.2. Query languages

Supporting in one way or another LPG at the model level, it is necessary to give the opportunity to make requests to data in such a model.

Blazegraph for querying RDF * supports SPARQL * and Gremlin . The SPARQL * query looks like this:

  SELECT * { <<:bob :isMarriedTo ?wife>> :since ?since }

Anzograph also supports SPARQL * and is going to support Cypher , the query language in Neo4j.
Stardog supports its own extension SPARQL and again Gremlin. In SPARQL, you can get the URI of the triplet and “meta-information” using something like this:

 SELECT * { BIND (stardog:identifier(:bob, :isMarriedTo, ?wife) AS ?id) ?id :since ?since }

Allegrograph also supports its own SPARQL extension :

  SELECT * { ("since" ?since) franz:attributesNameValue ( :bob :marriedTo ?wife ) }

By the way, GraphDB once supported Tinkerpop / Gremlin, while not supporting LPG, but in version 8.0 or 8.1 it stopped.

Vi. Tightening licenses

There have been no recent additions to the intersection of the "triplestore of choice" and "open source triplestore" sets. The new open source RDF repositories are far from being a good choice for everyday use, and the source code for new triplstors that I would like to use (the same AnzoGraph) is closed. Rather, we can talk about reductions ...

Of course, before open source is not closed, but some open source repositories are gradually no longer considered as worthy of choice. Virtuoso, which has an opensource edition, is, in my opinion, drowning in bugs. Blazegraph purchased by AWS and formed the basis of Amazon Neptune; now it is not clear whether there will be at least one release. Only Jena remains ...

If open source is not very important, but you just want to try, then everything is also less rosy than before. For example:

Stardog ceases to distribute the free version (however, the trial period has doubled as usual);
in GraphDB Cloud , where previously you could choose a free base plan, registration of new users has been suspended.

In general, space for an ordinary IT citizen is becoming more and more inaccessible, its development becomes the lot of corporations.

Source: https://habr.com/ru/post/451206/

All Articles