📜 ⬆️ ⬇️

Triggers, access rights and versioning at the SPARQL access point

Anyone who tries to use the SPARQL access point as a replacement for a database in an industrial project will have to face several troubles. One of them is the lack of access control, triggers, and versioning capabilities in the arsenal of such a product. Having studied everything that is offered on the market today, we came to the need to implement such functionality on our own.
Apache Fuseki is the “guinea pig”, although the same principle can be applied to any other SPARQL endpoint.

Architecture and functionality

The only way to implement the conceived, if not to go inside the product itself, is to create a proxy layer above the software interface of the access point. This we have done. All SPARQL queries addressed to the service go through a proxy, where they are analyzed and further processed.
When accessing a proxy, the application can authenticate under a specific user account - for this, the standard program interface had to be slightly expanded - and it can be contacted anonymously, while remaining within the standard.
The proxy has its own back-end, which provides the ability to configure user access rights (user groups) to the ontology classes. Rights are inherited. The access level can be set to the following values:


It is clear that each ontology object can be simultaneously an instance of any number of classes (including with regard to inheritance). Of all the options applicable to parent classes, the rights are chosen the most stringent.
The ability to edit class definitions and properties is regulated by granting access rights to standard types, such as owl: Class.

In our case, it was important to ensure the possibility of collective work on the ontology. The “change with moderation” rights level allows the user to execute DELETE / INSERT requests, but their result is not applied immediately to the database, but is submitted for approval by users who have appropriate rights. Once a day, the back-end informs such users about the changes received, and they have the opportunity to apply them, or reject them.
All changes made by users to the ontology are stored in the log, which lies in the back-end service database (relational; the access rights settings are stored in it). As a result, it is possible to build a history of changes for all properties of each ontology object, with the date and author of each change.
')
Returning to the rights of access: any request received on a proxy, passes the verification of rights, either before execution or after it. If the query is aimed at sampling data (SELECT, ASK, CONSTRUCT), then solutions containing those objects to which the current user does not have access rights are excluded from the set of its results (if the query was anonymous, only solutions that consist entirely of instances of classes on which there are no restrictions of rights). If the request has the DELETE / INSERT / UPLOAD type, then the set of triplets that it will affect is determined first, and if at least one of them has no editing access rights, the entire request is canceled. Of course, the front-end'y, which work with our proxy, had to "teach" to interpret error messages, as well as warnings that the changes went to moderation.
Paired DELETE / INSERT requests are detected, and if the INSERT request is canceled (what if?), The DELETE pair is also canceled. In general, when writing a proxy, I had to use some interesting workarounds; for example, the response to a SELECT query may not include objects that are not accessible, but they will be involved in the calculation of the solution. Such a situation will arise, for example, when executing a query

SELECT ?prop WHERE { ?object <has_property_1> "some value". ?object <has_property_2> ?prop } 


in case the user does not have access rights to a part of objects? object. Our proxy expands such requests, and returns the properties? Prop only available objects. Similar processing had to be applied to queries that return a COUNT (*) value.

After all of the above, it was quite easy to implement the functionality of the triggers. A trigger is a procedure that is executed after a data change request, in case it affects instances of certain classes. In our project, triggers are used to notify external systems of changes — messages are sent to the bus; however, the same mechanism can be used for, for example, cascading changes in the database itself.

Results and performance


In terms of functionality, we have achieved all the intended results. The system provides control of access rights regardless of which application sends a request to the access point, and also sends notifications about changes in data. The change log allows you to restore the state of any object to an arbitrary point in time. The “edit with confirmation” functionality provides full moderation of changes in ontology.
It remains to find out how much additional processing will affect the speed of query execution. First of all, we were interested in the fact that the speed of SELECT queries was not affected, since our product functions as a catalog of master data for several other information systems.
After analyzing the incoming SELECT queries, as well as those queries to the real SPARQL access point that the proxy itself performs, we found out that more than half of them are simple expressions like “A is a subclass of B”, and “A is a member of class B ". Of course, such requests are easy to cache, and update the cache when the content of the real database changes with the help of our trigger mechanism. As a result, the proxy responds to requests of this kind (as well as some more complex ones) without accessing the real access point, and also makes extensive use of the cache in access rights calculation algorithms. The result exceeded our expectations: under real load, the system works only 13% slower than with a direct access to a real endpoint without rights control.

With requests for changing data, the situation is not so optimistic: their execution has become 6 times slower, since the processing on DELETE / INSERT (and, moreover, UPLOAD) is much more complicated and cannot be optimized. Well, I had to accept this on the working system, losing some of the performance in exchange for functionality.

Source: https://habr.com/ru/post/240295/


All Articles