Semantic Web technologies for integrating information systems

Semantic Web technologies (Semantic Web) periodically attract attention due to the fact that new interesting tools are created on their basis. Most recently, Facebook’s Social Search (Graph Search) appeared - the first graph search tool available to a truly wide range of users.
However, the scope of semantic technologies is not limited to social networks and search services. The idea of using these technologies for organizing data exchange between information systems is quite obvious. If one system transmits to the other not only the data itself, but also information about their subject matter (sense, semantics), this makes it possible to better abstract the exchanging systems from each other than when using XML uploads or SOA web services.
Encoding information in semantic form during transmission

Encoding information in semantic form during transmission

Today there are several implementations of this approach. Most of them, of course, made by foreign companies, but there are Russian developments. In this article I will talk about the architecture of one such system, which I implemented in practice.

When creating a system, I proceeded from the assumption that several information systems need to very quickly (at intervals of a few seconds) inform each other about the changes that take place in their data stores. Therefore, the architecture strongly resembles a Message Queue (Message Queue) messaging bus, the main feature of which is that the content of messages is expressed in the RDF syntax, that is, it represents triplets. On the part of each of the integrated systems, there is a client exchange module that interprets the received messages and, if necessary, makes the appropriate changes to the data warehouse of this system. The client module is active, that is, if changes have occurred in the repository, which need to be reported to other systems - it encodes information about these changes in RDF, and sends it as a message on the bus.
The role of the router in the bus is performed by the central intermediary server, which has information about the access rights of information systems to various types of data, guarantees delivery, controls data integrity (and, if necessary, tries to restore it), and also performs a number of other useful functions. The figure below shows the scheme of software components involved in the exchange. As an example, two exchanging systems are taken, on the one hand, a certain PHP / MySQL web application, and the 1C configuration, on the other. The proxy server is also implemented in PHP / MySQL, but it will be rewritten on a more suitable platform. As for databases, the client component already doesn’t care which database it works with - MySQL, PostgreSQL or Oracle.
The exchange scheme between clients and server

The exchange scheme between clients and server

Of course, in practice, exchanging systems can do more than two.
The advantages of this approach compared to uploading to XML or a SOAP web service are fairly obvious, especially if we assume that three, four or more systems need to be interconnected. They can operate with the same types of objects (for example, the concept of “client” is almost certainly present in all information systems of any company), but they can have different data sets about them and use them in different contexts. Uploading to XML will almost always be redundant, rigidly associated with the data structure in the source database, and will require writing software code for export and import. If, for example, a CRM system stores tables with customers and transactions with them, the XML upload will look something like this:
Upload to XML

It is clear that for one system-consumer of this information, information about transactions will be required with details on the goods, the other without it, but with information about the consignee and the delivery time, etc. The use of uploads becomes extremely problematic if any data can be changed “retroactively”. And if any of the recipient systems has the right to make changes to this data (for example, information about the actual shipment of goods can be received from the system of the logistics department), you can continue to use unloadings only through pure persistence.
A more progressive option is to use the SOAP web service. Then the scheme of interaction of the source system with consumers of information will look like this:
Integration with a SOAP web service

It will be convenient as long as the number of services (growing in proportion to the number of types of information objects that need to be exchanged) does not exceed in a few dozen. Then the problem of monitoring the serviceability of services, their documentation and support will begin to become truly critical. In addition, this does not solve the problem of system feedback: in the example mentioned above, when a logistics department system not only takes information from CRM, but can also put any data into it, web services will have to be implemented by both systems, completely uncomfortable. Also, programmers will have to take care of the security of services, maintaining data integrity, error handling, etc. - problems are not considered.
In larger infrastructures, MDM systems are used (which requires a completely different order of investment and effort), as well as a messaging bus. Tires are good because they allow you to combine a large number of information systems without increasing the complexity of the exchange depending on the number of systems. We, in fact, implement the bus, but unlike the “classic” bus, we will send messages that look like this:
Information encoded as RDF triplets

As I said before, each sending system, before sending, converts the information that it wants to share into such a format, and each receiving system interprets it in accordance with its own rules.
The advantages of this approach in comparison with the classical messaging bus are that information systems do not need to create an “exchange protocol” that defines the meaning of messages. Its role is performed by an ontology known to all exchanging systems and a server (the server has the most complete version of the ontology, client systems can use its subsets). In the settings of each client module, the mapping of ontology elements to the structural elements of the local data store (in other words, the tables and fields of the database) is performed. Of course, you can create custom program handlers for entities and relationships that do not have a unique mapping to the database structure, but in practical implementations the amount of required program code is measured in dozens of lines. Also, unlike the classic messaging bus, in our case the server performs a number of high-level useful functions, such as the already mentioned data integrity restoration, detection of duplicate objects, conflict resolution. By the way, at the same time the server can form a SPARQL database in which information from all the exchanging systems, presented in the form of a graph, will be accumulated. This promises to provide such analytics capabilities that none of the integrable systems alone can provide.
Of course, the use of the approach I have described allows all the above integration problems to be solved: to ensure that exchange systems from different systems are independent of each other’s database structure, fault tolerance, security, customization flexibility, the ability to easily use the same information in different contexts and .P.
Of course, the described approach is far from the only use of semantic technologies for data integration. I will tell about possible alternatives in the next post. Also outside the scope of this post is the question of choosing an ontology for encoding the transmitted information.

Source: https://habr.com/ru/post/167419/

All Articles

Semantic Web technologies for integrating information systems

More articles: