ISO 15926 vs Semantics: a comparative analysis of semantic models

The idea of using semantic models in corporate information systems has existed for a long time, but the steady practice of using them so far has not yet been formed. Semantic models can be used for data integration, analytics, knowledge management; However, there is no generally accepted opinion on how to approach the assessment of their usefulness, according to what methods such models should be built.
The objective of the article is to use a practical example to compare the analytical potential of models built according to the rules of the ISO 15926 integration standard, which prescribes the use of OWL and SPARQL to express models and work with them, and “ordinary” semantic models built without using this standard. Solving this issue will allow you to choose a range of tasks, for the solution of which it is advisable to apply such high-level paradigms of semantic modeling, such as ISO 15926.

Formulation of the problem

It is necessary to briefly highlight the history of the issue, and the essence of the relationship between ISO 15926 and the "usual" semantics. ISO 15926 is an information exchange standard intended for use in industry (primarily oil and gas). Historically, the emphasis in the development of a standard was on the exchange of data between various organizations, i.e. between different information infrastructures. Its main features are a specific approach to the classification of objects and their relations, taking into account the time component of objects (4D modeling), the ability to model the life cycle of systems (and not just the current state of a particular system). The standard contains an ontological core, and implies the use of common reference data libraries for creating applied information models. All this provides both its advantages (the ability to create high-quality and relevant models of the life cycle of systems, excellent potential for transmitting information between different organizations using a common "ontological dictionary") and disadvantages (increasing complexity of the resulting models, high "entry threshold" for the level of knowledge required to master the standard and its use).
The development of the standard began in the 1990s. With the advent of the Semantic Web technology in the mid-2000s, they were established as the technological basis for expressing data in accordance with ISO 15926. Thus, the basic concepts of the standard were laid before the Semantic Web, but the emergence of these technologies provided the necessary technological basis for creating a way of expressing data in accordance with a standard that has the potential of really wide distribution. Some ideological affinity, but not the identity of these technologies, laid the foundation for the “contradiction” that we want to resolve. Since the principles on which modeling is performed in accordance with ISO 15926 do not quite correspond to the principles of representing objects and relations between them, for example, in the OWL language, the combination of these two technologies turned out to be somewhat synthetic. The data constructed in accordance with ISO 15926 can be decomposed into elementary elements - RDF triplets, but the analysis of relations between information objects of the model presented in this form by means of SPARQL will be difficult.
So, the essence of the contradiction, to which we are going to shed light, is as follows: it is argued that semantic models built in accordance with ISO 15926 have qualitative differences from semantic models built without such high-level guidance, only by means of “ordinary” technologies of the Semantic Web . Thus, these models have a fundamentally different nature (at least, ideological). It is argued that there can be a competition between these two types of semantic models, and there can be only one argument in favor of “ordinary” models - relative simplicity; for all other indicators, ISO 15926 models are more correct and useful.

Below we take a closer look at these statements, based on a practical example, designed to clearly identify the interrelationship, similarities and differences of ISO 15926 and “ordinary” semantic models. In the meantime, let us turn to the ideology of creating and applying semantic data models (we will call “ordinary” semantics models that are not built in accordance with ISO 15926).
')

Semantic models and their applications

The main driving idea in creating semantic technologies was the need to ensure that the algorithms of computers understand the meaning (semantics) of data. Thus, the initial task of these technologies was analytics: providing opportunities for extracting knowledge from related sets of information.
With the development of these technologies, experiments on their application in various fields, it turned out that they are extremely convenient for combining (linking) data from various sources. From here arose the second direction of development of tools based on the ideas of semantic networks - the integration of information systems.
There is no contradiction between the analytical and integration application of semantics; on the contrary, they are in unbroken unity. After all, the goal of integration, as a rule, is to extract some new knowledge from the combined set - such knowledge that could not be obtained from each source separately. The task of simplifying the transfer of information from one information system to another can also be solved using semantic technologies, but is rather an additional bonus from their development.

In a number of applications, significant breakthroughs have been achieved using semantic information analysis technologies. Especially convincing are these successes in the field of medicine and biotechnology. For example, on semantic technologies bases are built, combining information about medical drugs and their effects, clinical histories, genetic information. The analysis of such bases helps researchers to create new medicines. This is an excellent example of a situation where relational databases are not able to adequately reflect the diversity of links between information objects and provide tools for analyzing these links — and semantic technologies can. Also, semantic databases are used in health care (for analyzing the spread of diseases), and in many other applications.
Means of analyzing information using semantic technologies are also included in everyday life. For example, the developers of Facebook Graph Search have come up with an excellent example that allows us to demonstrate at the ordinary level the fundamental novelty of semantic search (analysis): it is obvious that none of the existing search engines based on the search principle in the text will be able to answer the question “Which restaurants liked my friends? ”, or“ What cities do my relatives live in? ” Searching the graph using a formalized set of information objects (people, restaurants, cities) and the relations between them (like, living in) is able to give the necessary answer quickly and absolutely accurately. At the same time, the terms of the request can be varied within the limits that the ontology allows (a set of the same types of information objects and relations between them): you can ask similar questions not about restaurants, but about films, not about relatives, but about classmates. It is clear that the entire contents of the Facebook social network is a huge single information graph, with billions of nodes and links. The ability to analyze and use these links is of all its value, which the owners of the resource perfectly understand.

Based on the above, we can determine the criteria that should apply to information models built on the principles of semantic technologies. Some of these criteria stem from the general requirements for models, and some from the specifics of technologies related to the conditions of their practical utility. We list them.
1. The result of performing any action in the real system and in the model should be the same (the similarity relations between the model and the system in the initial and final state are described by the same rules). This requirement provides the predictive potential of the model: if it is satisfied, we can simulate the development of the system, and implement the simulation results.
2. The model should reflect the properties of the objects and the relationships between them in a way that makes it possible to extract knowledge from the model using existing technologies (such as SPARQL). This is a purely practical requirement that ensures the suitability of the model for analysis. In fact, it declares the possibility of performing calculations on the model.
3. The model should provide opportunities for expansion and scaling (integration and detail), without revising its ontological core. This requirement imposes restrictions on the selection of methods for the classification of objects, the delimitation of objects and their properties; This requirement can be seriously detailed.

Example: creating and analyzing a “conventional” semantic model

We now consider two “conflicting” methods for constructing models, and evaluate their practical utility based on the listed criteria. We will take an example from the field of industry “native” for the standard under discussion.

We describe in the form of a semantic model information about the event - the installation of the pump in the pipeline. Our model should contain the following information:

The place of installation of the pump (indicated on the plant diagram by a specific identifier; hereinafter, we will call it a “functional place”, in accordance with the terminology of ISO 15926);
The pump itself, as a physical object of a certain type with a specific serial number assigned to it at a certain point in time;
Information that this pump is suitable for this place (can be installed in it);
Date and time of installation.

First, we will model this information structure without relying on ISO 15926 (of course, you can create it in many different ways, we will randomly choose one).

Yellow shows objects corresponding to events, green shows material objects, and no frame contains literals. In the dotted box, the class definition is circled, which, in principle, refers to reference data, and not to a specific model. Arrows indicate the relationship of objects with each other, and objects with literals (properties) - edges of the graph.
As a result of importing this ontology into SPARQL, you get a set of 16 triplets (edges of the graph). They correspond to the lines shown in the diagram, plus one triplet for the type of each object. Of course, the scheme is simplified - for example, the “model” should not be a literal, but a reference to the corresponding object.

From this link you can download the RDF-representation of this model, as well as a set of triplets, into which it turns into a SPARQL access point after import.

Consider the analysis of this model. For example, let us want to find out exactly where the pump was installed with the serial number known to us. For this we need the following sequence of simple queries:

SELECT * WHERE { ?pump <http://example.org/DC#Model> "Centrifugal Pump Model AB-123C"^^<http://www.w3.org/2001/XMLSchema#string> }

This query returns the ID of the object containing the pump information. Now we find the object "installation of the pump":

 SELECT * WHERE { ?installation <http://example.org/DC#InstalledItem> <http://example.org/DC#DE-1234F> }

It remains to find out the installation location:

 SELECT * WHERE { <http://example.org/OurInstallation> <http://example.org/DC#InstalledPlace> ?place }

We looked at the three edges of the graph; Of course, these requests can be combined into one.

Example: creating and analyzing a model according to ISO 15926

In accordance with the ISO 15926 standard, our event — the installation of a pump — should be described by the InstallationOfTemporalPartMaterializedPhysicalObjectInFunctionPlace template. In a simplified form, the structure of the roles of this template, which allows to express information approximately equivalent to that shown in the example above, can be represented as follows:

In this diagram, the template instances are colored yellow, the instances of the objects are green.
Such a structure, filled with the minimum necessary data (without annotations), when importing into SPARQL, the access point turns into 36 triplets (download OWL and the set of triplets resulting from it can be found at this link ). Note that the structure of this data in triplets turns out to be quite reasonable, and is not so very different from the structure of the model without using the standard. The increase in the number of triplets compared to the “ordinary” semantic model is more than doubled by the addition of new information objects, as well as references to the definitions of the basic types contained in many of them. However, converting reference data into triplets, especially template definitions, will give much worse results in terms of the optimality of the graph structure. Thus, the definition of only one InstallationOfTemporalPartMaterializedPhysicalObjectInFunctionPlace template is 148 triplets, many of which include blank nodes (graph nodes that do not have their own identifiers). In particular, many triplets connect two blank node. Working with such structures using SPARQL is very difficult. In practice, this will result in a serious increase in the complexity of software that implements the possibility of creating or viewing templates. For comparison, the “usual” semantic model of the same data fits into just 38 triplets, that is, it is an order of magnitude more compact than the ISO 15926 model (let's not forget that the 148 triplets mentioned above describe only one pattern, and there are four in our example, plus the definition of the required standard types). Another important difference is that the ISO model contains links to external elements that are outside the access point where the current ontology is located — in particular, in the RDL (Reference Data Library, reference data catalog; below we will return to these directories).

Let us consider the possibilities of analyzing a model built according to the rules of ISO 15926. We will perform the same tasks as described above for the “ordinary” semantic model. Suppose we want to know exactly where the pump was installed with the serial number known to us. In the ISO model, we need the following sequence of simple queries:

 SELECT * WHERE { ?temporalpart <http://standards.iso.org/iso/15926/tpl#valIdentifier> "S/N DE-1234F"^^<http://www.w3.org/2001/XMLSchema#string> }

The result is an instance ID of the ClassifiedIdentificationOfTemporalPart template. Now we ask what physical object the pump is associated with this pattern:

 SELECT * WHERE { <http://example.com/tpl#CITP456> <http://standards.iso.org/iso/15926/tpl#hasTemporalWhole> ?pump }

We get the pump ID (an object of type MaterializedPhysicalObject). Now we can get a list of instances of the template describing the installation of the pump:

 SELECT * WHERE { ?installation <http://standards.iso.org/iso/15926/tpl#hasTemporalWholeOfInstallable> <http://example.com/tpl#MPO456> }

Got an instance ID of the InstallationOfTemporalPartMaterializedPhysicalObjectInFunctionPlace template. Now we find out what functional place the installation was made:

 SELECT * WHERE { <http://example.com/tpl#T123> <http://standards.iso.org/iso/15926/tpl#hasTemporalWholeOfFunctionPlace> ?place }

So, we needed to go through four edges of the graph. It is very important that in order to compose these requests, the programmer must be thoroughly familiar with the principles of ISO 15926, and have an annotated template library (which, in fact, is not publicly available).

Comparison of the analytical potential of models

Another interesting aspect of analyzing this graph is related to time (considering the temporal aspect is one of the strengths of ISO 15926). If we want to find out which pump was installed at a certain functional place at a certain time, we will have to do this with the help of not very convenient means of working with SPARQL dates. We construct the necessary query.
We obtain the pump installation episodes, knowing the identifier of the functional place:

 SELECT * WHERE { ?inst <http://standards.iso.org/iso/15926/tpl#hasTemporalWholeOfFunctionPlace> <http://example.com/tpl#FP123>. }

Now we get the pump ID - for the sake of variety of examples, let's do it in the same query:

 SELECT * WHERE { ?inst <http://standards.iso.org/iso/15926/tpl#hasTemporalWholeOfFunctionPlace> <http://example.com/tpl#FP123>. ?inst <http://standards.iso.org/iso/15926/tpl#hasTemporalWholeOfInstallable> ?pump. }

From the pump, most likely, there will be a link to its type in the RDL - with its help we will be able to find out what kind of pump it is; but we left this part of the model outside of our example. It remains to add to the request condition on the installation date:

 SELECT ?pump WHERE { ?inst <http://standards.iso.org/iso/15926/tpl#hasTemporalWholeOfFunctionPlace> <http://example.com/tpl#FP123>. ?inst <http://standards.iso.org/iso/15926/tpl#hasTemporalWholeOfInstallable> ?pump. ?inst <http://standards.iso.org/iso/15926/tpl#valStartTime> ?time. FILTER (?time < "2013-05-09T12:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>) } ORDER BY DESC (?time) LIMIT 1

The combination of using the FILTER condition by the installation date, sorting ORDER BY and limiting the number of output results LIMIT gives us only one desired result - it allows us to select the pump installation episode preceding the specified date.

On the “conventional” semantic model, this query will have exactly the same structure, and exactly the same number of elements:

 SELECT ?pump WHERE { ?inst <http://example.org/DC#InstalledPlace> <http://example.org/DC#R4598459832>. ?inst <http://example.org/DC#InstalledItem> ?pump. ?inst <http://example.org/DC#Created> ?time. FILTER (?time < "2013-05-09T12:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>) } ORDER BY DESC (?time) LIMIT 1

Another interesting aspect of using models built in accordance with ISO 15926 is related to the use of RDL - reference data libraries. They contain definitions of device types, their functions, etc. These libraries are available in external SPARQL access points, usually belonging to any industry association. In our column there is one reference to RDL - this is the definition of the type of functional object, which tells us that it should be a device with a pump function. If we request information about the type of the FunctionalPhysicalObject object we have,

 SELECT * WHERE {<http://example.com/tpl#FP123> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?rdl }

then we get a link to the RDL: < rdl.example.org/sampleReferenceData#R4598459832 > (and at the same time we find out that our functional object belongs to the WholeLifeIndividual class, and several other ISO 15926 root classes are not very useful information for us). If we now want to know what this definition means, we will have to make a request to another access point where this RDL is stored:

 SELECT * WHERE { <http://example.com/tpl#FP123> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?a. SERVICE <http://rdl.example.org/sparql> { ?a ?b ?c. }. }

Such a request will return to us all the information that is in the RDL for this type of device. As RDL, reference data libraries (catalogs) supported by industry associations and regulatory authorities, as well as private catalogs, for example, of suppliers of certain equipment, can be used.
In the "ordinary" semantic models, we can also use federated queries. We can create a shared directory, for example, equipment, and place it in an open access point. The only question is whose “authority” will be supported by such a repository of information. Giving authority to directories is a function of various associations. At the same time, if one does not prevaricate, the conformity or non-compliance of a directory with one or another standard does not add to its “authority” practically nothing. If, however, a specific company’s directory, for example, a supplier of equipment, is used as an RDL, the question of the presence of “authority” is completely meaningless.

Similarities and Differences of the “Normal” Semantics and ISO 15926

The conclusions from the considered examples of using semantic models are quite obvious:
1. From a technological point of view, “ordinary” semantic models are quite symmetrical to ISO 15926 models, if we talk about design data (expressing information about specific systems and processes). ISO models are more complex, and this gap, compared to “conventional” models, is growing depending on the size of the model according to a linear law. This is explained by the presence of separate entities for expressing the temporal parts of objects, as well as the need to classify objects according to the classifier of top-level types.
2. From the point of view of the computational potential of these models, the calculations on them are also somewhat more complicated than in the “usual” semantics, but the difference is not radical. More importantly, building queries requires not only familiarity with the model, but also possession of the ISO 15926 concepts, as well as the presence of a template navigator (which, as far as we know, is not publicly available; a set of templates and the order of their approval, as far as we know are also far from desired).
3. The system of reference data and high-level entities ISO 15926 is very complicated compared to the “usual” semantics (if we take as an indicator the number of triplets required to express a model, 10 times or more). This is especially true of high-level entity libraries, such as templates. Working with definitions (not instances!) Of these entities by means of semantic technologies is much more difficult. However, any application that provides the user with the ability to work with templates, and their low-level presentation “hiding” from him, should have a wide range of possibilities for such work (searching and viewing, creating and editing template definitions, filling them out). A partial solution to the problem can be working with templates expressed not in the form of triplets in RDF storage, but in the form of OWL files.
4. The concepts of ISO 15926, which are considered to be its “know-how,” and which ensure the special value of this standard — the use of federated access and RDL libraries, the consideration of temporal parts — are also available in “normal” semantics. It all depends on how the data model is built, and how the data is divided into design and reference data. By the way, note that there are no practical obstacles to the use of RDL libraries built in accordance with ISO 15926 in applications that use data models that do not correspond to it.
5. The real value of a standard is, above all, its status as a standard; conventional classification methods and the classifiers themselves (as well as their administration methods) provide the potential to use the standard for integration between different enterprises, but somewhat complicate the execution of computational tasks on models. This is a natural situation: one has to pay for any universality with speed.

Thus, the ISO 15926 standard is one of the ways to build semantic models, which has certain advantages and disadvantages compared to other methods that contain less high-level formalism. From the point of view of practical implementation and potential of use, there is no fundamental difference between the standard and other methods, which would allow to oppose them as different technologies. Declaring the presence of such a difference could be considered a marketing method of propaganda of the standard, if it did not play a scaring role for specialists already familiar with “ordinary” semantic technologies (as is happening now in practice). In addition, it is extremely difficult to explain the difference between the ISO 15926 and the “usual” semantics at the technological level to people who are not IT specialists but who make decisions on the creation of this or that software infrastructure.

The creation of high-quality models of systems is possible both with the use of this standard and without it; The decision on its use should be made based on the context of application of the developed information system, first of all from the point of view of the possible inclusion of the model in the integration processes, and from the point of view of the requirements for performing calculations on the model. Following the standard can be expressed as a general recommendation, however, under certain circumstances, the need for fast calculations on a model may require the implementation of a more rational ontology. Solving the problem of optimizing computing only with hardware is usually not efficient.
The main obstacles to the spread of the standard should be considered:

high “entry threshold” - the amount of knowledge necessary for the successful use of this technology;
the lack of a full-fledged methodological base, documentation, developed support communities, collections of examples of use;
the lack of software available to a wide range of users that provides opportunities to work with data models built in accordance with the standard;
the lack of convincing and open examples of successful use of the standard (going beyond the simple declaration that such and such companies apply it for such and such purposes).

It is with these problems that one must fight, spreading the standard as a best practice. Its artificial opposition to “ordinary” semantic technologies can only play a negative role in this process.

ps Thanks to Viktor Agroskin for clarifying the example of the ISO 15926 model.

Source: https://habr.com/ru/post/178973/

All Articles