📜 ⬆️ ⬇️

Elements of the semantic web

The complexity of the structure of the modern information society is constantly growing. In this regard, the requirements for the effectiveness of information processing algorithms are also increasing. Recently, the most popular areas in this area are Data Mining (DM), Knowledge Discovery in Databases (KDD) and Machine Learning (ML). All of them provide a theoretical and methodological basis for studying, analyzing and understanding huge amounts of data.
However, these methods are not enough if the data structure itself is so poorly suited for machine analysis as it has historically been on the Internet today.
To solve this problem, a global initiative has been undertaken to reorganize the Internet data structure in order to transform it into the Semantic Web, providing opportunities for effective search and analysis of data both by humans and software agents.
This article discusses the main technologies that allow to realize the Semantic WEB.


The most important disadvantage of the existing structure of the Internet is that it practically does not use computer-readable data presentation standards, and all information is intended primarily for human perception. For example, in order to get the work time of a family doctor, it is enough to go to the site of the clinic and find it in the list of all practicing doctors. However, if it is easy for a person to do this to a software agent in an automatic mode, it is almost impossible, unless you create it taking into account the rigid structure of a specific site.

Knowledge Disposition Process

To solve such problems, ontologies are used to describe any subject area in terms that are understandable to the machine and to use mobile agents effectively.
When using this approach, in addition to the information seen by a person, there is also service information on each page, which makes it possible to use the data effectively by software agents.
In turn, ontologies are an integral part of the global vision for the development of the Internet to a new level, called the Semantic WEB (SW).

A stack of semantic web concepts
')

The most important concepts of Semantic WEB


To achieve such a complex goal as the global reorganization of the world wide web requires a whole set of interrelated technologies. The above figure shows the general structure of the concepts of Semantic WEB. The following is a brief description of key technologies.

Semantic web


The concept of semantic web is central to the modern understanding of the evolution of the Internet. It is believed that in the future the data in the network will be presented both in the usual form of pages and in the form of metadata, approximately in the same proportion, which will allow the machines to use them for logical conclusions realizing all the benefits of using the ML methods. Uniform resource identifiers (URIs) and ontologies will be used everywhere.
However, not everything is so rosy, there are doubts about the possibility of the full realization of the semantic web. The main theses in favor of doubts about the possibility of creating an effective semantic web:
• The human factor people can lie, lazy to add meta descriptions, use incomplete or simply incorrect metadata. As a solution to this problem, you can use automated tools for creating and editing metadata.
• Excessive duplication of information, when each document must have a complete description for both the person and the machine.
This is partly solved by the introduction of microformats .

In addition to the metadata themselves, the most important part of SW is semantic Web services. They are sources of data for semantic web agents, initially aimed at interacting with machines, have the means of advertising their capabilities.

URI (Uniform Resource Identifier)


URI is the uniform identifier for any resource. It can indicate both a virtual and a physical object. Represents a unique character string. The most famous URI for today is the URL, which is the identifier of a resource on the Internet and additionally containing information about the location of the address of the resource.
URI
Basic URI format

Ontologies


As applied to the field of Machine Learning, ontology is understood as a certain structure, a conceptual scheme describing (formalizing) the values ​​of elements of a certain subject domain (PRO). An ontology consists of a set of terms and rules that describe their relationships, relationships.
Typically, ontologies are built from instances , concepts, attributes, and relationships .

Since, as between different ontologies, it is possible to establish intersection points, then the use of ontologies allows you to look at one ABM from different points of view and, depending on the task, use different levels of detail of the considered ABM. The concept of ontology detail levels is one of the key ones, for example, to indicate the color of a traffic light signal, it is sometimes sufficient to simply indicate “green”, whereas when describing the color of a car painting, even such a detailed description as “ dark green, close in pitch to needles ” may not be enough .

Consider the general structure of the use of ontologies.

Part of possible address ontology

An example of a possible rule in address ontology. In the case of using this ontology, in order to send a letter to an American university, it is enough to indicate its name, the program agent will find his address on the basis of standard address information from the university site, if you need to send a letter to a particular department, then a list of all will be received from the site of the faculties and the necessary one is chosen, and the address is taken from the site of the required faculty, then, using the above ontology, the program will determine the address format adopted in the USA.

A computer does not understand all the information in the full sense of the word, but the use of ontologies allows it to use the available data much more efficiently and meaningfully.


Of course, there are many questions, for example, how in the beginning the agent will find the site of the required university? However, funds have already been developed for this. For example, the Web Services Ontology Language ( OWL-S ), which allows services to advertise their capabilities and services.

Taxonomy


Taxonomy is one of the options for the implementation of ontologies. With the help of taxonomy it is possible to determine the classes into which the objects of a certain subject area are divided, as well as what relationships exist between these classes. Unlike ontologies, the task of taxonomies is clearly defined within the framework of the hierarchical classification of objects.

Modern languages ​​of the description of ontologies


RDF (Resource Description Framework) is a language for describing metadata of resources, its main purpose is to present assertions in the form of equally well perceived by both man and machine.
An atomic object in RDF is a triple: subject - predicate - object. It is believed that any object can be described in terms of simple properties and values ​​of these properties.
Sample table with highlighted parameters

Sample table with highlighted parameters

Before the colon, you must specify a Uniform Resource Identifier (URI); however, in order to save traffic, you can specify only the namespace.
Also, in order to improve human perception, there is a practice of presenting RDI schemes in. as graphs.

RDI
RDI diagram example in the form of a graph

OWL (Web Ontology Language) is a Web ontology language created to represent the meaning of terms and the relationship between these terms in dictionaries. Unlike RDF, this language uses a higher level of abstraction, which allows the language, along with the formal semantics, to use an additional terminological dictionary.
An important advantage of OWL is that it is based on a clear mathematical model of descriptive logics.
OWL    Semantic WEB     W3C
OWL's place in the general structure of the Semantic WEB from the point of view of the W3C consortium

According to the degree of expressiveness, there are three OWL dialects.

Currently, OWL is the main tool for describing ontologies.

Software (mobile, user) agents (SA)


In the considered ABM SA, it is considered a program acting on behalf of the user, independently collecting information for some, possibly long time. Also important is their ability to interact with other agents and services to achieve the goal.
Unlike search engine bots, which simply scan ranges of WEB pages, agents move from server to server, i.e., they are destroyed on the starting server, and created on the receiving server with the full set of previously collected information. This model allows the agent to use data sources available to the server that are not accessible through the WEB interface.
It is clear that a server must be installed on the server to accept the agent and service its requests. It is also important to pay attention to the security and integrity of the agents. The approach of allocated spaces is used for this when the agent works in some safe environment with limited rights and possibilities to influence the system.
Agents for their implementation are divided into ordinary and students.
If the former are designed to perform well-defined tasks, then the latter are based on flexibility, usually they are based on neural networks. The use of neural networks allows the agent to constantly adapt to user requirements, as well as more effectively interact with the Internet.

Microformats


Microformats are an attempt to create semantic markup of various entities on Web pages that are equally well perceived by both humans and machines. Information in some microformat does not require the use of additional technologies or namespaces in addition to simple (X) HTML . The specification of a microformat is simply an agreement on standards for naming classes of page design elements that allow storing relevant data in each of them.
For example, let's look at the hCalendar format.
This microformat is a subset of the iCalendar format (RFC 2445) and is intended to describe the dates of future or past events to provide opportunities for their automatic aggregation by search agents.

< div class ="vevent" > <br> < a class ="url" href ="http://www.web2con.com/" > <br> http://www.web2con.com/ <br> </ a > <br> < span class ="summary" > <br> Web 2.0 Conference <br> </ span > : <br> < abbr class ="dtstart" title ="2007-10-05" > <br> October 5 <br> </ abbr > <br> -<br> < abbr class ="dtend" title ="2007-10-20" > <br> 19<br> </ abbr > <br> ,at the <br> < span class ="location" > <br> Argent Hotel, San Francisco, CA <br> </ span > <br> </ div > <br><br> * This source code was highlighted with Source Code Highlighter .


This example describes how to create a root container class with a date (class = "vevent") and correlate with an event of a certain date in the standard ISO date format.

Currently, the most common microformats are


In this area there are many new developments, for example, for the automatic construction of automatic classifiers use different levels of ontologies, depending on the data under study.
This article is an attempt to combine data from various sources to get an idea of ​​the general structure of the development of the Semantic Web.

Source: https://habr.com/ru/post/79210/


All Articles