📜 ⬆️ ⬇️

How will we interact with the data network?

image

The semantic web is a common information space of related data, intended for machines rather than people. Is it so? Yes and no. Indeed, machine-readable data, endowed with precise semantics and published online in conjunction with the ability to link data into distributed sets are the main feature of the semantic web. Together, these features allow us to collect and combine disparate data on an unprecedented scale, and machines will do the whole routine for us.

However, all this is meaningless without a person who can reap the benefits of emerging opportunities. The network of machine-readable data (semantic web or data network) is far from deleting a person from the process. Moreover, it opens up great prospects for the interaction of man and machine.
')
To date, the semantic web community has been mainly engaged in the development of technical infrastructure to make the data network realizable in principle, and to publish related data sets in order to fill it with content. If we want to make full use of the perspectives and capabilities of the data network, we need to overcome this initial stage and work on understanding how the paradigm of user interaction with the network is changing.

In this article, I will look at some aspects of how our interaction with the data network may differ from the interaction with an existing network of documents, and what this may mean for both users and content creators.

Semantic Web: from vision to reality


In 1999, Jakob Nielsen wrote about the emerging crisis: the network was growing at an incredible speed, and he predicted that without increased attention to the principles of the user interface, it would become a useless dump of documents. Almost 10 years have passed since then, and the network is experiencing a new round of its development. There is a data network or semantic web, foreseen for more than a decade, and the result of many years of work on the technologies underlying it. Although we can consider them to be different concepts, the data network is rather another step in the development of the web that we know, rather than something completely different from the existing network of hypertext documents.

Today, even growth statistics are not given in terms of pages or sites. Instead, they talk about the number of triplets placed on the data network using the Resource Description Framework (RDF) model and the number of links created by triplets between different data sets.

RDF is a W3C specification of statements about entities in a machine-readable form. Each of these statements consists of three parts: the subject, the predicate, and the object, and is therefore called a triplet. In most cases, the subject in a triplet is a uniform resource identifier (URI), which allows you to identify everything that the creator of the data wants: a person, a place, a document on a network, an abstract concept - in general, everything. The predicate determines the nature of the relationship between the subject and the object, is taken from the dictionaries placed on the network and is identified by the URI. An RDF triplet object is usually a string literal or another URI. If an object is a URI from another namespace, that is, it defines something from another data set, then the RDF triplet creates a link between these sets, linking the isolated data islands to a giant distributed repository built on the basis of the Internet architecture. This is the real data network.

When participants in the original Linking Open Data project last tried to calculate the current size of a data network, their discreet estimates showed that the data sets in the network contain more than two billion RDF triplets, three million of which are links between sets. The growth rate of this network is so great that any future estimates seem to be out of date at the time of its publication.

Another additional feature of RDF: you can combine triplets contained in any number of documents distributed over the network. Source documents can be painlessly combined without need for the resulting graph to conform to any particular scheme. One consequence of this is significantly less headache associated with the integration of heterogeneous data.

Throw out your homepage!


In the network of documents, individuals and organizations often devote a lot of attention to designing visually appealing websites that create the right impression on their target audience. But if RDF allows you to combine data from multiple sources to create a consistent image of some entity, how will this affect how we place the data on the network? This will lead to the fact that web pages in the form in which we are accustomed to perceive them now, simply disappear.

The developers of Web 2.0 mashups have been demonstrating this for some time, combining data from several different sources to present them in a new form, which none of the original sources can do in itself. The data network is a logical extension that allows developers to create links between data sources represented on the network, so that others can use them to create large-scale specialized machines, and at the same time facilitate the integration of heterogeneous data.

Documents will always be useful data repositories, but in many cases, I believe, this will limit their role. On the semantic web, you cannot control how the information you post will be presented — it's just data. As for visual design, RDF is a continuation of the long-standing principle of separating content from presentation. This may be alarming for some content creators - how can you maintain a brand with less control over the presentation? For others, it is possible to get rid of concerns about appearance, focusing on the placement of relevant, high-quality data, giving everyone the opportunity to create such an idea, what he wants, and not be content with what he has prepared for someone else.

At the data level, their creator may have some influence on where his data is referenced, mainly by independently creating these links and placing them for use by others. However, in a data network, no one can control the sources with which his data is associated with any degree of confidence. As a result, it becomes possible to reuse data, and this is just what you need! As already described, the data placed on the network in a reusable form allows you to create new views, the value of which is higher than the simple sum of their parts, which the creators of the original data could not have anticipated.

It is for these reasons that I propose to abandon the home pages. Researchers are well aware of the difficulties of connecting all pieces of their professional activities into a single whole: projects, documents, participation in committees and editorial boards, blog entries and photo albums scattered on isolated islands in the network, possibly copied to their personal website or connected via hypertext and perhaps not, given the difficulties involved.

A home page on a data network can take many forms. In the simplest case, it may simply be a collection of RDF triplets that link together, scattered in different places, the data we want to present. To collect this data into a single representation suitable for human use is the work of the machine.

In order not to be unfounded, the next time I print my business card, I will write on it not the address of my homepage, but my URI, being sure that a person with a browser, semantic or not, will be able to view this URI and find what the network knows about me.

What should a semantic browser look like?


Developing the ideas described, one can see that the document in which some RDF-graph is placed, first of all indicates the source of its origin, and does not act as a rigidly specified package for these data.

Where the entities described in them are more important than the documents themselves: people, places and concepts. Here I use the term “data network”, but in fact I use it as an abbreviation of “entity data network,” arbitrary entities. Perhaps, we can not get a car via HTTP, but we can identify it using the HTTP URI and use the network to get the description of the car in the form of RDF.

Data network browsers should operate at the entity level. Creating simple browsers to display RDF triplets and documents containing them is one of the options for people to interact with this information space. We have seen a similar approach in the early browsers of the semantic web, but they probably miss the point. Viewing one page at a time, which we are familiar with from the existing network of documents, negates the potential of a generalized presentation of data collected from many places.

Thus, semantic web browsers are not intended to simply display a low-level view of data. Instead, they must treat entities (in the widest sense) as basic interface elements. The entity in question should be the center of attention, while the browser collects and organizes information related to it in a transparent way for the user.

We see hints of a similar approach in such semantic browsers as Tabulator and DBpedia Mobile, where the entity in question is the focus of attention, and specific documents only provide pieces of data that together make up the full picture. Despite this movement in the right direction, we still have much to strive for.

Habitual browsers, in general, did not succeed in conveying the original vision of the network as a medium for reading and writing. Despite the fact that such an approach is generally gradually implemented through, for example, blogs, wikis and special services with tag support, such as Flickr , there remains a considerable degree of indirection when it comes to editing network documents. In some cases, the process still involves launching an HTML editor, making the necessary edits and using another application (like an FTP client) to post the modified document.

Browsers for the semantic web, which I prefer to call "entity browsers", have a chance to provide much more possibilities for direct processing in their interfaces. Different types of objects imply different types of actions, and knowing the type of object that the user focuses on will allow browsers to provide a set of actions specifically for this object, and perhaps even adapt them to fit the context.

For example, if a user is currently viewing information about a person, the browser may allow him to send a message to that person, share an object with him, or make an appointment without the need for that person to explicitly indicate the ability to perform any of these functions. Instead, the semantic web as a whole can provide all the necessary knowledge and capabilities to perform these functions, for example, a definition describing “making an appointment” as an action that can be performed on an entity like “person”, or determining what a meeting consists of, or meeting point assumptions based on the relationship between the participants and the time of the day.

Obviously, the data network does not allow to operate with real things, such as cars or dogs, which are not and will never be online. However, on the data network, we can explicitly refer to anything, not just documents. This is a great potential for reducing the level of indirection in network interfaces. We can no longer link to web pages about any entities, we can link to these entities themselves.

In case there are doubts: all this is not some kind of fleeting fashion, but a direction, the realization of which will take years and can take various forms. Speaking at the World Wide Web conference in 2007, Bill Buxton from Microsoft Research stated that “The variety of web browsers will be the same as the ink browsers in the near future in terms of differences in shape, function, location, and importance. " I didn’t have the impression that Buxton was thinking about the data network when he made this statement, but it nevertheless seems plausible. A real network of entities will require a similar variety of interfaces through which we will use it. The browser is just one of the approaches.

Back button for semantic web?


Accepting the transition from documents to entities and from predefined representations to those that are being created will dynamically require not only completely new interfaces, but also some changes in the elements of interaction with which we are already familiar. If browsing becomes not just a transition from one document to another via links, but also uses a generalized view of data collected from various sources, the concept of the “back” button in the interface will have a slightly different meaning. The browser, rather, should move the user not to the previous document, but to the previous entity under consideration. More importantly, the “undo change” button that you could see in word processors can be critical in an environment where a huge amount of data can be collected with minimal effort, but not all of them are suitable for the current task.

The range of potential sources providing data on some entity will be enormous. Imagine that you entered the URI of London in the address bar of your browser of the semantic network. All information available on the network about London cannot be placed in one interface. The user must decide which sources to add depending on the current task or context, or allow the browser to make this decision for him with the option to cancel adding certain sources. This functionality becomes even more important if the automatic reasoning performed on semantic data in the network creates new knowledge that previously did not exist in an explicit form in any of the individual sources.

Managing a set of data sources becomes a pressing issue. When several colleagues and I evaluated the demonstration of various technologies of the semantic web to the delegates of the European Conference on the Semantic Web (European Semantic Web Conference) in 2006, one of the main issues was “integrity”. Various applications for the semantic web were presented to delegates. They expected the data to be combined and presented as a whole. For various reasons (described in other publications) this was impossible, which disappointed the delegates, leaving not the best impressions.

The key to the development of data browsers is search services like Sindice, which provide a way to find other RDF documents on the semantic network that mention some essence. Services of this kind can help to make sure that the data received by the user is complete, that is, that they include everything that the user expects. But there is still the question of checking whether a particular data view is useful.

Any system designed to integrate heterogeneous data in real time and present the result to the user will have to use complex models of relevance, quality and reliability, taking into account the user's current task and its context. How this can be achieved is a matter of the future.

IEEE Internet Computing

Original (English): How Do We Interact with Web of Data?

Translation: daeq , dulanov , vvvolf , jupy (the choice of the article for translation and the translation itself was made within the distribution group webofdata.ru ).

License: www.ieee.org/web/publications/rights/privacy.html

translated.by translated by the crowd

Source: https://habr.com/ru/post/47093/


All Articles