WRIO Internet OS. Architecture: Linked Data and JSON-LD

According to the survey in the “ WRIO Internet OS. Introduction, ”this is the first post in a series designed to uncover technical details. Information will be of interest to developers who wish to use the following technologies in their projects: JSON-LD, blockchain, Node.js and React. At the end of the post you will find a survey that will allow us to find out what the next topic would be useful and interesting for the habrosoobschestva.

Introductory video about the project:
www.youtube.com/watch?v=JUiMijJ6tEg English version
www.youtube.com/watch?v=DxA6t2kax_k Russian version

Today's topic: Linked Data and JSON-LD. Using our experience we will tell you why this format is interesting and what advantages it provides.

')

Linked Data, JSON-LD

What is Linked Data and JSON-LD? On Habré, this issue has already been discussed many times, but we will try to give some additional details and ways to apply it.

Wikipedia tells us that:

Linked Data is a collection of interconnected datasets on the world wide web. This term can also be understood as a description of the methods of publishing related structured data. These methods are based on web standards: HTTP, RDF and URIs and allow you to distribute information in a machine-readable form. This makes it possible to work with data from different sources, including building queries.

Building queries opens the way for another feature - the consistency of data, which we describe below.

In her article Design Issues: Linked Data, Sir Timothy John Berners-Lee outlines four basic principles of Linked Data:

use URIs to define entities
use HTTP URIs in such a way that these entities can be referenced so that they can be found by a human and software client
provide useful information about an entity, provided that its URI is dereferenced using standards such as RDF and SPARQL
when publishing data on the web, include in the description links to other entities (if there are relationships) using the URIs of these entities

The Web enables us to link related documents. Similarly it enables us to link related data. The term Linked Data refers to a set of best practices for publishing and connecting data on the Web. Linked Data are URIs, HTTP (a generic means for identifying resources), and RDF (a generic graph-based data model). things in the world).
Contributor: Tom Heath, including excerpts from Bizer, Heath and Berners-Lee (2009) (PDF)

His idea can be reduced to one sentence: Linked Data is not just separate files, they are a set of pages linked into a single semantic network.

If you are a web developer, you probably have repeatedly pointed out that due to data mixing and presentation, if you need to process information from a third-party site, you need to study the API or develop a parser. Also, today's web pages are easily readable for humans, but not for machines and search services like Google. Therefore, they have to parse the page and try to analyze the information received. In order to achieve higher positions in search results, some services separately prepare and give the search robots data in a form that is understandable to them. This is usually JSON and in this case parsing is not needed. However, this approach requires additional efforts and financial costs in the development and maintenance of the site. The idea of WRIO OS is to ensure that the same site structure is suitable for both users and machines. As a result, we keep the data clean: with minimal HTML markup and without the usual additional elements necessary for rendering the page and displaying it to the user - this functionality is implemented separately using the WRIO OS, see below.

To understand how search robots “see” JSON-LD pages, we prepared examples with a JSON-LD content renderer :
https://wrioos.com/jsonld-vis/view/?https://webrunes.com
https://wrioos.com/jsonld-vis/view/?https://aa.wr.io/ru/ru/

Each such page - atomic data - represents the entire set of data about a single, indivisible entity: person, organization, book, film, song, etc. and built on one of the schemes in JSON-LD format. The format is textual, simple and in many respects similar to JSON, provides free decentralized processing / reading of pages and is great for creating all sorts of services using a shared pool of related data. That is, you can easily read and use files lying on third-party servers as if they were on yours (open data, see below).

In addition, using static JSON-LD files instead of a database provides many other benefits:

There is no need to install and maintain the database. The link files between themselves through links in the form of mentions
content consistency (see below)
contains only data, therefore has a minimum size. In addition, JSON-LD is a text, which means it is perfectly compressed, which further reduces traffic.
text format = security. Allows you to ignore all scripts except the script execution node (see the section “How it works in practice”)
end-to-end can easily be encrypted, and therefore provide protection against wiretapping and surveillance
text format allows you to create P2P internet by transferring files via DHT
caching in the browser. The system is able to work in offline mode first.
decentralization. Anyone can join the JSON-LD pool extension simply by adding new files and linking them to existing ones.
support and distribution of static data is simple, cheap, reliable - in case of failure, only part of the data on this server can be lost
the data is perfectly understood by search engines, which increases the page position and can be displayed in the search results in the form of rich snippets
and the main feature - opens the door to the semantic web . No need for SEO anymore
the semantics makes it possible to create new types of services, to provide new opportunities for recommender systems, personalized advertising, etc.

Disadvantages removes one of our projects - Taglang, which will provide additional functionality for working with JSON-LD peculiar to databases. We will tell about it separately in case of interest of the habrasoobshchestvo. The project is developed on the basis of blockchain and is a publicly accessible decentralized database of links to JSON-LD with tags and metadata.

The logical continuation of Linked Data is open data - a concept that reflects the idea that the data should be freely available for machine-readable use and further republishing without restrictions of copyright, patents and other control mechanisms ( Wikipedia , En). In turn, open data is closely related to the concept of Open Content (En) and is a necessary element for the development of a free society outside the framework of an outdated copyright; open the way to the new paradigm of Open Copyright - another of our projects, to which, over time, the turn of the story will reach.

Why is open data important? Tim Berners-Lee answered this question as follows:

One of the big social networks, one for socialblogging. We don’t have a social problem. ”The problem is not google, or amazon, or facebook. Follow-up winner-take-all distribution. It is a structural distribution independent of the type of service. Making a new service will not change that.

Free translation: The problem is in the dominance of one search engine, one social network, one Twitter for microblogging. The problem is not technical, but social and is not related to Google, Amazon or Facebook. The problem is that users, giving their data to centralized services, are amazed at the massive effect - “the winner gets everything”. This is a structural problem of data distribution and the creation of the next service will not save the situation.

In other words, the solution is to create a free data pool, which is not owned by one service, but is generally accessible to all and is kept decentralized, which will be provided by interconnected files spread across the web in JSON-LD format.

Now back to the issue of consistency. Remember, at the beginning of the article, we mentioned it in the context of query building? We will consider technical details in one of the following posts, if there is interest from the users, while we only describe the problem that data consistency (or content consistency) solves.

Today, if you want to add text from one page to another, in most cases you can only do this with copy / paste. Naturally, in the case of a change in the original, your insertion will become obsolete, since it is a piece of text that begins to live its life separately from the source at the time of creation. Also, in the case of a simple copy of the text, the copyright of the forever unfortunate legal authorities may be violated. The consistency of the data solves the first problem - copy / paste connectivity with the original - it is enough to indicate the boundaries of the beginning and end of the quotation from the desired file and instead of the copy in the test, the necessary part of the test from the original file will be displayed. Graphically, this can be represented as:

It should be noted that a complete download of the quoted JSON-LD is necessary for insertion, but even large articles are only tens of kb and this can hardly be called a significant problem.
The bottom line: data consistency solves the problem of inserting information from one source into another, a sort of global OEmbed (En) without the efforts of the authors.

Open Content, about which we spoke earlier, solves the problem with copyright: in our case, the author will receive a portion of donations, and therefore is interested in using his text on other pages. We will tell you more about the Open Copyright post, but even without this today, many people are willing to invest their strength, knowledge and time in creating freely available information, which is confirmed by Wikipedia’s experience.

Summarizing everything indicated above: instead of separate sites with closed information, we get a common pool of data, in the development, replenishment and processing of which anyone can participate; and the ability to write queries on the command line without the need for an API opens up bright prospects for creating new types of interrelated services. The presentation is not mixed with data, and they themselves lie in an open, structured and widely used text format that is easy to store, process, encrypt and transmit.

If you want to learn more about Linked Data and JSON-LD, then especially for you we have selected links on the topic (En):
http://json-ld.org/ Data is messy and disconnected. JSON-LD organizes and connects it, creating a better Web.
http://linkeddata.org/ and http://linkeddata.org/guides-and-tutorials
https://developers.google.com/search/docs/guides/intro-structured-data
https://www.youtube.com/watch?v=4x_xzT5eF5Q What is Linked Data?
https://www.youtube.com/watch?v=vioCbTo3C-4 What is JSON-LD?
https://developers.google.com/structured-data/testing-tool/ check JSON-LD
http://www.markus-lanthaler.com/research/on-using-json-ld-to-create-evolvable-restful-services.pdf
SEO:
http://www.seoskeptic.com/json-ld-google-knowledge-graph-schema-org-seo/
http://manu.sporny.org/2013/json-ld-google-search/
https://ignitevisibility.com/everything-to-know-about-json-ld-for-seo/

How it works in practice

To create a page in JSON-LD format, you can use our service (in development, it can sometimes go off-line and at the moment only basic functionality), or use any text editor. Our pages still contain a lot of empty fields, which is processed by the Google testing service as an error, which will be corrected with time. Nevertheless, today you can use the https://webrunes.com/ru as an example. His repository is here https://github.com/webRunes/webrunes.com-WRIO-Hub . Edited JSON-LD can be saved on any server: Github, Dropbox, Google Drive, etc. and get your own hub.

HTML with JSON-LD inside. Black arrows are mentions links that often lead to other pages on third-party servers.
each file (1) contains a link (2) of the form
```
<body><script type="text/javascript" src="https://wrioos.com/start.js"></script></body> 
```
which leads to one of the WRIO OS nodes (anyone can pick up their own). This link is only needed for new users, otherwise it is ignored and the node selected by the user is used.
start.js processes JSON-LD and dynamically generates a page (3) based on the selected theme - so far we have only one https://github.com/webRunes/Default-WRIO-Theme
In addition, the script adds additional functionality based on separate Node.js applications (microservices): authorization, commenting, payment, and so on. Each such application can be viewed as a plugin. The user will be able to build their own set of applications.

If you have any questions, please leave them in the comments or email us info@webrunes.com and we will answer.

Thank you for your time and attention.

Source: https://habr.com/ru/post/304298/

All Articles

WRIO Internet OS. Architecture: Linked Data and JSON-LD

Linked Data, JSON-LD

How it works in practice

More articles: