Embedding semantic data in HTML

I also want to take part in the reflections on the semantic web, started here and here .

I have spent a certain amount of time researching the principles and trends of the semantic web, I want to share the main results and thoughts.

Why do I need it?

The answer is very simple - the need to separate the wheat from the chaff , i.e. “Information” from “information noise”.
')
How can this qualitatively affect the web:

If you enter into the search engine a query containing the name of a certain topic or news, you can see that 80% of the results are the same text “embedded” in the graphical interface of a particular resource
focus on information, not on banners, lists of links, friends of friends, etc.
more accurate search by taking into account only relevant content
your option?

What do we have at the moment?

If the need and advantages of the “semantic web” are more and less clear, then here are some implementation concerns.

At the moment, we operate with such concepts as URI (Uniform Resource Identifier), ontologies, which are described by such languages as RDF and OWL, etc.

To be honest, my attempts to deal with these languages and the methods of their use failed - they are difficult to understand, ambiguous and need to be improved. The search for some working and understandable tools also did not get married success. As for me, this is the main stop factor in the development of this direction.

We also have such a concept as microformats, which, as it seems to me, have further advanced in their ideological development, but, unfortunately, not far enough.

From what I have seen, the development of OpenCalais , which allows you to extract some semantic information from texts and web resources, deserves attention. Their service allows you to determine to which category of knowledge (technology, education, politics, etc.) a text belongs, extract terms and get some other similar information. Despite the seeming beauty of everything that is happening, it is still early to seriously use this service.

Manual labor or automation?

The second stop factor is that semantic data must be entered independently, which raises questions about who will do it and who will pay for it.

My opinion is this: automation can help, but one cannot fully rely on it for the simple reason that the issues of understanding and logical connections between concepts are a subjective assessment that cannot be formalized at this stage of development.

Setting tasks and their solutions

So, when creating a website, we draw a unique design, adapt it to all known search engines and web standards, weaver for different browsers, select container owners, promotion specialists and pay them money and, most importantly, everyone considers it normal . So why the semantic component can not be part of this process?

From the point of view of the author of the site, it makes no sense to engage in semantics, because:

this requires additional labor costs (this is not so bad)
this requires learning new standards and languages (the same RDF and OWL)
lack or weak support of semantics by search engines

If the first item is more a matter of money and often completely solved, the third one depends on the search leaders, then we will try to do something with the second item.

Integration of semantic data

After analyzing (and having a little imagination) complex and not very possible methods for integrating semantic data, I settled on a simple and obvious method: integration in the form of tags and (or) in CSS notation .

Example:

<div id=”content” xml:semantic=”keywords: mathcad; contentType: content; category: math;”> Mathcad is desktop software for performing and documenting engineering and scientific calculations.

In order for the code to be valid, we add the scheme:

<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
[
<!ATTLIST div keywords CDATA #IMPLIED>
<!ATTLIST div contentType (copyright|content|definition|links|references| bibliographic|related|image) CDATA #IMPLIED>
<!ATTLIST div category (Business_Finance|Entertainment_Culture| Environment|Health_Medical_Pharma|Hospitality_Recreation|Law_Crime| Politics|Sports|Technology_Internet|Weather|Other) CDATA #IMPLIED>
...
<!ATTLIST div progLang CDATA #IMPLIED>
]>

In CSS notation:

<div id=”content” xml:semanticClass=”mySemanticClass”> Mathcad is desktop software for performing and documenting engineering and scientific calculations.

.mySemanticClass {
keywords: mathcad;
contentType: content;
category: math;
}

We connect the semantic file to our HTML file just like a simple CSS file.

Attributes and categories

In my work, I highlighted the main attributes that I would like to have now. Here is a list of them:

contentType defines content type (top, bottom, advertisement, content, links, references, bibliographic, related, image, video etc.);
phrases for block content; synonyms defines related terms and synonyms (eg “Obama” and “president”;
category defines content category (Business_Finance, Entertainment_Culture, Environment, Health_Medical_Pharma, Hospitality_Recreation, Law_Crime, Politics, Sports, Technology_Internet, Weather, Other) [6].
float value from 0 to 1;
reference attribute;
it will be considered a parent block;
authoring copyright code, can be used for citations, proverbs, programming code;
progLang defines a programming language.

Consider the advantages of this approach:

you can integrate semantic data immediately during the creation of HTML
this can be done by both the coder and the programmer who is familiar with CSS (and there are more of them than RDF experts)

Well, yes, dreams, dreams ...

But the future is already here!

This approach has one obvious drawback - the need to enlist the support of search giants who will use this approach when indexing pages. But this idea can already be embedded in CMS, bloggers - for this you need to embed the appropriate code in the engine and some additional fields to enter and use this information in your own logic of searching and filtering data.

PS As someone well noticed, they do not beat for the idea. Therefore, it would be interesting to discuss such a development option semantic web. Thanks for attention!

Source: https://habr.com/ru/post/79399/

All Articles