HTML 5 semantics

I'm going to make a bold prediction. Long after you and me, HTML will be around. Not only in the billions of archival pages of our era, but as living respiratory organs. Too much power, energy and investment went into the development of web-tools, protocols and platforms, so that all this was easily thrown.

Let's stop to consider our responsibility. Unfortunately, in history, we are associated with the development of an important tool of our civilization, which will be used to communicate for decades. And so when we direct our minds, idly or seriously, to improve HTML, we need to understand how far-reaching the consequences of our decisions can be.

HTML 5, the W3C recently redoubled its efforts to form a new generation of HTML, gaining significant momentum over the past year or so. This is a huge project that covers not only the structure of HTML, but also model analysis, error handling models, DOM, algorithms for resource extraction, media quotes, 2D graphics, data templates, security models, page load models, client-side data storage and much more.

There are also changes in the structure, syntax and semantics of HTML, some of which are described by Lachlan Hunt in the article " Review of HTML 5 " ( translated into Habré ).
')
But in this article, let's consider only the semantics of HTML. This is something that I have been interested in for many years and I believe that this is very important for the future of HTML.

The BBC recently announced that they will reduce the proportion of hCalendar microformat in their TV program in favor of the accessibility and convenience of the abbr design pattern . This indicates that we, beyond any doubt, have pushed the semantic possibilities of HTML far beyond the limits that were ever intended, and indeed it is possible for the language. We simply exhausted the elements and attributes of HTML that can enhance the semantics of the document. If we continue to be cunning with existing HTML constructs, more and more such problems will arise. Because HTML suffers from a fundamental flaw, as a semantic markup language - its semantics is fixed and not extensible.

This is not just a theoretical problem. Hundreds of thousands of developers use class and id to create more semantic markup (they also use them as “hooks” for CSS styles, but that is another matter). Almost always, these developers use special dictionaries, the values of which they themselves constitute, rather than the values of existing schemes. This is pseudo-semantic markup - at best.

Many pages around the Internet use microformats to add more structured semantics than they do with the impoverished set of HTML elements and attributes . In this case, the values used for the class attribute are consistent with dictionaries, sometimes taken from other standards, such as vCard , sometimes from newly created dictionaries where there is no hard standard (as is the case with hReview ).

Extensible Semantics

There is a very serious problem that needs to be solved here. We need mechanisms in HTML that clearly and unambiguously allow developers to add more expressive semantics, rather than pseudo-semantics, in their markup. This is perhaps the most urgent task for HTML 5 projects.

But it’s not so easy to come up with a mechanism for creating more semantics in HTML content: There are significant limitations on any solution. Perhaps the biggest of these is backward compatibility. The solution cannot disrupt the hundreds of millions of viewing devices used today that will be used in the coming years. Any solution that is incompatible will not be widely accepted by developers for fear of losing readers. It will quickly dry up on the vine.

The solution should also be forward-compatible. Not in the sense that it should work in future browsers is the task of browser developers, but it should be extensible . We cannot expect any single solution that we will now develop to solve all imaginable and unimaginable needs of semantics in the future. We can develop solutions that can be expanded to meet future needs as they arise.

These difficulties, in aggregate, present a huge problem. But in the context of a language, the main iterations of which take place in ten-year intervals and the importance of which, as a global communication platform, is of paramount importance, this is a problem that needs to be solved.

So how does HTML 5 solve this issue? HTML 5 introduces a number of new elements. Some I called “structural” - section, nav, aside, header, and footer. The dialog element is similar in type and content to blockquote. There are also a number of data elements, such as meter , which is a “scalar measurement within a known range or a fractional value, such as disk usage”; and the time element {http://www.w3.org/html/wg/html5/#the-time}, which is the date and / or time.

Although these elements may be useful and, as it turned out, they have aroused some interest in whether they can really solve this problem, we will define bottom-up compatibility and backward compatibility.

Consider every obstacle

backward compatibility

How do modern browsers handle these new elements, such as section? Well, the latest versions of Safari, Opera, Mozilla and even IE7 do everything on the page as follows.

< h1 > Top Level Heading </ h1 > < section > < h1 > Second Level Heading </ h1 > < p > this is text in a section element </ p > < section > < h1 > Third Level Heading </ h1 > </ section > </ section > * This source code was highlighted with Source Code Highlighter .

In the beginning it looks beautiful. But when we try to set CSS styles, for example, for a section element that looks like this:

  section {color: red}

... Most of the mentioned browsers succeed, but IE7 (and even more so 6) is not.

Therefore, we have a backward compatibility problem with 75% of the browsers currently in use. Considering the half-life of Internet Explorer, we can predict that most users will use IE6 and IE7, even after a few years.

If HTML 5 introduces new elements, what is the likelihood that they will be used by the vast majority of developers — given that they are not compatible with most of the browsers used?

Let's turn to upward compatibility, this is the next problem.

Upward compatibility

First, we put the question: "Why do we invent these new elements?". A reasonable answer would be: "Because there is not enough semantics in HTML, and adding these elements we will increase the semantics of HTML, which cannot be bad, or can it?".

By adding these elements, we consider the need to increase the potential of HTML semantics, but only within a narrow sphere. No matter how many elements we introduce, we will always think about adding more HTML semantics. And adding as many elements as we want, we will not solve the problem. We do not need to add certain terms to the HTML dictionary, we must add a mechanism that allows us to extend the semantics of the document as necessary. In technical terms, we need to make HTML extensible. HTML 5 does not offer an extensibility mechanism.

Thus, HTML 5 performs a function that will kill a significant percentage of modern browsers and does not allow you to add language semantics at all.

Stay with a few questions about new items. Where did the names of the new elements come from? How was it decided that the navigation element should be called “nav”? Why are page-level, site-level and meta-site-level used in navigation?

Why not adopt an existing dictionary, such as DocBook ? His vocabulary document structure is richer; it has been developed through expert publications over the years. This is not an argument in favor of DocBook, but the fact is that the extremely important task of preparing the mechanism for providing HTML semantics goes the way, paying little attention to the practice of work that began more than 30 years ago. (The original work on GML began in the early 1970s)

Some solution ideas

And so, while the current efforts are extremely important, I have some practical recommendations on how to solve this problem. Well, I started with one.

If adding new elements is not discussed, at least in this discussion, attributes are another logical area of HTML, concentrate on it. In the end, we have been using the class and id attributes for nearly ten years as mechanisms for extending the HTML semantics. Many developers are already familiar with this and feel comfortable. The microformats project showed that the existing attributes are not enough to use them as a mechanism to extend the HTML semantics. So if we want to use attributes to solve a problem, we need to introduce one or more new attributes. Before moving on to mechanics, how this can work, it is fair to subject this sentence to the same requirements as new elements in HTML 5. The most important thing in introducing new attributes is whether backward compatibility is HTML. If so, does this provide a workable mechanism for extending semantics in HTML?

Let's invent a new attribute. Let's call it “structure”, but the name is not important. We can use it like this:

  <div structure = "header">

Let's see how our browsers appreciate this.

Of course, all our browsers will handle the next CSS element.

  div {color: red}

How about this:

  div [structure] {font-weight: bold}

In fact, almost all browsers, including IE7, will handle the div style with the structure attribute, even if there is no such attribute. Unfortunately, our happiness is disappearing, because IE6 is not. But we can use this attribute in HTML and all existing browsers recognize it. We can even use CSS styles for our HTML, using the attribute in all modern browsers. And if we want to bypass the old browsers, we can add a class with the style value. Compared to the HTML 5 solution, which adds new elements that do not work in Internet Explorer 6 or 7, we see that this is definitely a more backward compatible solution.

Extensibility through attributes

Instead of new elements, HTML 5 should adopt a number of new attributes. Each of these attributes will fall into a category or type of semantics. For example, as I have outlined in detail in another article , HTML includes: structural semantics, rhetorical semantics, role semantics (adopted from XHTML), and other classes and categories of semantics.

These new attributes can be used as a class: attribute to impart semantics to an element, describe the character of an element, or for element metadata.

This is no different from attribute roles in XHTML , where we have one attribute for all elements of semantics, we must define different types of element semantics and separate them.

For example, the XHTML role attribute works as follows:

< ul role ="navigation sitemap" > < li href ="downloads" > Downloads </ li > < li href ="docs" > Documentation </ li > < li href ="news" > News </ li > </ ul > * This source code was highlighted with Source Code Highlighter .

The value of the role attribute is a divided list space of words defined by a standard dictionary or a given dictionary.

Why not accept the role attribute as it is? After all, there are other types of semantics for which the definition of the role is not applicable. For example:

  <p rhetoric = "irony"> He's a fantastic person. </ p>

This demonstrates the theoretical type of semantics - “rhetorical”, which can be used to mark up a rhetorical document. This element clearly does not play the role of irony in the document. On the contrary, it contains elements of irony.

Here is another example. It is becoming more and more obvious that HTML lacks a representation of a machine-readable meaning that is understandable to a person, such as a date. This underlies the problem of the BBC with the hCalendar microformat; we talked about it earlier. Although May Day next year really does not make sense, but by analogy May Day next year will be.

Again, when we use the specific term “equivalent” as an attribute or some other term for such semantics, this is not a problem. It is important to note that this is not as simple as using the class attribute or the role, where a whole set of information semantics elements fit into one element. For a properly extensible solution that provides backward compatibility and sufficient flexibility, it is worth exploring in this direction.

I called this section “Some Solution Ideas”, since a significant amount of work needs to be done in order to create a truly workable solution. Open questions include the following.

how many different semantic attributes should be. Will these categories be expandable, if so, how?
How to define a dictionary?
Are we simply inventing the terms we want, in almost the same way that the developers used the value of the class, or should the possible values be defined by a standardized specification?
If we have a conflict, between two dictionaries, for example, two identical terms, give definitions of two different dictionaries, how to solve this?
Do you need a namespace or is there another mechanism?

Instead of rushing to answer these questions, I brought to light the questions that need to be resolved and a dialogue started. The ramification and scope of the decisions made in HTML 5 is too large to make these decisions; it is necessary to introduce awareness of linguistics, semantics, semiotics and related fields.

I hope it is clear that simply introducing new elements into HTML is not a solution to the problem of expanding semantics in HTML.

Let's not rush with an easy decision - with the change of “climate” all this will burden our grandchildren with a problem, as now. At the very least, let's leave them as good HTML as possible.

Source: https://habr.com/ru/post/49734/

All Articles