Bulletproof HTML: 37 steps to perfect markup

Note: this is not a complete translation of the article. I chose only the most interesting points for myself. The article does not pretend to be fresh, but perhaps even people familiar with the layout will find something interesting for themselves. The article touches upon some aspects of layout semantics with concrete examples.

There are many letters under the cut! So as not to get lost in their abundance, all items are highlighted with headings.

2. What HTML versions are there?

The first version of HTML (1989) did not have a version number; it was just "html". The first standardized HTML version released by the Internet Engineering Task Force (IETF) in 1995 was called HTML 2.0.
')
Later the WWW Consortium (W3C) was formed. He presented his first standardized version in 1997: HTML 3.2. Its receiver, HTML 4.0, was released in 1998, and was quickly changed to HTML 4.01 in 1999. This is the latest actual version of HTML. The W3C has announced that it will no longer release subsequent versions of HTML. HTML 4.01 is recommended for creating HTML documents.

Despite this, the Web Hypertext Application Technology Working Group (WHATWG) is working on so-called HTML5 , in the hope that it will eventually become recommended by the W3C.

5. What does the DOCTYPE declaration do?

A DOCTYPE declaration, which must precede any markup in a document, usually looks like:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

It specifies the type of the root element of the document (<html>), the public identifier, and the system identifier.

The public identifier ( - // W3C // DTD HTML 4.01 // EN ) indicates who used the document type declaration, or DTD, ( W3C ); DTD name ( DTD HTML 4.01 ); and the language in which the DTD is written ( EN , i.e. English). Note that the DTD does not indicate the language of the web page itself; there is the language in which the DTD is written.

The system identifier ( www.w3.org/TR/html4/strict.dtd ) is the path to the DTD in use.

The DOCTYPE declaration indicates to the validator (a program for checking the syntactic "correctness" of a web page) that a DTD should be checked against a document. Browsers do not have to worry about whether the document is declared DOCTYPE, but modern browsers use this declaration to determine whether the page is “modern” (the browser will process the page according to the specification) or written “to the old style” (the browser will process the page taking into account the bugs of old browsers). DOCTYPE affects how the page is rendered in Internet Explorer, Opera, Firefox (and other Mozilla-like browsers), Safari, and most other modern browsers. The full DOCTYPE declaration — including the system identifier — tells the browser that this is a modern document. If there is no system identifier, or there is no DOCTYPE declaration at all, the browser will consider that it is a page written “in the old manner” and will process it in “quirks mode” (quirk mode) .

7. What is the difference between Strict, Transitional and Frameset DTD's?

The difference between these DTDs is what elements and attributes they declare and how they allow (obligate) the nesting of elements to be observed.

HTML 4.01 Strict DTD - emphasizes the separation of content from presentation and behavior. This DTD is recommended by the W3C for all new documents.
HTML 4.01 Transitional DTD - is a kind of intermediate link in the transition from the "old" (old-scool'nyh, up to HTML) markup to modern. It is not recommended to use when writing new documents. It contains 11 presentation elements (note of the translator: not carrying a semantic load, but used solely to change the appearance; for example, the  element) and a complete set of presentation attributes that are canceled in the Strict DTD. Transitional DTD is often necessary for pages located inside frames, because it has a target attribute that is required to open a link in another frame.
HTML 4.01 Frameset DTD - used for frame-based pages. W3 does not recommend using frames. For modern sites, a better solution would be to use server-side applications for solving such problems.

8. Which DOCTYPE to choose?

If we are creating a new page, W3C recommends using HTML 4.01 Strict (note of the translator: of course, we all know that it's better to use XHTML 1.0 Strict) .

If we are going to translate old HTML 2.0 or HTML 3.2 documents, until we have translated the entire presentation into CSS, and the elements responsible for the behavior in JavaScript, we can use HTML 4.01 Transitional.

11. Why does the validator swear on the <embed> tag?

<embed> has never been part of the HTML specification. This is not a standardized element, which, although supported by most browsers, is not part of HTML.

During the “browser wars” in the late 90s, browser manufacturers such as Microsoft and Netscape competed, who would come up with more “cool” features for styling and styling HTML pages. The problem was that these features were not standardized and, in most cases, were not cross-browser.

There are other elements that are widely used (for example, marquee ), but have never been included in the specification. If possible, never use them.

Non-standardized attributes were also widely distributed. One example is the marginwidth .

13. What is a BOM?

BOM , or byte order mark (byte sequence mark) - is used in some encodings that use more than 8 bits to encode data (for example, UTF-8 or UTF-16). The processor can use two different schemes for storing large integers: “big-endian” (blunt-ended) and “little-endian” (sharp-pointed). The BOM contains 16 bits, written at the very beginning of the file, which indicate to browsers which scheme is used.

Unfortunately, many older browsers cannot process this information; instead, they display these bits as character data. If you see some strange characters at the top of the page, then this most likely means that the BOM was not processed by the browser (or the encoding was not correctly set).

The only solution is to not use BOM. Editors who can save a document in UTF-8 usually allow you to choose whether to use or not to use the BOM.

14. What encoding should I use?

Note translator: did not begin to translate this item; I think everyone knows that UTF-8 is our everything. UTF-8 should be used. And when saving a document, select UTF-8 without BOM .

16. Why write & amp; instead of &?

Note: The HTML sequences are written with a space after the ampersand, because otherwise the Habra parser does not display them as it should.

Some characters have a special meaning in HTML: < (less), > (more), & (ampersand), " (quotes), ' (apostrophe). Sometimes, when we want to use these icons in plain text, we have to replace them with HTML -sequences.

For the first four characters indicated above, the sequence will look like this:

& lt; (less)
& gt; (more)
& amp; (ampersand)
& quot; (quotes)

XML defines an HTML sequence for an apostrophe ( & apos; ), but HTML does not include this sequence. An apostrophe can only be replaced by a numeric sequence ( & # 39; ). Note translator: for the sake of interest conducted a small experiment. In practice, the & apos; All browsers (FF3, Opera 9, Safari 3, Google Chrome) are interpreted to the apostrophe except IE (all versions).

Because Ampersand is used in all these sequences; it must always be converted to an HTML sequence, including when it is used inside attributes, in particular in the href attribute in links. Unfortunately, an ampersand is very often found in a URI as an argument separator.

In most cases, in HTML, an ampersand not replaced by a sequence breaks nothing (but XHTML is another story). But what if we happen to encounter a query parameter that matches the name of the html sequence ...

21. What to use, or ?

The p element is used to select paragraphs in the text. A paragraph is one or more sentences united by one thought.

Line wrapping ( br ) is mainly used as a presentation tool and should be implemented in CSS rather than HTML. However, there are several situations when a line break can have a semantic meaning, for example, when marking lines in verses and songs, when writing email addresses or when marking up code examples. In these cases, using br is justified, but using br to separate paragraphs is not acceptable.

On the other hand, p has a fairly clear semantic meaning: paragraph markup. Sometimes web developers tend to see p as the main block for use as containers, but this is not true. It is not uncommon to see label and input elements inside p in forms, but I would call it semantically incorrect. Labels and input fields cannot be paragraph content.

23. Should I replace and with and ?

Only if you really want to emphasize something (to emphasize something, select). These tags are not equivalent.

In Past Troubled Times, the authors used b and i to enhance the semantic coloring of words.

In Present No Less Sad Times, authors use strong and em to make text bold or italic .

em means semantic stress, value enhancement. The content to which this type of selection is applied should be amplified while reading out loud (for example, louder or longer). strong means even more stress, but this is often considered redundant (you can use nested em to indicate even more stress). Some experts advise using strong only for certain elements on the page, which should be uniquely highlighted (for example, the “current page” pointer), and should not mark words and phrases inside the main text.

b and i do not have semantic load; they only change the font to bold or italics. They are good for use in generally accepted typographical rules that have not found a semantically suitable HTML element. For example, ship names are traditionally displayed in italics, but in HTML there is no <ship> element. By this we can write Titanic .

27. How to use the element <address>?

address is used to specify contact information on the page. This may be a postal address, telephone number, any other contact information. address is a block element that can only contain text and inline elements. Most browsers display italic by default, but this is easily fixed with CSS.

It is a common misconception that the address can only be used to specify a postal address, but it is not.

28. How to use the element <dfn>?

dfn is used to define the meaning of terms. This typographical agreement, especially common in scientific documents, for italicizing a new term with which the reader may not be familiar when the definition appears in the text for the first time. By default, dfn is displayed in italics.

It is a common misconception that dfn means “abbreviation” and many authors use it also as abbr and acronym (indicating the term with the title attribute). Terms should be marked with dfn in documents only once (when you first use the term and its explanation).

29. How to use the <var> element correctly?

var is used to mark up variables, or replaceable parts of a text. This typographical convention, adopted to italicize variables, which in real life are replaced by other data. For example, in the telephone communications guide, instructions for redirecting an incoming phone call to another extension may look something like this:

<kbd>* 21 * <var> </var> #</kbd>

Here, the var tag is used to indicate an “extension number” (which will be italicized). The one who wants to redirect the call to the additional number 942 will write “ 21 * 942 # ”. So var does not mean that you have to enter “drnlnnnnnnnl , and the fact that instead of the words "additional number" will be numbers.

It is a common misconception that var should be used to specify variables in examples of program code.

31. What is the difference between the <abbr> and <acronym> tags?

No one really can answer this question! Even the HTML specification to some extent contradicts itself.

abbr was a Netscape extension for HTML throughout the “browser war”. acronym was a microsoft extension. Both options mean approximately the same. Both elements were included in the HTML specification with different semantic load. The problem is that no one can really explain what this semantics is.

Let's turn to the dictionary:
Abbreviation is an abbreviated form of a word or phrase.
Acronym - a word formed from the first letters or the first few letters of words in a phrase or several words.

The acronym definition says that this word, i.e. it can be pronounced. Thus, “NATO” is an acronym, since consists of initial letters in the phrase “North Atlantic Treaty Organization”. On the contrary, “FBI” will not be an acronym, in accordance with the definition, because it cannot be pronounced as a whole word, but rather it will sound like “ef-bi-ai”. This is where the mess begins. Technically it is known that “FBI” is initialism (in the original “initialism”) , the definition of which, according to the dictionary, is:

Initialism - 1) a name or term formed from the first letters or from the first few letters of words, which are pronounced as separate words; 2) a group of first letters meaning the name, organization, etc., which are pronounced separately.

The first definition is almost the same as the acronym, and the second is more detached. In spite of this, there is no initialism element in the specification, and the confusion is aggravated by the fact that the word “acronym” is used in simple American speech as a synonym for the word “initialism”.

The HTML specification offers the following definitions:

abbr - indicates an abbreviated form (for example, WWW, HTTP, URI, Mass, etc.).
acronym - indicates an acronym (for example, WAC, radar, etc.).

It seems that the specification refers to dictionary definitions, which means that the “FBI” should be labeled with the abbr tag, since can not be pronounced as a whole word. Despite this, a few paragraphs below the specification says:

Western languages widely use such acronyms as "GmbH", "NATO", and "FBI", as abbreviations such as "M.", "Inc.", "et al.", "Etc."

Are you not confused yet? I am yes. It is safest to always use abbr , since all acronyms are also abbreviations, but not vice versa. Despite this, there is a small problem. Microsoft was so upset by the W3C's decision to use abbr for abbreviations and initialisms instead of acronym , that they refused to support the abbr tag! (But still they have introduced abbr support in Internet Explorer 7.)

So what do poor web developers do? And why should we bother at all? Of course, it's good to have an element to which you can stick the title attribute, but we can do this with a span 'ohm. The bottom line is that marking up acronyms and abbreviations is good for related technologies; in particular for screen readers. But screen readers mostly prefer to ignore the abbr and acronym tags , since no one knows exactly how to use them correctly and Microsoft does not support the abbr tag. This is a double-edged sword.

I do not know the question for this answer! Personally, I use abbr for such obvious abbreviations as "Inc." and for such initialisms as "FBI", and use acronym for abbreviations that can be read as a word, for example "GIF". But according to the specification, I cannot blame anyone for the “FBI” markup as an acronym. And what about the “SQL”, which some spell, and some call “sikuel”.

32. Why are certain features canceled?

The most common feature that newbies are interested in is the target attribute. This attribute is prohibited in HTML 4.01 Strict, but is still supported in HTML 4.01 Transitional. There are many elements and attributes that are allowed in Transitional, but are prohibited in Strict.

The reason W3C cancels some elements and attributes is to want to separate content (HTML), appearance (CSS) and behavior (JavaScript). Making the element appear in the middle is a presentation question; it should be solved by CSS, and not using the center tag. To open a link in a new window is a question of behavior; it should be solved using JavaScript, rather than using the target attribute.

Basically, canceled features - those that appeared during the war of browsers in the 90s. These features were included in HTML 3.2 to somehow restore order, but this is not the main task that faced HTML. With the release of HTML 4, its authors attempted to “retrain the Web” by removing the “disastrous” parts that were included in HTML 3.2, at least in the Strict DTD.

In other words, these things are canceled for a reason. If possible, try not to use them.

37. How to connect an HTML page inside another page?

If you use Strict DTD, then you have only one valid way - to use the object element:

<object type="text/html" data="http://example.com/foo.html">
Alternate content here for browsers that don't support OBJECT.
</object>

Unfortunately, object support is not available in Internet Explorer.

When using Transitional DTD you can use iframes ' s:

<iframe src="http://example.com/foo.html">
Alternate content here for browsers that don't support IFRAME.
</iframe>

Source: https://habr.com/ru/post/51333/

All Articles