As follows from the title, the article will focus on an integral part of any Russian-language (and not only) text - on the space. We will touch on the history of the gap, the types of gaps, the issues of the use of the gap in the web typography.
Generally speaking, a space is any empty space in the handwritten, printed or displayed text on any other medium. So the spaces are different:
- trigger (large vertical skips in the front page of the publication) and trailing spaces of the strip,
- paragraph indents and trailing spaces in the paragraph,
- leading spaces (between lines of text),
- interword spaces (between words on one line)
- letter spacing (between letters in a word).
The following discussion focuses on word breaks separating words and functionally belonging to punctuation marks.
History of the word break
Interword gap - relatively later invention in the history of human thought. The story of the gap is described in depth in Paul Saengerâs âSpace between Words: The Origins of Silent Readingâ book and, less deeply, in Johannes Friedrichâs The History of Writing.
')
There is also a good article by Anton Bizyaev about spaces and their history
âThere were no spaces at the beginningâ , which was published in 1997 in the magazine âPublishâ.
In short, the space appeared rather late, in those writings where the lack of word delimitation led to the difficulty of reading (the so-called consonant writing, where only consonant sounds are recorded). However, in Greek and Latin, in which the vowel sounds were also recorded, the use of the space was lost. Paul Sanger connects this with the fact that reading was done out loud, which simplified the delineation of words in the perception of the text.
Again, the space began to be used approximately in the VII â IX centuries. n Oe., and this tradition came from Ireland, where the scribes and readers of the native language was Old Irish, and religious literature was written in Latin. Apparently, for this reason, the monks had difficulty reading aloud. It is believed that the appearance of a space is tightly connected with the gradual transition from reading out loud to reading to oneself. Examples of books in Latin with interword spaces are monuments of British literature: the
Gospel of Darrow (VIIth century) and the
Book of Kells (VIII â IX centuries).
In the verb and Cyrillic, the space was also absent, and in the usual sense, it has been used only since the 17th century.
Before the mankind invented the type-setting font, there was no special classification of interword spaces - the scribes put spaces by eye and set. Let me remind you (we wrote about this in the article
âWidth Switchâ ) that the manuscript and xylography relate to ways of creating texts without the mobility of letters. Naturally, spaces could have different widths, since gaps were made manually.
Manual spaces
When the mobility of the letters appeared (and this happened with the advent of typesetting fonts), the questions accordingly arose - but how to put spaces in order to keep the switch width-wise?
The manual dialing technology is such that the typed line is completely clamped in the bench and in the galley, and, accordingly, should have a width almost exactly equal to the width of the strip (for more information on the manual dialing technology, see M. Schulmeysterâs
book of the same name ).
When typing, the string was typed from letters (bars, on the end of which convex mirror copies of letters were printed on paper), and word breaks were created with the help of so-called spines - bars of various thickness, which do not have a printing surface on the end. It looks
like this . Spacing for each font size, of course, produced their own, and had a different width. For example, for a point font of 10 points (a standard point for most text editions), shakes of 10, 5, 4, 3, 2 and 1 point were issued.
Spit widths were called kegel or round. The half-pintelle spears were called semi-pinicular or semi-circular. There is also the name âthin spacationâ, which means spacation with a thickness of 1-2 points for a font size of 8-12 points. That is, for a 10 point font size, a thin spacing is usually 2 points (respectively, 1â5 point type). However, due to the lack of precise definition of fine spelling, in the manuals of the publisher, editor and typesetter they usually speak not about beating into thin faces, but about beating on so many points (assuming that the font size is 10 points).
Thus, you need to understand that, depending on the size of the font, the proportion of round spacing (third, quarter, etc.) may have different width in points, and vice versa.
Traditional word width
So, having figured out what a round and semicircular spacing is, let's move on to the width of the word word proper adopted in the Russian set.
Schulmeister writes (p. 94) that when typing a line between words a semicircular is put. When a line is typed to the end, in most cases its width is either smaller or larger than the width of the dial band. Therefore, the typesetter has to change the width of the spaces, reducing it to at least 1â4 round and increasing the maximum to 3â4 round (respectively, when typing 10 points, the interword words can vary from 3 to 7 points). Naturally, there are nuances that depend on the format of the publication, but we will not touch them.
However, Schulmeister stipulates that the word word itself is semicircular, and the use of a standard space in 1â3 round is both more economical from the point of view of paper consumption, and so is often more beautiful. Also the use of the interword space in a semicircular is not recommended for narrow fonts.
With the advent of line-throwing machines, spaces began to be made uniform in width within one line, and the width of the inter-word space began to vary around 1â3 round.
Computer typing and web typography
We are currently limited by the capabilities of the fonts used, and, of course, by the character set in Unicode. It must be remembered that by no means all fonts contain most white-space Unicode characters.
When switching to computer layouts, the transition was made from specifying the width of the spaces in points to specifying the width of the spaces in round lobes, as the fonts began to scale easily to any size, and whitespace elements should remain proportional to the size of the font.
Space Unicode Characters
Unicode includes the following characters for western typing spaces.
- Interword space , U + 0020, & # 32; - width from 1â5 to 1â2 round depending on the font. For medium fonts, the interword space has a width of about 1â4 round (for example, Times New Roman has just such a space), for wide ones - about 1â3 round (Microsoft Verdana - 0.35 round, Microsoft Tahoma - 0.31 round).
- Uninterrupted word break, U + 00A0, & nbsp; - has the same width as the usual word break, but the line break is prohibited in the place of the non-breakable space.
Regular and non-breaking interword spaces are included in any font and are correctly displayed by all agents, except for the absence of increasing and decreasing the non-breaking space when turning off in width in some word processors and browsers (which is a violation of the recommendations). For example, FireFox correctly scales non-breaking spaces, and MSIE 7.0 does not scale them at all.
All other whitespace characters have a fixed width and do not stretch when the lines are turned off. However, according
to the Unicode string break algorithm , they should all be treated as a line break point.
- Round Felon , U + 2003, & emsp; - as it was said, has a width equal to the size of the size. Also called Em Space, perhaps because the letter "M" in any old fonts had such a width. At the same time, this is now far from being met everywhere, and therefore the statement that Em Space always has the width of the letter âMâ is a delusion.
- Half-rounder, U + 2002, & ensp; - half round. Also called En Space, perhaps because the letter "N" in any old fonts had such a width. At the same time, this is now far from being implemented everywhere, and therefore the statement that En Space always has the width of the letter âNâ is a delusion.
- The third tail , U + 2004, & # 8196; - one third round. In English it is called Three-per-Em Space.
- Quarter play , U + 2005, & # 8197; - a quarter round. In English it is called Four-per-Em Space.
- One sixth round , U + 2006, & # 8198 ;. In English it is called Six-per-Em Space.
- Fine Glory , U + 2009, & thinsp; - usually has a width of 1â5 round (less often - 1â6). Generally speaking, its width depends on the typing language and the font manufacturer, and in Cyrillic fonts, thin spacing usually has a width of 1â5 round. This proportions in proportions in accuracy corresponds to two-point nomination when typing 10 points. In English called Thin Space.
- Hair dressing , U + 200A, & # 8202; - the narrowest crepus, about 1â10â1â16 wide, round. Such proportions in proportions correspond approximately to one-point nomination when typing 10 points or even look narrower.
Using different spaces
Since the width of the interword space is fixed in the font and changes automatically when turned off in width, the use of other whitespace characters as interword characters is justified only when typing printed publications, and only if you have a deep understanding of what is being done.
In the usual layout for the web, it is enough to use ordinary and non-breaking word breaks to separate words.
At the same time, according to the rules of Russian-language typography, in a number of places, fine spacation should be used (more precisely, reference books write about two-paragraph spacation, but we will use the term âfine spacationâ as the most appropriate in terms of established terminology and when typing).
The basic rules for the use of spaces will be described below, but in general, we recommend the following principle for use in web layout.
When preparing HTML documents for publication on the Internet, only a space, an unbreakable space & nbsp; should be used as whitespace elements; and a fine fancy dress & thinsp ;. In the event that the author assumes that the page should be viewed with the help of agents that incorrectly process the & thinsp; symbol, then instead of thin spacing, a regular or non-breaking space should be used.
The use of only thin spacation from the whole variety of whitespace elements allows, firstly, to preserve the harmonious look of the typed text, and secondly, not to overload the author of the publication with various rules for the use of spacecraft of various fractional widths.
Processing gaps by browsers and search engines
In preparing the article we conducted a kind of experiment on a specially prepared page. Yandex and Google cope with non-standard characters well, replacing all non-standard whitespace with normal ones (we think this is the right behavior). That is, they do not make a difference between the texts âtwo wordsâ, âtwo & ensp; words "," two & thinsp; words, "etc.
As it turned out, rendering non-standard white space elements works badly in browsers. Only Firefox 3.0 in Windows XP and * nix, MSIE 7.0 and Safari in Windows XP cope with the task normally. There is no data about MSIE 8.0, but most likely, he is also fine.
- Firefox to version 3.0 does not break the line at all in places with non-standard spaces. The width of the spaces is displayed correctly.
- Opera 9.26 and 9.50, FireFox 3.0 for Mac, Safari for Mac carry the line, but all the spaces are the same width.
- MSIE 5.5 and 6.0 under Windows 2000 instead of spaces are placed in small squares (perhaps the corresponding characters are simply missing in the system font).
It is not quite clear what is the reason for the equal width of all white space elements in all browsers under Mac. Probably with embedded fonts.
Basic rules for using spaces
So, once again, we emphasize that in all the rules listed below, there is a thin & thinsp; it is used only in the case when the author notes the risk of the use by the visitor of the site of browsers that incorrectly display fine envelope. These include some browsers in * nix (perhaps this is due to embedded fonts), MSIE version 6.0 and earlier, browsers for Mac (they can be neglected, since the rendering error is only in width), perhaps - some browsers for mobile phones and PDA.
In that case, if the use of such browsers is likely, we recommend using regular or non-breaking word breaks instead of thin spaces.
As described above, according to the recommendations of Unicode, thin spacing is such a space where a line break is possible. In cases where the rules require fine spelling and prohibiting line breaks (for example, between digits when dialing a number), you need to use a design like
<span style="white-space: nowrap;">250 000</span>
. The
nobr
html element is proprietary and prohibited to use.
Next, we describe those rules for the arrangement of spaces, which, according to our observations, are most often violated during the imposition of texts. More detailed information on the rules of typing can be found, for example, in the âHandbook of the Publisher and Authorâ A. E. Milchin and L. K. Cheltsova.
Abbreviations and Symbols
- In the abbreviations "and so on," "and the like," "because," "that is,", "and others," "BC," "southern latitude," and the like, all the elements of the reduction are separated by an inseparable space.
and so on - and & nbsp; t. & nbsp; d.
and so on - and & nbsp; t. & nbsp; P.
t. k. - t. & nbsp; to.
i.e. - t. & nbsp; e.
and others - and & nbsp; others
BC er - until & nbsp; n. & nbsp; er
Yu. sh. - y. & Nbsp; sh. - Initials are repelled from each other and from the last name by an inseparable space.
A. S. Pushkin - A. & nbsp; C. & nbsp; Pushkin
J. R. R. Tolkien - J. & nbsp; P. & nbsp; P. & nbsp; Tolkien
It is also possible to discard the initials from each other and from the last name following him with a fine spelling, but transferring the initials or the last name to the next line is prohibited. Regardless of which style you choose, you must adhere to the unity of style throughout the entire document or site.
V.V. Putin - V. & thinsp; B. & thinsp; Putin
V. Putin - V. & thinsp; Putin
Putin V.V. - Putin & nbsp; B. & thinsp; AT.
Putin V. - Putin & nbsp; AT. - The abbreviated word bounces on behalf of its own unbreakable space.
st. Shchorsa - st. & Nbsp; Shchorsa
Moscow - city & nbsp; Moscow
Metro them. Lenin - Metro them. & Nbsp; Lenin - The number and the corresponding countable word are beaten off by an unbroken space.
12 billion rubles - 12 & nbsp; billion rubles
Ch. IV - Ch. & Nbsp; IV
pp 3â6 pp. 3â6
rice 42 - Fig. & Nbsp; 42
XX century. - XX & nbsp; at.
1941-1945 - 1941-1945 & nbsp; yy
Chamber â 6 - Chamber â & 6;
§ 22 - § & nbsp; 22
25% - 25 & nbsp;%
97.5? - 97.5 & nbsp ;?
16 ¢ - 16 & nbsp; ¢ . - The number and the unit of measure corresponding to it (except for signs of degree, minute and second) are beaten off by a thin sword, the line break is forbidden.
400 m - 400 & thinsp; m
100 tons - 100 & thinsp; t
451 ° F - 451 ° F
but 59 °, 57 â˛, 00 âł. - The signs of degree, minute, and second are beaten off by a thin sword from the following digits.
59 ° 57 Ⲡ00 âł - 59 ° & thinsp; 57 Ⲡ& thinsp; 00 âł
It should be taken into account that there is no fully established rule about the jabbering of percent signs and currencies among typographers, so a set of a percent sign and currency symbols close to a number is not a mistake if such usage is carried out uniformly throughout the site. However, we believe that using a space in this case improves the readability of the text.
Numbers and intervals
- The fractional and integer parts of the number do not beat off with a space from the comma: 0.62 , 345.5 .
- The digits of the number are beaten off from each other by a fine spell, except for dates, numbers (for example, documents), signs of machines and mechanisms.
25,563.42 - 25 & thinsp; 563.42
1,652 - 1 & thinsp; 652
1,298,300 - 1? Thinsp; 298 & thinsp; 300
but 1999 , GOST 20283 , in. No. 982364 - When numerical designation of intervals, the dash does not bounce off the boundaries of the interval.
50â100 m - 50â100 & thinsp; m
1 500-2 000 - 1 & thinsp; 500-2 & thinsp; 000
1.5-2 thousand - 1.5-2 & nbsp; thousand
15â20% - 15â20 & nbsp;% - Unary signs plus, minus and plus or minus do not beat off the following numbers: +20 ° C , â42 , Âą 0.1 .
- Binary signs of mathematical operations and ratios are bent on both sides into a thin frame.
2 + 3 = 5 - 2 & thinsp; + & thinsp; 3 & thinsp; = & thinsp; 5
Punctuation marks
- A comma, a comma, a colon, question and exclamation marks, a semicolon are not beaten off with a space from the preceding word, and are beaten with a space from the following: Ha, ha. Ha? Ha!
- An ellipsis does not bounce off the previous word, if it stands at the end of a sentence or part of a sentence, and from a subsequent one - if it stands at the beginning of a sentence: Wow ... What? âŚNothing.
- The quotes do not beat off with spaces from the text enclosed in them: the battleship Potemkin .
- The brackets do not beat off with spaces from the text enclosed in them, and beat off with spaces outside (except when the closing bracket is adjacent to the punctuation mark on the right): Text in & nbsp; nobody brackets & nbsp; interesting (usually).
- The dash bounces off the previous word with a non-breaking space, and from the next with an ordinary space (including if the interval is specified in verbal rather than numeric form).
Vitenka & nbsp; - well done!
we only need a cucumber fifteen & nbsp; - twenty centimeters
Molotov Pact - Ribbentrop . - If two numbers in the verbal form do not form an interval, but mean âeither one number or the otherâ, then a hyphen is inserted between them, which is not reflected by spaces: I drank two or three glasses .
There is a recommendation to beat the dash on a thin spice or not to beat off from a point, comma or quotation mark at all. This can be justified when typing a printed text with a specific font, since it increases the homogeneity of the spaces. At the same time, when viewing text for the web, the userâs fonts can be quite diverse, which is why the space left of the dash is constantly becoming narrower than the right one.
Unwanted transfers
- Short words and conjunctions ( a , and , but , me , you , and so on) should be discouraged from the subsequent word with an inseparable space, since the short word hanging at the end of a line impairs readability. Including it is very desirable not to allow the transfer of the line between the particle and the verb following it.
- Particles, however , would it be desirable to beat off the previous word with an unbroken space: then & nbsp; same said & nbsp; would have thought & nbsp; whether i
- It is advisable to keep the prepositions at the beginning of the sentence from the words following them. (even longer than one and two letters)