📜 ⬆️ ⬇️

BIDI (unicode bidirectional algorithm)

image Multilingual sites are good, but rather dreary. And if for the most popular languages ​​it is enough to have several variants of text, then with the addition of RTL (right-to-left) everything becomes much worse. We have to start a new set of styles with the replacement of the entire right to the left and vice versa (for properties like float, padding, margin etc), but this is not all. There may be situations when phrases in languages ​​with different directions are side by side in the same document, and bidi begins to work here. If anyone is interested ....


Bidi


When there are languages ​​with different (RTL and LTR) spelling on the page, the characters are displayed, as a rule, not in the order in which they are stored in the browser's memory. Bidi is committed to putting them in their places. The order of the characters depends on the so-called basic direction of the text. It is set by the dir attribute, which can be specified for most tags. The easiest way to set the base direction of the entire page in RTL is to add the dir attribute to the top: <html dir="rtl"> . In addition, most of the characters in unicode themselves have a certain direction. On this basis, they are divided into strictly typed, weakly typed and non-typed (neutral).

Strongly Typed Symbols

These are the most letters. The sequence of LTR characters is displayed one after another from left to right, RTL - from right to left:
image

The phrase from several differently directed pieces is displayed in several passes. Their direction does not depend on the base, but only on the direction of the characters. Therefore, for example, words in Arabic will be read from right to left in both RTL and LTR versions of the site. But the order of these "visits" will change. In the following example, you will first read bahrain, then مصر, then kuwait:
')
image

If we add dir=”rtl” any parent element, then the basic direction, and hence the word order, will change, but the words themselves will be read the same way as before:

image

Please note that the only difference in the code of these two pages (the pictures are clickable) is the presence of the dir attribute, but not the word order.

Non-typed characters

Spaces and punctuation are not typed, because they can be used in any text. They are called neutral. From the point of view of bidi, a neutral symbol located between two strongly typed characters of the same direction has the same direction, that is, the space between the two RTL characters will also become RTL. Thanks to this, the three Arabic words in the following example are displayed in one go:

image

But when a neutral symbol stands between symbols with different directions , it assumes a base direction, which is not always correct. Suppose the name is taken from the base and must end with an exclamation mark:

 The title is <?php echo $title ?>! in <?php echo $lang ?>. 

For a name in Arabic, it would be correct to move the exclamation mark:

image

But bidi won't do that. The exclamation mark (like the space next to it) is between characters with different directions, which means it takes the base direction and will be displayed during the next - LTR - approach. That is, the first entry will display “The title is“, the second - “مفتاح معايير الويب” and then “! in Arabic ”:

image

Thus, for Arabic-speaking site visitors, the phrase will begin with an exclamation mark, and not end with it.

Mirror Symbols

These are all kinds of brackets and they deserve a separate mention, because the bidi not only transfers them, but also expands them. The opening (sya?) Bracket always "opens" in the base direction, the closing - in the opposite direction. And this is how it happens:
image
The rules are the same: in the top line, the closing bracket is on the border -> takes the base direction -> is transferred and flipped. Not nice

Weakly typed characters

Numbers are considered LTR symbols, but they do not affect the neutral ones:

image

Since the numbers in this example are between Arabic characters, they will be displayed during the RTL approach, but left-to-right and without affecting the spaces. There are also subtleties like the one that the dollar sign, for example, will be considered part of the number, not a neutral symbol, but I know little about it.

What to do?


Problems, apparently, arise on the border of texts with different directions and are connected with neutral symbols. Bidi cannot independently determine the direction of such a symbol and regards it as a base one. The simplest solution is to wrap the problem area in <span> with the appropriate dir. Thus, we will eliminate ambiguity, but for this we need to know in advance the direction of the text, which will be located in this place.
Another possible value is added to HTML5 for the dir attribute - auto and even a whole new <bdi> . When using any of them, the bidi will find the first strongly typed symbol inside the element and take the corresponding basic direction, which will solve most problems.
In addition to tags, you can use special markers (directional mark). These are non-printable characters, which in their action replace a strongly typed LTR or RTL symbol. In html, you can use the mnemonics & lrm; and & rlm;

Well, such a test :
LineIE 10Opera 12.15FF 21Chrome 27.0.1453.94 m
 Title is مفتاح معايير الويب! in Arabic 
----
 Title is مفتاح معايير الويب!&rlm; in Arabic 
++++
 Title is <span dir="rtl">مفتاح معايير الويب!</span> in Arabic 
++++
 Title is <span dir="auto">مفتاح معايير الويب!</span> in Arabic 
--++
 Title is <bdi>مفتاح معايير الويب!</bdi> in Arabic 
--++
Plus means that the exclamation mark is transferred to the left, that is, it is displayed as part of the Arabic text.

More tar!


And there are also input fields in which there may also be a jumble of multidirectional characters.
We have, say, an RTL site and a field for entering a phone number in a free format. If you enter numbers in a row, or through a hyphen, then everything is fine, but the reaction to spaces is quite amusing. Here is an example . Try also to highlight a pair of numbers in parentheses. Why is that? Yes, everything is the same: the numbers are LTR characters, but because of their weak typing they do not affect the neutral ones, which means the spaces take the basic direction (RTL) and instead of one LTR line (which would be printed in one go) there are several separated spaces and all this is displayed in accordance with the bidi: "words" - from right to left, the characters in them - from left to right. If we add a strictly typed LTR symbol to the beginning, it will set the direction of the neutral ones in the LTR as well and the problem will disappear. The only solution I see is to somehow handle user input with js. For example, add a marker & lrm; in front of each neutral symbol. If someone tells a more beautiful solution, I will be grateful.

upd:
T2L proposed this solution: <input type="text" value="123456" dir="ltr" style="text-align: right" />
Simply set the base direction in the LTR and align the content to the right. I think this will be optimal for fields that do not assume strictly typed RTL characters.

Conclusion


The algorithm is good, but not perfect, and this must be taken into account. As for the input fields, I asked a question in Habré, but I received only one answer (although very detailed, thanks, yogev_ezra ). The text is based on an article with w3, which explained everything to me. For the transfer of forces was not enough, I decided to try to retell. Pictures from the same place. Indications of any errors are welcome.

PS
Yes, and thanks gribozavr for the hint .

Source: https://habr.com/ru/post/181123/


All Articles