Introduction
I have long wanted to share the experience of adding support for bidirectional text to my own text editor, but I was motivated by this mercenary considerations. In this article I will write how to integrate
GNU FriBidi into my TextBox to support Arabic. I hope my article will be useful, since it is hard to find good materials to support the Arabic text.
What we had
By the time of the need to add support for the Arabic language, the TextBox control was already able to do a lot: edit the text, control the cursor, select part of the text, paste, cut, support many lines, alignment, etc. Of course, Word doesn’t compare, but he knew basic things. Also TextBox was used in the application for Windows and Mac OS X.

FreeType was used to get the glyph image, and OpenGL output, although this is not important. The control from FreeType received a glyph metric to correctly calculate their position.
')

What task set
Guys, let's add Arabic support and quickly,
- that was the task.
At that time, I knew about Arabic that it is written from right to left, and that they have letters in words connected to make a ligature. But in reality it turned out
to be a
bit more complicated .
For this functionality, you must select a library. The following are commonly used for these purposes:
GNU FriBidi ,
Pango ,
HarfBuzz . We chose GNU FriBidi, because it seemed the simplest and required minimal changes.
Some features of the Arabic language
At first glance, Arabic (اللغة العربية) seems very different from Russian or English. But the differences are not as great as it seems at first glance. When implementing, I encountered the following features:
- Arabic is written from right to left, i.e. the first character is the rightmost character. At the same time pressing the keys to the right or left, shifts the cursor to the right or left. Unlike Russian, if you press to the right, the cursor increases its position in the string, and in Arabic it decreases.
Del deletes the character following the cursor, and the previous backspace. For Arabic, Del deletes the right character, and backspace deletes the left character.
But the fun begins when Arabic and Russian are mixed in one line.

This also applies to the selection of the text with the mouse. Try highlighting the text below:
اللغة اللرربية Russian language اللغة العبباة English language اللغة العربية
- The second feature is a tie. To make words look like a ligature, almost every letter has different Unicode characters for different positions in the word: at the end, at the beginning, in the middle. Who cares, there is a good table on Wikipedia .
- Ligatures If two letters go one by one, they can be replaced by one letter. For example, these two characters "ل" "" are converted to this لآ.
- Diacritics. For Russian, diacritics are "¨" above or "˘" above nd. There are no special problems with these diacritics, since they are already “sewn into” the glyph of these letters. Those. “Y” and “Y” are separate characters in the font, and it is not necessary to separately take a tick and add it to the letter Y to get J. In slave diacritics it is much more diverse and not “sewn” into symbols.

The black color represents the letters of the Arabic alphabet, while the gray color represents the vowels (diacritics).
As you can see in the picture there is one interesting case:

Two diacritics simultaneously on one letter.
- I am sure that there are more features, but we did not encounter them and the users did not report.
Implementation
How to use GNU FriBidi
Using GNU FriBidi is quite simple. The library accepts a Unicode character string and, after calling a number of functions, returns a Unicode string, taking into account the position of the letters in the word, the legacy and position of the letters in the string.
int nLength;
In tex boxing, added a GNU FriBidi challenge before updating the position of the letters and cursor.
Changes to existing code
To simplify the calculation of the position of the letters, we had to slightly complicate our data structure. Initially, there was a list of letters on which the cursor was moving, the same list was used to calculate the position of each letter.
Position | 0 | one | 2 | 3 | four | five |
Letter | P | R | and | at | e | t |
For Arabic, we had to add 2 lists, the first list is a logical storage of letters, that is, numbers in the order in which the user entered. And the second list is the letters in the order of their drawing, starting from the left to the right (even for Arabic). With this approach, it is easier to implement paragraph alignment.
Example for mixed text:

Position | 0 | one | 2 | 3 | four | five | 6 | 7 | eight |
Entry order | خ | ط | أ | ABOUT | sh | and | b | to | but |
Display order | ABOUT | sh | and | b | to | but | أ | ط | خ |
By and large, GNU FriBidi was used to build a list of display letters.
Thus, the entire work of the cursor was done with a list of letters in the order of input: insert character, select, delete, move. And for display and alignment, a list of letters in display order was used. By the way, for the Russian language both lists are the same.
Result
As a result, quite quickly managed to add support for Arabic. Everything seemed to work.

But then we received a report from an Arab user, that the diacritics are not displayed correctly. FreeType could not cope with difficult cases when diacritics are added separately. FreeType is not enough information, because The position of a deacritic depends on the letter with which it is used.
What's next...
To add support for diacritics, it was necessary to complicate the decision, but this is the topic of the
next article . I can only say one thing, for this I used HarfBuzz.
Disclaimer
Yes, we write our bike, so we implement our TextBox from scratch. And we did not use the Pango, because with him was a bad experience before. Maybe with Pango it would be easier. I don't speak Arabic, maybe I missed something.
useful links