📜 ⬆️ ⬇️

Adding bidirectional support to your own Textbox

Introduction


I have long wanted to share the experience of adding support for bidirectional text to my own text editor, but I was motivated by this mercenary considerations. In this article I will write how to integrate GNU FriBidi into my TextBox to support Arabic. I hope my article will be useful, since it is hard to find good materials to support the Arabic text.

What we had


By the time of the need to add support for the Arabic language, the TextBox control was already able to do a lot: edit the text, control the cursor, select part of the text, paste, cut, support many lines, alignment, etc. Of course, Word doesn’t compare, but he knew basic things. Also TextBox was used in the application for Windows and Mac OS X.

Hi Habra

FreeType was used to get the glyph image, and OpenGL output, although this is not important. The control from FreeType received a glyph metric to correctly calculate their position.
')
FreeType Character Metrics

What task set


Guys, let's add Arabic support and quickly,
- that was the task.

At that time, I knew about Arabic that it is written from right to left, and that they have letters in words connected to make a ligature. But in reality it turned out to be a bit more complicated .
For this functionality, you must select a library. The following are commonly used for these purposes: GNU FriBidi , Pango , HarfBuzz . We chose GNU FriBidi, because it seemed the simplest and required minimal changes.

Some features of the Arabic language


At first glance, Arabic (اللغة العربية) seems very different from Russian or English. But the differences are not as great as it seems at first glance. When implementing, I encountered the following features:

  1. Arabic is written from right to left, i.e. the first character is the rightmost character. At the same time pressing the keys to the right or left, shifts the cursor to the right or left. Unlike Russian, if you press to the right, the cursor increases its position in the string, and in Arabic it decreases.
    Del deletes the character following the cursor, and the previous backspace. For Arabic, Del deletes the right character, and backspace deletes the left character.
    But the fun begins when Arabic and Russian are mixed in one line.
    Move the cursor in mixed text
    This also applies to the selection of the text with the mouse. Try highlighting the text below:
    اللغة اللرربية Russian language اللغة العبباة English language اللغة العربية

  2. The second feature is a tie. To make words look like a ligature, almost every letter has different Unicode characters for different positions in the word: at the end, at the beginning, in the middle. Who cares, there is a good table on Wikipedia .
  3. Ligatures If two letters go one by one, they can be replaced by one letter. For example, these two characters "ل" "" are converted to this لآ.
  4. Diacritics. For Russian, diacritics are "¨" above or "˘" above nd. There are no special problems with these diacritics, since they are already “sewn into” the glyph of these letters. Those. “Y” and “Y” are separate characters in the font, and it is not necessary to separately take a tick and add it to the letter Y to get J. In slave diacritics it is much more diverse and not “sewn” into symbols.
    An example of diacritics in Arabic
    The black color represents the letters of the Arabic alphabet, while the gray color represents the vowels (diacritics).
    As you can see in the picture there is one interesting case:
    2 diacritics over one letter
    Two diacritics simultaneously on one letter.
  5. I am sure that there are more features, but we did not encounter them and the users did not report.

Implementation


How to use GNU FriBidi

Using GNU FriBidi is quite simple. The library accepts a Unicode character string and, after calling a number of functions, returns a Unicode string, taking into account the position of the letters in the word, the legacy and position of the letters in the string.
int nLength; //   uint* pInputLine; // /  Unicode FriBidiCharType* pBidiTypes; //      FriBidiLevel *pEmbeddingLevels; // Embedding Levels FriBidiJoiningType *pJtypes; //    FriBidiArabicProp *pArProps; //      FriBidiStrIndex *pPositionLogicToVisual; //     ------------------------- //     . fribidi_get_bidi_types(pInputLine, nLength, pBidiTypes); //  Resolving Embedding Levels (http://www.unicode.org/reports/tr9/#Resolving_Embedding_Levels)   . FriBidiParType baseDirection = FRIBIDI_PAR_RTL; FriBidiLevel resolveParDir = fribidi_get_par_embedding_levels(pBidiTypes, nLength, &baseDirection, pEmbeddingLevels); //       fribidi_get_joining_types(pInputLine, nLength, pJtypes); //     . memcpy(pArProps, pJtypes, nLength * sizeof(FriBidiJoiningType)); fribidi_join_arabic(pBidiTypes, nLength, pEmbeddingLevels, pArProps); //   Unicode    .         ,     . fribidi_shape (FRIBIDI_FLAG_SHAPE_MIRRORING | FRIBIDI_FLAG_SHAPE_ARAB_PRES | FRIBIDI_FLAG_SHAPE_ARAB_LIGA, pEmbeddingLevels, nLength, pArProps, pInputLine); //            . FriBidiLevel res = fribidi_reorder_line(FRIBIDI_FLAGS_ARABIC, pBidiTypes, nLength, 0, baseDirection, pEmbeddingLevels, pInputLine, pPositionLogicToVisual); 

In tex boxing, added a GNU FriBidi challenge before updating the position of the letters and cursor.

Changes to existing code

To simplify the calculation of the position of the letters, we had to slightly complicate our data structure. Initially, there was a list of letters on which the cursor was moving, the same list was used to calculate the position of each letter.
Position0one23fourfive
LetterPRandatet

For Arabic, we had to add 2 lists, the first list is a logical storage of letters, that is, numbers in the order in which the user entered. And the second list is the letters in the order of their drawing, starting from the left to the right (even for Arabic). With this approach, it is easier to implement paragraph alignment.
Example for mixed text:
Mixed text example
Position0one23fourfive67eight
Entry orderخطأABOUTshandbtobut
Display orderABOUTshandbtobutأطخ

By and large, GNU FriBidi was used to build a list of display letters.
Thus, the entire work of the cursor was done with a list of letters in the order of input: insert character, select, delete, move. And for display and alignment, a list of letters in display order was used. By the way, for the Russian language both lists are the same.

Result

As a result, quite quickly managed to add support for Arabic. Everything seemed to work.
We did it!
But then we received a report from an Arab user, that the diacritics are not displayed correctly. FreeType could not cope with difficult cases when diacritics are added separately. FreeType is not enough information, because The position of a deacritic depends on the letter with which it is used.

What's next...


To add support for diacritics, it was necessary to complicate the decision, but this is the topic of the next article . I can only say one thing, for this I used HarfBuzz.

Disclaimer


Yes, we write our bike, so we implement our TextBox from scratch. And we did not use the Pango, because with him was a bad experience before. Maybe with Pango it would be easier. I don't speak Arabic, maybe I missed something.

useful links


Source: https://habr.com/ru/post/262987/


All Articles