Entering text in Linux (ibus)

If your keyboard is marked as Latin or Cyrillic, and you have to type texts in another language, especially using complex, non-alphabetic scripts, then this note about Linux input systems (simply “keyboard layouts”) may interest you.

I apologize in advance for fuzzy terminology and do not pretend to have an exhaustive technical description. The main goal of the article is to describe the possibilities, not the implementation.

Input methods

The main method of inputting characters (input method, IM) in Linux is XKB, it is installed by default and is activated immediately after installing the operating system. XKB is designed to work with alphabetic scripts, and cannot serve complex scripts such as Chinese characters or syllabaries of India and Africa. The system can be configured to work with no more than 4 layouts. The last limitation can be bypassed by hanging up on the hotkeys a command call with the desired combination of parameters for each language.

If more flexibility is needed, then go to the frameworks (input method framework). The main representatives of similar systems in Linux: IBus, SCIM, Fcitx. The framework itself is not able to enter text, and various scripts must be connected in the form of plug-ins (engines). From the experience of using IBus and Fcitx I can say that both systems support approximately equal number of plugins. Often, it can be almost the same plugins. For example, the Pinyin input method for Chinese is implemented as a stand-alone libpinyin library and provides identical capabilities when connected via IBus or Fcitx.

We can assume that over the last 6-7 years the difference between the frameworks leveled off, although some features may appear. Next, I will list the main IBus plugins as a more familiar system to me.

First, IBus is able to transparently use xkb and all its features. The only problem is that IBus cannot dynamically generate XKB configurations. The most popular of them are pre-registered in the file /usr/share/ibus/component/simple.xml , which can be modified and supplemented as necessary. (When updating IBus, the file will be replaced with the standard one.)

For example, the Russian layout is described as follows:

 <engine> <name>xkb:ru::rus</name> <language>ru</language> <license>GPL</license> <author>Peng Huang &lt;shawn.p.huang@gmail.com&gt;</author> <layout>ru</layout> <longname>Russian</longname> <description>Russian</description> <icon>ibus-keyboard</icon> <rank>99</rank> </engine>

In addition to layout you can specify layout_variant , the rest of the setxkbmap parameters setxkbmap not available, including the well-known typographic layout by Ilya Birman, which is specified in xkb using the argument misc:typo . To get around this limitation or simply create a layout for your tasks, you need to fully describe it. To do this, in the /usr/share/X11/xkb/symbols folder you need to create a custom file (if you add existing files, they will be erased when updating the system) and set up the layout configuration. For example, Russian with additions by Ilya Birman:

 partial alphanumeric_keys xkb_symbols "ru-typo" { include "ru(winkeys)" include "typo(base)" include "level3(ralt_switch)" // 1th keyboard row key <TLDE> { [ NoSymbol, NoSymbol, U0301, NoSymbol ] }; // "~" };

Where include lines collect configuration from ready-made templates. Accordingly, from the file "ru" is taken the version of the Russian layout "winkeys". Then it is supplemented with the "base" layout from the "typo" file and the switch of the third AltGr layer is set (see the "level3" file), which is similar to the command:

 setxkbmap -layout ru -variant winkeys -option lv3:ralt_switch,misc:typo

If you wish, you can make your own changes. In the example above, the accent mark "U + 0301" (Combining Acute Accent) is placed on the AltGr + ~ combination. Positions in which NoSymbol is indicated use the definitions from the previous patterns: "" and "" from "winkeys", "≈" from "typo":

 key <TLDE> { [ Cyrillic_io, Cyrillic_IO, NoSymbol, NoSymbol ] }; // winkeys key <TLDE> { [ NoSymbol, NoSymbol, NoSymbol, approxeq ] }; // typo key <TLDE> { [ NoSymbol, NoSymbol, U0301, NoSymbol ] }; // custom

Next, the created layout should be /usr/share/ibus/component/simple.xml to the /usr/share/ibus/component/simple.xml file as follows:

 <engine> <name>xkb:ru:typo:rus</name> <language>ru</language> <layout>custom,us</layout> <layout_variant>ru-typo,</layout_variant> <longname>Russian (with Typo)</longname> <description>Russian (with Typo)</description> <icon>ibus-keyboard</icon> <rank>1</rank> </engine>

Where custom is the file name from the /usr/share/X11/xkb/symbols folder, and ru-typo points to the layout it contains. The additional us layout is specified so that the hotkeys work correctly (Ctrl + C, Ctrl + V, etc.). After restarting IBus (ibus ibus restart ), a new "Russian (with Typo)" layout will appear in the settings.

The second input method is m17n . This is a fairly rich library of keyboard layouts for a variety of scripts. IBus has its own similar ibus -table input method, which is described as “slightly less powerful”. I had to use the latter to create a layout with a one-to-one correspondence between Latin letters and letters of the required alphabet without using complex logic, so I cannot judge which of the two systems is more functional and expressive - description of the layout in m17n or ibus-table format. The ibus-table method includes a curious "LaTeX" layout for entering characters in the corresponding notation: " \Delta " for "Δ", " \ge " for "≥", etc.

The next universal input method is KMFL . This is the linux version of keyman input method for windows. Not very common IM, which supports the rarest script. Unlike the original Keyman, with the stated ability to print in more than 1000 scripts, KMFL is not so advanced, but it can also be useful. The format for describing layouts is text, there is a program for creating them under Ms Windows. I use the EuroLatin layout, in which the text " 2//3 " is converted into a fraction "⅔", and the sequence " -a " is converted into the macro "ā". It resembles the Compose key in xkb, but does not require a separate modifier - KMFL itself recognizes the sequence during dialing.

The remaining input methods specialize in separate scripts: "ibus-libpinyin" for Chinese, "ibus-unikey" for Vietnamese, etc. The settings for these plugins are also in /usr/share/ibus/component/ . In the corresponding files, you may need to specify a basic keyboard layout, otherwise, when switching from a non-Latin layout, they will be inoperative. For example, in libpinyin.xml you need to find the parameter "layout" and enter "us" for the QWERTY keyboard or "fr" for AZERTY, etc.

 <layout>us</layout>

Switch Layouts

Most of the time I work with language pairs: Russian-English, Chinese-Spanish, etc. Therefore, I prefer to have one hot key to switch between the two latest layouts (CapsLock), and the layouts themselves are switched by separate hot keys (Win + 1 ... 9 on the digital block). Thus, at first I set the working layouts, Win + 1 (en) and Win + 2 (ru), and then switch between them using CapsLock (en <-> ru).

Two shortcut keys can be set in IBus: one for cyclic switching through the list of layouts, the second for the last two layouts. You can also select the desired layout through the console and, accordingly, assign the script to the hot key.

I note that reassigning CapsLock using xmodmap does not work, since IBus resets these settings. Therefore, I prefer through udev globally redefine CapsLock as F14 (file /etc/udev/hwdb.d/90-custom-keyboard.hwdb ):

 evdev:input:b0003v1A2Cp0E24* # my keyboard id KEYBOARD_KEY_70039=f14 # bind capslock to f14

And use F14 as the hot key in IBus. In my experience, this provides the most stable configuration.

For more information on configuring udev, see the end of the article.

Virtual keyboard

Industrially produced keyboards, marked up for a specific writing, only for languages with a large number of users - for example, for Russian (YTSUKEN). Neither in Armenia, nor in Georgia you can not buy a keyboard with keys, signed with letters of national alphabets. Similarly, in Kazakhstan and Uzbekistan they use Russian-English keyboards and are forced to learn where the letters are located that are not included in the standard Latin or Cyrillic alphabet.

If you master the new layout, I advise you to use the virtual keyboard. I like Onboard because it adapts itself to the active layout and is updated when you switch to another. But this only works with xkb (also when using xkb via iBus).

Onboard is very convenient for testing xkb layouts and allows you to see the assigned characters on all layers (AltGr, etc.).

Conclusion

Not all programs correctly support language frameworks. In particular, Sublime Text 3 works only with SCIM, and using IBus, regardless of the layout chosen, it will print only Latin letters.

I have been using IBus for a long time, and I know other systems very superficially. According to reviews on the Internet, Fctix is described as more functional and better adapted for entering Chinese text. In any case, when working with Chinese texts, IBus completely suits me and the differences should be unprincipled. The last time I had to use Fctix (2 years ago), this framework did not allow switching layouts if the cursor is not in the text field. I hope, by the present moment this defect has been corrected.

Another tool for working with a variety of scripts - silicone lining on the keyboard. Chinese online stores offer overlays (保护膜 or 键盘膜) for the Apple Magic Keyboard for a wide variety of scripts. An example of a non-Chinese distributor . But keep in mind that three generations of Apple Magic were released (and each in versions for the USA, Europe and Japan), and Chinese replicas are distinguished by linear dimensions and key layout. Sometimes, I regret that there is no single standard for computer keyboards.

Key Signal Transformation Summary

The numeric code of the pressed key changes its value several times.

scancode: When you press a key, the keyboard (or driver?) sends to the Linux kernel scancode .
keycode: Next, the scancode core is converted to a keycode (Linux input API subsystem). You can control the conversion using the programs udev , keyfuzz , setkeycodes .
keysym: The X Window System receives the keycode from the kernel and translates it into a keysym - this is already the final character that the client program will receive as input. Conversion setting is done via XKB or xmodmap (deprecated).

It can be seen from the above sequence that reassigning keys at the stage of scancode > keycode is preferable, since this does not cause intersections with KXB.

Udev setup instructions

Scancode is translated into keycode for each input device independently, so you first need to know the unique identifier of the keyboard (in fact, evdev also works with a large class of peripheral devices that have buttons - from mice to printers and webcams). Arch Linux users can use the following script (for other distributions, you may need to adjust paths):

 #!/bin/sh for DEVICE in /dev/input/by-id/*; do echo $(basename $DEVICE) DEVID=$(basename $(readlink $DEVICE)) printf "evdev:input:b%sv%sp%se%s*\n\n" \ `cat /sys/class/input/$DEVID/device/id/bustype` \ `cat /sys/class/input/$DEVID/device/id/vendor` \ `cat /sys/class/input/$DEVID/device/id/product` \ `cat /sys/class/input/$DEVID/device/id/version` done

The same device can be represented in the system in several instances under different names, but the identifier will be the same. For example, my keyboard is defined as two devices:

 usb-SEM_USB_Keyboard-event-if01 evdev:input:b0003v1a2cp0e24e0110* usb-SEM_USB_Keyboard-event-kbd evdev:input:b0003v1a2cp0e24e0110*

Note: the identifier can be abbreviated (for example, to b0003v1a2cp0e24* ), which is useful when creating uniform rules for a series of similar models. The asterisk “*” here plays the role of a wildcard.

Now you need to create a 90-custom-keyboard.hwdb file in /etc/udev/hwdb.d/ with the following content (see the samples /usr/lib/udev/hwdb.d/60-keyboard.hwdb ):

 evdev:input:b0003v5c0ap0003e0110* #   KEYBOARD_KEY_70039=f14 #

The KEYBOARD_KEY string starts with a space, this is important. Update configuration:

 sudo udevadm hwdb --update && udevadm trigger

Subsequently, when the device is rebooted or reconnected, the configuration will be updated automatically.

Key mapping is set in pairs KEYBOARD_KEY_<scancode>=<keycode> . The keycode values (required in lower case) are in /usr/include/linux/input-event-codes.h (for Ubuntu 14.04 in /usr/include/linux/input.h ).

You can get scancode using the evtest program. First, you need to decide on the eventXX number, to do this, run the command and find your keyboard:

 > ls -l /dev/input/by-id/ … usb-SEM_USB_Keyboard-event-if01 -> ../event11 usb-SEM_USB_Keyboard-event-kbd -> ../event10 …

Select "Keyboard-event-kbd" and find out the desired number (in this example - 10). Now you can turn to evtest:

 > sudo evtest /dev/input/event10 … Event: time 1531562530.720076, type 4 (EV_MSC), code 4 (MSC_SCAN), value 70039 …

When you press the "CapsLock" key, you get the code "70039" - this is the desired scancode .

Source: https://habr.com/ru/post/358440/

All Articles

Entering text in Linux (ibus)

Input methods

Switch Layouts

Virtual keyboard

Conclusion

More articles: