📜 ⬆️ ⬇️

Russian-English or a few words about the spelling of two languages ​​at once


From the very moment the spell checker appeared in “ordinary” programs (Firefox, Miranda, Opera) I was strained by the need to constantly switch the dictionary from Russian to English and back.

For Firefox and Miranda, there are (semi) solutions that switch the dictionary themselves depending on the current keyboard layout. Already at least something, but still not very convenient - then one, then the other half of the words remain “red” and make it difficult to find real mistakes.

The ideal solution would be a dictionary that combines the spelling of both languages. And such a dictionary was created by one of the Firefox users ( http: //forum.ru-board.com / ... ). But having studied it more closely, I realized that I did not like it (see details below).

And finally, I got around to deal with this issue closely. Having fiddled with a couple of days, I wrote a script that combines any two myspell / hunspell dictionaries into one, combined.
')
The results of his work for Russian and English can be downloaded here: http://drop.io/spell_dict

For these dictionaries, the latest versions of dictionaries from OpenOffice.org 3.x were used.

Immediately it is necessary to make a reservation about the dictionary "without e". As a “Russian half” for him, I had to use a dictionary from OpenOffice.org 2.x - since all the other “without ” dictionaries that I came across were immediately “with ” and “without ”.

But this dictionary does not give the impression of a full-fledged one - there are somehow very few rules there: in the “with e” dictionary - almost 1360, in this dictionary - only about 120. But then there are almost 3 times more words (although, in theory, should be the other way around). So I would not recommend using the combined dictionary “without ”.
(by the way, if anyone has a link to a better Russian dictionary “without ” - share, correct :)).

I want more!


Theoretically, any other languages ​​can be added to these dictionaries to get even more “universal” kits. French? Belorussian?..

But, as practice has shown, the use of close languages ​​(for example, English and German) is undesirable - many errors will go unnoticed: Was is das?

If anyone has any suggestions regarding the expansion of the set - write. Send links to the selected versions of the dictionaries, describe the "configuration", and I will try to assemble the appropriate "kits". :)

How to install


For Opera 10.x


Just download the selected archive to the dictionaries folder in the Opera profile, restart the last one and select a new dictionary in the context menu of the multi-line input field.

For firefox


Extract the .aff and .dic files from the archive to the dictionaries folder in C: \ Program Files \ Firefox. And after the restart, select the dictionary in the context menu of the multi-line input field.

For miranda


You will need a spellcheck plugin .

If you have already installed the combo dictionary for Firefox, it is enough to set the checkbox "Use dictionaries from other programs" in the plugin settings. Otherwise, create the dictionaries folder next to miranda.exe and unpack the .aff and .dic files there.

After that, you need to restart Miranda and select the desired dictionary in the drop-down list (the plugin defines it as “en_ru (yo)”, while the standard English dictionary will be designated as “English (United States) [en_US]”). Do not forget to uncheck "Use input language to select dictionary".

For Pidgin, gedit (and other GtkView / enchant programs)


Since there is no convenient way to select the active dictionary (as an option, you can run these programs by setting the $ LANG environment variable), it’s best to replace the default .aff and .dic files.

For Ubuntu, for example, they are here: / usr / share / myspell / dicts. Unpack the .aff and .dic files from the downloaded archive, make a backup copy of the en_US files (or ru_RU, if you have a Russian locale in Ubuntu) and rename the new files to en_US.

PS Google Chrome uses dictionaries in a different format - BDic

What's wrong with the "Firefox-ovsky" combo dictionary


First of all, it combines dictionaries "with e" and "without e", and I want errors to be highlighted in "more", "hedgehog", etc.

And secondly, because such a large number of dictionaries were combined and not enough attention was paid to the processing of internal file structures, the same identifier in the combined dictionary got a much wider set of rules.

Here are some examples.

In the original English dictionary for the word "vive" were the following rules:

SFX Z 0 rs e
SFX Zyers [^ aeiou] y
SFX Z 0 ers [aeiou] y
SFX Z 0 ers [^ ey]

The format they have is: SFX <identifier> <cut off> <paste> <condition>

Those. for the first line: if the word ends with 'e', ​​cut off “nothing” from it and paste 'rs' - you will get a new correct word (vivers).

And in "Firefox-ovsky" the word "vive" got:

SFX Z 0 ers [aeiou] y
SFX Z 0 Ey [aiouy]
SFX Z 0 rs e
SFX Z 0 y [aeiouy] e
SFX Z ey [^ aeiouy] e

(... and 19 more diverse rules)

And here already (see the lines in bold) the word will also be recognized as correct: vivy

Also, for example, for some words ending in 'y', the original rules give the options: -iers or -yers (depending on what is before the 'y'), and the new set of rules gives the correct -yers and some something new -yey (and here-siers got lost somewhere). Etc.

(the author of this post needs to invite - sabio [at] tut [dot] by)

UPD: Added dictionaries: Russian-English-French and Russian-English-French-Italian. Download any of the dictionaries here .

Source: https://habr.com/ru/post/90130/


All Articles