📜 ⬆️ ⬇️

Pre-Reform Dictionary: Recipe

As probably many habrap users know, today, May 24, they celebrate the day of Slavic writing - the holiday of those for whom the word NUT means not “operational expense”. Today I will tell you how to make a dictionary of the Russian language with pre-reform spelling from the modern Russian morphological dictionary. Everything in order.



As many of us know, the 1917 revolution in Russia abolished not only debt obligations, but also some letters from the Russian alphabet. But the pre-reform rules did not remain forgotten, the texts published before the reform also remained pretty (even in my modest home library there are a couple of volumes), and indeed the topic of creating a morphological dictionary for that vintage grammar is interesting in itself. The reform consisted in the fact that some letters (і, ѣ, ѳ and ѳ) were removed from circulation, and also some rules were changed that did not have a direct relationship to the use of these letters. Read more in wikipedia .
')
Today we will talk about how to generate a morphological dictionary for the pre-reform language from the morphological dictionary of our usual modern Russian language.


What is a morphological dictionary or a dictionary with morphology support? I call this term not the dictionary that simply contains all the possible word forms of each word, but one that knows how to generate these word forms for each word. Which, of course, not only leads to saving space, but also gives hope that we have not forgotten to add the word " seeker " with the word " seeker " (cf. cf. genus). The grammatical category of a word is responsible for the generation of word forms, each word refers to a certain grammatical category.
In addition, in order to avoid combinatorial explosions from such words as gray-brown-crimson , the so-called composite rules are added to the dictionary. They are needed in order to generate such constructions. Each composite rule is responsible for generating words according to some laws. A composite can have an explicit point of division (like the hyphen in the given example), or an implicit one, when parts of the composite just dock to each other. For example, a particular case of a composite rule may be a way to form verbs with the prefix " re ": rewrite , remake , move , rewrite ... Composites for the Russian language without an obvious point of division may seem unnecessary, but those who know German will probably agree that they are needed.

So, we from the Russian dictionary with morphology make a dictionary for pre-reform Russian. We will look at the differences and introduce them gradually to the new dictionary. So, for the beginning we will consider the most simple moments:

The reform abolished at the end of words ending in a consonant (except Y). There are no problems to put it back.

The letters ѵ and ѳ by the time of the reform were living out their last days, the list of words containing them is very small. Pretty easy to recover.

The letter і was used in the word mіr (the one that the universe, not the antonym of war), as well as in ordinary words before vowels and d, except those that were formed according to the composite rules (chemical i , but sem and lingual). In the dictionary of fundamentals and grammatical categories, it is not difficult to fix it: a search with a replacement in the form of a regular expression is an easy manipulation.

The rules with c / c at the end of the prefixes from -, wow , time -, roses -, bottom ( izdodovaniye , razkazp ) are also made easy, as is the cancellation of the modification of consoles without -, through -, through ( useless , interlaced ).

Note that in case our modern Russian dictionary did manage without composite rules, then these changes, like saving —and in composites at the end of the first part, will have to be provided manually.

Next, let's work with the endings. Adjectives in the plural are in addition to the -th ending -yy, and in the singular masculine accusative case we replace the -th and -y with its -go and -yay. It is not difficult, as well as adding not very tricky changes in the endings of nouns. Add the words her , he, one , one , one , one , one (you can at least as immutable, if reluctant to mess with grammatical categories on this subject).

And after these simple manipulations we reached the most interesting. How to recover ѣ?
The topic is not easy, there is a separate article on this subject in Wikipedia. First, let's deal with simple parts. For the instrumental, dative and prepositional, comparative and excellent forms of adjectives and verbs are not - the grammatical category corresponds. Numbers for two - change manually, as reflexive pronouns. There are several more adverbs and prepositions, but their replacement is also quite a lifting task. But what to do with the crowd of vocabulary words?
Here we will come to the rescue ... Ukrainian language! Suddenly, do not you?

Because in the fact that ... oh, sorry, carried away. Ukrainian and Russian are very similar (well, really?), In particular, many words are similar. The rule is - in many cases, when ѣ was used in Russian, in Ukrainian there is a very similar word with the letter і in this place. We do not know what the second letter was in the word turnip ? OK, we check in the Ukrainian dictionary і and turnip . Similarly, say, the word repair . Of course, it happens that the meaning of a word changes (for example, what does the Ukrainian word mean?), But for our purposes this is not very important. Worse for us when there is no analogue in Ukrainian - as for the word “father”, for example. Well, it will not be possible to completely get rid of manual work, we will be glad that its volume can be greatly reduced. Possessing such uncomplicated knowledge and Ukrainian morphological vocabulary it will become easier to automate markup.
A small digression: linguistics
The reason for this phenomenon is probably that once, before the division of the common pre-language into Russian and Ukrainian, and e were pronounced differently, but the Russian and Ukrainian languages ​​further diverged and in Russian ѣ were pronounced the same as e, and in Ukrainian as i
There is, by the way, another verification rule - if the letter E is used at the root, then the letter e is without stress, but it was not without exceptions - the village .


And if the Ukrainian dictionary was not at hand? Worse Skoda :) We'll have to rely on our own accuracy and be glad at least that the roots with ѣ are still less than 9000.

After all the manipulations, you should deal with the pre-reform, more stringent than the modern, rules of transfer - if you plan to support them for your vocabulary.
In the result, we obtain a morphological dictionary of the Russian language using pre-reform orografii.
Thank you for your attention, and
Happy Slov'yanskoi letters!

UPD: At the request of paulousky (as well as the editor of the blog) added examples.

Source: https://habr.com/ru/post/223315/


All Articles