📜 ⬆️ ⬇️

Grammar checker

Most text editing systems have a tool to automatically check spelling errors (when one or several letters are written incorrectly in a word; in English, speller ). Their principle of operation: the program analyzes each word in the text and searches for the same in the Database of all words and their various forms.

This text verification ensures that the words in the text will be spelled correctly (as in a dictionary), but does not protect against consistency errors and syntax errors in the sentence. For example, the sentence “I read an interesting magazine” is wrong, but the text editing system will not show the correct version: “I read an interesting magazine.”
The grammar checker in grammar checker helps to avoid such mistakes.

Created a lot of programs that check spelling errors. But so far (as far as I know) there are no programs that check grammatical errors in the sentence (for flexive / inflectional languages ​​- in which words have many forms); that is, checking for errors is not only spelling, but also syntax and syntax errors. Therefore, I would like to present a program that does it (as far as it turns out).
')
image

In 2003-2007, I wrote a program (three versions of it) that as quickly as possible analyzed the text (in Lithuanian) and issued corrections not only for spelling errors, but also corrected grammatical errors and communication errors in the sentence.
In January 2011, through Wikipedia, I came across habrahabr.ru, and then rg_software articles “Notes on NLP” and “NLP: spell checking - look inside” and met the phrase “how to create a pattern for the rule“ the predicate should the same genus as the subject? ”I thought that it was worth writing about my work - maybe it will be interesting for specialists and not indifferent.
Then I made the Russian version of the program in a month and a half.

The program works on the principle of a parser for programming languages: a large “universal pattern / pattern of a sentence” is used, which describes all possible variants of matching the words in a sentence; all possible suggestions.
The parser is slightly modified for a human language (for more details, see the pdf-e description); In addition, in order for the program to offer a more successful (in her opinion) version of the input sentence - for this you need to consider all the forms of each word from the input sentence. For example, for the sentence “This computer program recognizes simple text” it is supposed to consider 13.934.592 combinations of variants (24 * 24 * 12 * 7 * 24 * 12); This is done in about 5 seconds.

image

In addition to correcting grammar program:
* Supports the multivariance of words and phrases at all levels of the syntax tree of the sentence.
* Shows graphically the tree of the selected (revised) version of a sentence or a recognized part of it; The “trees” of the universal sentence pattern, the “words” of the input sentence and their variants are also shown.
** In the windows of the graphical display of the structure of the revised proposal and the universal template of sentences, you can: move the view, click the left or right mouse button on the empty
place (cancels the selection of the element), by elements (left - opens / collapses the element (link "-EXP-"), right - selects it) or their parts (only the left mouse button - on the links "-AND-" (horizontal) and "-OR-" (vertical) - opens / collapses them).

Here are links (for quick reference) to the page of this program in different languages:
In Russian: sites.google.com/site/sergprogrammer/main/main_ru/grammar_ru
In English: sites.google.com/site/sergprogrammer/main/main_en/grammar_en
At the moment there you can download the Lithuanian "trimmed" version of the program, in which the graphical display of elements of all trees / graphs is turned off and some features are disabled, and the more open Russian version.

The latest version of the program (in Russian and for work with the Russian language) is on the site of the project “multigrammar” at the address: sourceforge.net/projects/grammar-multi/files
There you can also download a large pdf or doc description of “how it all works approximately”.
There is also a forum where I can answer the questions of unregistered users.

The interface of the “Russian” version of the program is changed in the “messages.txt” file, the dictionary (microscopic) - in the “dic.txt” file, and the universal pattern of the sentence — in the “rules.txt” file.

I'm still trying to create the English version; the version will be no longer for use but to demonstrate the principle of operation.

Update November 15, 2011: I decided to add here screenshots of the English version of Grammar:

image

image

image

At the last, by the way, the structure of the parsed sentence in the program view according to the version of the file of the "universal sentence pattern" rules.txt

Source: https://habr.com/ru/post/126675/


All Articles