Testing of linguistic technologies: competition for the automatic resolution of coreference and anaphor

So, as promised, we are telling: recently, the results of the competition for the automatic resolution of anaphora and coreference were summed up. Such competitions for the Russian language were held for the first time and organized by their team from the HSE-MSU.

We are sure that among our readers there are many linguists who, even without us, know perfectly well what anaphora and corereferencing are, we tell the rest. The same object of the real world can be mentioned in the text several times in different ways. "Vasya is a millionaire, he wants to buy an island." In this phrase, the pronoun “he” and the noun “Vasya” refer to the same person (that is, they have the same referent ). If the text analysis system understands that “he” is “Vasya”, then she knows how to resolve the anaphor.

It is more difficult when Vasya appears in the text several more times - for example, as “Ivanov”, “client”, “head of the company” or “football player”. Then we are not talking about the pronoun anaphor, but about coreference of nominal groups. The task of the system in this case is to combine all the words behind which this person is hidden into one coreferent chain. Let us give a few examples, and at the same time we will show how our Compreno technology does it.
')
1. Evgeni Plushenko is the only figure skater in the world who was able to win the medals of the four Winter Olympics. The athlete received his first Olympic experience in 2002 at the games in the American Salt Lake City.

Due to the syntax, the system understands that “Plushenko” and “figure skater” are one person, then this person is combined with the person distinguished on the “athlete” due to their connection in the semantic hierarchy, and in addition anaphoric rules replace the pronoun in the tree of parsing same "athlete." The result is coreferent chain.

2. Darrell Lance Abbott was born in Arlington, Texas, a suburb of Dallas and Fort Worth, into the family of musician and producer Jerry Abbott. His father owned the Pantego Sound Studios in Pantego, where Darrell saw and heard many blues guitarists, but after hearing Ace Frehley from Kiss, he wanted to start playing the guitar.

Here, the system immediately correctly parses the name "Darrell Lance Abbott" into parts and then identifies it in parts. Therefore, we did NOT get into the coreferent chain of Abbott's father, Jerry Abbott - the surname is the same, but the name is different. But in the next sentence, the system recognizes Darrela by name without a last name.

3. Rosneft can gain control over all airports in Kyrgyzstan. The Russian company has signed a memorandum on the acquisition of at least 51% of Manas International Airport OJSC. Novaport Roman Trotsenko, who previously acted as a partner of Rosneft in the project, is likely to become the operator of Kyrgyz airports.

Here again due to the fact that in the semantic hierarchy the IC “ROSNEFT” is a descendant of the IC “COMPANIES”, Compreno understands that the second sentence also concerns Rosneft. This example shows how the permission of coreferency helps to correctly extract the participants of the events - it is clear to us who signed the memorandum, although the proposal just says “Russian company”.

But back to the competition. Their goal was to assess the quality of the methods developed for the analysis of anaphora and coreference in Russian. Seven developers took part in the contest: ABBYY, RCO, SemSyn, Open Corpora (St. Petersburg), Mail.ru, Institute for Systems Analysis of the Russian Academy of Sciences, Sergey Ponomarev. We emphasize once again: the goal was to compare the algorithms, not the products of companies. The results of the competition were summarized at the conference "Dialogue" , the largest conference in the field of computational linguistics in Russia.

On the first track, it was necessary to find complete coreferent chains, on the second, to allow the anaphora, that is, for all pronouns to find whom they point to. Both of these tasks are more complicated than syntactic and morphological analysis (on these topics, competitions took place several years ago), with most systems using syntax and morphology to mark up a text collection before resolving an anaphora.

On the first track three participants competed, on the second - seven, but only “runs” on the second track were seventeen. A variety of systems participated - from experimental ones (their goal was to test specific anaphor resolution algorithms) to complex ones, in which the module defining referential links is just one of the components.

How were the competitions.

At first, participants were given the opportunity to practice their systems on a hand-marked small text collection. It includes 100 texts, each of which contained from 5 to 100 sentences, the longest - 170 sentences. In the corpus, 2000 anaphoric pairs of “pronoun - antecedent (the word that the pronoun indicates)” were allocated. Then the systems had to analyze a large text corpus. For the competitions a corpus was specially assembled, which included excerpts from texts of various genres: news notes, scientific articles, posts from blogs, and fiction. All texts were taken from open sources: the Open Corpus of the Russian Language (Open Corpora), the network library Lib.ru, the Lenta.ru publication, Wikipedia and other resources - a total of 1342 texts.

The results were evaluated by comparison with the “Gold Standard” - a part of the same body, marked up manually. The evaluation took place in a semi-automatic mode (the points at issue were double checked by experts).

Results of the competition

Competitions showed that existing systems are well able to resolve the anaphor (for example, Compreno, who won first place, showed a F-measure of 76% with an accuracy of over 80%), while a complete analysis of coreference is worse. For the Russian language, the methods used in English are insufficient - free word order interferes, some other features of the language and the acute lack of open tagged corpuses (created by the organizers, apparently, was the first resource of this kind). The new building can be used by developers for self-testing of their algorithms, and the markup rules formulated by the organizers during the work on it will help researchers to create new buildings for the same purposes.

Important for ABBYY result - our Compreno won on both tracks. According to the rules of the competition, we cannot open all the names of winners and losers in our blog. The meaning of such rules is that the competition (or, more precisely, testing) is conducted not for PR, but for the benefit of developers who compare their algorithms with colleagues and get estimates (they can be referenced in scientific publications) and experience. In addition, as a result of the competition, a labeled test case is always created, the Gold Standard, on which everyone (eg students) can then chase their own algorithms and compare with the level achieved in the industry.

We cannot name the winners and losers in blogs and media, but soon a detailed article will be posted on the Dialogue website with an analysis of the results of the competitions, which will include the final rating. The article of the organizers on the preparation of the competition and the evaluation method, read here .

Source: https://habr.com/ru/post/229515/

All Articles

Testing of linguistic technologies: competition for the automatic resolution of coreference and anaphor

How were the competitions.

Results of the competition

More articles: