What will be the “Dialogue” of linguists and data analyst

From May 29 to June 1, the 25th international scientific conference on computer linguistics and intellectual technologies “ Dialogue ” will be held at the Russian State University for the Humanities (RSUH). We already talked about Habré about what “Dialog” is and why ABBYY is its main organizer. In this post we will talk about the main topics of the conference, key speakers, their reports and four competitions on the creation of automatic text analysis systems in the framework of Dialogue Evaluation .

This year, the "Dialogue" will be a few key topics:

The use of neural networks for language analysis . It is considered that deep learning is the transformation of raw data into a result (so-called end-to-end), during which it is quite difficult to interpret the “logic” of its receipt in meaningful linguistic concepts. But why not use neural networks to obtain the very knowledge of the language?
The use of more complex language models in deep learning . Another important trend for Dialogue: distributive models ( embeddings ) are clearly evolving from “mid- hospital ” methods of obtaining - to the use of context, syntactic and semantic information.
Application of big data analysis methods to problems for which there is little data . The year 2019 has been declared the International Year of Indigenous Languages ; therefore, participants in one of the sessions of the Dialogue will discuss methods of using machine learning to describe and preserve “low-resource” languages (for example, Evenki or Selkup).
Multichannel corps : today there is a tendency to study the speech act in its entirety, its components, including the verbal part, intonation, mimicry, gestures. Such studies are especially important when training robots, intelligent assistants and chat bots.

Famous international specialists in computational linguistics are traditionally invited to the “Dialogue”. This year's conference will include:

Chris Beeman of the University of Hamburg. One of the leading analysts in the field of computer semantics. He will talk about adaptive machine learning technologies that allow to take into account individual experience. May 31 (Friday), 15: 00-16: 00.
')
Vossen Peak from the Free University of Amsterdam, founder and president of the WordNet Global Association. His main area of interest is the human-computer speech interaction. Peak Vossen will make a presentation on "Communicative robot that studies people and the world." He will talk about a model of a robot that learns information about the world and its interlocutors through natural language communication. The robot learns everything that people tell it, that it observes during different situations, and everything that it finds on the Internet. May 30 (Thursday), 15: 00-16: 00.

A total of 102 reports of the main track and about 20 student reports will be presented at the “Dialogue”. On May 29, on the first day of the conference, presentations will be made on :

Andrei Kibrik , Director of the Institute of Linguistics, Russian Academy of Sciences. He will make a report on the new corpus methods of fixing speech and gestural elements of communication created by his research group. May 29 (Wednesday), 10: 30-11: 50.

Igor Boguslavsky , a professor at the Madrid University of Technology, and his colleagues will talk about how a computer can be trained to properly analyze so-called. "Vinograd's schemes" is a new and more complex than the traditional Turing test, a way to evaluate the capabilities of artificial intelligence systems in understanding the language. May 29, 12: 20-13: 30.

Valentina Apresyan , Professor at the HSE School of Linguistics. Her report is devoted to implicatures : not explicitly expressed, but meanings and assumptions derived from the text. The study of implicatures, especially false ones, allows, for example, to identify unscrupulous publications in the media. May 29, 12: 20-13: 30.

A lot of interesting things will happen on other days. Traditionally, the “Dialogue” pays great attention to the new expressive possibilities of the language. For example, Maria Polinskaya from Harvard University and Irina Levontin from the Institute of OC will analyze in her speech popular emotional expressions like “We got enough to use the infinitive” (by the way, this is the title of the report. You can listen to it on May 30, 10: 00-13: 30 ). Antonina Laposhina from the Pushkin Institute in her report “Zazyab li Zui?” Analyzes the lexical composition of the Russian language textbooks for elementary school - from the standpoint of modern corpus linguists (May 29, 15: 00-18: 30).

Of course, a lot of work is devoted to the hot topic of applying neural networks to problems of language analysis. For example, on May 31, a special section of the Dialogue is devoted to such important areas of research as language models in deep learning, transfer learning, etc.

On May 30, at 19:00, there will be a round table devoted to the prospects for modeling a speech act during human-computer interaction. This trend is booming, and analytical multimodal linguistics is not easy to keep up with what modern methods of analyzing vast amounts of audiovisual information allow us to do.
May 31, at 19:00, we invite you to the round table “ Brave New DL Word: where is the NLP place in it? ". The panelists will discuss the “provocative” thesis that NLP today is “dissolved” in deep machine learning technologies and loses the status of an independent scientific discipline. Of course, many researchers will disagree with this statement, and we will have exciting performances from our opponents.

One of the key events of Dialogue is the summing up of technological competitions between the developers of the systems of linguistic analysis of texts in Dialogue Evaluation . This year there were competitions for four tasks:

automatic generation of news headlines;
automatic analysis of low-resource languages (when there is very little data for machine learning);
automatic resolution of the anaphor and definition of referential chains (various references to the same object in the text),
automatic recovery of words by context (some types of ellipsis).

For such competitions, as usual, it was necessary to create specially prepared data (datasets) to train the tested algorithms. For the first time, ABBYY technologies for analyzing texts in natural language participated in creating such datasets for a part of the competition. This allowed the buildings to be made much larger due to the large amount of preprocessing done by the computer. In more detail about it we will soon tell on Habré. The results of the Dialogue Evaluation will be summarized in the “Dialogue”:

May 30, 10: 00-13: 30, special session following the results of testing automatic processing systems for gapping-ellipsis.
May 31, 10: 00-13: 30, special session on the basis of testing systems for analysis of an anaphora and special session on the results of testing systems for generating news headlines
June 1, 10: 00-13: 30, special session following the results of testing systems describing short-resource languages.

Conference working languages are Russian and English. The detailed program of the conference is posted here .

The conference materials will be published in the yearbook “ Computational Linguistics and Intellectual Technologies ”, which is part of the international citation system Scopus .

You can register here , registration takes place until May 28. Terms of participation .

Elizaveta Titarenko, editor of the corporate blog ABBYY
with the participation of Vladimir Selegay, director of linguistic research at ABBYY

Source: https://habr.com/ru/post/452944/

All Articles

What will be the “Dialogue” of linguists and data analyst

More articles: