Yandex has to respond to tens of millions of requests every day. The search engine should be able to quickly and accurately process them. Without using linguistics, the search engine can only find exact matches in the indexed documents. To find relevant documents, the system needs to correctly identify the query language, correct typos, morphologically parse each word, expand the query with synonyms, or reformulate it altogether. In this lecture, Alexei Zobnin tried to give the students of the Small ShAD answers to the following questions:
Why take into account the morphology?
How and why do we define the language of the request and the document?
What is a language body?
What are language models and how are they used in search?
How is the morphological analysis of non-vocabulary words?
How to determine the correct meaning and morphological paradigm of homonyms?
What are some typos, and how do we fix them?
What are query extensions and how can they be useful?
Lectures of the Small School of Foreign Affairs are devoted to computer science, mathematics, linguistics and related fields of knowledge.
The speakers are leading scientists, specialists of high-tech companies and teachers of famous universities.After each lecture there is a discussion with the audience and answers to questions.
We try to keep in our audience the informal atmosphere of visiting schools and conferences.Lectures are completely independent of each other, and students can freely choose interesting topics.Classes are free.