⬆️ ⬇️

Segment-statistical approach to the Internet as a corpus - a new seminar in the ABBYY Open series

image We continue the ABBYY Open series of computer linguistics workshops. The next event will be held January 31 at 17.00 in the Moscow office of ABBYY. The topic is “Segmental-statistical approach to the Internet as a corpus (on the example of the analysis of the blogosphere)”. Vladimir Belikov, Doctor of Philology, Associate Professor of the Department of Theoretical and Applied Linguistics, Faculty of Philology, Moscow State University, leading researcher at the Russian Language Institute, RAS, will speak at the seminar.



His report focuses on sensible methods for extracting reliable linguistic information from the Internet. The report provides a comparative analysis of the National Corpus of the Russian language and various Internet corpses as sources of information about the Russian lexical language of various types. On the material of Russian explanatory dictionaries and individual linguistic studies, typical errors and inaccuracies that resulted from ignoring modern corpus methods in lexicography are analyzed.



The segmental structure of the Russian-language blogosphere is considered, the various results of its analysis by the segment-statistical method are shown in the study of the synchronous state and dynamics of changes in the general Russian and regional vocabulary, phraseology and grammar. The method of linguistically oriented search in the blogosphere and ways to overcome the difficulties arising from this are described in detail.

')

Detailed information and registration - on the ABBYY Open page .



Update: the video of the seminar is posted here .

Source: https://habr.com/ru/post/137125/



All Articles