📜 ⬆️ ⬇️

The program analyzes the neologisms on Wikipedia

A computer program called Zeitgeist , created by scientists from Ireland, was presented at the European Conference on Artificial Intelligence a few days ago.

The program searches Wikipedia for words that are not in the official linguistic reference book WordNet . The base WordNet is considered a reference in the sense that it is usually used in computer systems for automatic analysis of the meaning of texts. These systems are actively used by marketers and specialists in neuro-linguistic programming (NLP).

The auxiliary utility Zeitgeist finds neologisms, that is, new words that have just appeared in the human language. They can be quite widespread throughout the blogosphere and may even be present on Wikipedia, but are not considered official vocabulary. However, for the work of linguistic programs that analyze the blogosphere, you need to have their approximate value. This problem is solved by Zeitgeist.

When a program stumbles upon a neologism on Wikipedia, it examines the links from this page to find keywords to describe the neologism. The program does not read documents on these links, but takes only their names. For example, in the article “Gastropub” (a neologism; a pub that specializes in cooking) there are links to the articles “pub” and “cooking”, and this gives the key to understanding the word.
')
According to the developers of Zeitgeist, the Wikipedia reference structure reflects the relationship between different concepts and ideas. Unfortunately, people tend to place links anywhere. To prevent this from interfering with the work of the algorithms, Zeitgeist filters unanswered links. If a document by reference does not link back to a page with a neologism, then this document is ignored.

True, the work of the program is not always correct. For example, in the article about "feminists" (women who hate men) there are references to feminism and Nazism, but after all, feminist women have nothing to do with the doctrine of national sociolism. In this case, the program may fail. But this happens quite rarely. In 75% of cases, Zeitgeist works reliably enough to make a correct connotation for a particular neologism.

Many commercial companies are interested in this technology because they want to receive relevant and reliable reports about what people write about their products in blogs and forums. There are a lot of slang words and neologisms in these texts. Living language changes very quickly, and linguistic bases are updated late. Thus, Wikipedia is an ideal source of information for computer linguists, even though the use of neologisms is officially prohibited by the rules of the popular encyclopedia.

Source: https://habr.com/ru/post/4369/


All Articles