Sentiment analysis (in Russian, tonality analysis) is the field of computational linguistics, which studies the emotional coloring of texts, for more details, see
Irokez 's
article . This is a very important area of ​​machine learning: tonality analysis is needed for a better “understanding” of texts, translation from one language to another.
The difficulty of the task lies in the difficult linguistic constructions that people often use. Even a person does not immediately recognize a negative in a phrase like “In this book, only the cover is good”. How to teach this computer?
The accuracy of determining the emotions of the best computer programs to date has amounted to no more than 80%. A group of scientists from Stanford, with the participation of the notorious
Andrew Ng, managed to bring it up to 85% , and with further training of a recursive neural network, the accuracy may well increase to 95%,
says one of the authors of the study. Note that 95% - it will be absolutely phenomenal result, not all people can recognize sarcasm and determine the tone of words with such precision.
For the initial training of neural networks, scientists used a set of data from 12,000 movie permissions that were broken down into separate phrases using an automatic parser. The result was 215 thousand phrases. Each of them was read by three people with a mark according to the degree of positive or negative tonality. The screenshot shows the interface that was offered to users of Amazon Mechanical Turk.
')

The authors created the NaSent model (Neural Analysis of Sentiment), which is called the recursive tensor neural network (Recursive Neural Tensor Network) for processing individual words in each phrase, building a tree of interconnections and analysis, which emotional coloring each word carries and how words affect each other.
In the
online demo you can study how the program works. It builds a
tree with the evaluation of each word, each phrase and the entire text. The uniqueness of the program can be assessed in the following two examples, which consist of the same words, but in a different order, which changes the tone of the sentence - and the program understands this.
Analysis of the phrase "Unlike the surreal Leon, this movie is weird but likeable" gives a general positive result (blue), the combination "strange but pleasant" is correctly recognized as positive
The phrase from the same words, but in a different order "Unlike the surreal but likeable Leon, this movie is weird" is correctly recognized as a negative review (red color in the overall assessment)By the way, the online demo is also a tool for learning neural networks. Each user can suggest arbitrary text to the program for analysis - and correct the result by correcting the errors (simply by clicking on the circle with the wrong grade).
Free online text demonstrationScientific work (pdf)
Neural Network Learning Dataset (6 MB)
Program code (to be published before the EMNLP conference, which begins on October 18)