Sentiment text analysis

Sentiment analysis of information flows has a great potential for monitoring, analytical and signaling systems, for workflow systems and advertising platforms targeted on the subject of web pages.

This material introduces the concept of sentiment analysis, the main methods for determining the tonality and new approaches in this area.

A natural language text, in addition to information, can express an emotional assessment of what is being reported. For example, such a proposal contains a negative assessment of what is happening:
')
(1) In 2012, Armstrong was found guilty of using illegal drugs following an investigation by the American Anti-Doping Agency.

And this is positive:

(2) Apple received final permission to build a new campus.

The emotional assessment expressed in the text is called the tonality or sentiment (from English sentiment - feeling; opinion, mood) of the text. A person evaluates the world at once on many scales (good-bad, strong-weak, big-small, happy-unhappy, funny-sad, fast-slow, etc.), and these scales are differently emotionally loaded. But for simplicity, we can assume that the emotional assessment comes down to the scale of good-bad or positive-negative.

Historically, the traditional approach to sentiment analysis is the task of classifying a text (part of a text) into two or three categories (negative, positive, neutral, or simply: negative or positive) [Pang & Lee; Turney]. It is from this task that the analysis of tonality began its development: to evaluate the sentiment of evaluative reviews on any subject (cinema, restaurants, electronics, etc.).

However, this is not the only and not decisive type of the task that text analysis should solve. Currently, readers are not interested in a general emotional assessment of the text (average temperature in a hospital), but in the attitude of sentiment to a particular object referred to in the text, or in the attitude of the subject of the utterance to the object under discussion.

The object, in relation to which the emotional evaluation is expressed, is usually called the object of tonality. Thus, in sentence (1) Armstrong is the object of tonality, and in sentence (2) - Apple . This kind of analysis sentiment is called object tonality (object-based).

The carrier of the emotional evaluation expressed in the text is also usually a well-defined person, in general, he is the author of the text. However, if the author of the text refers to someone's opinion, as in sentence (3) below, or quotes another person’s statement, as in sentence (4), then the bearer of emotional evaluation, or, as they say, the subject of tonality will be opinion referenced.

(3) Religious studies, according to S. A. Buryanov, today do not constitute an exact science, characterized by unity and having strict and generally accepted principles

(4) Yesterday, the head of the Central Election Commission, Veshnyakov, once again praised the amendments to the electoral law and said that now the law is blocking many loopholes for abuse.

Thus, the tonality of the utterance is determined by three components: the subject of the tonality (who made the assessment), the object of the tonality (about whom or what the assessment was made) and the actual tonal evaluation (as estimated). In our examples you can find the following components of tonality:

In one sentence, several emotional assessments can be made simultaneously regarding various objects of tonality:

(5) Samsung ordered Apple to pay $ 290 million in compensation.

Regarding Apple, this is rather a positive event, which cannot be said about Samsung .

It can also be a different tone with respect to the same object:

(6) “Favorite” lemonade based on caramel, so beloved by the buyers of our region, can provoke the development of diseases.

Here the object "lemonade" is mentioned both in a positive way and in a negative way.

Another direction of the analysis of sentiment is to identify the negativity / positivity of the attributes of an object of tonality (feature-based / aspect-based sentiment analysis). For example,

(7) Another plus of this smartphone is the light indicator, which significantly saves battery power, supports flash drives up to 8 GB, but the camera is quite weak.

Here, the tonality object is a “smartphone”, but its tonality consists of several factors (light indicator, battery, flash drive, camera), which may have different polarities. Thus, the task here is to identify the attributes of the product (object) and determine their tonality. Moreover, the same quality characteristic for one attribute can be positive, and it is negative for another attribute (for example, a “big battery” for a phone is rather good, but the “big weight” of a phone is rather bad).

In addition to the tonality itself, the text can be judged by subjectivity / objectivity of judgment (Opinion Mining). If this is the opinion of the author of the statement, containing a subjective assessment of the described, then the text is considered subjective. Conversely, if this is a media report or an opinion, by default shared by the participants in the dialogue, then it is considered objective.

For example, a message from the social network:

(8) So far, I stay with my - Samsung Galaxy Note 3 is the best gadget that went through my hands!

It has a subjective assessment regarding the smartphone. A text from the media:

(9) Promsvyazbank strengthened its position in the top 10 Russian banks in terms of loans to organizations.

contains objective information.

Subjective information will include direct and indirect speech in the text, as well as citation (see examples 3 and 4). In such cases, the automatic determination of the subjectivity / objectivity of the statement is technically much easier to implement than in the general case.

Methods for determining tonality

There are two main methods for solving this task of automatically determining tonality:

Statistical method. For it, we need to pre-marked the tonality of the collection (corpus) of the texts on which the model is trained, with which the tonality of the text or phrase is determined.
Method based on dictionaries and rules. For this, dictionaries of positive and negative words and expressions are prepared in advance. This method can use both lists of templates and rules for combining tonal vocabulary within a sentence, based on grammatical and syntactic parsing.

In addition, a mixed method is sometimes used (a combination of the first and second approaches).

In the statistical approach, the support vector method (SVM), Bayesian models, various regressions are widely used to solve the problem of general classification of texts into tonality classes [Chetviorkin & Loukachevitch - description of ROMIP-2011 data analysis sentiment, almost all participants used SVM or Bayes] .

If the goal is to determine the tonality of a certain, predetermined object (several objects), then more complex statistical algorithms are used, such as CRF [Antonova and Soloviev], semantic proximity algorithms (for example, latent-semantic analysis - LSA, latent placement of Dirichlet - LDA) et al., As well as rule-based methods [Pazelskaya and Soloviev].

To determine the attribute key, language models [García-Moya & all], neural networks [Tarasov], or thematic thesauruses are used.

SentiFinder tonality determination module

The SentiFinder module defines three types of tonality of Russian-language texts (positive, negative and neutral) with respect to a given object of tonality both within one sentence and averaged over the entire document.

The module is implemented on a random Markov field algorithm using tonal dictionaries. This made it possible to achieve not only good quality (the average accuracy for the three types of tonality is about 87%.) And high speed of word processing (the speed of the SentiFinder module is over 100 kB / s on one stream).

A feature of this module is that it allows you to evaluate the power of emotionality. Thus, the user is given the opportunity not only to obtain a qualitative, emotive assessment of the document as a whole relative to the tonality object of interest, but also a quantitative ratio of negative and positive attitudes towards it.

The module can work with both the "classic" texts of the news flow, and the "non-classical" language of social messages. media

You can get acquainted with this service on the site eurekaengine.ru

Bibliography

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan Thumbs up? Sentiment Classification using Machine Learning Techniques // - 2002. - P. 79–86.
Peter Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews // Proceedings of the Association for Computational Linguistics. - 2002. - p. 417-424. - arΧiv: LG / 0212032
Anna Antonova and Alexey Soloviev, Using the conditional random fields method for word processing in Russian. Computational linguistics and intellectual technologies: "Dialogue-2013". Sat scientific articles / Vol. 12 (19) .- M .: Izd-vo RSUH, 2013.– P.27-44.
Sentiment Analysis Track at ROMIP-2012. Chetviorkin II, Loukachevitch NV Computational linguistics and intellectual technology. Computational linguistics and intellectual technologies: "Dialogue-2013". Sat scientific articles, volume 2, p. 40-50.
Anna Pazelskaya and Alexey Soloviev, Method for determining emotions in texts in Russian. Computational linguistics and intellectual technologies. Computational linguistics and intellectual technologies: "Dialogue-2011". Sat scientific articles / Vol. 11 (18) .- M .: Izd-vo RSUH, 2011. – P.510-523.
Tarasov DS Deep Recurrent Neural Networks for Multiple Language Aspect-Based Sentiment Analysis // Computational Linguistics and Intellectual Technologies: Proceedings of the Annual International Conference “Dialogue-2015”, Issue 14 (21), V.2, pp. 65-74 (2015).
García-Moya, L., Anaya-Sanchez, H., Berlanga-Llavori, R .: Retrieving product reviews. IEEE Intelligent Systems 28 (3), 19–27 (2013)

Source: https://habr.com/ru/post/262595/

All Articles

Sentiment text analysis

More articles: