iKnow Review Analyzer (iKRA)

Intro

With the help of InterSystems iKnow technology, we made a rating system called iKnow Reviews Analyzer (iKRA). About the prototype of the project can be found here . iKRA analyzes textual feedback from users, automatically giving a numerical rating to the subject of the study. These functions can be very useful, for example, on online sales sites, thematic forums or collections of media content. In other words, wherever the community is discussing any things.

What makes the decision?

iKnow Reviews Analyzer performs analysis of any subject area, whether it is the online sales of home appliances or booking of tourist hotels in hot countries. To get the results, you must go through the following key steps:

collect feedback in the subject area of interest to us;
create dictionaries - base words for the calculation;
create an area for loading and analyzing data;
run the model on the calculation;
drink coffee / wait;
see the results.

Usage example

Now, what it looks like in practice ... As an example, we will analyze reviews on smartphones. We will select five manufacturers:
')

Apple;
HTC;
LG;
Samsung;
Sony.

Suppose that each of them we are interested in two models of smartphones. For each selected model, we will load 50 reviews - in the end, 500. We take comments from Yandex.Market:

Each of the reviews will be placed in a separate file and for convenience we use the following file layout (Figure 1):

Figure 1. File location hierarchy

In parentheses is the overall assessment of the smartphone, which the user set when writing a review. It is recorded in the metadata and subsequently used to optimize the calculation algorithm. Original reviews are here .

For the analysis, you must create a domain iKnow - storage of unstructured data. We will not dwell on this in detail, since this issue is described in detail here .

When the domain is created and filled with reviews, let's proceed to the analysis of its contents. When choosing a smartphone, the following parameters are critical for me:

performance;
connection quality;
comfort / convenience.

For simplicity, further narration will introduce the following concepts:

category - parameter to be estimated;
functional (f) marker - a term that characterizes the estimated parameter / category;
functional dictionary - many f-markers;
emotional (e) marker - the word reflects the attitude of the author to the described subject;
emotional vocabulary - many e-markers.

Based on the selected characteristics, we compile a functional dictionary, where for each of the specified categories we select the F-Markers - the defining words. For example, for the category of "performance" it can be: "speed", "processor", "memory", "speed", "core" and so on. All f-Markers are recorded in a special file . Figure 2 shows an example of the “Performance” / “Performance” category:

Figure 2. F-Markers

Next, we compose an emotional dictionary, filling it with appropriate e-Markers. The entire list is not cited here, but for clarity I will list some of them: “good”, “convenient”, “liked”, “problematic”, “flaw”. e-Markers give a positive or negative color to the sentence in the text. Each e-Marker will have a numerical score. For simplicity, use +1 for positive, -1 for negative. All e-Markers are also recorded in a special file . Figure 3 shows an example of e-Markers:

Figure 3. e-Markers

Once the dictionaries are ready, you can calculate the estimates. To do this, on the "Domains" tab, select the desired one and click "Run calculation" (Figure 4):

Figure 4. Calculation of ratings

To see the result, open the table of the class ikra.Dictionary.MarksUnit - which contains estimates for each smartphone model or refer to the class ikra.Dictionary.MarksReview - which contains estimates for each individual review. Information is displayed in the management portal. Select the SQL section and view the table of interest. Figure 5 shows an example of viewing a table of the class ikra.Dictionary.MarksUnit.

Figure 5. Viewing the table ikra.Dictionary.MarksUnit

With the help of DeepSee, let's see what we did. We created a cube that uses the results of calculating estimates by categories and plotted for each device under study (Figure 6):

Figure 6. Grade rating by category

Among the analyzed data, the first places were distributed as follows:

performance - HTC ONE;
call quality - HTC ONE;
comfort / convenience - Samsung Galaxy S5 SM-G900F.

And what if you need to add another category?

Previously, to evaluate each individual category, it was necessary to prescribe the corresponding class property manually. This created an inconvenience when, when analyzing new subject areas, the categories and their number changed, it became necessary to edit the code with each such change, which is obviously the most fun and productive use of time. To get out of this situation, we considered two possible solutions:

Reserving a large number of class properties;
Using the database.

The first option allows you to forget about the ever-changing number of categories, without bothering with the structure of the database. But storing such a volume of properties is inconvenient, and besides, no one guarantees that a larger number of estimated parameters will not arise. From this path we refused.

The second option solves the problem with an indefinite number of categories and does not require a fixed amount of memory for storing each instance of the class. On the basis of the database, the system easily adapts to the calculation of any subject area with any number of categories.

Considering the advantages of the second option, it was he who was implemented in the iKRA system

Adding a new category

“And then I realized that I need to evaluate another parameter of the smartphone - the camera! (Well, if you catch Pokémon, you’ll enjoy it) "

Add a new category is not difficult, for this we change the composition of the functional dictionary and enter a new name - Camera (Figure 7).

Figure 7. Adding the Camera category

We define a category by adding f-Markers on the corresponding tab (Figure 2).
On the domains tab, select the desired one and run the calculation (Figure 4).
We are waiting for the end and proceed to view (Figure 8):

Figure 8. Updated rating chart by category

Hooray! We easily introduced a new category and appreciated it. Now the picture is as follows:

Camera - iPhone 4S
performance - HTC ONE;
call quality - HTC ONE;
comfort / convenience - Samsung Galaxy S5 SM-G900F.

To be continued

Now we quickly and without the cost of rewriting the code get estimates of any product categories based on their feedback. Now for this you need to configure the dictionary and run the calculation. A difficult place is still the process of loading reviews into the database, but we will discuss this issue in the next article.

» Github

Source: https://habr.com/ru/post/308940/

All Articles