We will continue from the point where we stopped last time : we have considered several ways to solve the cold start problem, now I propose to consider other problems of recommendation systems (hereinafter simply SR) and think about how different types of SRs can complement each other. At once I will make a reservation that I will not consider in detail how to solve this or that problem. The purpose of this article is only to help developers navigate the varieties of SR and related problems.
For starters, you still have to supplement the classification of CP. Przemyslaw Kazienko and Pawel Kolodziejski proposed to divide all SR into five types: statistical, collective, associative and informational. Let's start with the simplest.
Statistical CPs (Statistical approaches) are systems that are based on statistical data collected from users. Simply put, these are our most-most-accustomed ratings: the most downloaded, most read, most popular, and so on. Obviously, unlike all SRs that we considered earlier, this approach is not personalized and offers the same recommendations for all users.
Demographic CPs (Demographic recommendation) compare the characteristics of the object with the characteristics of the user. Simple discrete properties can be considered: age, gender, place of residence, nationality and more complex ones, for example, user interest in a particular object. Some of them can be easily determined by the system: the country and even the city in which the user lives can be determined by IP , the preferred language can be learned from the properties of the browser using JavaScript. Other data can ask the user to specify explicitly. Despite the fact that the amount of information received about the user will be quite limited, it can help to make important decisions, for example, someone to recommend the doll, and to whom the video is for adults.
Associative CPs ( Association rules ) build recommendations based on data about which objects are used together. A striking example of the use of such systems is the analysis of user purchases. For example, someone bought a new phone and, quite likely, they will need headphones, a case, charging, an additional memory card and other accessories for him. Other SRs here can be powerless, because these objects do not have any common parameters and cannot be compared, they are united only by the fact that they are used together.
Informational CPs (Content based; I have not found a generally accepted translation, it would be more accurate to have recommendation systems based on content , but this is too long) - these are systems that are looking for objects that are similar to those that the user has already rated positively. Unlike associative systems, they can operate with almost any data: from the simplest binary values (bought - not bought) to complex text descriptions. In the last article, we looked at a couple of important methods used in the implementation of systems of this type, including the popular vector model . Looking ahead, I will say that this is one of two systems that can solve the problem of a cold start.
Collective (collaborative) CPs ( Collaborative filtering ) are the most common systems that, in order to predict the rating of a particular object, are guided by the ratings of other users. Although such systems are quite effective, their accuracy strongly depends on how much the evaluated objects intersect between individual users. The more and more often these intersections, the more accurate the system will be. This fact limits the scope of application of this model; if each user has his own unique information that no one else has seen, collective CPs will be powerless.
')
More clearly the pros and cons are illustrated in the table, which I found in the article of the already mentioned Przemyslaw Kazienko and Pawel Kolodziejsk " Personalized Integration of Recommendation Methods for E-commerce " (pdf), which I added with another column.
Method
Data source
User attachment
Attachment to an object (context)
Solving the problem of a new object
Solving the problem of a new user
Solving the problem of intersection of objects
Considers quality aspects
Statistical
Ratings, views, downloads and etc.
-
-
-
+
+
+
Demographic
Object and User Characteristics
+
-
+
+
+
-
Associative
Common uses
-
+
-
+
- / +
-
Information *
Object Properties
+ / -
+ / -
+
+ / -
+
-
Collective
Ratings
+
-
-
-
-
+
* In this example, the authors meant that the information method will be used exclusively for comparing two objects, but if the results of this comparison are subsequently applied to a specific user, then other problems will arise.
Attachment to the user means that the system needs to identify the user in order to give him advice. Obviously, this problem affects all personalized CPs. This problem can be solved by simple authorization or other methods of user identification.
Attachment to an object indicates a context in which a particular CP can be applied. For some systems, for example, for associative and informational it is required that the user selects a specific object from which she can push off (look for other objects similar or complementary to the selected one). Other systems can make recommendations in any context on any topic.
Solving the problem of a new object is one of the variations in the problem of a cold start. A newly added object can be used only by systems that directly use its properties, which are usually set at creation. For statistical and collective SRs it will take some time until a sufficient number of users assess it. For associative systems, this problem may be more difficult, as it is necessary to look for new patterns of use of the object.
Solving the problem of a new user is another variation on the problem of a cold start, only in this case we do not know anything about the user. Obviously, this will not be a problem for associative and statistical CPs, since they do not depend on the user at all. For a demographic system, this will not be a problem if at least a certain amount of data that the system can use has been specified when creating a user. For the collective system, however, it will take time until it can find out the information it needs.
The solution to the problem of intersection of objects is typical for almost all CPs, except for the collective and partially associative. As I already wrote, their effectiveness depends on how often the compared objects intersect. For a collective system, this means that the more people view some material, the more accurately it will be able to form an opinion about it. For associative - that the more often an object intersects with some narrow set of other objects, the easier it will be for the system to identify patterns of its use. Note that if it intersects with a very large number of objects, it will only confuse the system.
Takes into account qualitative aspects - this is the item that is responsible for whether the system takes into account the quality of the object. It was not in the original table, but which in my opinion is also worth considering. The fact is that most methods looking for similarities between objects do not take into account their quality. They can find two similar news, but if one of them is interesting, it does not mean that the second also. In this case, the system can not be sure that what it recommends the user is not complete trash. This problem can be solved only by those systems that take into account the ratings of objects, that is, collective and statistical. At what ratings can be obtained as well as explicit methods and implicit observations.
The table clearly shows that different systems can effectively complement each other. In order to choose a successful combination of systems, it is necessary to take into account what data they will work with and in what context to apply. If you do not do this, then all the enormous work of introducing a new system may not give a noticeable increase in accuracy.
PS The article turned out to be quite voluminous and without concrete examples, but I hope that it helped someone to more clearly imagine what different systems of recommendations are.I am afraid that in the near future I will not be able to continue the cycle, but in order to compensate for your time spent I can give several references that you should pay attention to for further study of this topic (unfortunately, they are all English-speaking, I haven’t found anything in runet ): - Robin van Meteren and Maarten van Someren: " Using Content-Based Filtering for a Recommendation " (pdf); - Przemyslaw Kazienko and Pawel Kolodziejski: " Personalized Integration of the Recommendation Methods for E-commerce " (pdf); - Michael J. Pazzani: " A Framework for Collaborative, Content-Based and Demographic Filtering " (pdf); - Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze: " Introduction to Information Retrieval " - everything related to the classification of texts. Practically about all the mathematical algorithms mentioned in them there is information on Wikipedia.