📜 ⬆️ ⬇️

Recommender systems: problem statement

Hello! My name is Sergey, I am a mathematician, and I define the development of the Surfingbird recommender system. With this article we open the cycle devoted to machine learning and recommender systems in particular - I don’t know yet how many installations will be in the cycle, but I will try to write them regularly. Today I will tell you what recommender systems are, in general, and set the task a little more formally, and in the next series we will start talking about how to solve it and how our advisory system Tachikoma is learning.

image

Recommender systems are models that know better what you want. Recently I heard an indicative anecdote: as you know, supermarket chains usually try to predict what you want to advertise for you (and this is an example of a recommender system); in particular, the supermarket may try to recognize by the changed preferences that the woman became pregnant, and start using it. So, they say that once an infuriated father broke into the office of a supermarket, whose schoolgirl daughters began to receive mails for diapers and kidswear by mail; the manager had to apologize for a long time and tell him that all recommender models are probabilistic, and mistakes are quite possible. A couple of months later my father came again and apologized himself - it turned out that he knew far from everything about his own daughter ...
')
Collaborative filtering systems are models that try to predict how much you will enjoy a particular product, receiving input about how you and other users have rated this and other products in the past. Collaborative filtering is the most popular type of recommendation systems today. In Surfingbird, for example, your ratings are like and dislike buttons (as well as the fact that you looked at the page and did not set any ratings, but more on that later). The more data we have about your preferences, the more interesting pages we can recommend to you!

Here are some other examples of well-known recommender systems.

So back to our sheep. Imagine that we have a lot of users and a lot of products (for Surfingbird, these are web pages, for Netflix - movies, for Last.fm - compositions), and some users somehow rated some products. Formally speaking, data consists of triples of the form. image where i denotes a user, a is a product, and image - rating that user i assigned to product a .

You can imagine this data as a matrix, each row of which corresponds to the user, and the column - to the product. Our task is to predict the unknown elements of the matrix; To be precise, our task is to predict which of the unknown elements will be maximal in their line, that is, which products will be most liked by this or that user.

Collaborative filtering systems have several common problems that any model has to solve in one way or another.
  1. The matrix of ratings is usually very sparse (sparse) - usually there are a lot of users and products, and in fact there are much less ratings than their work, because the average user assesses very few products; the remaining elements of the matrix are unknown to us, and it is precisely them that must be predicted.
  2. The cold start problem. For users, when a new user arrives who has no ratings yet, what to do with him? Well, when not at all, this is nothing - you can simply recommend the most popular products; and what if the user has already appreciated something, but so far very little? For products - how many ratings do you need for a new product before you can confidently recommend it? And where do these ratings come from, if you don’t recommend it to anyone?

In the next series we will talk about what to do with these and other problems, as well as how to predict unknown ratings in general - follow the developments!

Source: https://habr.com/ru/post/139022/


All Articles