10 Lessons from the Quora Recommender System

Hi, Habr! As Retail Rocket Analytics Director, I periodically attend various industry-specific events, and in September 2016 I was fortunate enough to attend the RecSys conference, which was a recommendation system, in Boston. There were a lot of interesting reports, but we decided to make a translation of one of them Lessons Learned from Building Real-Life Recommender Systems . It is very interesting from the perspective of how Machine Learning is used in production systems. There are a lot of articles written about ML itself: algorithms, application practice, Kaggle contests. But the output of algorithms in production is a separate and big job. I will tell you a secret, the development of the algorithm takes only 10% -20% of the time, and its withdrawal into battle is all 80-90%. There are a lot of restrictions here: what data to process where (online or offline), model training time, model application time on online servers, etc. A critical aspect is also the selection of offline / online metrics and their correlation. At the same conference, we gave a similar report on Hypothesis Testing: How to Eliminate Ideas as Soon as Possible , but chose the aforementioned training report from Quora, since it is less specific and can be used outside of recommender systems.

Information about Quora:

Quora is a social question-and-answer knowledge-sharing service founded in June 2009 by Adam d'Angelo and Charlie Cheever (one of the creators of the social networking site Facebook). Today, the service is visited by more than 450 million people monthly.
Quora creates a personalized tape of questions and answers, in accordance with the interests of the user. Questions and answers can be likes and dislikes, users can “follow” other users, thus creating a kind of social network around various knowledge.
')

Xavier Amatriain of Quora shares 10 important lessons about recommender systems that he has shaped over the years in the recommender industry.

Introduction

Quora's main mission is to share and develop knowledge around the world. Therefore, it is important for us to understand what knowledge really is. We use the question and answer mechanism to increase the amount of knowledge, and this distinguishes us from Wikipedia’s encyclopedic approach, so there are millions of answers and thousands of topics in our service.

If we go down a notch, we can single out three main things we care about: relevance, demand, and quality.

Relevance. This is the main task of any recommendation system, it is very important that each recommendation corresponds to the interests of a particular user.
Demand. How many people want to know the answer to this question? If only a few people are interested in any issue, then it may not be worth the effort on it.
Quality. But if we are talking about a large number of users who are interested in a particular question, then you should definitely take care of the quality of the answers in this question.

Most of the information on Quora is textual, but what is more interesting is not the work with texts, but what actions users can perform on the site.

This diagram shows the different types of relationships that exist between different types of content, users, their responses to content, etc.

Since Quora is a kind of social network, users can not only answer questions, but also “foul” each other, as well as put upvote and downvote questions and answers. Related topics (topics) are used to categorize questions and answers.

All of these interactions generate a tremendous amount of data that influences personal recommendations.

Among the recommendations that we use on the site, we can distinguish the main types:

Ranging ribbon items on the main page
Daily email newsletter
Ranking answers
Recommendations to those potentially interesting to the user.
Recommendations of users whose answers may be interesting.
Trends
Automatic detection of topics of questions
Questions related to the question being viewed

Different types of recommendations are formed by completely different algorithms based on different data types.

Since we solve problems that are very different from each other, we have to use many different machine learning algorithms:

Deep Neural Networks. Since we have a lot of text data, using deep neural networks we can get various signs from this text (feature)
Logistic regression
Elastic nets
Gradient boosted decision trees
Random forests
Lambdamart
Matrix Factorization
Lda
other

We are always looking for the best methods to solve each specific problem, and we do it not just to follow the fashion, but to get maximum efficiency.

Lesson 1. Implicit user feedback is almost always better than explicit.

Even while working at Netflix, my favorite post was that five-point ratings are useless, which is confirmed by Youtube. But why is this happening?

The main reason is that implicit data is much more “dense” (dense), allowing you to learn much more about the users themselves. The problem of explicit feedback is that we only receive it from those who have agreed to give it to us. Thus, an explicit response speaks more about demonstrative public behavior than about the real reaction of the user. Therefore, implicit feedback is more appropriate for the objective function. It also correlates better with AB test results.

But not in all cases the implicit response is better. This can be well illustrated with the help of an example, when a funny picture or a loud title attracts attention, and the user inadvertently fits the cursor or even clicks on a similar block. The activity indicators in this case will be quite high, you can get a lot of action, but this tactic is hardly correlated with long-term goals, so it is important to be able to combine implicit and explicit responses.

In Quora, we use both implicit responses, such as page visits, clicks on questions, answers and user profiles, as well as various explicit signals: likes, dislikes, sharing, etc. To prepare qualitative data for training a model, it is necessary to learn how to combine implicit and explicit responses.

Lesson 2. Be alert to the data on which your system is learning.

The second lesson relates to how you prepare your data, how it enters your recommendation system, and how the algorithm of the recommendation system will learn from it.

Even for the trivial binary classification problem, it is not always obvious which response is considered positive and which is negative.

In the case of Quora, you can find a lot of examples where it is very difficult to understand what is positive and what is negative. For example, what to do with funny, but meaningless answers that have many likes? Or what to say about the short but informative response of a famous expert? Should I consider it positive? Or how to regard a very long informative response that users do not read and who do not like? There may be many such examples.

If we talk about Netflix, then how to evaluate a movie that a user watched for five minutes and stopped? At first glance, this suggests that the film is not very interesting, but what if the user just paused it to watch tomorrow?

Thus, there are many hidden behavioral factors that you need to learn to understand, evaluate and take into account in the training of the recommendatory system.

Imagine that you have a more difficult task - the ranking, where you have to put different labels. In this case, a lot of time is spent on preparing data for training, but even more time can be spent on the separation of positive and negative examples. This can be a rather complicated and time consuming process, but it is critical to solve the problem.

Lesson 3. Your system can only learn what you teach it.

The third lesson is related to the training model. In addition to the data for learning the model, two more things are important: your objective function (for example, the probability of the user reading the answer) and the metric (for example, precision or recall). They form what your model will be trained on.
For example, we need to optimize the probability of watching a movie in the cinema and its appreciation by the user, using the history of previous views and ratings, it is worth trying the NDCG-metric as final, considering films with a rating of 4 or higher as a positive result. This is quite a long way, but it brings really good results.

After you have determined these three points, you can begin work, but note that there are many details and pitfalls. If you forget to identify at least one of these aspects, the risk is great that you will set the task incorrectly and will optimize something unrelated to your real goal.

What we do in Quora:

Data for learning models - implicit and explicit responses that are collected on the site
The objective function is the value of showing the question / answer to the user and the weighted sum of all user actions that they (questions / answers) will receive. It should correlate with the long-term goals of the company. Accordingly, you need to solve the regression problem, i.e. match user actions with target function
Metrics. Since we are talking about working with the tape of recommendations of questions and answers, and it is simply a list of elements, you need to select the appropriate ranking metric, such as the NDCG, which will make sense to solve this particular problem.

Lesson 4. Explanations may be as important as the recommendations themselves.

The fourth lesson is about the importance of explaining certain recommendations and their effect on perception. Explanations make it possible to understand why what you are recommended will be interesting to you. For example, Netflix recommends movies based on previous views. In the case of Quora, users are recommended topics, similar questions, people who can be “follow”, and explanations help to understand why it will be interesting.

Lesson 5. If you need to choose only one algorithm, rely on matrix factorization.

If you can use only one algorithm, then I would put on the matrix factorization. I often get questions about what to do if a startup has only one engineer and you need to build a recommendation system. My answer is matrix factorization. The main reason is that although it can be combined with other algorithms, it gives a good result by itself.

Factoring is, in a sense, learning without a teacher, because it simply reduces the dimension, as the PCA method does. But it can also be used for teaching tasks with a teacher. For example, you can reduce the dimension and then solve the regression problem with respect to your tagged data.

Matrix factorization fits different data types, and different implementations can be used for different data types. For implicit signals, you can use ALS or BPR, you can go to tensor factorization or even factorization machines.

We in Quora wrote our library, which was called QMF . This is a small library that implements the basic methods of matrix factorization. It is written in C ++ and can be easily used in production.

Lesson 6. Everything is an ensemble

The most interesting thing lies in the ensemble of algorithms. Suppose you started with factoring the matrices and hired a second engineer. He gets bored, and this is a great time to try a different method of training, and then combine them into an ensemble.

In the open competition of the Netflix Prize, the winning team used the ensemble as the last step of the model. The Bellcor team used GBDT (Gradient boosted decision trees) to create an ensemble, and when they teamed up with the BigChaos team, they used neural networks as the last layer of the ensemble, and thus approached some kind of deep learning.

If you have different models that do different transformations on different layers, you get unlimited possibilities of working with a neural network. It looks almost like a homemade deep neural network. At the same time, we are not talking about deep learning in the classical sense.

Thanks to ensembles, you can combine various algorithms using different models, such as logical regression, or non-linear Random Forest methods, or even neural networks.

Another interesting thing is that you transform any model in the ensemble into a feature. Thus, we erase the border between features and models. Interestingly, you can divide their production into different engineers, and then combine them - this leads to good scalability, as well as the possibility of use in other models.

Lesson 7. To build recommender systems it is also important to be able to build features properly.

To obtain optimal results, it is necessary to understand the data (domain knowledge), namely, what is behind them. If you do not understand your mission, but simply take some kind of matrix of features that interest you, some data, twist it, optimize the target function for the matrix, then with this you can optimize not what you really need.

In Quora, we need to rank the answers and show users the best of them. But how to choose the "best" to create the right ranking system? Evaluating various aspects, we deduced that a good answer should be truthful, informative (that is, give an explanation), useful for a long time, well formatted, etc.

It's hard enough to make a neural network understand what a good answer is. To do this, you must learn to interpret the data and build features that will be associated with those aspects that are important in order to optimize the model exactly for your purposes.

In our case, you need to have features that are related to the quality of the text, its formatting, user reaction, etc. Some of them are easy to make, others are quite difficult, some may need a separate ML model.

I would express the properties of a good feature as useful, reusable, transformable, interpretable and reliable.

Lesson 8. Why is it important to be able to answer questions

The value of the model is not in its accuracy and correctness, but in the fact that it introduces into the product and to what extent it satisfies the expectations of product managers. If your algorithm is at odds with their understanding, you will need to answer questions, why here the recommendations are exactly like that. It is very important to be able to debug the recommendation model, and even better the visual debugging system.

This allows you to check the model much faster than if it were a completely black box - in this case, finding the cause of bad recommendations is almost impossible.
For example, in any question / answer from the Quora tape, we can get all the features, analyze them and understand why the model recommended a particular question or answer to each user.

Lesson 9. The data and model are good, but the right approach to performance evaluation is even better.

When we received the data and the model, the next question arises - the correct method of evaluating the effectiveness.

This is a diagram that explains a certain ideal innovation process when introducing algorithms into recommender systems. The basic idea is that there are two processes.

The first is offline experiments, that is, the process of testing and training the model. If the model works better on the selected metric, we consider it good and we can run the second process - online AB testing. Is the AB test negative? The first process starts on a new one.

Thus, an iterative process arises that will make your work faster and better.

When evaluating test results, decisions should be based on data. This is often easier said than done, it should be an element of your company's culture.

You must select a common prime metric, such as for example user retention or other long-term metrics.

The main problem is that long-term metrics are difficult to measure, so sometimes you need to use more short-term metrics. The main thing is that they correlate with your main metric!

In offline testing, standard metrics are usually used to measure how well your model works. But it is critically important to find a way for these metrics to be correlated with online results.

Lesson 10. No need to use a distributed system for a recommender system.

The last advice would be quite contradictory - you do not need to use a distributed system to build a recommender system. Many follow a simple path, but based on my experience working with data in different companies, most of the tasks can be freely solved on a single machine.

Everything can be calculated on one machine instead of a cluster, if you follow certain requirements:

data sampling
offline computing
efficient parallel code

If you do not care about costs, delays, speed, debugging capabilities, etc., then you can continue to do this on a distributed cluster. You can simply ask for 200 servers, deploy your spark-cluster on them and build a recommendation system. Quora as a startup always thinks about cost reduction, because we have to pay for everything ourselves. We also constantly think about how complex things we create, and how easy it will be to debug.

If we talk about the matrix factorization algorithm, the first thing that is important to understand is that all factors are not necessarily needed - very often you can get good accuracy with the help of correct sampling, the result will be as if we were working with the full amount of data. Secondly, a lot of things can be done offline. For example, user factors you can count online, and factors of goods offline.

Full performance can be viewed at the link .

Retail rocket

If you want to apply these lessons, then you are welcome to us. We are looking for an engineer in our analytical team Retail Rocket . We are developing recommender algorithms on the Spark / Hadoop computing cluster (30+ servers, 2 Petabytes of data). The results of our team can be seen on hundreds of e-commerce sites in Russia and abroad. Our technological stack: Hadoop, Spark, Mongodb, Redis, Kafka. The main programming language is Scala. We already have processes for testing hypotheses (new algorithms) both offline and online (AB tests), and we are constantly looking for ways to improve our recommendations. To this end, we study the scientific articles of leading researchers and test them in practice. Job Link (ANALYTICS ENGINEER)

Source: https://habr.com/ru/post/341346/

All Articles