Personalization in e-commerce

Hi, Habr!

Today we begin a series of articles on how we build the Retail Rocket service. For almost three years of work, we collected a solid technological stack, became disillusioned with a large number of "fashionable" technologies and built a very complex system.

In brief, Retail Rocket is a platform for multi-channel personalization of an online store based on Big Data. Our service analyzes the behavior of visitors to an online store, identifies needs and at the right time shows interesting offers to them on the site, in email and display campaigns, increasing the income of the online store due to an increase in conversion, average bill and frequency of repeated purchases.
')
With this article, we open the Retail Rocket engineering blog (we have been marketing a blog for almost two years) with a story about the approaches used in data analysis and a short list of the technologies used. We have come to everything described in the article iteratively and in the following articles we will try to describe in detail our way in each of the areas.

A few numbers briefly describing our service:

More than 70 processing servers (mostly Hetzner).
About 100 million unique users (unique cookies) per month.
360,000 external requests per minute (on average).
35 man-years invested in the development.
10 engineers (developers, analysts, system administrators).

Approaches to data analysis

The essence of the work of Retail Rocket - identifying the needs of the visitor of the store with the help of behavioral analysis and product matrix of the store. For the formation of personal recommendations, we initially needed a mathematical foundation that would easily scale. Here is an almost complete list of the approaches we use today:

Content filtering.
Collaborative filtering (colaborative filtering).
Predictive models (predictive analytics) based on machine learning and Markov chains.
Bayesian statistics.

For each of these topics, you can write a series of articles or even books :) I am sure that someday we will tell in detail how we implemented a real-time personal-recommendations computing subsystem, but for now we will briefly tell you about the technologies we use for this.

Analytical platform

For machine learning, we use Spark based on the Hadoop Yarn platform — this is a cluster computing system that is best suited for our current tasks.

Currently, we have almost completely transferred the entire data analysis system to Spark using the functional Scala programming language. Before that, we wrote a lot on Pig, Hive, Python, and Java. From Hadoop's native components, we have Apache Flume for data delivery, the distributed Machine Learning Mahout library and the Oozie task scheduler.

Jenkins was chosen as a centralized solution for launching periodic recommendations calculation tasks (at the time of writing, just under 100). Despite the fact that this is a rather strange application of such a tool, for the year of work we were pleased with it.

By the way, we have a repository on GitHub , where our team supports several projects:

Engine for A / B tests for JavaScript.
Spark MultiTool library on Scala.
Scripts for deploying a Hadoop cluster using Puppet.

Frontend

Almost everything that the user sees is processed on win-machines with the IIS web server, the code is written in C #, Asp.Net MVC.
All data is stored and distributed in three DBMS: Redis, MongoDB, PostgreSQL.

When we need to ensure the interaction of distributed components, for example, when calculating a user segment by User-Agent for profiling the audience, Thrift is used. And in order that various subsystems could receive data flow for this from online stores, the Flume transport mentioned above is used.

Development process

In development, our team adheres to the methodology of continuous delivery of new functionality to customers (today more than 500 stores are connected to us). To do this, we use the Git + GitLab + TeamCity technology chain with passing unit tests, acceptance tests and code review. This approach is the minimum manufacturing standard that allows
We need to maintain product quality and production deployment with zero downtime.

What are we going to share

In this article we tried to introduce you a bit to the Retail Rocket technology kitchen. We have a small plan for those that we wanted to highlight in our blog, and which, it seems to us, will help the community with solving engineering problems for which we have spent more than one day of our lives.

We will also be happy to hear from Habr's readers about which issues in the field of personalization are of the most interest. Be sure to take into account your wishes in the following articles!

Source: https://habr.com/ru/post/246793/

All Articles