Unobvious features of product sorting and "dance of reality"

As usual, we are trying to solve a complex math problem with minimal means and costs. The essence of the task is to sort the goods of the online store in such a way that it is most convenient for the buyer.

The easiest way is to set the order manually. In physical stores on the shelves, this is exactly what is done, and this is called “display.” We have it done by sellers for planograms for each point (this is included in the training), and in the same large grocery - special merchandiser dudes who make sure that everything is ok. On the Internet, of course, I want to do the same, but the method is good up to 50 positions.
')
On the other side of the scale are big data methods, when all the data about you, starting from the browser assembly, such as the device (more precisely, its prices) and the screen resolution, plus all the profile data and evaluation of your actions on the site, lead to an optimal result. The easiest way to use such data is to build your profile in the first 20-30 seconds on the site and compare it with the profiles of the same people. And, in the end, to offer you not the cheapest apartments and hotels, for example, but to start with the prices that will be acceptable to you. You probably know this sorting, which for some reason is served in the press under the sauce "the most convenient for the client."

According to my feelings, the most convenient for our customer is one that is understandable and controllable.

If you look at how the goods are sorted in various online stores, you will see that there is definitely ordering by price, availability, and some other voodoo-type parameter like “by popularity”, “in ozonovski” and so on. We will talk about this magic.

Where did we start

Made by hand. Up to 50 products are the best way. In fact, in similar layouts it was possible and without sorting - the goods are placed on 2-3 screens, and even navigation is not always needed. You can dump everything in a heap, the user will be happy. So we did.

However, when a buyer cannot push all the products of a category (or a catalog in general) into RAM, sorting is nevertheless necessary. When we reached about a hundred games, we needed the first functional tool - a ladder for the price. In general, sorting by price is a very simple and straightforward thing. You have a budget and you want to meet it. Or do you want to compare several similar products. Or you just want to take the "bottom" and "top" of the category in order to understand what to rely on when comparing products.

This sorting allows you to exploit a single thinking bug.

Many specifically enter into the category of expensive goods, which pushes the border of the fork up, and the “golden mean” becomes something more expensive than before. In the absence of information about current prices, a person buys something that is not the cheapest (because he wants good), but also not the most expensive. The scatter occurs within the segment approximately in the normal distribution. The most famous documented experience is this (simplifying): 0.3, 0.5 and 0.7 beer was sold at the bar. They took mostly 0.5. Experimenters shifted the boundaries to 0.5, 0.7 and 0.9. The people began to take 0.7, although before that the consumption rate was 0.5.

Mechanically, it is implemented very simply - the prices are compared. At the same prices, lucky to the product, which fell into the comparison array earlier (most likely, was previously added to the site).

The first problem of sorting by price is that bestsellers will be buried pretty deep down. The second - the goods are not in stock, which may be in the middle of the list. The third - often there is a lot of garbage at the top, that is, all sorts of consumables, packaging, etc., which is cheap, but often has no direct relation to the goods for which the person came.

Accordingly, as a tool, it is needed, but as an additional one. The basic sorting should be different.

We tried to sort by sales and at the same time began to collect estimates of goods on a scale from 1 to 5, in order to enter sorting by rating.

When sorting by sales , not bestsellers, but cheap goods, which obviously (in pieces) take more than anything else, began to force their way upwards. Any product with a price above the average has an underestimated chance of being at the top of such a sort. Accordingly, you need to somehow balance it.

We thought it would be a good idea to make an ensemble of models, and to use a rating as a second weighting function for the overall evaluation. And not only, but more on that later.

It turned out that one cannot simply take and unload a rating, because a product with 4.5 "stars" per 100 reviews is clearly better than a product by 5 stars, but with 1 review. The solution is obvious - to change the weighting factor of the rating in the overall evaluation function depending on the number of reviews. The variant was considered with the fact that it varies from 1% to 50% nonlinearly, for example, 200 reviews give 40% to weight, and 10 ratings give 15% to weight. But we didn’t have enough time to play with the coefficients - there was a completely stupid sorting problem according to estimates.

The fact is that the estimates were also set according to the normal distribution.

The freaking Gaussian was shifted only one division from the bottom - after all, we mostly remember the school grading system, where, in fact, there is no one. That is, most of the goods come in at 2-5 with an optimum somewhere around 3.5-4 points. All hits were 4.5, all non-hits - 4.

After some time, in general, all the goods became the same, and a separate sort of "rated" did not give anything good.

Another problem was the old voting dilemma on the site: one must either authorize the user by mail or something similar (rather difficult to generate), or not do so and filter the markups. Including slow and smooth. I know this very well, because even when I was studying at the university, thanks to the “testing of security” of the university’s website, which was supported by another group (we were mathematicians-system programmers, and they were information security), the university emblem was selected in passing. At the vote. On Saturday. On Monday morning, before the administrator arrived, the rector slammed his fist on the table and said, “Enough to plant, get out, over the weekend 400 students voted, and take this emblem.” The problem was that the most popular connection in the city is a dial-up, and we went without graphics. And for which there was a vote, simply did not see. Two years later, I told a little more details than you, the head of the university's IT service. And after that he ran three blocks. It was then that I realized with my feet the importance of protecting voting ballots.

So here. The product could be screwed, because logging in for the sake of evaluation is the wildest hemorrhoids. Collecting estimates only from logged-in users (that is, after ordering, but before receiving it in most cases) seemed not a very sober idea.

We turned off the collection of ratings and sorting by it and began to think further.

Second run

Comparing the popularity of products when selling in the physical world and through the site, we got the following conclusion - it’s not only the rating of the product that is important and not the rating of the product, but the number of evaluations to it. That is, a product with hundreds of ratings is clearly better than a product with a dozen ratings, despite the fact that both are not knocked out strongly enough from the central area of normal distribution.

It was possible to hang a weight coefficient on this part, but everything was decided much easier. Just at the moment while I was thinking about this problem (by turning off sorting by rating, because it had already done very little), Facebook put its “Like” buttons on the Internet.

Three buttons - Google, VKontakte and Facebook - solved the question. We got a sort of "by rating", and the rating was determined by the sum of the indicators of likes. And every vote in this system was taken into account for sure - thanks to the social networks and their authorization, we did not have to think about this problem.

We again tried to cross the sorting by rating with the sorting by sales, but then there was the problem of availability of goods.

Availability

These were hard years, and we sold as we could. The presence of real-time on the site seemed to us enough space technology, but it took. The number of stores grew, the number of errors increased due to the fact that a person came, but there were no games, the number of negative reviews increased: “I compared three goods for half an hour, but not two,” and so on.

Pretty quickly, but with a lot of blood we synchronized the base of each store in real time (well, almost, minus the cache) with the site.

Let me remind you, we worked then not very long ago, so suddenly we understood one more thing: sorting by sales should be limited to a certain interval. Because otherwise, new games will not have a chance to break through - because the old ones are in the top, and they have historically been sold more.

Limited month.

Further, they made a volitional decision not to take into account the sorting of goods that are not present now. They simply stack up below the list under the list in the order of “past merit”, that is, the historical position in the same list. This was done so that the person did not compare the goods, one of which cannot be bought right now.

There was a problem. There were only nine points in Moscow then, and it’s not a fact that the goods ended everywhere at the same time. Plus an online store. How to show the product on the site, which is not in the online store, but at the points it is?

The method suggested by one of the retail chains of home appliances seemed correct - the user chooses either the store (closest to the house) when entering the site or decides to buy with delivery. Until the general surfing. And all the goods "dance" on the parameters of this store. No there - sorted down and showed that there is only in the warehouse of the online store.

Monitoring the behavior of our customers has shown that the idea is so-so. Specifically, we have. Why? Everything is simple - shops on the ring line of the metro and nodes, and usually 1-2 minutes from the metro (or even in the metro building, like on Kurskaya, if you're lucky). This meant that the buyer does not have a “favorite” store. He calls in the one that is more convenient for him on the way between the house, office, bar and mistress.

However, the sorting by sales for the last month by 80% solved the problem of availability only in certain stores - when the game is not sold throughout the network, its position begins to fall sharply.

Current model

The next hypothesis was: what if the problems of selling a heap of small pieces were solved by equating one “large” game with a pair of “medium” or a bunch of “small” ones? Then all the little things will not climb up the sort.

The solution was found to be very simple - instead of sorting by the number of units sold per month, we began to sort by turnover per month.

Then they also played with terms and shortened the period so that the current trends were reflected directly on the sorting.

Sorting of this type became the basis for launching promotion in the real world. That is, products that turn around better go to advertising. This has already come from the physical principle of the calculation - the most wrapped product to the height of the chest, so that it is closer to the hands and eyes.

The shorter the term - the more relevant is the game in the position of "top". A longer interval is a greater impact of past sales. Accordingly, we raised the degree of competition between goods. Previously, the "old men" were sitting in the top, from where they could not be squeezed out. Now every new well-selling game got a solid chance. It was necessary to gain a foothold in the top - and now it sells well, not only on its own, but also thanks to the top. But, of course, it was possible to gain a foothold only thanks to the current merits - that is, if the interest in the position faded away, it would go back to the “basement”. It perfectly balanced seasonal goods - Small towns and Petank vigorously climbed up in the summer.

The second sorting method is a collection of likes.

Current Model Issues

First, we do not take into account in our basic sorting likes and the number of comments to the game - this, of course, determines its popularity, but is made into a separate tool. Why? Just because we have reached the stage where “wallet voting” gives more realistic data. In theory, of course, you can include the speed and acceleration of a set of likes as weights, but this is a matter for the future.

Secondly, we do not take into account the "temperature" of the game. That is, new items are held on a common basis. We do not have simultaneous launches due to geography (the furthest point is Yuzhno-Sakhalinsk, it flies there by plane; the longest points on the mainland are 3-4 weeks delivery). Therefore, the game can not get the initial acceleration everywhere and immediately - and the effect of the "bright new" is smeared around the cities. Probably, it will be correct to rank novice higher for the first 3-4 weeks at the expense of novelty.

Thirdly, we have quite a lot of games in which both children and adults can play. And almost every one is a good gift. This automatically means that in 2-3 categories there will definitely be similar bestseller lists, that is, the “top” of the sort will be the same. In this case, it would be logical to take the sorting for each category, but this is still difficult to do - you need to know for whom each position is bought (this is already cool Data Mining, and we are slowly doing it, I'll tell you on occasion).

Fourth, yes, you can use the services of optimal product offers, issuing sorting from user actions on the site. Perhaps you know these. The general principle is the analysis of input data (browser, screen, geography, etc.), behavior on the site (I went to “children’s”, I looked at these two products) and profiling, and then overlay — what people with the same profile bought like you, you have to show you. The problem is that, firstly, with such a sorting, control is lost, and, secondly, we still tested it. And they did not get any significant differences between the “dance of reality” and their basic methods. I note that one of the possible reasons is the fierce desire of the service of such recommendations to sell more by all means through “I do not want”, which affected the “aggressiveness” of the proposals and algorithms. Disabled.

As a result, it became clear that the correct idea was to “feature” the game, that is, add the “choice of editor” to the usual sorting. We tried to do this as part of testing for sexual holidays (February 14 and 23, March 8) this year. Soon we will try again, but otherwise. Most likely it will work.

And yes, I know, the first product is Nefarius.

And yes, I know, the first product is Nefarius.

^{This is how it will be in the new interface.}

That is, again, we have now described a full circle from fully manual sorting to smart automation, and then back to manual.

And yes, the task is not completely solved, there is much to do. In general, in retrospect, everything described above seems logical and simple, but believe me, when we started, it was not at all like that. I hope you were interested in our historical rakes.

Source: https://habr.com/ru/post/270129/

All Articles