📜 ⬆️ ⬇️

Greedy algorithm in A / B testing

Canadian developer Steve Hanov talks about a simple way to improve A / B testing .

The bottom line is to use the greedy algorithm (epsilon-greedy method) to solve the problem of a multi-armed gangster . In other words, when choosing between A / B testing options, Steve proposes to abandon the full randomization option, and in 90% of cases, choose the best option based on the statistics accumulated to date.

The method is simple to genius.

For example, we are testing three button options for a site. To begin with, each of them is assigned a result of 1 choice from 1 attempt.
OrangeGreenWhite
1/1 = 100%1/1 = 100%1/1 = 100%
A visitor comes and the system makes a choice, which button to show. With the same result, you can simply show the first one in the list. Show orange, he does not click on it.
OrangeGreenWhite
1/2 = 50%1/1 = 100%1/1 = 100%
Comes next. He is not exactly shown orange, and the choice is made between green and white. Show green, it does not click. The next one is already shown white, because it has the best result. He does not click. After a few cycles, something like this appears.
OrangeGreenWhite
1/4 = 25%1/4 = 25%1/4 = 25%
Then, unexpectedly, a visitor suddenly clicks on the orange button, the table is immediately updated via $.ajax(url:"/reward?testname=buy-button");
OrangeGreenWhite
2/5 = 40%1/4 = 25%1/4 = 25%
Now the system will always show an orange button. The developer can say: how is it, this is obviously the worst option, the smallest button, and because of one random click, the greedy algorithm will now show it all the time?
')
But if this is really a bad option, then the situation will be corrected very quickly.
OrangeGreenWhite
2/9 = 22%1/4 = 25%1/4 = 25%
After many cycles, the best option, if one exists, will be found, and it will be shown 90% of the time.
OrangeGreenWhite
114/4071 = 2.8%205/6385 = 3.2%59/2264 = 2.6%
In practice, such a system can be implemented "in just 20 lines of code," Steve figuratively expressed.

 def choose(): if math.random() < 0.1: # exploration! # choose a random lever 10% of the time. else: # exploitation! # for each lever, # calculate the expectation of reward. # This is the number of trials of the lever divided by the total reward # given by that lever. # choose the lever with the greatest expectation of reward. # increment the number of times the chosen lever has been played. # store test data in redis, choice in session key, etc.. def reward(choice, amount): # add the reward to the total for the given lever. 

Steve says that this approach has several advantages:

Source: https://habr.com/ru/post/144977/


All Articles