A / B tests are not needed *

You all cheat. The brazen myth of the IT industry is the claim that A / B testing is a universal and useful optimization tool.
* for websites and mobile applications

A little about the author and the reasons to write this text;)

Hi, my name is Chudinov Denis. Now I am developing my own mobile app design and development studio. I want to deal with the delusions and stereotypes that roam among customers.
My path to IT began with a UX-specialist (yes-yes), that is, with design solutions and analytics. The current post is, in a sense, the result of work in this direction.

From my own experience and the experience of my colleagues, I can say that almost no company has normal A / B testing that has been put on stream. They say someone saw somewhere, but, in fact, honest A / B testing is not to be found.
')
Why is that? Let's figure it out.

Once, on one project (daily traffic of about 800,000 unique users), we set out to implement A / B testing.

This is what we encountered:

1. It is difficult from the point of view of "purity of the experiment"

For now, we don’t take A / B testing, but analyze the “simple” example, when you added another ad banner to your site and measured the indicators.

They clicked on him, began to drip money.

What happened to other banners and their conversion? If you are unlucky, then, most likely, the total income has not changed or even fell.

Now imagine that you are lucky and your income has grown. Is it just a banner? Maybe traffic has changed? Has seasonality or a one-time viral effect worked in social networks? While you are testing, the product lives and develops, it is very difficult to find a “clean” month that would be “without influence” of marketing spoiling the experiment.

It is necessary to understand well the possible external causes, and this is almost always a fortune telling on the coffee grounds. Of course, you can manicly take into account all the indicators of the product ... so that fortune telling on the coffee grounds is a little more scientific.

How to act in a similar situation? Follow the simple algorithm:

Invented a hypothesis.
Implemented a change.
We measured the main indicators in a month (or another period: day or quarter).
Got better? You can leave.
It got worse? Return as it was.
Repeat.
Go to 1.

Seeing improvement or deterioration is simple. Explain the reason for the changes and scale it - oh, what a thankless task.

2. Need a cool analyst. Or an analyst

In our project, in addition to Google Analytics and Yandex. Metrics, we also used samopisny analytics and unloaded raw data in Excel for manual counting. As I know, large e-commerce projects live in approximately the same way (at least they lived). They measure everything in several systems, because they count in different ways and give different errors. At the same site the data of visits on NM and GA can be very different. Alas, if this were the main problem: analytics systems are not very useful when you need to be able to simultaneously consider commercial and product indicators.

It may happen that with the new banner the income for the month has grown. Only recurrence (or retenshene) began to fall. The core of the audience has become more "annoyed." That is, in a few months you will lose in traffic and, again, “per circle” will earn less.

What am I leading to? Moreover, it is virtually impossible to take into account all the reasons and to correctly measure the result of the change. Methodologically (mathematically) it is correct to consider the natural error of the indicators, and if the experiment gives rise to more than the error, then only then you can think about whether to leave a new solution.

Natural fluctuations in performance can reach 10% -20%, so if you put up a banner and got a 5% change in profit, it means nothing. Nothing at all.

Repainted the button in pink? Conversion increased by 9%?
Haha;)

3. A / B testing is very expensive

And let's simultaneously show different design options to different people, but from the same source? Then we will not depend on traffic variability, and also exclude seasonality and marketing.

~~Great idea.~~

If a product with a history, high-load, caching is configured, different servers for content and many more joys, then this project is hardly sharpened initially. That is, architecturally, the project is not ready for the test. This means that if you come to a backend programmer and say:

- Kohl, and let's give it to 8% of the audience we will show another layout of the registration page, and they still have to register there. Yes, the fields are different. Yes, it is also necessary that the page be personalized if it returns. Did I already say that the statistics should be modified? Uh, why did I say that you're boiling ?!
Your first A / V test will be full of technical surprises and fun, especially if something falls off and you “mix” the audience. Of course, in ideal projects there is no such thing, but in reality it is constantly encountered.

When you deal with it and even test something, you will understand that small changes give a small result. That is, if you make a rounding of the edges on the button and change the color from blue to green - most users will not notice this. If you want a tangible result - make "major" changes. There were 12 input fields for registration, and 4 left? This is significant.

The main question is that if you can do with 4 fields, instead of 12 ... why haven't you done it yet? Do you need confirmation by A / B-test or opinion of a reputable UX-specialist for correct conclusion in this situation?

And even if you still decide to do an A / B test ... prepare to lay out at least half the original cost of the page to prepare the second, test, option.

What do you think? Don't you believe that the exhaust is very doubtful compared to the costs?

4. Other actions give more benefits.

The final nail in the A / V test cover is the interesting fact that it is easier for you to change the advertiser, hold a contest in the social network, buy traffic elsewhere, optimize the campaign in direct, fix a new feature in the product, or fix the bugs - generally do something useful without touching the product. At the cost of time and money you will pay back your actions with greater efficiency than by doing A / B tests.

Why are A / B tests so popular?

I think because large companies use them and involuntarily publicize. For them, they are necessary, as they have already tried everything for their products and are now forced to “squeeze a stone” in search of bits of good. They have the resources, the money and the desire.

For example, Yandex.Music uses eye-tracking (a whole set of tools that studies where a person’s eyes look when using a mobile application). Yes, the thing is useful when you have a budget. Do not recommend it now to everyone?

A / B test is easily sold to incompetent people. You can do something and say in the report that “the return of the audience from the Kamchatka region increased by 8%.” How does this affect profits? This question is rarely raised. In general, analysts and designers want to eat their own bread;)

The main output of the buttons and interfaces

Do it neatly, comfortably and with taste. It'll be enough. If your product is so-so and the call-center is rude to customers - no interface will correct the situation.

Did you make a normal design, think through usage scenarios, sit over texts, draw nice graphics? Cool, you have already achieved 96% efficiency!

Reaching the remaining 4% through interface improvements is utopia. Do not live in utopia.

So A / B tests are a dead story?

Of course not! The methodology itself is excellent if you work under more controlled conditions, for example, when testing contextual advertising or e-mail newsletters. Texts, in principle, easy to test, as opposed to design. Landings are also amenable to experiments, but be careful with the interpretation of the results;)

Source: https://habr.com/ru/post/321358/

All Articles