How not to lose money in a black box: billing testing methods

Checking paid services is one of the key engineering issues in testing Badoo. Our application is integrated with 70 payment providers in 250 countries of the world, and a bug in at least one of them can lead to unpredictable consequences.

In this article I will talk about the testing methods that we use in Badoo, and the limits of applicability of these methods - the stages of testing at which they are most effective.

The article will be useful for testers, developers and product managers whose projects are already integrated with payment providers, or the integration process is just beginning. If in your work you are faced with the problem of choosing methods for testing such integrations, welcome under the cat!
')

My name is Vladimir Solodov, I am Billing QA Engineer in Badoo: I am checking test integration and payment processing. My colleague Viktor Koronevich helped me in the preparation of the text: together with him we gave a talk at the Heisenbug conference ( video ). In the article, we expanded the description area to all integrations with payment providers that are used in Badoo; we further classified and described the practices of removing external dependencies.

Using the example of business cases, I will tell you why you should be more attentive to testing paid services and how not to aggravate problems if they do arise. And then we proceed to the description of the technical problems of testing integrations and how to solve them.

We are already preparing the second part of the article, in which we will tell you more about the automation of testing paid services in iOS applications.

Go!

Billing testing specificity

Usually the goal of a business is to generate income. In Badoo, a social network for dating, income brings loans and premium subscriptions. Credits are the internal currency of Badoo. With their help, for example, you can raise your profile in the search results in the first place, make a gift to another user and so on. A premium subscription is valid for a certain period of time and gives you several options at once: turn on stealth mode, see people who are interested in you, confirm the authenticity of your account, and much more.

In order for these paid services to work, we use integrations with more than 70 payment providers. The choice of provider depends on the platform, country, device, mobile operator and other factors. Therefore, the issue of testing paid services is very serious.

To begin, consider why testing paid services should be approached with special attention. There are two reasons.

1. Billing bugs are critical for business.

The first problem is reputational. The user who paid money for the service becomes more sensitive (and less tolerant) to the bugs in the application. Any feedback in a public space, be it a review of a blog application or a comment on the App Store or Google Play, from a user who has encountered a bug in a paid service will be more emotional - this is a factor that leads to reputational losses.

The second problem is that as soon as you start receiving money from the user for the service, you become an object of the right to protect the consumer of services. And reputational losses can easily turn into financial ones.

Companies lose money in three ways.

The first way is refandy (refunds) . Suppose a user finds that you have sold a service to him that does not quite match his expectations. In this case, he turns to your support service. Its employees are investigating and find out that the user's expectations did not really come true because of a bug in the application. You initiate a refund. In this case, there is a refand: as a result, the company is faced with lost profits, and this is the most harmless way to lose money.

The second way is chargebacks . Suppose the same situation occurred, only the user turned not to your support service, but to the bank that issued him the card, or to the payment provider. Bank / provider initiates a refund. In this case, we are dealing with chargeback. The danger for business here is not only in lost profits. After a certain number of chargebacks, the company receives a fine, and its risk rating decreases. The downgrade, in turn, leads to a rise in the cost of payment service providers.

The third way - lawsuits (claims) . In the most neglected cases, there may be lawsuits (including collective ones) leading to the most serious consequences. For example, in 2015, after a lawsuit from the Ofgem regulator, British Gas was forced to pay multimillion-dollar compensation to users who were charged a higher fee due to an error in the payroll system. Read more about this here .

2. For testing integrations, knowledge and expertise are needed.

Commands that are just starting to integrate with payment providers often face this problem. Not knowing all the possible cases of billing, they miss the important nuances when they implement the system's reaction to the notification of payment providers.

This can lead to unpredictable consequences - from lost profits to disgruntled users.

Let's turn to the scheme, which lists the types of paid services, and consider the problem more closely.

Figure 1. Possible billing cases

There are three main cases: error, successful payment and refund to the user. But each case has parts, each case your system must handle differently.

Errors can be critical and uncritical. The non-critical error can be attributed to the notification - when the payment provider informs about the lack of funds in the user's account, and the critical one - blocking the payment method of the user. And if in the first case you can try to make payment later, in the second case it would be nice to find out why the method is blocked. Perhaps the user has been spotted in a fraud and you should be more careful about his transactions.

Returns . You already know that there are two types of returns: refands and chargebacks. Your system must respond to them differently. For example, after chargeback it makes sense to think about blocking some functions of your application for a user, because chargeback is one of the most popular methods of fraud.

A successful payment can be a one-time, or it can be a subscription.

One-off payments can be consumable and non-consumable. We considered an example of the expendable payment at the very beginning of the article - these are loans in Badoo. An example of a non-consumable payment can be cited from the games. Suppose you have a character that you play. You want to buy for him superpowers that are valid for some time. In this case, the purchase belongs to the class of non-consumable payments.

Subscriptions (subscriptions) . Here is the largest variety of cases. In addition to the initial purchase of a subscription, you may have:

renewal of a subscription (renew);
cancel subscription;
trial subscription (trial);
grace period subscription: when we are unable to renew a subscription and are trying to pay again for a period of time called the grace period. For the user, the grace period looks like this. Suppose you bought a monthly subscription to a newspaper. The company that sends you newspapers is trying to write off the payment for the next month’s subscription at the end of the first month, but cannot do it (due to the blocking of the card, lack of funds in the account, etc.). If the duration of the grace period is ten days, then during this time the company is trying to write off the payment, while the subscription remains valid. If the company fails to debit the money, the subscription is canceled. If it is, the subscription is renewed from the date of the last payment;
partial billing For example, PayPal allows you to make a partial payment if there are not enough funds on the user's account (partial pay), or to split the payment into parts (partial invoice).

You also need to take into account two characteristics that are completely dependent on the payment provider: the subscription can be controlled by your application or controlled externally.

An internally managed subscription is, for example, a credit card or PayPal subscription, when after the first payment you receive a token with which you reapply to the provider without having any payment details of the user.
A externally managed subscription is when the payment aggregator takes over the management of your subscriptions and simply sends you notifications about their current states.

In the figure, the most obvious areas are highlighted in purple, the reaction to which is usually implemented in the first place. All others begin to be taken into account much later, as the accumulation of expertise. This is largely due to the incorrect application of iterative development methodologies in the field of billing.

Figure 2. Billing cases that are primarily implemented in systems

Such a phased implementation can lead to unpredictable consequences. For example, in one of the projects I was working on before Badoo, the possibility of a refund was not taken into account. As a result, all returns were made not through refands, but through chargebacks, which negatively affected the company's risk rating and led to failures in collecting income statistics. Ignorance of the diversity of billing cases can lead to lost profits or the company's vulnerability to users who feel cheated.

So, on the one hand, bugs in payment processing must be found before the release, because they can lead to the most negative consequences. If this was not possible, then it is important to understand as quickly as possible that the bug got into the release version of the application, fix it and - most importantly, many people forget - to “reassure” users who are faced with this bug.

On the other hand, the situation is complicated by the fact that integration with payment providers is always interaction with the “black box”, which adds many variables to the testing process.

Technical problems in the process of testing billing

Let's consider them on the example of Badoo integration with the App Store.

App Store subscriptions belong to the class of externally managed ones, that is, they are fully managed on the provider's side, and our system can only request the current status or receive a notification about its change.

We specifically chose this integration because it is the most complex and contains all the diversity of cases that can be found in the process of integrating the service with other payment providers.

To begin, let's turn to a one-time expendable purchase.

Figure 3. The process of making a one-time expendable payment

In step 1, the user makes a request to purchase the service. The application decides that payment must be made, and in step 2 control is transferred to the payment provider (App Store). Step 3: the user is given a form to make a payment. Step 4: the user provides the data for payment. Step 5: The provider performs the transaction and reports the result to the application, returning the receipt (receipt) containing the full information about the purchase (date, service, status, etc.). Step 6: The check, supplemented by user data, is sent to the server for processing. The server processes the check data and generates a push notification for the application in step seven. In the eighth step, the notification is shown to the user.

The problem is that steps 3, 4 and 5 are performed on the side of the payment provider, are practically not controlled by us and may have different variations. Thus, the process does not actually have a linear structure, as shown in Figure 2, but a branching one (see Figure 4), and each branch must be processed differently by the application.

Figure 4. One-time payment status branch

Buying subscriptions starts just like a one-time payment, but further control of the process is quite difficult to control.

Figure 5. Externally managed subscription states

Recall that the Apple subscription, which we consider as an example, is manageable externally. This means that the user after the purchase can manage it asynchronously: close, change the expiration date, request a refund. We see this at step 9. Since the action takes place outside our system, in the figure I marked it with a dotted line.

In step 10, the App Store can change the subscription status: renew, close, enter in the grace period window.

So that we can find out what state the subscription is in, there is step 11, which is specific to aggregators such as the App Store and Google Wallet. At this step, the system sends a token, which is used as a receipt (receipt), received at the very beginning when purchasing a subscription or after its previous renewal.

Step 12 is the provider response. We receive a check with the current state of the subscription. The result of this step depends on asynchronous steps 9 and 10.

In the fall of 2018, Apple implemented the server-to-server notification mechanism for everyone, which allows you to notify about changes that occurred with the subscription. Receiving such a notification is displayed in step 13. For most other providers, the server-to-server notification mechanism is unique, so it can be argued that the example from Apple covers the entire diversity of cases. In the case of other providers, step 13 allows you to exclude steps 11 and 12 from the scheme.

In step 14, the server generates a response for the application to change the subscription state.

Thus, we have a complete state graph that must be passed to check the paid services.

Figure 6. The complete process of changing the status of payments (for example, managed externally subscription)

Orange colored parts that we do not control in our system, and they are black boxes for us.

Billing Testing Methods

So, the main technical problem when testing paid services is the presence of "black boxes", the state of which we have very little control. This defines a set of methods that can cover all the variety of cases.

There are not so many of these methods, and we have divided them into three categories: real payments, sandboxes, and the elimination of external dependencies on black boxes.

Real payments

Real payments as a test method are good because they give a clear idea of the state of integration. Error when making a real payment is an unconditional evidence of a bug.

Otherwise, real payments are bad. First, it is expensive: it is obvious that in order to make a real payment, you need to spend real money. You are mistaken if you think that ultimately the entire amount will return to the company: first, providers charge a commission on each transaction, the amount of which, as described above, depends on the organization’s risk rating and can reach 40% (and even more) ; secondly, you can lose money when testing payments in other countries because of the currency spread - the difference between the purchase and sale rates of the currency (you will make a purchase at the bank's rate for selling currency, and the return will come at the rate of purchase).

In addition, this method can take a long time, because you have to wait for the end of the renewal period of subscriptions, the completion of grace periods, and this may be months.

Sandboxes

Sandboxes are beautiful. This is, in essence, the same functionality that the payment provider gives us in the case of a real payment, but without spending real money. It is fully supported by the provider, which means that integration with the sandbox is cheap.

The problem of the length of testing in time is solved, as a rule, using various tricks. For example, in the App Store sandbox, the following subscription expiration conversion is used.

Real time subscription	Apple sandbox subscription time
Week 1	3 minutes
1 month	5 minutes
2 months	10 minutes
3 months	15 minutes
6 months	30 minutes
1 year	1 hour

Table 1. Ratio between the validity of a real subscription and a subscription in the Apple sandbox

The default validity of the Google Wallet sandbox subscriptions is shown in Table 2, and it can be configured in the merchant console.

Real time subscription	Google Sandbox subscription time
Week 1	5 minutes
1 month	5 minutes
3 months	10 minutes
6 months	15 minutes
1 year	30 minutes

Table 2. Setting a subscription term in the Google sandbox

Unlike the Apple sandbox, you can also check the trial, grace-period, etc. in the Google sandbox using the ratio from table 3.

Real time subscription	Google Sandbox Subscription Time
Trial period	3 minutes
Introductory period	Equal to the time of the corresponding subscription
Grace period (3/7 days)	5 minutes
Temporary account lock (hold)	10 minutes
Pause (1/2/3 months)	5/10/15 minutes (respectively)

Table 3. Expiration dates for additional features in the Google sandbox

Closing a subscription can also be implemented in different ways: in the App Store sandbox, closing is performed after the fifth extension, and in Google Wallet it is done from the merchant console or on a device from the Play Store.

The problem of sandboxes is that providers treat their quality differently. Our experience shows that of more than 70 payment providers that are integrated into Badoo, only two sandboxes can boast of full functionality and stable operation. These are Adyen and PayPal sandboxes. The remaining providers have either stable sandboxes that are trimmed in terms of functionality (like Google Wallet), or unstable and heavily trimmed in functionality (like App Store and Fortumo). And there are providers that do not have at all and are not going to have a sandbox.

Figure 7. Sandbox classification by stability and functionality

Elimination of external dependencies

If we have convinced you that testing using real payments is expensive and inefficient, and payment providers do not provide the sandbox of proper quality, then it remains to turn to various ways to eliminate external dependencies. There are only three of them: moki, fakie and stubs.

Moki in billing is the formation of your system's responses to requests with predetermined parameters without actually addressing the payment provider (see Figure 8). For example, a request to the provider of SMS payments to the number + 7111-111-11-11 is intercepted at the stage of sending a request to the provider and forms the response of the system in the form of a successful payment. The request to the number + 7111-111-11-12 is also intercepted, but leads to a reaction to the error with the code “There is not enough funds for the transaction.”

Figure 8. Moka scheme

Fakes in billing are fakes of notifications (as if they come from a real provider) (see Figure 9). Integration with each provider implies a limited set of system responses to a limited set of notification types or reseits. Based on this information, for each individual payment you can form a set of notifications (with signatures and other fake security attributes), which our system will consider as real notifications from a payment provider.

Figure 9. Scheme fake

Billing stubs are a redirect to a page with a list of possible system responses instead of sending and processing a request (see Figure 10), when we provide all possible payment provider reactions for the current payment status and trigger this reaction instead of sending a request to a real provider or sandbox .

Figure 10. Diagram stub

All these methods allow you to avoid wasting real money and time, but you can’t call them very cheap, because to use them you need to make maps of all possible billing states for each provider and keep them up to date. Also, to use all the methods (except, perhaps, fake), it is required to make significant changes to the code. In addition, as various options for modeling a real payment, moki, stubs and fakes have a certain degree of approximation to reality and the risks of use that must be considered.

Let's return to the process of making a one-time payment. Steps 3, 4, 5 are the key for integration: transferring control to the payment provider, sending a request to the provider and receiving a response. When using each of the considered methods of eliminating external dependencies, the focus is directed to some of these steps: when using mock, we simulate the transfer of control and sending a request, when using a stub, only the transfer of control, when using a fake, a response is received. The remaining steps are “out of the brackets”.

Figure 11. Modeling the interaction of the application with the provider with different methods of eliminating external dependencies (for example, a one-time purchase in the App Store)

On the one hand, such elimination of steps leads to risks (for example, you can skip a bug in untestable steps). On the other hand, modeling each step makes the method more expensive because it requires changes in the system. Therefore, in practice, we use a combination of methods. For example, mocks and fakes, when sending a request to a specific number does not generate a system response, but a fake notification is sent to the entry point for notifications on our server. Or stubs and fakes, when fake notification is also sent from stub when choosing a reaction. Naturally, such implementations should be limited to developer environments and should not fall on the prod.

Limitations of billing testing methods

All described methods are not a panacea. How to understand at what point it is better to use one or another of them? To do this, we propose to evaluate them according to the following criteria:

reproducibility and coverage - which method will help cover and reproduce as many cases as possible?
the ability to check end-to-end - what method does it better: allows you to check the whole process of making a payment or to carefully and quickly test only one of its stages?
Cheapness - estimate the full cost: not only real cash spending, but also the cost of writing and maintaining the code.

The results of the evaluation we have tabulated.

Table 4. Comparative characteristics of billing testing methods

Real payment. Quite a limited number of cases. Annual subscription needs to be tested a year. But this is the only method that allows you to test the entire integration process. It is quite expensive: we constantly spend real money, paying transactions to providers.

Sandbox. Sandboxes, for example, at Apple and Google, differ. Therefore, they can cover a different number of cases (and certainly not all). The sandbox does not provide the ability to complete end-to-end testing: even the code in the sandbox itself may differ from the code on the sale. However, this is probably the cheapest method.

Fakes, moki, stubs - the most flexible method. We can cover the entire set of cases. Due to the specifics of this method, we do not test the entire payment process. The method is not cheap: you need to write code and keep it up to date.

Choosing a test method

In order to determine which method to use at what stage, let us turn to the classical testing pyramid.

At the bottom of the pyramid is a large number of tests that should fully cover all the functionality of our system. These should be very small cases and fairly cheap.

At the top of the pyramid, the cover may be incomplete: it may be expensive cases. The main test that we want to perform here is to check the full path of our service from request to delivery to the user.

If we relate this to the criteria for evaluating the testing methods, then we get the following ratio: for the tests at the bottom of the pyramid - fakes, mocks, stubs; for tests at the top of the pyramid - integration-oriented methods: real payment and sandbox.

Figure 12. Correlation of stages and methods of testing on the pyramid of testing

Antipatterns when choosing a method

Information about what happens if the ratio of tests in the test pyramid is violated can be found in a large number of articles, for example , this one .

Let's look at examples of three testing anti-patterns that do not match the ratio in Figure 12 that we encountered in Badoo.

Real payments at the bottom of the pyramid

For testing using real payments a special card was instituted. It was available only to a narrow circle of people. But one day a QA engineer from our team recognized her data. Having good intentions, he decided to implement autotests. Naturally, at some point the bank saw that it received requests for several thousand payments for a very short period of time, and blocked the card. Moreover, I blocked it so that we could not unlock it for about two weeks.

The conclusion is this: you do not need to use real payments everywhere.

Sandboxes at the top and bottom of the pyramid

The first problem arising from excessive dependence on sandboxes is a malfunction in their work. For example, to test payments to Apple, the sandbox has long been the only way. As a result, we are faced with the consequences of her unstable work. There were two cases when the sandbox did not work at all. It did not work for two weeks: as a result, four releases of the client application, we had to release without some kind of adequate testing.

The second problem is the limitations that sandboxes have. First, it’s the difficulty of changing the subscription’s validity period. Secondly, this is the absence of such features as grace-period, refandy and others, that is, a part of the functionality is not covered at all by tests.

The consequences of using sandboxes at the bottom of the pyramid are the emergence of various infrastructural problems: when using the same account in the sandbox for a large number of payments, the size of the transferred resource or notification can increase, because Apple accumulates the purchase history. For one of the users, the resit reached 1 GB - naturally, the test bench simply could not withstand the transfer of such a volume of data.

Elimination of external dependencies at the top of the pyramid

For one of the payment providers, we used only a combination of mocks and fakes. As a result, the format of notifications was changed for one of the operators, and the test at the same time gave false positive results. The problem of the provider was the impossibility of making a real payment, since this required a SIM card from a specific operator in another country.

In such cases, it is necessary to carry out an assessment of the risks of eliminating external dependencies, it is important to track real notifications and check them for compliance with the template or pattern (in case of non-compliance, such notifications should be studied separately).

findings

Paid services should be tested especially carefully, since even the most insignificant bugs can lead to unexpected consequences.
When implementing integration with a payment provider (especially when using iterative development methodologies), it is important to study and map out all possible provider states. Iterativeness can be used to complicate the response of a system to certain states, but the system itself should classify states correctly from the very beginning.
The payment provider is always a “black box” for us, testing its work is very difficult. You should not try to use any one method and test everything with its help - this will lead to sad consequences. It is better to test everything in combination, in composition: with fakes, mocks and stubs - all cases, a sandbox and a real payment - a couple of cases to check the integration.
When using fakes, mocks and stubs, it is important to remember that these are real payment models, and, like any model, they have a degree of approximation to reality and risks. These risks should be assessed and covered either with real payments or additional checks.

We will describe how we managed to achieve stable and inexpensive automation of testing paid services in an iOS application in the next article.

Thanks for attention! All big profits and fewer bugs!

Source: https://habr.com/ru/post/459650/

All Articles