Many consider testing for a production environment a harmful practice: it does not help prevent problems from reaching the end users, but rather it is their presence. In addition, the tester breaks away from the standard workflow and techniques used in the test environment. My name is Olya Mikhalchuk, I am a QA-engineer in the fintech company ID Finance. In this post I will tell why testing on the prod can significantly help your project.
Why do you need a QA on the sale, if there is a pre-production environment
In the process of software development there are always several environments on which the application is deployed. The environment used by end users, as you know, is called production. It is usually assumed that testing should be carried out on a separate environment, more often on the QA environment or Staging (pre-prod) to prevent errors from reaching users. But there is such a technique as QA on the prode, which perfectly helps to solve problems that are physically impossible to solve on a test environment.

')
What tasks does QA help in selling?
1. The problem of the difference between Staging and Production environments.
Staging is often considered a copy of the production environment, which is inaccessible to the end users, but is most similar to the combat environment. When an application is quite complex, synchronizing and maintaining such a mini-copy becomes time consuming and not always a rational task.
For example, in our project, pre-prod is used more for functional testing on manually made test scenarios. It does not have technical resources comparable to the production environment. Also, we usually do not do full synchronization of configurations and databases with production environment, which does not interfere with the performance of functional tests. Why don't we copy the prod environment? Imagine how many resources it would take to create a copy of, say, Facebook, with the same super powerful servers, services, database and configurations as in production. This is actually how to deploy another such application.
In addition, when integrating with third-party services, you always have different settings for the test and combat environments (the same API). I do not claim that the test and staging environments are meaningless. It is simply impossible to guarantee 100% that with the successful completion of certain tests on one medium, the services will not fall on another. Additional testing for production can help solve this problem.
2. Real levels of multitasking and load.
Some errors can be detected only under continuous and real levels of multitasking and workload. This applies to memory leaks, stability, speed and stability of the system. For example, we had a situation when a system performance problem arose due to the fact that two resource-intensive tasks were performed in one time interval. The developers optimized the work of the tasks, the team made tests on the pre-prod environment, delivered the changes, then made a check on production.
3. Deployment Errors
From the definition of deployment (deployment) is the installation by the working group of a new version of the service code in the production infrastructure. Accordingly, the best way to see deployment errors is testing during the deployment process itself.
4. Lack of pre-sale monitoring
One of the best and irreplaceable ways to control that an application works as we expect is to monitor certain metrics. For example, from the simplest and most critical examples: monitoring for the number of registrations of new users per hour, for conversion from one target action to another, for the number of loans issued. Of course, such monitoring makes sense only on the combat environment.
5. Ability to analyze system usage scenarios that end users implement
Production - a storehouse of test cases for the tester. If the tester is able to see and process the scenarios used by the end users, the tester can identify the most critical scenarios, or find out the cause of the defect that has appeared, or pay attention to non-trivial cases when testing for pre-sale.
6. The ability to maintain more reliable statistics and software quality metrics.
For example, the number of errors in the logs of an application or component, bug reports and other reports that a prod tester can do, more realistically demonstrates the quality of the software compared to the same reports from the test environment.
7. It is always better if the tester makes a mistake on the sale than the end user.
Usually, after the task has been delivered, the tester does basic checks for new or changed functionality on the sale. In addition, we have a separate dedicated person in our company - a tester for sales. I want to reiterate that I do not position QA on the prod as a replacement for pre-production testing, and, of course, to prevent the appearance of bugs and to carry out preventive measures is definitely necessary. But such testing can be an excellent additional technique in the process of ensuring the quality of your project.
Useful QA practices in production that work effectively on our project1. Checking the delivered tasks to ensure that they are well-established and work on the new environment.
For example, when we introduce integration with a new partner, in addition to tests on pre-sales, we will definitely check integration after delivery, because there are a lot of settings depending on the environment (API, URLs, components). There are also 3rd party issues - errors are not on our side, but on the side of integrable services.
2. Logging and auditing.
Good logging helps developers and testers to notice the problem before the end user guesses it, and also to notice the places that need to be optimized. Audit of actions and changes always makes it easy to find out the reasons for this or that behavior. For example, if the credit policy component cannot issue a loan decision, to analyze why this happened, we first turn to the logs. This item applies to both prodcution and pre-production environments.
3. Monitoring and alert system
As I mentioned above, monitoring on certain metrics is one of the best ways to control that everything is “ok” with our application. Moreover, when a problem arises, it is necessary to send this notification to interested parties (for example, the number of loan applications is 20% less than expected - we send IT and business department alerts, CPU load is higher than the norm - alerts to admins and devas). You need to make sure that alerts about problems are timely and relevant, and also really indicate the problem.
4. Regression and stability testing
It is a good practice to periodically pass regression tests in order to make sure that nothing has failed anywhere. It can help in some narrow and specific cases where monitoring does not see problems.
5. Reporting and statistics
As in any testing, reporting and statistics on the results of the pre-testing makes the process more transparent, the quality of the software and the causes of defects are more observable.
All errors can not be identified in the pre-sale, so they will fall into the combat environment. If they are detected by users, it will affect the company's reputation and, ultimately, the loss of money. Testing for sale will help prevent this.