📜 ⬆️ ⬇️

How much do unit tests cost?

image

Now, at the peak of the economic cycle in such a hot industry as software development it is not customary to count money. Often, this process is in principle positioned as a creative activity, where there is no need to substantiate anything, but the artist knows better what to write and how. In particular, there are a lot of controversies on unit tests and TDD, but, unfortunately, they all fall into unsubstantiated allegations and emotional attacks, confirmed by proofs on well-chosen articles and books of methodologists, who earn from consulting and sales trainings, which, in their the queue does not contain absolutely any statistics or calculations, or, on the contrary, to the indiscriminate accusations of soft music and other sins of youth.

Unlike similar empty disputes, this article will give you not only food for thought, but also a methodology for assessing the economic feasibility of implementing unit tests on a specific project. Immediately emphasize that, like any assessment, our assessment for the project of the introduction of unit tests will be based on assumptions about the future of the product, the team and various indicators, which can be assessed only subjectively. Nevertheless, a situation where a programmer gives an expert assessment of indicators, at least somehow related to his field of activity, is much better than directly asking him whether companies benefit from using unit tests or not. In the end, programmers are usually not inclined to even think about basic financial indicators, but not only developers, but also testers and managers are able to estimate the time spent writing test tests or the likelihood of a critical bug appearing.
')

What is the cost


First of all, if we are going to calculate the cost of anything, we must first understand at least in general terms what cost is. When we buy a loaf of sausage, this question does not arise, as the price tag hangs under it. But in the case of a project, we are not dealing with a one-time payment, but with a cash flow that has been going on for a long time, and we first invest and then pay us. In order to correctly evaluate it, you need to consider that the money received tomorrow will cost less than today. If the sausage were sold by subscription, it would not be difficult to do so, the cost of the subscription could be calculated by discounting future payments for it by the amount of inflation. However, in the case of a project, we need to take into account that the company, investing money in it, expects not only to compensate for its costs, but also to earn, so much so as to cover its risks. There are a huge number of risks, but ultimately it all comes down to the fact that the refund of money invested in the project is not guaranteed and poorly predictable. You can take the money to a bank or lend to some strong borrower and get guarantees of regular interest payments, but you cannot invest in a project so that you end up with a predictable cash flow strictly on schedule. Therefore, from the point of view of the investor, in this case the company, discounted to the required rate of return, the cash flow generated by the project, also called the net present value, should be positive:

$ 0 <NPV = FCFF / WACC + FCFF / WACC ^ 2 + FCFF / WACC ^ 3… $


In order to calculate it, we need to know how much money the project will bring or eat in the first, second, third year, etc., as well as the discount rate, which should not only be more profitable than at a bank, but also cover the risks .

In fact, this is a somewhat simplified formula. Strictly speaking, the rate will vary from year to year, so the denominator would be worth using WACC1 * WACC2 * WACC3, etc., but in practice this is even neglected by professional appraisers, since By virtue of the WACC calculation methodology, the market has already set market expectations for future rates and is unproductive to make its own assumptions about this.

There are different types of cash flows, but I took the most convenient for our purposes cash flow to the company, which takes into account not only the money owed to the owners, but also creditors. Of course, most IT companies have no noticeable debts simply because no one lends them without collateral, but they have nothing to mortgage, but there are still exceptions, for example, this approach can be convenient when evaluating a project in an in-house development of a credit company. . The second reason why we are interested in FCFF is the simplicity of its calculation, FCFF is just an operating profit minus taxes, net capital costs and changes in working capital.

Since FCFF is a cash flow simultaneously to both owners and creditors, it is discounted at a weighted rate of the cost of capital, both own and borrowed.

In large companies, the cost of capital tracks the financial department, so you can just ask, but for the general case, we still need a formula for calculating the WACC:

$ WACC = Re * P / EV + Rd * (1 - P / EV) $


Here Re is the cost of equity, Rd is the cost of borrowed capital (that is, the effective rate for the company's debts), P is the market value of equity, EV is the total value of the enterprise (EV = P + D, where D is debt).

Next, we need to define Re, there are different models for this, but the easiest way is to take the CAPM model, where Re = Rb + β * Premium, where Rb is the risk-free rate, Premium is the premium to profitability for investing in equity, and not borrowed, and β - is the risk coefficient, which shows how much more risky our project is relative to the business of some average company.

How is quality assured and what are unit tests


Now we need to figure out what unit tests are. Strangely enough, many people, even those close to the development, often call any automatic tests as unit tests, but this is certainly not the case.

Testing is divided into functional and non-functional. Non-functional includes things that are not directly related to the functionality of the software, for example, load testing or tests related to security. The functional just means checking the compliance with the requirements and the absence of errors in their implementation, it is about him that will be discussed.

The first thing that needs to be done to ensure quality is to take control over the developers and hire the person who will be responsible for it. So a tester appears in the team who is engaged in manual testing. No serious project is unthinkable without manual testing, this is the foundation that is vital for the project and the overwhelming majority of problems that will be discovered and corrected in time will be the merit of the testers. At this stage, everything looks simple: if you want quality, hire a quality specialist.

As the project grows, the time for manual testing will remain less and less, so testers will be more and more busy working with new system features and will be less likely to check those parts of the system that should not have changed. However, since the complexity of the system grows and there is a possibility that between its components there will be explicit and implicit dependencies that developers may theoretically lose sight of, it is still advisable to check some things each time before release. This problem is particularly acute in flexible methodologies with their short iterations and frequent releases. This logically implies the need to automate the work of testers, for example, write a script that will click the buttons and check the result or use the tools more efficiently and turn an ordinary tester into an automatic testing specialist who is able to automate the routine part of his work.

These measures are able to provide a decent level of quality, but there is no limit to perfection. What testers do is called black box testing, it’s not their responsibility to know all the implementation features, so testing is usually focused on typical scenarios and does not set a goal to break the system or test its behavior under some unusual conditions. In addition, some things are not easy to check simply because they lack an interface, for example, if the purpose of the iteration is to develop a library for accessing data or some specific API, to test it you will need to write some kind of application or at least something would use this component. In such cases, you have to partially return the quality control function to the developers and ask them to write integration tests. This is the second type of automated tests that are used on the project. Their goal is to test the correctness of the interaction of the components of the system, written by different people, to test the behavior of these components in border conditions, as well as the correctness of the response to failures in the environment.

Well, we have testers who test the entire project for compliance, there are tests to automate their work and there are tests that test parts of the project written by different developers, what else can you do? Unit tests claim to be the fourth level of quality control. They check the code written by one programmer, and, as a rule, the minimum part of the code that is suitable for testing is tested, for example, a separate class. In practice, the developer himself most often writes unit tests for his own code, and their number and necessity are poorly controlled. According to my observations, the typical amount of developer time spent on unit tests can be called about 40% of the time to develop the feature itself, although this ratio can vary greatly. The open-source SQLite project case is widely known, where due to an excess of low-skilled free labor provided by a large number of people willing to work on a well-known project, this workforce is utilized by the army, that is, writing useless unit tests, whose volume at some point in 100 times the amount of the DBMS code itself. The reverse cases, when unit tests are not written or written to a minimum, are not surprising either. In the end, almost all software developed before the end of zero, that is, before the era of outsourcing and Agile, was created without unit tests.

Costs, adjustment for complexity and mythical person-month


Of course, if you need to write unit tests or something else, you will either have to allocate more time to the project, or hire additional developers. The main question that arises is whether the dependence of the time and cost of development on the amount of code is linear, or whether it obeys another law.

Once upon a time I had a free SVN repository on the well-known Assembla service, which provided source hosting services and collaboration tools, that is, a tracker, statistics, and other nonsense. Later, the freebie ended, but they did not stop sending newsletters and alerts. So, in 2015, their employee published a short post called “How many people should discuss a task?” Now it has been preserved only in the Web Archive . The essence of the post was as follows: the employee collected statistics on clients, plotting the duration of the task versus the number of people who discussed it, the result was as follows:

image

It is seen that the dependence is nonlinear. Two people are usually involved in solving a problem of two days, three people - four days, and four people - already six days. What are they doing there? It can be assumed that the task requires several stages of work, for example, in the case of two people, Vasya does his part of the task and then transfers it to Pete, so it lasts two days. Three people can already quarrel and share their duties for an extra day, find out who is guilty and what to do, and a group of seven people will spend six additional days on discussions, approvals and otfutolivanie each other.

Of course, it can also be assumed that a friendly team of seven people gives them complex tasks that are much easier and the more people are engaged in a task, the more grandiose it can be, because friendship is magic! Therefore, such arguments may seem far-fetched, and I will not include them in subsequent calculations, however, if you want to get a more conservative estimate, it would not be superfluous to make some correction to the non-linearity of cost increases with increasing project code base, which, of course, unit tests are included, or a certain margin of safety is laid in the requirements for the NPV level.

If we explain the non-linearity of this schedule solely by increasing the size of the team, then the costs associated with it can be estimated using the following table, depending on the size of the time lost on communication, on the size of the working group:

image

For example, if the team has five developers, and you think that you need to hire two, so that everyone can spend an additional 40% of their time on unit tests, be prepared for the fact that development costs can grow by more than 40%. The team will grow and become less effective, instead of 5 * 0.625 = 3.125 conventional units of productivity, it will have 7 * 0.539 = 3.77 units, and the workload will increase from 1 to 1.4 conventional units of work, respectively, the time required to develop increase by 16%.

An interesting conclusion that can be drawn from the schedule is that when there are more than ten people, the value of each new participant becomes less than the additional expenses for communication and the Brooks law begins to work. It remains only to try to divide the tasks into smaller ones, or to involve more experienced and efficient employees in their implementation.

Of course, it is difficult to argue that the non-linearity of the schedule from Assembla is associated only with a drop in efficiency as a result of the growth of the team, but it agrees well with the intuitive understanding of the complexity and the Brooks law, therefore, if you don’t want to risk and you need a conservative estimate, this data become a good help.

Use of unit tests


In addition to costs, unit tests are beneficial. Of course, in the overwhelming majority of cases, a bug that could be caught by a unit with tests will be caught at other levels of quality control, but there is always the possibility of a technical failure and, theoretically, unit tests can reduce it. For me personally, such cases are unknown, fortunately, all the testers with whom I have ever worked were extremely responsible people, but when it comes to such low probabilities, personal experience can be unrepresentative. Failures can have different consequences, for example, a company can have an SLA, the violation of which will result in well-defined financial losses, say, the company will have to give customers one month of free use of its services as compensation, having lost 1/12 of revenue. In this case, tightening quality control, which reduces the likelihood of SLA violations during the year from 10% to 8%, will reduce the average annual loss by about 0.17% of revenue. This money will be the positive component of the cash flow that needs to be added to the model.

Please note that such a simple calculation is applicable only when the probability of loss is small, but if the probability is higher than 15-20% and can lead to bankruptcy or liquidation of the company, it is desirable to use optional valuation models, such as a decision tree. Fortunately, in most cases, some stupid bug is not something that can bankrupt a company and we don’t need to dive into the horror of option cost calculations.

Example One: Bison Company


Bison is a large online store, they themselves call themselves the online retailer No. 1 in Russia. The company is not public, but as part of a recent deal to recapitalize its total capitalization was estimated at 50 billion rubles, which is twice as much as annual revenue. Additional capitalization was required due to operating losses, however, shareholders hope to achieve an operating profit margin of 10% after the company succeeds in winning a higher market share and doubling revenue during the year, after which it will have to start earning and revenue growth will slow down up to 30% in the second year, 20% in the third year and, finally, it will be established at the level of 10% in the fourth and subsequent years. However, banks are not very sure about this and give Bison a long time with caution, the company's total debt is only 10 billion rubles at a rate of 11%. Bison is rather clumsy and poorly managed at an operational level, the uncontrolled hiring of employees has already led to the fact that it employs 600 programmers, whose total payroll is 1.5 billion rubles a year and who spend about 30% of their working time on unit tests. The company has no obligations to customers and technical failure can only lead to a temporary stop in sales, while in case of a failure a rollback to the old version of the site takes about an hour.

What is the NPV from using unit tests in Bison?

Bison’s revenues should be 50, 65, 78 and 86 billion in the first, second, third and fourth year, respectively. We take the probability of failure equal to 33%, that is, an incident that can flood their site for a long time can happen about once every three years, which is not so bad. Suppose the use of unit tests can reduce it to 25% simply because, in addition to developer errors, there is also the likelihood of various hardware failures, DDOS attacks and other troubles. If an online store site is unavailable for an hour, the retailer loses no more than 0.023% of revenue, even taking into account the fact that buyers are active on average only 12 hours a day. In other words, unit tests reduce the company's losses by 11.5 million rubles in the first year, 14.8 in the second, 17.8 in the third and 19.6 million in the fourth year.

Even without taking into account the growth of staff and developer salaries, the cost of unit tests will amount to 450 million rubles a year.

I think at this stage you already understand that unit tests inflict tremendous damage on the financial condition of the Buffalo even without adjusting for the increase in complexity and problems associated with loss of controllability. And this is in conditions when shareholders are forced to pay money to finance the work of a loss-making company! No further calculations will be able to rehabilitate the unit testing in this case, but we still continue to figure out how to discount cash flow.

Let us return to the developers, let us assume that the payroll grows by 10% per year, then the total effect from the use of unit tests is -438, -480, -527 and -579 million rubles of operational loss in the first, second, third and fourth year, respectively after which the loss grows by 10% annually. Unit tests in this case do not affect the net capital costs and working capital, but the loss leads to savings on taxes equal to 20% of the amount of loss, respectively, it must be multiplied by 0.8: -351, -384, -421 and -463 million rubles.

The company's EV is 50 + 10 = 60 billion rubles, P accounts for 83% of the capital, 17% for D, we know that the cost of debt is 11% per annum, then to calculate the WACC it remains only to find the cost of equity. Bison works in Russia, therefore, as a risk-free rate, you need to take the effective yield of government bonds with the highest duration, now it is 7.6%. The premium for investing in equity varies from year to year, but usually it is around 4-6% per annum, we take 5%, and to determine the β coefficient, we turn to the directory and find an unprofitable risk ratio for companies from the online retail industry (unlevered beta) equal to 1.3. But Buffalo has, albeit small, debts, so you need to make an amendment and calculate the lever beta (levered beta):

$ βl = βu * (1 + (1 - T) * D / P) = 1.3 * (1 + (1 - 0.2) * 10/50) = 1.51 $


Thus the WACC discount rate will be

$ (7.6 + 1.51 * 5) * 0.87 + 11 * 0.17 = 15 percent $


Finally, let's calculate how much the unit tests are for the Bison, for this we discount the first years of uneven growth separately, and for subsequent years with an increase of 10% per year we use the Gordon model.

Given the cost of the first year will be $ -351 / 1,15 = -305 $ million rubles, second $ -384 / 1.32 = $ -290 million, third $ -421 / 1.52 = -277 $ million

Starting from the fourth year, losses evenly grow by 10% per year, respectively, after the third year, the nominal loss from unit tests can be calculated by the formula $ -463 / (1.15 - 1.1) = -9260 $ Millions that must be brought to the first year: $ -9260 / 1.75 = -5291 $ one million rubles.
In general, the damage from the use of unit tests is $ 305 + 384 + 421 + 5291 = $ 6.4 billion rubles.

Example Two: Hyperstal Company


Vasily is a budding graduate of the Chelyabinsk College of Innovative Technologies. There are no Amazons and Google in Chelyabinsk, but there are many steel companies, one of which he was lucky to get. As it turned out later, the budgets are modest, the money is chronically short, so she could afford to hire a programmer with only a salary of less than 50 thousand rubles, including all taxes and mandatory payments. The first task of Vasily became software for controlling the operation of the blast furnace. This project should take no more than two months and is unlikely to be further supported and developed.
During Vasily’s visit to the workshop, the specialist in charge of production told him in general terms the following: “Dear colleague! Please take a look at this giant ladle with molten metal. If something goes wrong, we will not only be extremely discouraged, but also face technological difficulties. The fact is that if the blast furnace rises, the metal inside it will harden and it will take three months to eliminate the consequences of the accident. It will not be easy to deal with a giant piece of metal in the workshop and replace it with a new blast furnace. ” Later, Vasily found out that an emergency stop of a blast furnace could cost the company 8 billion rubles.
Question: Is it worth it for Vasily to worry about writing unit tests?
Since I no longer have the strength and patience to count the obvious, I will immediately say the answer: of course, yes. Vasily has no experience, he has a high probability of making a mistake (I give 50 percent, that alone, without the help of colleagues and without adequate quality control, his program will go down somewhere and 10 percent, which will lead to an accident), his time is worthless and the cost of error is extremely high. Since in this example we are talking about a short project that will be written and forgotten, there is no need to discount anything, it’s enough to compare Vasily’s salary for two months equal to 100 thousand rubles and expectation of losses of about 10% * 8 billion = 800 million rubles.

Example Three: XSoft


XSoft is a successful outsourcing company that has just signed a contract with another western customer.The customer plans to hire 7 programmers, his budget in this part is 15 million rubles a year, of which XSoft will take 3 million. The customer is a burdock and knows nothing about development. From the point of view of XSoft, should developers write unit tests?

Of course! In this case, the cost of writing and supporting unit tests is borne by the customer, and for the contractor, the additional amount of work means only an increase in the project duration and additional profit, which is at least proportional to the number of man-hours spent on unit tests, and at best grows out of priority -for increasing the code base and the complexity of the project. With your permission, I will not develop this idea further, delve into the intimate details of the outsourcer's relationship with the customer and discount its cash flows, since the conclusion is already obvious.

Afterword


, , . , . , , NPV / IRR. , IT. Excel .

Source: https://habr.com/ru/post/457632/


All Articles