What we know and what we do not know about the assessment of labor costs in software development

The overwhelming mass of research and reports confirms the tendency to exceed budgets and deadlines for program projects. On average, this excess is of the order of 30 percent ¹ . Moreover, if we compare the estimation accuracy in the 1980s and the one that appears in recent studies, we will not find a significant difference (The only analysis suggesting a significant improvement in the quality of the assessment is found in the Chaos Reports from the Standish Group. However, this is ", Most likely, stems from the fact that researchers have improved the quality of their data, moving from overloaded with problematic projects to a more representative sample ² ). Evaluation methods also did not change significantly. Despite intensive research in the field of formal assessment models, the “expert assessment” continues to be the dominant method ³

The apparent lack of breakthroughs in improving the methodology for estimating labor costs does not mean that we did not become more aware of this. In this article, I will try to summarize some of the knowledge that we, in my opinion, now have. Some of this knowledge may potentially improve the quality of the assessment, some may not be able to improve it, and some relate to what we know about what we do not know about the assessment of labor costs in software development. All materials that I use to confirm my theses are published ¹ .

What do we know

After analyzing the research on the evaluation of labor costs, I chose seven results that are consistent with most studies:

There is no "best" model or assessment technique.

Many studies are devoted to comparing the accuracy of the estimates obtained using various models and methods, while we observe a wide variety of “winners” in these competitions for accuracy ⁴ . The main reason for this lack of stability in the results is that many key relationships, such as between the size of a project and the complexity of development, vary significantly from context to context ⁵ . In addition to this, the variables with the largest impact on labor intensity also vary, prompting us to conclude that the models and assessment methods must be matched to the context in which they are applied.
')
The lack of stability of key relationships also explains why statistically advanced assessment models, as a rule, do not improve or practically do not improve the accuracy of estimation compared to simple models. Statistically advanced models rely too much on historical data and may produce even worse results than simple models, when applied in a different context. The results suggest that it is better for software companies to build their own assessment models than to expect universal methods and tools to be accurate in their particular context.

Customer loathing at low prices leads to an overspending of the project budget

The tendency to underestimate labor costs is most pronounced in situations where the choice of supplier is made on the basis of price, for example, when quotations are requested. In less price-sensitive situations, such as in-house development, there is no such tendency - in fact, here you can even face the opposite phenomenon. This suggests that the key reason for underestimating is customer focus on getting the lowest possible price, that is, when suppliers who underestimate their labor costs are more likely to become executors. This observation suggests that customers can avoid cost overruns, paying less attention to the estimated cost and more to the competences of the contractor.

“Min-max” rating intervals too narrow

Estimation intervals, such as the 90 percent confidence interval, are systematically too narrow to reflect the actual uncertainty in labor requirements. Despite strong evidence of our inability to estimate minimum and maximum levels of effort, current assessment methods continue to suggest that this is a solvable task. This is especially pronounced when using PERT- methods (estimation by three points), in which the estimated estimate is based on the most probable, minimum and maximum estimates.

Instead of using expert estimates to determine the minimum and maximum values for estimating labor costs, developers should use historical data on previous estimation errors to set realistic minimum-maximum intervals ⁶ .

It is easy to mislead those who value, but it is difficult to get rid of the error

Any evaluation of labor costs in the field of software development, even based on formal assessment models, requires expert judgment. But, although expert judgment can be quite accurate, it is also strongly influenced by external factors. Probably, the strongest (and negative) impact occurs when those involved in estimating labor costs, before or during the assessment, receive information about the budget, customer's expectations, available implementation time, or other quantities that may come up. "Anchors attracting evaluation." Without noticing this, experts will give such an assessment, which is unreasonably close to the values of "anchors". For example, knowing that a customer expects a low price or a small amount of man-hours will most likely lead to an underestimation of labor costs. Expert judgment can also be affected by a query containing “loaded” words, such as “How much can such a small and simple project cost?”

Despite many studies on how to get rid of false assumptions and how to neutralize bias in the assessment, there are still no reliable methods for this. The main conclusion from this situation is that the persons responsible for the assessment should be protected in every possible way from irrelevant information and information that could form a delusion - for example, by removing such documentation from the requirements.

Relevant historical data and checklists improve the accuracy of the assessment

One of the well-documented ways to improve the accuracy of an assessment is to use historical data and checklists of assessment. When historical data is relevant, and the checklists are adapted to the needs of the company, there are less chances that certain activities will be lost, and more - that a sufficient margin will be made for risks, and previous experience will be reused to the maximum. This in turn leads to a more realistic assessment. Especially if data on similar projects can be brought in for a so-called. “Estimates by analogy” ⁷ , the accuracy of the assessment is significantly improved.

Despite the obvious usefulness of such tools, many companies still do not use them to improve the accuracy of their ratings.

The combination of independent estimates improves the accuracy of the assessment.

The average of several estimates from different sources is more likely to be more accurate than most individual estimates. The key factor for improving the accuracy is precisely the “independence” of the assessments, that is, the resulting assessments should differ in terms of expertise, expert background and the assessment process used. The evaluation process for a “ Delphic method ”, for example, “planning poker”, in which developers show their independently obtained estimates of labor costs (their “cards”), looks particularly useful in the context of estimating the labor costs for software development.

A grouped, structured assessment process adds additional value compared to the mechanical integration of assessments, because knowledge sharing increases the total cumulative knowledge in a group. Negative effects of using group judgments, such as “ group thinking ” and readiness for greater risk in a group (compared to individual decision making), are not documented for cases of estimating labor costs for software development.

Evaluation models, on average, are less accurate than expert estimates. However, the difference in assessment processes by models and evaluations by experts makes combining both approaches particularly useful for improving the accuracy of an assessment.

Ratings may be harmful

Estimates not only predict the future, but often influence it. Too low grades may lead to lower quality, possible rework at subsequent stages, and higher risks of project failure; too high grades can reduce productivity according to Parkinson’s law, which states that “any job takes up all the time allotted to it.”

That is why it is important to carefully analyze whether it is really necessary to evaluate labor costs at this stage. If in fact it is not absolutely necessary, it may be safer to work without an assessment, or to postpone them at a later stage when additional information appears. Flexible (“agile”) software development methodologies — which only plan for the nearest sprint or release, using feedback from previous sprints or releases — can be a good way to avoid the harm caused by assessments made too early.

What we do not know

There are several evaluation problems for which a satisfactory solution has not yet been found, despite the research volumes. Three of them most clearly emphasize the scarcity of our knowledge in this area.

How to accurately estimate labor costs in mega-large, complex software projects

Mega-projects have increased requirements for the assessment of labor costs. Not only because there are big bets on the line, but also due to the lack of relevant experience and historical data. Many of the activities typical of mega-projects, such as solving organizational issues involving many participants with different interests and goals, are very difficult to accurately assess, because they tend to affect changes in business processes and involve complex interactions between project participants and existing software.

How to measure the size and complexity of programs for accurate assessment

Despite years of research into measuring the size and complexity of programs, none of the proposed metrics is good enough when it comes to estimating labor costs. Some contexts of size and complexity seem to make it easier to estimate correctly, but such contexts are quite rare.

How to measure and predict productivity

Even if you appreciate the size and complexity of the project well, [for a reliable assessment] you need to reliably predict the productivity of the teams and team members who will have to work on it. This prediction is complicated by surprisingly large differences in performance between teams and within teams. There is no reliable method for such an assessment (with the possible exception of performing realistic test tasks - trialsourcing).

At the moment, we don’t even know if there are “scale economies” inherent in software development projects (productivity increases with the size of the project) or “savings” (productivity decreases with the size of the project). Most empirical studies seem to indicate that, on average, software development projects are characterized by "economies of scale", while most practitioners believe the opposite. Unfortunately, the results of studies confirming the existence of "savings", apparently, are a consequence of how the study is organized, and do not reveal the deep connections between the scale of the project and productivity.

Thus, what we currently know about the evaluation of labor costs in software development, in fact, does not allow us to solve the problem of estimating labor costs on real projects. However, we can point out several practices that, as a rule, make it possible to increase the reliability of such estimates. In particular, companies are likely to be able to improve the accuracy of their estimates if they:
- will develop and apply simple models, adapted to their contexts, in combination with the method of expert assessment;
- use historical data on estimation errors for constructing minimum-maximum intervals;
- avoid situations in which the expert risks exposure to misleading information or irrelevant information;
- use checklists approved by the organization;
- use structured group assessment methods, with a guarantee of independence of assessments;
- avoid early assessment based on incomplete information.

Highly competitive tenders, with a focus on the lowest cost, with a high probability will lead to the choice of an overly optimistic Contractor, and, as a result, to a failure of the terms of performance of the contract and poor software quality. In other areas, this is called the winner’s curse . In the long term, most customers will begin to realize that their obsession with the lowest contract price in the field of software development ultimately negatively affects the success of the project. Until then, development companies should try to keep track of those situations where they can be selected only in the case of an overly optimistic evaluation of projects, and have response strategies in stock to counter or avoid the "winner's curse."

Sources:

1. T. Halkjelsvik and M. Jørgensen, “From the Time of Performance,” Psychological Bulletin, vol. 138, no. 2, 2012, pp. 238-271.
2. M. Jørgensen and K. Moløkken-Østvold, “How Large Are Software Cost Overruns? A Review of the 1994 CHAOS Report, ”Information and Software Technology, vol. 48, no. 4, 2006, pp. 297-301.
3. M. Jørgensen, “A Review of Studies and Expertise on Software Development,” J. Systems and Software, vol. 70, no. 1, 2004, pp. 37–60.
4. T. Menzies and M. Shepperd, “Special Issue on Repeatable Results in Software Engineering,” Empirical Software Eng., Vol. 17, no. 1, 2012, pp. 1–17.
5. JJ Dolado, “On the Problem of the Software Cost Function,” Information and Software Technology, vol. 43, no. 1, 2001, pp. 61–72.
6. M. Jørgensen and DIK Sjøberg, “An Effort Prediction Interval Approach Based on the Empirical Distribution of the Pure Estimation Accuracy,” Information and Software Technology, vol. 45, no. 3, 2003, pp. 123–136.
7. B. Flyvbjerg, “Curbing Optimism Bias and Strategic Planning in Reference: Reference Class Forecasting in Practice,” European Planning Studies, vol. 16, no. 1, 2008, pp. 3–21.

about the author

Magne Jorgensen is a researcher at the Simula Research Laboratory and a professor at the University of Oslo . The main directions of his scientific work are the issues of labor cost estimation, the process of searching for an executor, outsourcing, and the evaluation of software developer competencies. You can contact him by email. mail magnej@simula.no

Source: https://habr.com/ru/post/235271/

All Articles