Very soon, on December 17, a
master class by Mark Paulk, a co-author of the Capability Maturity Model for Software, will be held at the Luxoft Training Center.
Mark Paul develops and teaches courses in software development, improving the software development process (CMM and CMMI), process maturity, agile methodologies, software development project management and statistical analysis.
We invite you to read his article on the relationship between the organization of the software development process, software quality and personal productivity.
The impact of the organization of the software development process on the quality of programs and on personal productivity in the Individual development process
')
Mark Paul, Carnegie Mellon University
INTRODUCTIONOne of the basic concepts in modern theory of software development processes is that strengthening the role of processes, or adherence to “best practices,” increases the productivity of developers and improves the products they create. In practice, it is rather difficult to separate one from the other, since a network of complex causal relationships is built up between these two categories, but this separation is still quite common, since each of these concepts is complex and includes several dimensions. And although the organization of the development process does not guarantee the successful completion of the project, it nevertheless increases this probability.
The belief that the organization of the software development process adds value is at the heart of such models and standards as the Maturity Model of Software Development Processes (CMM) and CMM Integration (CMMI). The nature of the improvements depends on the business process, but as a rule, they affect performance and quality. In order to demonstrate the impact that an Individual Software Development Process (Personal Software Process (PSP)) has on productivity and quality, you can use the statistical data obtained during the analysis. The PSP demonstrates the growth in productivity and quality following the adoption of an orderly development process, but it also illustrates some of the difficulties in defining these concepts. Watts Humphrey successfully applied the principles of organization of software development to CMM, Team Software Process (TSP) and PSP. Many studies show the impact of these principles on the quality and performance of the organization, team (project) and individual, respectively.
PSP incrementally applies the concepts of development and quantity management to the work of a developer in a learning environment. There are 4 main processes in PSP: PSP0, PSP1, PSP2 and PSP3. Each process builds on the previous by adding engineering or managerial tasks. Step-by-step addition of tasks allows the developer to analyze the impact that new techniques have on his or her personal effectiveness. Usually given 10 tasks. PSP data is well suited for research, as many factors affecting project performance and introducing “interference” to research data, such as variability of requirements and teamwork difficulties in the PSP, are either hampered or completely eliminated. Students in the PSP use a wide variety of programming languages, but only programs written in C (unless otherwise indicated) (to eliminate possible data corruption) with a total of 2,435 were included in the sample.
IMPACT ON QUALITYFor simple quality analysis, the PSP uses available defect density data. In software development, data on the number of defects found in the first year (or in the first six months or another period of time) is often used as a measure of quality. Since the PSP is trained in the classroom, the products do not pass the “field test”. A reasonable option is to use the defect density metrics found during testing. The box diagram (Fig. 1) shows measurable and statistically significant improvements in software quality over 4 PSP processes. Quality improved by 79 percent, and volatility decreased by 81 percent.
However, it is necessary to make a reservation: in spite of the fact that the density of defects can be used to measure quality, customers are concerned not with defects, but with failures. Defects can go unnoticed for years without affecting the normal operation of the product. Even a quality measurement such as the average time between failures is not perfect, since many aspects of software products matter to customers (and the customer is the main quality evaluator). For example, Garvin identifies nine dimensions of quality (performance, performance, functionality, safety, standards compliance, reliability, durability, maintainability and aesthetics), pointing to the complexity of the concept of "universal quality". The quality features of the software listed in ISO 9126 include functionality, reliability, usability, performance, maintainability and portability. Unfortunately, in the context of PSP, it is only possible to analyze the density of defects without taking into account a larger context. However, here the difficulties can be very serious.

Fig. 1. Improved PSP quality
In the box diagram, “boxes” are depicted on the 25th and 75th percentiles of the data set, and the median is the center line. In this variant of the box diagram, the “contacts” are represented by a one-and-a-half interquartile range and can be used to determine emissions. The line through the entire diagram represents the overall average for the entire data array. Student's t-test for each couple (Each Pair Student's t) and Tukey-Kramer's multiple comparison of all pairs of groups (All Pairs Tukey-Kramer) give liberal and conservative comparison criteria, respectively; if comparative circles do not overlap each other or the external angle of their intersection is less than 90 degrees, it can be concluded that the average values ​​of different groups differ significantly at a given level of confidence (α = 0.05).

Fig. 2. Improving the quality of tasks
Another reasonable question may be this: will not the study of the number of defects provide more information than an analysis of their density, given that the size of the tasks is about the same? As can be seen in fig. 3, the size of the last five assignments is greater than the first five, but the variability of dimensions underlines the unsuitability of using the number of lines of code (LOC) as a measure for determining the size of a project. Since all students were given the same assignments and one programming language was used, the differences should be explained by the decisions that each individual programmer took.

Fig. 3. Different sizes (number of lines of code) of PSP tasks
Despite doubts about the use of lines of code as a measure of the dimension of the program to normalize the number of defects, in fig. 4, it can be seen that the number of defects in PSP jobs is generally decreasing, despite the fact that the number of lines of code is increasing. This confirms the hypothesis that the organization of the development process in the PSP improves the quality.

Fig. 4. The number of defects found during testing in the PSP
It may be asked what factors other than the process affect the quality of the software. Among possible options may be offered such factors as the experience and education of the programmer, however, neither the one nor the other, as the study showed, does not affect the quality. Consideration of a larger array of data that takes into account various programming languages ​​suggests that the programming language is not such a factor, in contrast to the individual abilities of the programmer, as shown in Fig. five.
The programmers were divided according to their abilities based on the results of the first three tasks at four levels. As can be seen from fig. 5, based on the data re-modification model, leaders (TQ) invariably cope with the tasks of better laggards (BQ) (the two groups between them, denoted as B M2 and T M2, also maintain their relative positions). The software quality of students from the upper quartile improved more than twice, the students from the lower quartile improved more than four times.
It should be noted that the ability of programmers can be measured in many other ways. The method we have chosen focuses on the quality of the software found in testing, which is affected by the assumption of several defects (high-quality development) and their effective identification and elimination (high-quality peer review). In the analysis, these two reasons are not separated.

Fig. 5. Qualitative trends according to the abilities of programmers
IMPACT ON PERFORMANCE
A similar analysis can be carried out for performance (the ratio of the result to the resources expended), measured as the number of lines of code per hour. As shown in fig. 6, performance in PSP processes increased by 12 percent, and volatility decreased by 11 percent (as well as the statistically significant difference between PSP0 and PSP3). Whether such an increase is significant remains at the discretion of the reader. In many environments, factors such as the variability of requirements most likely negate this small increase, but the effect is obvious in a controlled PSP classroom environment.

Fig. 6. Improving PSP Performance
The disadvantages of this analysis even exceed the imperfections of the qualitative analysis. Counting lines of code per hour is a very bad way to measure performance. However, it is impossible to say whether alternatives such as analysis of functional points, requirements or the number of user histories per hour are the best options, although all these four analyzes (and others) are used in software development projects. An alternative would be to analyze the number of hours spent on the task, as shown in Fig. 7. This analysis measures the performance for each individual task, but does not take into account the differences in the decisions taken by each programmer (as shown in Figure 3).

Fig. 7. Effort spent on assignments
The analysis did not reveal the effect of education and the number of years of experience on productivity. For the entire PSP dataset, the C ++ and Java languages ​​showed “better performance” than C and Visual Basic when measuring the number of lines of code per hour. However, if you look at the amount of effort spent on solving problems (as shown in Fig. 8 for setting number 10), the difference between the programming languages ​​was not revealed.

Fig. 8. Differences in efforts when using different languages ​​(task number 10)
When measuring performance as the number of lines of code per hour, the programmer's abilities affect performance. Moreover, it was found that productivity grows with quality (as expected).
CONCLUSION
The results presented in this article echo the results of previous work on the PSP, but pay attention to some of the difficulties associated with the interpretation of the results. There were doubts about the possibility of drawing conclusions about real projects on the basis of the tasks performed by students. However, PSP classes are often conducted in real conditions, rather than in classrooms, and the developers in this sample have up to 34 years of experience with a median value of 7 years of work experience. In this regard, students studying PSP are more like typical developers working on projects than students of the computer science department.
A more important issue is the extent to which the tasks performed during the PSP training correspond to the actual design tasks. And although each task fits into the framework of real-world design work, the tasks in the PSP are not related to such issues as uncertainty and high variability of requirements or product integration, which present the greatest difficulties in real projects and are areas where experience and training can play a key role for success. The PSP teaches the basics of the Team Development Process (TSP), and it has been found that the quality and performance of the TSP projects are also improved.
The most difficult question raised by this study is characteristic of the entire software development industry: how can you reliably measure performance and quality? As this work has shown, common defect density metrics and the number of lines of code per hour have significant drawbacks. A potentially more reliable metric, such as the amount of processing (the percentage of time spent on repairing defects), is far from being known to everyone and is relatively rarely used. It may be that the described metrics will be useful, given that it is unlikely that it will be possible to find and implement the best options, but any conclusions based on these analyzes should be carefully weighed before making any decisions. In evidence management, context plays a huge role.
One reviewer of this article remarked that it would be fair to ask another question: why is the PSP not used more often if it is so successful? Unfortunately, not all of the many best software engineering practices are accepted by companies, despite strong evidence of the benefits of their use. For example, there is a huge amount of research confirming the effectiveness and efficiency of inspections, but how many organizations systematically use some form of friendly assessment and even more rigorous peer review? Researchers may collect evidence to help make informed decisions and, as teachers and consultants, may struggle to adopt these or other best practices, but the organization of processes in software engineering remains relatively young.
Capability Maturity Model, CMM and CMMI are registered trademarks of Carnegie Mellon University.
SMCMM Integration, Personal Software Process, PSP, and SEI are service marks of Carnegie Mellon University.
Read more about Mark Paul’s master class on December 17 at Luxoft Training.