Program code and its metrics

One of the topics in programming, in which interest periodically appears and disappears, is the question of software code metrics. In large software environments, mechanisms for counting various metrics from time to time appear. Wave-like interest in the subject looks like this because so far the metrics have not come up with the main thing - what to do with them. That is, even if some tool allows you to calculate some metrics well, then what to do with this is often incomprehensible. Of course, metrics are both the quality control of the code (we do not write large and complex functions), and the "performance" (in quotes) of programmers, and the speed of development of the project. This article is an overview of the most well-known software code metrics.

Introduction

The article provides an overview of 7 classes of metrics and more than 50 of their representatives.

A wide range of software metrics will be presented. Naturally, it is not advisable to bring all existing metrics, most of them are never applied in practice either because of the impossibility of further use of the results, or because of the impossibility of automating measurements, or because of the narrow specialization of these metrics, but there are metrics that are sufficiently applied. often, and their review will be given below.
')
In the general case, the use of metrics allows project managers and enterprises to study the complexity of a developed or even developed project, to estimate the scope of work, the style of the program being developed and the effort spent by each developer to implement a particular solution. However, metrics can only serve as recommendations, they cannot be fully guided, since programmers, trying to minimize or maximize a measure for their program, may resort to tricks up to a program’s performance, while developing software. In addition, if, for example, a programmer wrote a small number of lines of code or made a small number of structural changes, this does not mean that he did not do anything, but could mean that the defect of the program was very difficult to find. The latter problem, however, can be partially solved by using complexity metrics, since in a more complex program, the error is harder to find.

1. Quantitative metrics

First of all, we should consider the quantitative characteristics of the source code of programs (in view of their simplicity). The most elementary metric is the number of lines of code (SLOC). This metric was originally developed to estimate project effort. However, due to the fact that the same functionality can be split into several lines or written into one line, the metric has become practically inapplicable with the advent of languages in which more than one command can be written into one line. Therefore, there are logical and physical lines of code. The logical lines of code are the number of program commands. This version of the description also has its drawbacks, as it strongly depends on the programming language used and the programming style [ 2 ].

In addition to SLOC, quantitative characteristics also include:

number of blank lines
number of comments
the percentage of comments (the ratio of the number of lines containing comments to the total number of lines, expressed as a percentage),
average number of lines for functions (classes, files),
average number of lines containing source code for functions (classes, files),
average lines for modules.

Sometimes they further distinguish the evaluation of the style of the program (F). It consists in dividing the program into n equal fragments and calculating the estimate for each fragment using the formula Fi = SIGN (Ncom.i / Ni - 0.1), where Ncomm. _i is the number of comments in the i-th fragment, N _i is the total number of lines of code in the i-th fragment. Then the total score for the entire program will be determined as follows: F = SUM F _i . [ 2 ]

Also, the group of metrics based on counting some units in the program code includes Holstead metrics [ 3 ]. These metrics are based on the following indicators:

n1 is the number of unique program statements, including characters

separators, procedure names and operation signs (operator dictionary),

n2 is the number of unique operands of the program (dictionary of operands),

N1 - the total number of operators in the program,

N2 - the total number of operands in the program,

n1 'is the theoretical number of unique operators,

n2 'is the theoretical number of unique operands.

Given the entered designations, you can determine:

n = n1 + n2 - dictionary of the program,

N = N1 + N2 - the length of the program,

n '= n1' + n2 'is the theoretical vocabulary of the program,

N '= n1 * log ₂ (n1) + n2 * log ₂ (n2) - the theoretical length of the program (for stylistically correct programs, the deviation of N from N' does not exceed 10%)

V = N * log ₂ n - the volume of the program,

V '= N' * log ₂ n 'is the theoretical scope of the program, where n * is the theoretical vocabulary of the program.

L = V '/ V - the level of programming quality, for an ideal program L = 1

L '= (2 n2) / (n1 * N2) - the level of programming quality, based only on the parameters of a real program without taking into account theoretical parameters,

E _C = V / (L ') 2 - the difficulty of understanding the program,

D = 1 / L '- the complexity of the coding program,

y '= V / D2 - expression language level

I = V / D - the information content of the program, this characteristic allows you to determine the mental cost of creating a program

E = N '* log ₂ (n / L) - assessment of the necessary intellectual efforts in the development of the program, characterizing the number of required elementary solutions when writing a program

When using Halstead metrics, the disadvantages associated with the ability to write the same functionality with different numbers of lines and operators are partially compensated.

Another type of quantitative software metrics is the Jilba metrics. They show the complexity of the software based on the saturation of the program with conditional operators or loop operators. This metric, despite its simplicity, fairly well reflects the complexity of writing and understanding the program, and adding such an indicator as the maximum level of nesting of conditional and cyclic operators, the effectiveness of this metric increases significantly.

2. Metrics of program control flow complexity

The next large class of metrics, no longer based on quantitative indicators, but on the analysis of the program control graph, is called the program control flow complexity metrics.

Before directly describing the metrics themselves, for a better understanding, the control graph of the program and the method of its construction will be described.

Let some program be presented. For this program, a directed graph is constructed that contains only one input and one output, while the vertices of the graph correspond to those parts of the program code that contain only sequential calculations, and there are no branching and cycle operators, and arcs correlate with transitions from block to block and branches of the program. The condition for constructing this graph: each vertex is reachable from the initial one, and the final vertex is reachable from any other vertex [4].

The most common estimate, based on the analysis of the resulting graph, is the cyclomatic complexity of the program (McCabe's cyclomatic number) [4]. It is defined as V (G) = e - n + 2p, where e is the number of arcs, n is the number of vertices, p is the number of connected components. The number of connected components of a graph can be considered as the number of arcs that must be added to convert a graph to strongly connected. A graph is strongly connected, any two vertices of which are mutually attainable. For graphs of correct programs, that is, graphs that do not have sections and “hanging” entry and exit points unreachable from the entry point, a strongly connected graph is usually obtained by closing an arc with a vertex denoting the end of the program to a vertex denoting the entry point into this program. In essence, V (G) determines the number of linearly independent contours in a strongly connected graph. So in correctly written programs, p = 1, and therefore the formula for calculating cyclomatic complexity takes the form:

V (G) = e - n + 2.

Unfortunately, this assessment is not able to distinguish between cyclic and conditional constructions. Another significant disadvantage of this approach is that programs represented by the same graphs may have completely different predicates in terms of complexity (a predicate is a logical expression containing at least one variable).

To correct this deficiency, G. Myers developed a new technique. He proposed to take the interval as an estimate (this estimate is also called interval) [V (G), V (G) + h], where h is zero for simple predicates and h = n-1 for n-place predicates. This method allows us to distinguish between different predicates of complexity, but in practice it is almost never used.

Another modification of the McCabe method is the Hansen method. The measure of the complexity of the program in this case is represented as a pair (cyclomatic complexity, number of operators). The advantage of this measure is its sensitivity to the structured software.

Chen's topological measure expresses the complexity of a program in terms of the number of border crossings between areas formed by the program graph. This approach is applicable only to structured programs that allow only a consistent connection of control structures. For unstructured programs, the Chen measure essentially depends on conditional and unconditional transitions. In this case, you can specify the upper and lower bounds of the measure. The upper one is m + 1, where m is the number of logical operators with their mutual nesting. The lower one equals 2. When the control graph of the program has only one connected component, the Chen measure coincides with the McCabe cyclomatic measure.

Continuing the topic of analysis of the control graph of the program, we can single out another subgroup of metrics - the Harrison and Meijel metrics.

These measures take into account the level of nesting and the length of the program.

Each vertex is assigned its own complexity in accordance with the operator that it represents. This initial vertex complexity can be calculated in any way, including the use of Holstead measures. For each predicate vertex, we select a subgraph generated by vertices that are the ends of arcs emanating from it, as well as vertices reachable from each such vertex (the lower boundary of the subgraph), and vertices lying on the paths from the predicate vertex to some lower boundary. This subgraph is called the sphere of influence of the predicate vertex.

The reduced complexity of the predicate vertex is the sum of the initial or reduced complexity of the vertices within its sphere of influence, plus the primary complexity of the predicate vertex itself.

The functional measure (SCOPE) of the program is the sum of the reduced difficulties of all the vertices of the control graph.

The functional relation (SCORT) is the ratio of the number of vertices in the control graph to its functional complexity, and the terminal ones are excluded from the number of vertices.

SCORT can take different values for graphs with the same cyclomatic number.

The Pivovarsky metric is another modification of the measure of cyclomatic complexity. It allows you to track differences not only between sequential and nested control structures, but also between structured and unstructured programs. It is expressed by the relation N (G) = v * (G) + SUMMAPi, where v * (G) is the modified cyclomatic complexity, calculated in the same way as V (G), but with one difference: the CASE operator with n outputs is treated as one logical operator, not as n - 1 operators.

Pi - nesting depth of the i-th predicate vertex. To calculate the depth of nesting of predicate vertices, the number of “spheres of influence” is used. The nesting depth is understood to be the number of all “spheres of influence” of predicates that are either completely contained in the sphere of the vertex under consideration or intersect with it. The depth of nesting increases due to nesting not of the predicates themselves, but of “spheres of influence”. Pivovarsky's measure increases with the transition from sequential programs to nested programs and further to unstructured ones, which is a huge advantage over many other measures of this group.

The measure of Woodward is the number of intersections of the arcs of the control graph. Since such a situation should not arise in a well-structured program, this metric is used mainly in weakly structured languages (Assembler, Fortran). The intersection point occurs when the control goes beyond two vertices, which are consecutive operators.

The boundary value method is also based on the analysis of the control graph of the program. To define this method, you must enter a few additional concepts.

Let G be the control graph of a program with a single initial and only final vertices.

In this graph, the number of incoming vertices at the arcs is called the negative degree of the vertex, and the number of outgoing arcs from the vertex is called the positive degree of the vertex. Then the set of vertices of the graph can be divided into two groups: vertices with a positive degree <= 1; vertices with a positive degree> = 2.

The vertices of the first group are called the host vertices, and the vertices of the second group are the vertices of selection.

Each receiving vertex has a reduced complexity equal to 1, except for the final vertex, the reduced complexity of which is 0. The reduced difficulties of all the vertices of the graph G are summed to form the absolute boundary complexity of the program. After that, the relative boundary complexity of the program is determined:

S0 = 1- (v-1) / Sa,

where S0 is the relative boundary complexity of the program, Sa is the absolute boundary complexity of the program, v is the total number of vertices of the program graph.

There is a Schneidevind metric, expressed in terms of the number of possible paths in the control graph.

3. Metrics of data management flow complexity

The next class of metrics is the metrics of data flow control complexity.

Chepin's metric: the essence of the method is to assess the informational strength of a single program module using the analysis of the nature of using variables from the input-output list.

The whole set of variables that make up the list of I / O, is divided into 4 functional groups:

1. P - input variables for calculations and for providing output,

2. M - modifiable, or created within the program variables,

3. C - variables involved in the management of the program module (control variables),

4. T - variables not used in the program ("parasitic").

Since each variable can simultaneously perform several functions, it is necessary to take it into account in each relevant functional group.

Metric Chepina:

Q = a1 * P + a2 * M + a3 * C + a4 * T,

where a1, a2, a3, a4 are weight coefficients.

The weights are used to reflect the different effects on the complexity of the program of each functional group. According to the author of the metric, the functional group C has the greatest weight, equal to 3, since it affects the program control flow. The weights of the other groups are distributed as follows: a1 = 1, a2 = 2, a4 = 0.5. The weight of the group T is not equal to 0, since the "parasitic" variables do not increase the complexity of the program data flow, but sometimes make it difficult to understand. Taking into account the weighting factors:

Q = P + 2M + 3C + 0.5T

The Spene metric is based on the localization of data accesses within each program section. Span is the number of statements containing the given identifier, between its first and last appearance in the program text. Therefore, an identifier that appears n times has a spen equal to n-1. With a large spine, testing and debugging is complicated.

Another metric that takes into account the complexity of the data flow is a metric that associates the complexity of programs with calls to global variables.

The pair “module-global variable” is denoted as (p, r), where p is the module that has access to the global variable r. Depending on the presence in the program of a real call to the variable r, two types of “module-global variable” pairs are formed: actual and possible. A possible reference to r with p shows that the region of existence of r includes p.

This characteristic is denoted by Aup and indicates how many times the modules Up actually accessed global variables, and the number Pup - how many times they could get access.

The ratio of the number of actual references to possible is determined

Rup = Aup / Pup.

This formula shows the approximate probability of a link of an arbitrary module to an arbitrary global variable. Obviously, the higher this probability, the higher the likelihood of an “unauthorized” change in any variable, which can significantly complicate the work associated with modifying the program.

Based on the concept of information flows, a Kafur measure has been created. To use this measure, the concepts of local and global flow are introduced: a local flow of information from A to B exists if:

1. Module A calls module B (direct local stream)

2. Module B causes module A and A to return B the value that is used in B (indirect local flow)

3. Module C calls modules A, B and transfers the result of module A to B.

Next, the concept of a global information flow should be given: a global information flow from A to B through the global data structure D exists if module A places information in D, and module B uses information from D.

Based on these concepts, the quantity I is introduced - the information complexity of the procedure:
I = length * (fan_in * fan_out) 2
Here:

length - the complexity of the text of the procedure (measured through any of the metrics of volume, such as the metrics of Halstead, McCabe, LOC, etc.)

fan_in - the number of local streams entering the procedure plus the number of data structures from which the procedure takes information

fan_out - the number of local streams outgoing from the procedure plus the number of data structures that are updated by the procedure

You can define the informational complexity of a module as the sum of the informational complexities of its member procedures.

The next step is to consider the informational complexity of the module with respect to some data structure. Informational measure of the complexity of the module relative to the data structure:

J = W * R + W * RW + RW * R + RW * (RW - 1)

W is the number of procedures that only update the data structure;

R - only read information from the data structure;

RW - and read and update information in the data structure.

Another measure of this group is the Oviedo measure. Its essence is that the program is divided into linear non-intersecting sections - rays of operators that form the control graph of the program.

The author of the metric proceeds from the following assumptions: the programmer can find the relationship between the defining and using occurrences of a variable inside the ray more easily than between the rays; the number of different determinative occurrences in each ray is more important than the total number of occurrences of the variables in each ray.

Let R (i) denote the set of defining occurrences of variables that are located within the radius of the ray i (the defining variable entry is in the radius of the ray if the variable is either local to it and has a defining entry, or for it is a defining entry in some previous ray, and there is no local definition along the way). Denote by V (i) the set of variables that use entries that already exist in ray i. Then the measure of the complexity of the i-th ray is given by:

DF (i) = SUM (DEF (v _j )), j = i ... || V (i) ||

where DEF (v _j ) is the number of defining occurrences of the variable v _j from the set R (i), and || V (i) || - power of the set V (i).

4. Metrics of control flow and program data complexity

The fourth class of metrics is metrics that are close to both the class of quantitative metrics, the class of metrics for the control flow of a program, and the class of metrics for the complexity of a data flow (strictly speaking, this class of metrics and the class of metrics for the control flow of a program are the same topological metrics, but it makes sense to separate them in this context for greater clarity). This class of metrics establishes the complexity of the program structure both on the basis of quantitative calculations, and on the basis of the analysis of control structures.

The first of these metrics is the testing M-measure [5]. A testing measure M is a measure of complexity that satisfies the following conditions:

The measure increases with the depth of nesting and takes into account the length of the program. A measure based on regular investment is closely related to the test measure. The idea of this measure of program complexity is to count the total number of characters (operands, operators, brackets) in a regular expression with the minimum necessary number of brackets describing the control graph of the program. All measures of this group are sensitive to the nesting of control structures and to the length of the program. However, the level of complexity of calculations increases.

Also, the measure of software quality is connectedness of program modules [6]. If the modules are strongly connected, then the program becomes difficult to modify and difficult to understand. This measure is not expressed numerically. Types of connectedness of modules:

Data connectivity - if the modules interact through the transfer of parameters and each parameter is an elementary information object. This is the most preferred type of connection.

Connectedness by data structure - if one module sends another composite information object (structure) for data exchange.

A control connection — if one sends an information object to another — a flag designed to control its internal logic.

Modules are linked by a common area if they refer to the same global data area. Connectivity (coupling) over a common area is undesirable, because, first, an error in a module using a global area may suddenly manifest itself in any other module; secondly, such programs are difficult to understand, since it is difficult for a programmer to determine which particular data is used by a specific module.

Connected content - if one of the modules is referenced to the inside of another. This is an unacceptable type of coupling, since it completely contradicts the principle of modularity, i.e. representation of the module in the form of a black box.

External connectivity - two modules use external data, such as a communication protocol.

Connectedness by means of messages is the most free kind of connectedness, modules are not directly connected with each other, they are communicated through messages that do not have parameters.

Lack of connectedness - modules do not interact with each other.

Subclass relatedness is the relationship between the parent class and the descendant class, with the descendant associated with the parent, and the parent with the descendant is not.

Time bound - two actions are grouped in one module only because, due to circumstances, they occur at the same time.

Another measure relating to the stability of the module is the Kolofello measure [7], which can be defined as the number of changes that need to be made in modules other than the module whose stability is checked, and these changes should apply to the module being tested.

The next metric from this class is the McClure metric. There are three stages of calculating this metric:

1. For each control variable i, the value of its complexity function C (i) is calculated by the formula: C (i) = (D (i) * J (i)) / n.

Where D (i) is a value that measures the scope of the variable i. J (i) is a measure of the complexity of the interaction of modules in terms of the variable i, n is the number of individual modules in the partitioning scheme.

2. For all modules included in the partitioning sphere, the value of their complexity functions M (P) is determined by the formula M (P) = fp * X (P) + gp * Y (P)
where fp and gp - respectively, the number of modules immediately preceding and immediately following the module P, X (P) is the difficulty of accessing the module P,

Y (P) is the complexity of call control from the P module of other modules.

3. The overall complexity MP of the hierarchical scheme of splitting the program into modules is given by the formula:

MP = SUM (M (P)) by all possible values of P - modules of the program.

This metric is focused on programs that are well structured, composed of hierarchical modules that define the functional specification and management structure. It is also assumed that in each module there is one entry point and one exit point, the module performs exactly one function, and the modules are invoked in accordance with the hierarchical control system that defines the call relationship on multiple program modules.

There is also a metric based on an informational concept - the Berlinger measure [8]. The measure of complexity is calculated as M = SUMMAf _i * log ₂ p _i , where f _i is the frequency of occurrence of the i-th symbol, p _i is the probability of its occurrence.

The disadvantage of this metric is that a program containing many unique characters, but in small quantities, will have the same complexity as a program containing a small number of unique characters, but in large quantities.

5. Object Oriented Metrics

In connection with the development of object-oriented programming languages, a new class of metrics, also called object-oriented metrics, has appeared. In this group, Martin’s metric sets and Chidamber and Kemerer’s metrics are the most commonly used. To begin, consider the first subgroup.

Before starting the consideration of Martin metrics, it is necessary to introduce the notion of a category of classes [ 9 ]. In reality, a class can rarely be reused in isolation from other classes. Virtually every class has a group of classes with which it works in cooperation, and from which it cannot be easily separated. To reuse such classes, you must reuse the entire class group. Such a group of classes is strongly connected and is called a category of classes. For the existence of a class category, the following conditions exist:

Classes within the class category are closed to any change attempts all together. This means that if one class should change, all classes in this category are more likely to change. If any of the classes is open to some kind of change, they are all open to that kind of change.

Classes in a category are reused only together. They are so interdependent and cannot be separated from each other. Thus, if any attempt is made to reuse one class in a category, all other classes must be reused with it.

Classes in a category share some common function or achieve some common goal.

Responsibility, independence and stability of a category can be measured by calculating the dependencies that interact with this category. Three metrics can be defined:

1. Ca: Centripetal clutch. The number of classes outside this category, which depend on the classes within this category.

2. Ce: Centrifugal clutch. The number of classes within this category, which depend on the classes outside this category.

3. I: Instability: I = Ce / (Ca + Ce). This metric has a range of values [0,1].

I = 0 indicates the most stable category.

I = 1 indicates the most unstable category.

You can define a metric that measures abstractness (if a category is abstract, then it is quite flexible and can be easily expanded) categories as follows:

A: Abstract: A = nA / nAll.

nA is the number of abstract_classes_in_category.

nAll - general_number_classes_in_category.

The values of this metric vary in the range [0,1].

0 = category is completely specific

1 = category is completely abstract.

Now, based on Martin’s metrics, we can construct a graph that reflects the relationship between abstraction and instability. If we construct on it a straight line defined by the formula I + A = 1, then on this straight line will lie the categories that have the best balance between abstractness and instability. This line is called the main sequence.

Then you can enter 2 more metrics:

Distance to the main sequence: D = | (A + I-1) / sqrt (2) |

The normalized distance to the main sequence: Dn = | A + I-2 |

Practically for all categories it’s true that the closer they are to the main sequence, the better.

The next subgroup of metrics is the Chidamber and Kemerer metrics [10]. These metrics are based on the analysis of class methods, inheritance tree, etc.

WMC (Weighted methods per class), total complexity of all class methods: WMC = SUMMA _i , i = 1 ... n, where c _i is the complexity of the i-th method, calculated by any of the metrics (Holsted, etc.) depending on the criterion of interest), if all methods have the same complexity, then WMC = n.

DIT (Depth of Inheritance tree) - the depth of the inheritance tree (the largest path through the hierarchy of classes to this class is from the ancestor class), the more the better, because with greater depth the data abstraction increases, the class saturation with methods decreases, however with a sufficiently large depth greatly increases the complexity of understanding and writing a program.

NOC (Number of children) - the number of descendants (immediate), the more, the higher the abstraction of data.

CBO (Coupling between object classes) - coupling between classes, shows the number of classes with which the original class is associated. For this metric, all the statements introduced earlier for connectedness of modules are true, that is, with a high CBO, data abstraction is reduced and class reuse is difficult.

RFC (Response for a class) - RFC = | RS |, where RS is the response set of a class, that is, a set of methods that can potentially be called by a class method in response to data received by a class object. That is, RS = (({M} ({R _i }), i = 1 ... n, where M is all possible methods of the class, R _i is all possible methods that can be called by the ith class. Then RFC will be the power of this set. The greater the RFC, the more difficult is testing and debugging.

LCOM (Lack of cohesion in Methods) - lack of cohesive methods. To determine this parameter, consider a class C with n methods M1, M2, ..., Mn, then {I1}, {I2}, ..., {In} are the sets of variables used in these methods. Now we define P - the set of pairs of methods that do not have common variables; Q - a set of pairs of methods that have common variables. Then LCOM = | P | - | Q |. A lack of coherence can be a signal that a class can be divided into several other classes or subclasses, so that it is better to increase the cohesion to increase data encapsulation and reduce the complexity of classes and methods.

6. Reliability metrics

The next type of metrics is metrics that are close to quantitative, but based on the number of errors and defects in the program. It makes no sense to consider the features of each of these metrics, it will be enough just to list them: the number of structural changes made since the last test, the number of errors detected during the code review, the number of errors detected during program testing, and the number of necessary structural changes necessary for correct work program. For large projects, these indicators are usually considered in relation to thousands of lines of code, i.e. average number of defects per thousand lines of code.

7. Hybrid metrics

In conclusion, it is necessary to mention another class of metrics, called hybrid. Metrics of this class are based on simpler metrics and represent their weighted sum. The first representative of this class is the Cocol metric. It is defined as follows:

H_M = (M + R1 * M (M1) + ... + Rn * M (Mn) / (1 + R1 + ... + Rn)

Where M is the base metric, Mi is other interesting measures, Ri is correct matched coefficients, M (Mi) - functions.

The functions M (Mi) and the coefficients Ri are calculated using regression analysis or task analysis for a specific program.

As a result of the research, the author of the metric identified three models for measures: The Holstead measure is used as the baseline, and these models are called the “best”, “random” and “linear” ones.

The metric of Zolnovsky, Simmons, Thayer also represents a weighted sum of various indicators. There are two variants of this metric:

(structure, interaction, volume, data) SUM (a, b, c, d).

(interface complexity, computational complexity, input / output complexity, readability) SUM (x, y, z, p).

The metrics used in each variant are selected depending on the specific task, the coefficients depending on the value of the metric for decision making in this case.

Conclusion

Summing up, I would like to note that not a single universal metric exists. Any controlled metric characteristics of the program should be controlled either depending on each other, or depending on the specific task, in addition, hybrid measures can be applied, but they also depend on simpler metrics and cannot be universal. Strictly speaking, any metric is only an indicator that depends heavily on the language and programming style; therefore, no measure can be raised to an absolute and any decisions can be made based only on it.

Source: https://habr.com/ru/post/106082/

All Articles