Magic constants in algorithms

Introduction

Nowadays, such principles of writing code (coding standards) are widely known , which allow to facilitate its support and development. These principles are used by many software companies, and development tools and static code analysis offer for this a variety of automation. At the same time, engineering problems in programming clearly require the expansion of the concept of “good code”. We will try to come to the discussion of a “good” engineering code through a seemingly very special example - through the practice of using constant parameters in algorithms.

Just want to note that this article sets itself the task of increasing the self-control of the developer, especially when starting his career. All the examples given below should be considered as model situations, without reference to any particular computational problem or any particular algorithm. We can say that our goal is to formulate some patterns of high-quality engineering / algorithmic programming.

Let's think whether the following code hurts our eyes:

if (abs(leftVoltage - rightVoltage - 150.7 * balanceCurrent) > 0.002) doSomethingImportant();

Personally, I cut it very much. First, it is completely incomprehensible, what is 150.7?
It is possible that 150.7 is the resistance of an exact resistor, which varies from one instance of the device to another, and in fact must be read from the calibration file. What is 0.002? It seems that this is some experimental threshold, which can be specified later, although there is no special commentary on this.
')

First blood

Imagine a junior programmer Alyosha, who was offered the task of automating the quality control of rivets for Inhabited Galactic spacecraft. These rivets flow through the conveyor belt and are photographed by a digital camera. Aleša needs to write a code that analyzes the photographs of these rivets and makes a decision about the conformity or non-conformity of the rivet shape with its standard.

Suppose that, to the satisfaction of his superiors, Alyosha quickly implemented some algorithm for evaluating the rivets. This algorithm perfectly shared the images of bad and good rivets from the test set, the problem was solved and Alyosha was switched to another project.

Then the following happened. The rivets went a little more, the speed of the conveyor increased and engineer Borya slightly shortened the shutter speed of the camera.
At the same time, the images coming from the camera became somewhat darker, although the rivets were still clearly visible. At the same time, the Aleshin algorithm sometimes began to erroneously reject quality rivets. What happened?

Let be

I

$I$ This input monochrome size image

W (I) t i m e s H (I)

$W (I) \ times H (I)$ with coordinates

x

$x$ and

y

$y$ . Alyosha noticed that the rivets squeaked

Z

$Z$ in all training images, it was possible to find threshold filtering:

(x, y) i n Z L o n g l e f t r i g h t a r r o w I_{m i n} < I (x, y) < I_{m a x} .

$(x, y) \ in Z \ Longleftrightarrow I_ {min} <I (x, y) <I_ {max}.$

At the same time high-quality rivet

Z_{g o o d}

$Z_ {good}$ could be found by area:

A_{m i n} < A r e a (Z_{g o o d}) < A_{m a x} .

$A_ {min} <Area (Z_ {good}) <A_ {max}.$

Total, Alesha introduced 4 empirical constants

I_{m i n}

$I_ {min}$ ,

I_{m a x}

$I_ {max}$ ,

A_{m i n}

$A_ {min}$ and

A_{m a x}

$A_ {max}$ at least one of which (

I_{m i n}

$I_ {min}$ ) lost relevance after reconfiguring the camera.

Having a little experience, Alyosha could do the following: transfer the values of the necessary constants to the configuration file and supplement the project documentation with a method of their practical calculation. It would be better than just hiding these values inside your code, but would have assigned the Aleshin colleagues an extra task of manually setting up its algorithm. Of course, such a decision can hardly be called satisfactory, although in particularly difficult situations it also has the right to exist.

What solution would be more reasonable? The parameters of bandpass filters for brightness and area could be converted to relative values to the average brightness and image area. For example,

A_{m i n}

$A_ {min}$ could be replaced by

A_{m i n}^{^{'}} c d o t W (I) c d o t H (I)

$A ^ {'} _ {min} \ cdot W (I) \ cdot H (I)$ where

A_{m i n}^{^{'}}

$A ^ {'} _ {min}$ - also empirical, but already more invariant value.

Let's try to guess what happened next in the spacecraft factory.
Since the quality of rejection of the rivets was insufficient, the chief technologist Vitaly, far from the intricacies of programming, decided to urgently put on the evaluation stand a more expensive camera with high sensitivity and twice as high resolution. The logical result of such a teamwork is 100% rejection, since all the rivets became twice as large in the number of pixels and none of them went through the band-pass filter. It is possible that competent project management would save, but now we are not talking about that.

Input normalization

Many problems with the selection of constants can be avoided by reducing the input data to a uniform form. Images - to one form of histogram, 3D models - to one resolution, etc. Often, such an argument is put forward against normalization that it leads to the loss of a part of the initial information. This is true, but the simplification of further work with the parameters of the algorithms often outweigh this disadvantage.

The very statement of the problem of maximum normalization of input data allows the developer to think about what will happen if the images come with a rotation of 90 degrees? And if the dimensions come in inches? And if someone transmits a high-resolution 3D model to the input that conveys the subtle texture of the material?

An excellent example when normalization, however, is not necessary - search for exposed pixels (highlights) in photos. Pixels with a maximum RGB brightness (255, 255, 255) can form thin stripes that are slightly smeared while the picture is scaled. At the same time, the brightness of the illuminated pixels is averaged with the neighboring pixels and it becomes almost impossible to distinguish them from just bright points of the image. Accordingly, it will not be possible to precisely select and cut out the glare from the image.

Calibration Constants

When processing sufficiently low-level data, the classical calibration problem arises. This may be the conversion of the voltage on a thermistor to a temperature, and the conversion of the radar response to the coordinates of the observed object. All these transformations are somehow determined by a set of constant parameters, and the procedure for determining these parameters is called a calibration.

Calibration parameters usually come from external sources. A terrible sin - scam these parameters. If such code is not immediately eliminated by code review, then it will be able to spoil the behavior of the software product as a whole for a long time, while remaining above suspicion. The system will simply not work quite accurately, but an error message will hardly appear anywhere.

For completeness, however, it should be noted that in some problems auto-calibration is possible, when the internal dependences of the observed values allow us to also calculate the calibration parameters. See, for example, the final chapters of Multiple View Geometry or the work on Uncalibrated Structure from Motion , where camera calibration parameters are determined from a set of overlapping images with different angles.

General remarks

Give meaningful names to constants and comment on their meanings. Remember that this is a very fragile piece of code that is not protected by the typing of the language or the wisdom of the compiler.

Group the parameters of the algorithms into structures. This usually improves the readability of the code and allows you to not miss any important parameter during debugging.

If you know that one parameter is expressed through others, be sure to reflect this connection in the code. Otherwise, someone with the best intentions will change your sub-optimal, but consistent parameters to inconsistent. For example, you are looking at the image circles in radius

R

$R$ = 10 and area

S

$S$ = 314. This is bad, because in fact you have one parameter - the radius

R

$R$ = 10, and

S = p i R^{2}

$S = \ pi R ^ 2$ . A code with two parameters can be easily broken by a programmer who has forgotten school math, and a code with one parameter looks much more reliable.

It so happens that some nonempirical constant enters the algorithm. For example, you need to evaluate the tail of some distribution. The accuracy of such an assessment is determined to a greater extent by the mathematical preparation of the author of the algorithm.
Evaluation constants can sometimes be improved, but this improvement is usually an isolated task. It is important to comment on this fact for other developers.

Summary

Empirical constants are a natural part of an algorithmic code, which, however, requires close attention from the developer. Let us list again the rules for working with constants, which we discussed above:

Constants have meaningful names.
The values of the constants are defined before the logical block in which they are used.
Large blocks of constants are organized into structures / classes of parameters
Function-dependent parameters are calculated using basic parameters.
Input normalized
Calibration constants are explicitly loaded from the configuration files, otherwise an error message is returned.

In general, those parameters that are now presented to the developer by constants, tomorrow, with a high probability, may be the subject of optimization.
If all types of parameters were entered correctly, then the construction of the optimization procedure is greatly simplified: the calibration constants and constants-estimates are not optimized, but only empirical parameters are optimized. I hope that this simplicity will serve you as one of the practical criteria for the quality of an engineering code.

Source: https://habr.com/ru/post/328244/

All Articles