Proper typing: the underestimated aspect of clean code

Hello colleagues.

Not so long ago, our attention was attracted by the almost finished book of the publishing house Manning “Programming with types”, which thoroughly considers the importance of correct typing and its role in writing clean and durable code.

')
At the same time, in the author's blog, we found an article written, apparently in the early stages of working on a book and allowing us to form an impression about its material. We propose to discuss how interesting the ideas of the author and potentially - the whole book.

Mars Climate Orbiter

The Mars Climate Orbiter spacecraft failed during landing and collapsed in the Martian atmosphere, because the software component developed by Lockheed gave the impulse value measured in pound-force-sec., While the other component developed by NASA took the impulse value seconds

You can imagine a component developed by NASA, in approximately the following form:

//    ,  >= 2 N s void trajectory_correction(double momentum) { if (momentum < 2 /* N s */) { disintegrate(); } /* ... */ }

You can also imagine that the Lockheed component called the above code like this:

 void main() { trajectory_correction(1.5 /* lbf s */); }

A pound-force-second (lbfs) is about 4.448222 Newtons per second (Ns). Thus, from the point of view of Lockheed, transfer 1.5 lbfs to trajectory_correction should be perfectly normal: 1.5 lbfs is approximately 6.672333 Ns, well above the threshold of 2 Ns.

The problem is the interpretation of the data. As a result, the NASA component compares lbfs with Ns without conversion and mistakenly interprets input to lbfs as input to Ns. Since 1.5 is less than 2, the orbiter has collapsed. This is a well-known anti-pattern, which is called “primitive obsession”.

Obsession with primitives

An obsession with primitives occurs when we use a primitive data type to represent a value in the problem domain and allow situations like the one described above. If postal codes are represented as numbers, telephone numbers as strings, Ns and lbfs as double-precision numbers, this is exactly what happens.

It would be much safer to define a simple Ns type:

 struct Ns { double value; }; bool operator<(const Ns& a, const Ns& b) { return a.value < b.value; }

Similarly, you can define a simple type lbfs :

 struct lbfs { double value; }; bool operator<(const lbfs& a, const lbfs& b) { return a.value < b.value; }

Now you can implement the type-safe version of trajectory_correction :

 //  ,   >= 2 N s void trajectory_correction(Ns momentum) { if (momentum < Ns{ 2 }) { disintegrate(); } /* ... */ }

If you call it with lbfs , as in the above example, the code simply does not compile because of the incompatibility of the types:

 void main() { trajectory_correction(lbfs{ 1.5 }); }

Notice how the information about the types of values, which is usually indicated in comments, ( 2 /*Ns */, /* lbfs */ ) is now drawn into the type system and expressed in code: ( Ns{ 2 }, lbfs{ 1.5 } ) .

Of course, you can envisage casting lbfs to Ns as an explicit operator:

 struct lbfs { double value; explicit operator Ns() { return value * 4.448222; } };

Armed with this technique, you can call trajectory_correction with a static cast:

 void main() { trajectory_correction(static_cast<Ns>(lbfs{ 1.5 })); }

Here the correctness of the code is achieved by multiplying by a factor. A cast can also be performed implicitly (using the implicit keyword), in which case the cast will be applied automatically. As a rule of thumb, you can use one of the Python koans here:

Explicit is better than implicit

The moral of this story is that, although today we have very clever type checking mechanisms, they still need to provide enough information to catch errors of this type. This information enters the program if we declare types according to the specifics of our subject area.

State space

Troubles happen when a program quits in a bad condition . Types help narrow the field for their occurrence. Let's try to treat the type as a set of possible values. For example, bool is a set of {true, false} , where a variable of this type can take one of these two values. Similarly, uint32_t is the set {0 ...4294967295} . Considering types in this way, we can define the state space of our program as the product of the types of all living variables at a certain point in time.

If we have a variable of type bool and a variable of type uint32_t , then our state space will be {true, false} X {0 ...4294967295} . This only means that both variables can be in any possible states for them, and since we have two variables, the program can be in any combined state of these two types.

Everything becomes much more interesting if we consider the functions that initialize the values:

 bool get_momentum(Ns& momentum) { if (!some_condition()) return false; momentum = Ns{ 3 }; return true; }

In the above example, we take Ns by reference and initialize if some condition is met. The function returns true if the value has been correctly initialized. If the function cannot set a value for any reason, then it returns false .

Considering this situation from the point of view of the state space, we can say that the state space is the product bool X Ns . If the function returns true, this means that the pulse has been set, and is one of the possible values of Ns . The problem is this: if the function returns false , it means that the pulse has not been set. The pulse in one way or another belongs to the set of possible values of Ns, but is not a valid value. Often there are bugs in which the following inadmissible state starts to spread randomly:

 void example() { Ns momenum; get_momentum(momentum); trajectory_correction(momentum); }

Instead, we simply have to do this:

 void example() { Ns momentum; if (get_momentum(momentum)) { trajectory_correction(momentum); } }

However, there is a better way in which this can be done by force:

 std::optional<Ns> get_momentum() { if (!some_condition()) return std::nullopt; return std::make_optional(Ns{ 3 }); }

If you use optional , then the state space of this function will significantly decrease: instead of bool X Ns we get Ns + 1 . This function will return either a valid Ns value or nullopt to indicate the absence of a value. Now we simply can not have an unacceptable Ns , which would be distributed in the system. Also, it now becomes impossible to forget to check the return value, since optional cannot be implicitly converted to Ns — we will need to specially unpack it:

 void example() { auto maybeMomentum = get_momentum(); if (maybeMomentum) { trajectory_correction(*maybeMomentum); } }

In principle, we strive to have our functions return a result or an error, and not a result or an error. In this way, we will exclude the states in which we have errors, as well as overshoot from unacceptable results, which could then leak into further calculations.

From this point of view, throwing exceptions is normal, since it corresponds to the principle described above: the function either returns the result or throws an exception.

RAII

RAII stands for Resource Acquisition Is Initialization, but to a greater extent this principle is associated with the release of resources. The name first appeared in C ++, however, this pattern can be implemented in any language (see, for example, IDisposable from .NET). RAII provides automatic cleanup of resources.

What are resources? Here are some examples: dynamic memory, database connections, OS descriptors. In principle, a resource is something taken from the outside world and to be returned after we do not need it. We return a resource using the appropriate operation: freeing it, deleting, closing, etc.

Since these resources are external, they are not explicitly expressed in our type system. For example, if we select a fragment of dynamic memory, we get a pointer by which we must call delete :

 struct Foo {}; void example() { Foo* foo = new Foo(); /*  foo */ delete foo; }

But what happens if we forget to do this, or something prevents us from calling delete ?

 void example() { Foo* foo = new Foo(); throw std::exception(); delete foo; }

In this case, we no longer call delete and get a resource leak. In principle, such manual cleaning of resources is undesirable. For dynamic memory, we have unique_ptr that helps us manage it:

 void example() { auto foo = std::make_unique<Foo>(); throw std::exception(); }

Our unique_ptr is a stack object, so if it unique_ptr out of scope (when a function throws an exception or when the stack is unwound, when an exception was thrown), its destructor is called. It is this destructor that implements the call to delete . Accordingly, we no longer have to manage the resource-memory - we transfer this work to the wrapper, which owns it and is responsible for its release.

Similar wrappers exist (or they can be created) for any other resources (for example, OS HANDLE from Windows can be wrapped into a type, in which case its destructor will call CloseHandle ).

The main conclusion in this case is never to manually clean up resources; either we use the existing wrapper, or if there is no suitable wrapper for your particular scenario, we implement it ourselves.

Conclusion

We started this article with a well-known example demonstrating the importance of typing, and then we looked at three important aspects of using types to help write more secure code:

The declaration and use of stronger types (as opposed to obsession with primitives).
Reducing the status space, returning a result or error, not a result or error.
RAII and automatic resource management.

So, types are great to help make the code safer and adapt it for reuse.

Source: https://habr.com/ru/post/460149/

All Articles

Proper typing: the underestimated aspect of clean code

More articles: