The dangers of designers

Hello, Habr! I present to you the translation of the article "Perils of Constructors" by Aleksey Kladov.

One of my favorite Rust blog posts is Things Rust Shipped Without by Graydon Hoare . For me, the lack of any feature in the language that can shoot in the leg is usually more important than expressiveness. In this slightly philosophical essay, I want to talk about my particularly favorite feature missing from Rust - about constructors.

What is a constructor?

Constructors are commonly used in OO languages. The task of the constructor is to fully initialize the object before the rest of the world sees it. At first glance, this seems like a really good idea:

You set the invariants in the constructor.
Each method takes care of the conservation of invariants.
Together, these two properties mean that you can think of objects as invariants, and not as specific internal states.

The constructor here plays the role of an induction base, being the only way to create a new object.

Unfortunately, there is a hole in these arguments: the designer himself observes the object in an unfinished state, which creates many problems.

This value

When the constructor initializes the object, it starts with some empty state. But how do you define this empty state for an arbitrary object?

The easiest way to do this is to set all fields to their default values: false for bool, 0 for numbers, null for all links. But this approach requires all types to have default values, and introduces the infamous null into the language. This is the path that Java has taken: at the beginning of the creation of the object, all fields are 0 or null.

With this approach, it will be very difficult to get rid of null afterwards. A good example to learn is Kotlin. Kotlin uses non-nullable types by default, but it is forced to work with pre-existing JVM semantics. The design of the language well hides this fact and is well applicable in practice, but is untenable . In other words, using constructors, it is possible to bypass null checks in Kotlin.

The main feature of Kotlin is the encouragement of creating so-called "primary constructors" that simultaneously declare a field and assign a value to it before any custom code is executed:

class Person( val firstName: String, val lastName: String ) { ... }

Another option: if the field is not declared in the constructor, the programmer should immediately initialize it:

 class Person(val firstName: String, val lastName: String) { val fullName: String = "$firstName $lastName" }

Attempting to use a field before initialization is statically denied:

 class Person(val firstName: String, val lastName: String) { val fullName: String init { println(fullName) // :     fullName = "$firstName $lastName" } }

But with a bit of creativity, anyone can get around these checks. For example, a method call is suitable for this:

 class A { val x: Any init { observeNull() x = 92 } fun observeNull() = println(x) //  null } fun main() { A() }

Capturing this with a lambda (which is created in Kotlin as follows: {args -> body}) is also suitable:

 class B { val x: Any = { y }() val y: Any = x } fun main() { println(B().x) //  null }

Examples like these seem unrealistic in reality (and it is), but I found similar errors in real code (Kolmogorov's probability rule 0-1 in software development: in a fairly large database, any piece of code is almost guaranteed to exist, at least if not forbidden statically by the compiler; in this case, it almost certainly does not exist).

The reason Kotlin may exist with this failure is the same as with covariant arrays in Java: checks still occur in runtime. In the end, I would not want to complicate the Kotlin type system in order to make the above cases incorrect at the compilation stage: considering the existing limitations (JVM semantics), the price / benefit ratio of validations in runtime is much better than that of static ones.

But what if the language does not have a reasonable default value for each type? For example, in C ++, where user-defined types are not necessarily references, you cannot just assign null to each field and say that this will work! Instead, C ++ uses special syntax to set initial values for fields: initialization lists:

 #include <string> #include <utility> class person { person(std::string first_name, std::string last_name) : first_name(std::move(first_name)) , last_name(std::move(last_name)) {} std::string first_name; std::string last_name; };

Since this is a special syntax, the rest of the language does not work flawlessly. For example, it is difficult to put arbitrary operations into initialization lists, since C ++ is not an expression-oriented language (which is normal in itself). To work with exceptions that occur in initialization lists, you must use one more obscure feature of the language .

Calling methods from the constructor

As the examples from Kotlin hint, everything shatters into chips as soon as we try to call a method from the constructor. Basically, methods expect that the object accessible through this is already fully constructed and correct (consistent with invariants). But in Kotlin or Java, nothing prevents you from invoking methods from the constructor, and so we can accidentally operate on a semi-constructed object. The designer promises to establish invariants, but at the same time this is the easiest place for their possible violation.

Particularly strange things happen when the base class constructor calls a method overridden in a derived class:

 abstract class Base { init { initialize() } abstract fun initialize() } class Derived: Base() { val x: Any = 92 override fun initialize() = println(x) //  null! }

Just think about it: the code of an arbitrary class is executed before calling its constructor! Similar C ++ code will lead to even more interesting results. Instead of calling the function of the derived class, the function of the base class will be called. This makes little sense because the derived class has not yet been initialized (remember, we cannot just say that all fields are null). However, if the function in the base class is pure virtual, its call will lead to UB.

Designer Signature

Violation of invariants is not the only problem for designers. They have a signature with a fixed name (empty) and return type (the class itself). This makes design overloads difficult for people to understand.

Backfill question: what does std :: vector <int> xs (92, 2) correspond to?

a. Vector of two lengths 92

b. [92, 92]
')
c. [92, 2]

Problems with the return value arise, as a rule, when it is impossible to create an object. You cannot just return Result <MyClass, io :: Error> or null from the constructor!

This is often used as an argument that it is difficult to use C ++ without exceptions, and that using constructors also forces you to use exceptions. However, I do not think this argument is correct: factory methods solve both of these problems because they can have arbitrary names and return arbitrary types. I believe that the following pattern can sometimes be useful in OO languages:

Create one private constructor that takes the values of all fields as arguments and simply assigns them. Thus, such a constructor would work as a structure literal in Rust. It can also check for any invariants, but it should not do anything else with arguments or fields.
public factory methods are provided for the public API with appropriate names and return types.

A similar problem with constructors is that they are specific and therefore cannot be generalized. In C ++, “there is a default constructor” or “there is a copy constructor” cannot be expressed more simply than “certain syntax works”. Compare this to Rust, where these concepts have suitable signatures:

 trait Default { fn default() -> Self; } trait Clone { fn clone(&self) -> Self; }

Life without designers

Rust has only one way to create a structure: to provide values for all fields. Factory functions, such as the generally accepted new, play the role of constructors, but, most importantly, they do not allow you to call any methods until you have at least a more or less correct instance of the structure.

The disadvantage of this approach is that any code can create a structure, so there is no single place, such as a constructor, to maintain invariants. In practice, this is easily solved by privacy: if the fields of a structure are private, then this structure can only be created in the same module. Within one module, it is not difficult to adhere to the agreement "all methods of creating a structure must use the new method". You can even imagine a language extension that allows you to mark some functions with the # [constructor] attribute, so that the syntax of the structure literal is available only in marked functions. But, again, additional linguistic mechanisms seem redundant to me: following local conventions requires little effort.

Personally, I believe that this compromise looks exactly the same for contract programming in general. Contracts like "not null" or "positive value" are best encoded in types. For complex invariants, just writing assert! (Self.validate ()) in each method is not so difficult. Between these two patterns there is little room for # [pre] and # [post] conditions implemented at the language level or based on macros.

What about Swift?

Swift is another interesting language that is worth a look at the design mechanisms. Like Kotlin, Swift is a null safe language. Unlike Kotlin, Swift's null checks are stronger, so the language uses interesting tricks to mitigate the damage caused by the constructors.

First , Swift uses named arguments, and it helps a little with "all constructors have the same name." In particular, two constructors with the same types of parameters are not a problem:

 Celsius(fromFahrenheit: 212.0) Celsius(fromKelvin: 273.15)

Secondly , to solve the problem "the constructor calls the virtual method of the object class that has not yet been fully created" Swift uses a well-thought-out two-phase initialization protocol. Although there is no special syntax for initialization lists, the compiler statically checks that the body of the constructor has the correct and safe form. For example, calling methods is possible only after all the fields of the class and its descendants are initialized.

Thirdly , at the language level, there is support for designers, the call of which may fail. The constructor can be designated as nullable, which makes the result of calling the class an option. The constructor may also have a throws modifier, which works better with the semantics of two-phase initialization in Swift than with the syntax of initialization lists in C ++.

Swift manages to close all the holes I complained about in the constructors. This, however, comes at a price: the initialization chapter is one of the largest in the Swift book.

When constructors are really needed

Against all odds, I can come up with at least two reasons why constructors cannot be replaced with structure literals, such as in Rust.

First , inheritance to one degree or another forces the language to have constructors. You can imagine an extension of the syntax of structures with support for base classes:

 struct Base { ... } struct Derived: Base { foo: i32 } impl Derived { fn new() -> Derived { Derived { Base::new().., foo: 92, } } }

But this will not work in a typical object layout of an OO language with simple inheritance! Typically, an object begins with a title followed by class fields, from the base to the most derived. Thus, the prefix of an object of a derived class is a valid object of a base class. However, for such a layout to work, the designer needs to allocate memory for the entire object at a time. It cannot just allocate memory only for the base class, and then attach derived fields. But such a allocation of memory in pieces is necessary if we want to use the syntax to create a structure where we could specify a value for the base class.

Secondly , unlike the structure literal syntax, constructors have an ABI that works well with placing object subobjects in memory (placement-friendly ABI). The constructor works with a pointer to this, which points to the area of memory that the new object should occupy. Most importantly, a constructor can easily pass a pointer to subobject constructors, thereby allowing the creation of complex value trees "in place." In contrast, in Rust, constructing structures semantically includes quite a few copies, and here we hope for the grace of the optimizer. It is no coincidence that Rust does not yet have an accepted working proposal regarding the placement of subobjects in memory!

Upd 1: fixed a typo. Replaced the "write literal" with "structure literal".

Source: https://habr.com/ru/post/460831/

All Articles