What is wrong with links in C ++

Disclaimer: At the moment I do not have enough experience with C ++ 11, so all the reasoning should be considered only in the context of C ++ 03, but I will be happy to discuss in the comments the interaction of C ++ innovations with the problems discussed in the article.

Links in C ++ appeared to satisfy the syntactic needs of the operator overloading mechanism. In pure C, there are no reference types; instead, there is the concept of lvalue, which is described by the vague wording “what can stand to the left of the assignment operator”.

//  C int a; int foo(int); a = 7; //  a - int 5 = 7; //   5 - int foo(42) = 7; //   foo(42) -  int

In this small example, three expressions: the variable a, the literal "5" and the function call foo (42) - have the same type, int, but only the variable is an lvalue and can stand to the left of the assignment operator.
')
From the point of view of the C programmer, the expression “foo (42) = 7;” is devoid of common sense and should not be compiled, but with the appearance of operator overload, the need arose for precisely such expressions.

In C ++, the operation of accessing an array element is treated as a call to the member function operator [] (size_t n). And should return something that can stand to the left of the assignment operator. And you need a type that would allow to describe it. So there were links.

A link, like a pointer, stores the address of an object in memory, but syntactically it is a dereferenced pointer. This allows you to solve the above tasks, but creates new problems.

The syntax of the language does not allow to distinguish the target object and the link itself - all operations on the link are in fact operations on the object. As a consequence of this:
1. The link can not be reassigned to another object.
2. You can not compare the address contained in the link with the address of another object or NULL.

Of these properties, in turn, follow other restrictions:
3. The link must be initialized during creation (since it is not possible to initialize it later).
4. The link cannot contain a null address (since it is impossible to verify and process this).

The last two properties are great advantages of links. I often see recommendations to abandon pointers in favor of references for the sake of these two properties (for example, this coding guide , one more , discussion on StackOverflow and, alas, the coding guide that your obedient servant is currently working on).

However, there is an opposite opinion (for example, Google and Trolltech engineers prefer pointers), since the contradiction between syntax and link semantics creates many problems.

The use of links to pass the output arguments of a function makes the fact of “output” extremely unclear when reading a function call:

  color.getHsv(&h, &s, &v); //      getHsv()    h,s,v color.getHsv(h, s, v); //   h,s,v

The use of constant links has become the de facto standard for optimized object transfer by value. Seeing the “const SomeClass & arg” entry, I’ll be the last to think that in this case a link is sent to an instance of the class SomeClass without the right to change it, and it’s important that the function should work with this instance. I will think that a value of type SomeClass is passed here. And once a value is passed, I can pass to this function any object of this class containing this value.

References cause some difficulties in meta-programming, spawning crutches like Boost.Ref .

References cannot be elements of STL containers. For a class in which there is a link field it is impossible to implement an assignment operator (without resorting to dirty hacks). Therefore, objects of such classes also cannot be elements of containers.

Based on a recently caught bug:

 template<class T> T foo(T x) { ... } template<class T> class Bar { public: static T baz(T x) { return foo(x); } }; std::string str = Bar<std::string>::baz(getTitle()); //  ColorDescriptor& desc = Bar<ColorDescriptor&>::baz(getColorDescriptor()); // !

And here is another interesting example:

 template<class T> class SizeOfTest { public: static bool sizeOfIsOK() { return sizeof(SizeOfTest<T>) >= sizeof(T); } private: T m_data; }; struct BigData { char d[1000]; }; assert(SizeOfTest<int>::sizeOfIsOK()); //  assert(SizeOfTest<BigData>::sizeOfIsOK()); //  assert(SizeOfTest<BigData&>::sizeOfIsOK()); // !

So links cannot serve as full replacement for pointers in C ++. Not for this they were created.

But on the other hand, there is a demand for “clean” pointers - pointers for which the type system guarantees that they are initialized and not NULL. And what is most interesting - properties (3,4) by their nature do not conflict with pointer semantics. The problem is created only by a limited choice of tools available in C ++.

Let's dream a little and break free of backward compatibility.

If I had my way, I would have made the properties (3.4) properties of the pointers themselves, preserving their semantics. I.e.

 int a = 5, b = 5; int* p1; //  int* p2 = null; //  int* p3 = &a; int* p4 = &b; assert(p3 != p4); assert(*p3 == *p4); p3 = &b; assert(p3 == p4); int * p5 = std::min(p3, p4); int * p6 = new int(5); // new    ,    if (p5) { ... } //  -   bool      .

But what about NULL? After all, sometimes the semantics of optionalness is still needed. Instead of returning to nullable pointers, you can do better - to implement the optional orthogonal semantics of pointers:

 int a = 5; int? b = 5; //   int int? c = null; //   int assert(a == b); assert(b != c); int* p0 = &a; int*? p1 = &a; int*? p2 = null; int*? p3 = &b; //  int?* p4 = &b; int?*? p5 = null; p5 = p4; p4 = p5; //  *p0 = 7; *p1 = 7; // : p1 -    if(p1 != null) { *?p1 = 7; } p0 = ?p1;

Is it possible to do without links at all? Let's try.

We'll start by passing arguments on a constant link. This method of transmission is an optimized variant of passing an argument by value. For some types this optimization makes sense, but for others it does not.

To make the right decision regarding this optimization, you need to consider:

The cost of allocating memory for a copy of the object
Cost of copy constructor and destructor
The cost of the reference dereference within
Calling convention for a specific function - the solution may be different when using registers and stack
Possible gains from optimizations that the compiler can apply knowing that no function arguments are aliases.

The programmer can not analyze in detail all these parameters for each parameter of each function - this is too time consuming task. In addition, the result will be different for different target hardware platforms. So, it is advisable to entrust the adoption of this decision to the compiler, and the moment of the decision to transfer from the time of writing the code to the time of compilation.

This approach has its drawbacks - for the implementation of separate compilation and function pointers, the compiler must decide without taking into account the factors hidden in the implementation of the function. But, I think that in spite of these limitations, optimization by the compiler would be no worse than manual.

And what about copying designers? If for the usual function the semantics “for this argument can be called copy constructor” is suitable, then it is unacceptable for the argument of copy constructor, since it allows infinite recursion. This problem can be solved in at least two ways:

Explicitly add an exception for the copy constructor - the compiler will always choose the transfer by reference.
```
 class MyClass { public: MyClass(MyClass src) //    const MyClass& src. { ... } }; 
```

Pass the argument to the copy constructor by pointer and decorate it in some way:

 class MyClass { public: MyClass(const MyClass* src, std::copy_ctor_tag) { ... } };

Now back to the operator overload.

In pure C, only a limited set of operators can return lvalues: array access, various types of assignment, prefix increment and decrement, and dereferencing itself. Everything. For these operators, you can change their way of mapping to functions so that they return a pointer:

 a[i] = b; *a.operator[](i) = b; (++i) = x; *i.operator++() = x; (x = y) = z; *x.operator=(y) = z; *p = d; *p.operator->() = d;

In this case, the dereferencing operator becomes non-bootable - instead of it, the operator -> does all the work.

For all other cases, the possibility of using lvalue contradicts the principle of least surprise - I hope I never have to debug the code in which the expression “a + b” changes one of its arguments, or during the review to understand what the record “foo (42) = 7” means ; ".

The exception to the confirming rule is I / O streams. You cannot pass the stream itself as an argument to the << operator - it will be passed by value. So you need to pass something that will refer to the stream object and at the same time be able to safely pass by value. It can be a pointer to a stream, or better a special wrapper object:

 int main() { std::fstream filestr("test.txt", fstream::out); std::outref(&filestr) << "foo = " << foo << ", bar = " << bar << std::endl; return 0; } std::outref operator<<(std::outref ref, MyClass obj) { ref << obj.x; ref << obj.y; ref << obj.z; return ref; }

If I didn’t miss anything, it’s very likely that you could do without references in C ++.

Summary

Today, there is a tendency to use links to satisfy the need for safe pointers. By virtue of their syntactic properties of the link, this need is extremely poorly satisfied. Constant references are used to optimize the transmission of arguments by value, although the responsibility for this optimization could be shifted to the compiler. The original problems that the links are intended to solve can be solved in other ways. Links are a very dubious acquisition of C ++, much more valuable features would be safe pointers with pointer semantics.

Source: https://habr.com/ru/post/151444/

All Articles

What is wrong with links in C ++

Summary

More articles: