Understanding lvalue and rvalue in C and C ++

Hi, Habr! I present to you the translation of the article Eli Bendersky , Understanding of lvalues and rvalues in C and C ++ .

From the translator: I bring to your attention the translation of an interesting article on lvalue and rvalue in C / C ++ languages. The topic is not new, but it’s never too late to know about these concepts. The article is designed for beginners, or for programmers migrating from C (or other languages) to C ++. Therefore, be prepared for detailed chewing. If you're interested, welcome under the cat.

The terms lvalue and rvalue are not something that one often encounters when programming in C / C ++, and when they meet, it doesn’t immediately become clear what exactly they mean. The most likely place to run into them is the compiler messages. For example, when compiling the following code with the gcc compiler:

 int foo() { return 2; } int main() { foo() = 2; return 0; }

You get the following:
')

 test.c: In function 'main': test.c:8:5: error: lvalue required as left operand of assignment

I agree that this code is a bit contrived, and you are unlikely to write something like this, but the error message mentions lvalue , a term that you will not often see in C / C ++ tutorials. Another example is illustrative when compiling the following code with g++ :

 int& foo() { return 2; }

You will see the following error:

 testcpp.cpp: In function 'int& foo()': testcpp.cpp:5:12: error: invalid initialization of non-const reference of type 'int&' from an rvalue of type 'int'

Again, the mystical rvalue is mentioned in the error message. What in C and C ++ is meant by lvalue and rvalue ? This is the topic of this article.

Simple definition

To begin with, we will deliberately give the definitions of lvalue and rvalue in a simplified form. In the future, these concepts will be considered under a magnifying glass.

lvalue (locator value) is an object that takes up identifiable memory space (for example, has an address).

An rvalue is defined by an exception, saying that any expression is either an lvalue or rvalue . Thus, from the definition of lvalue it follows that rvalue is an expression that is not an object that takes identifiable memory space.

Elementary examples

The terms defined above may seem a bit fuzzy. Therefore it is necessary to immediately consider a few simple explanatory examples. Suppose we are dealing with an integer type variable:

 int var; var = 4;

The assignment operator expects an lvalue on the left side, and var is an lvalue because it is an object with an identifiable memory location. On the other hand, the following spells will lead to errors:

 4 = var; // ERROR! (var + 1) = 4; // ERROR!

Neither the constant 4 nor the expression var + 1 are lvalue
(which is automatically made by rvalue). They are not lvalue, because both are temporary results of expressions that do not have a certain place in memory (that is, they can be in some temporary registers for the duration of the calculations). Thus, the assignment in this case does not carry any semantic meaning. In other words - there is no place to assign.

Now it should be clear what the error message in the first code fragment means. foo returns a temporary value, which is an rvalue. Attempted assignment is an error. That is, seeing the code foo() = 2; , the compiler reports that it expects an lvalue on the left side of the assignment operator.

However, not all assignments to the result of a function call are erroneous. For example, using links in C ++ makes this possible:

 int globalvar = 20; int& foo() { return globalvar; } int main() { foo() = 10; return 0; }

Here, foo returns a link that is an lvalue , which means you can give it a value. In general, in C ++, the ability to return lvalues, as a result of a function call, is essential for the implementation of some overloaded operators. As an example, we will overload the operator [] in classes that implement access by search results. For example std::map :

 std::map<int, float> mymap; mymap[10] = 5.6;

The assignment of mymap[10] works because the non-constant overload std::map::operator[] returns a reference that can be assigned a value.

Mutable lvalue

Initially, when the notion of lvalue was introduced in C, it literally meant “an expression applicable on the left side of an assignment operator”. However, later, when ISO C added the keyword const , this definition needed to be improved. Really:

 const int a = 10; // 'a' - lvalue a = 10; //       !

Thus, not all lvalues can be assigned a value. Those that can be called mutable lvalues (modifiable lvalues). Formally, the C99 standard defines mutable lvalues as:

[...] lvalue, the type of which is not an array, is not incomplete, has no const specifier, is not a structure or union containing fields (also including fields recursively nested in contained aggregates and unions) with the const specifier.

Conversions between lvalue and rvalue

Figuratively speaking, language constructs that operate on the values of objects require rvalue as arguments. For example, the binary operator '+' takes two rvalues as arguments and returns also rvalues:

 int a = 1; // a - lvalue int b = 2; // b - lvalue int c = a + b; // '+'  rvalue,  a  b   rvalue //  rvalue

As we have seen before, a and b both lvalues. Therefore, in the third line, they undergo an implicit lvalue-to-rvalue conversion . All lvalues that are not an array, a function, and not of an incomplete type can be converted to an rvalue.

What about the other way around? Is it possible to convert rvalue to lvalue? Of course not! This would violate the essence of the lvalue, according to its definition (The absence of implicit conversion means that the rvalue cannot be used where lvalue is expected).

This does not mean that lvalues cannot be obtained from an rvalue in an explicit way. For example, the unary operator '*' (dereference) takes an rvalue as an argument, but returns an lvalue as its result. Consider the following valid code:

 int arr[] = {1, 2}; int* p = &arr[0]; *(p + 1) = 10; // OK: p + 1 rvalue,  *(p + 1)  lvalue

Conversely, the unary '&' (address) operator takes an lvalue as an argument and produces an rvalue:

 int var = 10; int* bad_addr = &(var + 1); // :  lvalue    '&' int* addr = &var; // : var - lvalue &var = 40; // :  lvalue    //

The "&" character plays a slightly different role in C ++ - it allows you to define a reference type. It is called the “lvalue reference”. A non-constant reference to an lvalue cannot be assigned to an rvalue, since this would require an invalid rvalue-to-lvalue conversion:

 std::string& sref = std::string(); // :   //    'std::string&' // rvalue  'std::string'

Constant references to lvalue can be assigned to an rvalue. Since they are constants, the value cannot be changed by reference and therefore the problem of modifying rvalue is simply missing. This property makes it possible for one of the fundamental idioms of C ++ to be the admission of values by a constant reference as function arguments, which avoids the need to copy and create temporary objects.

CV-specified rvalues

If you read carefully the part of the C ++ standard regarding the lvalue-to-rvalue conversion (chapter 4.1 in the draft of the C ++ 11 standard), you can see the following:

An lvalue (3.10) on a type T that is not functional, or an array, can be converted to an rvalue. [...] If T is not a class, the type of rvalue is a cv-unspecified version of type T. Otherwise, the type of rvalue is T.

So what does cv-unspecified mean? The CV specifier is a term used to describe const and volatile type specifiers.

From chapter 3.9.3:

Each type that is a cv-unspecified complete or incomplete object type or void (3.9) type has three cv-specified versions, respectively: a type with a specifier const, a type with a specifier volatile and a type with specifiers const volatile. [...] CV-specified and cv-unspecified types are different, but they have the same presentation and alignment requirements.

But how does all this relate to rvalue? In C, rvalues never have cv-specified types. This property is lvalue. However, in C ++ class rvalues can be cv-specified, which does not apply to built-in types like int . Consider an example:

 #include <iostream> class A { public: void foo() const { std::cout << "A::foo() const\n"; } void foo() { std::cout << "A::foo()\n"; } }; A bar() { return A(); } const A cbar() { return A(); } int main() { bar().foo(); //  foo cbar().foo(); //  foo const }

The second line in the main function will call the foo() const method foo() const , since cbar returns an object of type const A , which is different from A This is exactly what was meant in the last sentence of the excerpt from the standard above. By the way, notice that the return value of cbar is rvalue. This was an example of a cv-specified rvalue in action.

Links to rvalue (C ++ 11)

References to rvalue and the accompanying concept of transfer semantics are one of the most powerful tools added to C ++ 11. A detailed discussion on this topic is beyond the scope of this modest article (you can find a lot of material just by running “rvalue references.” Here are some resources that I find useful: this , this and especially this one ), but I would like to cite A simple example, because I believe that this chapter is the most appropriate place to demonstrate how the understanding of lvalue and rvalue expands our ability to talk about non-trivial language concepts.

A good half of the article was spent explaining that one of the most important differences between lvalue and rvalue is the fact that lvalue can be changed, while rvalue is not. Well, C ++ 11 adds one crucial feature in this distinction, allowing us to have references to the rvalue and thereby change them in some cases.

As an example, consider the simplest implementation of a dynamic array of integers. Let's look only at the methods related to the topic of this chapter:

 class Intvec { public: explicit Intvec(size_t num = 0) : m_size(num), m_data(new int[m_size]) { log("constructor"); } ~Intvec() { log("destructor"); if (m_data) { delete[] m_data; m_data = 0; } } Intvec(const Intvec& other) : m_size(other.m_size), m_data(new int[m_size]) { log("copy constructor"); for (size_t i = 0; i < m_size; ++i) m_data[i] = other.m_data[i]; } Intvec& operator=(const Intvec& other) { log("copy assignment operator"); Intvec tmp(other); std::swap(m_size, tmp.m_size); std::swap(m_data, tmp.m_data); return *this; } private: void log(const char* msg) { cout << "[" << this << "] " << msg << "\n"; } size_t m_size; int* m_data; };

So, here are the usual constructor and destructor, the copy constructor and the assignment operator (this is the canonical implementation of the copy assignment operator from the standpoint of exception tolerance. Using the copy constructor and then not throwing the std::swap exception, we can be sure that intermediate state with non-initialized memory, if an exception occurs somewhere). They all use the logging function so that we can understand when they are actually called.

Let's run a simple code that copies the contents of v1 to v2 :

 Intvec v1(20); Intvec v2; cout << "assigning lvalue...\n"; v2 = v1; cout << "ended assigning lvalue...\n";

And here is what we will see:

 assigning lvalue... [0x28fef8] copy assignment operator [0x28fec8] copy constructor [0x28fec8] destructor ended assigning lvalue...

Which is completely logical, since it accurately reflects what is happening inside the assignment operator. But let's assume that we want to assign v2 some rvalue:

 cout << "assigning rvalue...\n"; v2 = Intvec(33); cout << "ended assigning rvalue...\n";

Although here I only assign a value to a newly created vector, this is one of the demonstrations of the general case when some temporary rvalue is created and assigned to v2 (this can happen for example, if the function returns a vector). Here is what we see on the screen:

 assigning rvalue... [0x28ff08] constructor [0x28fef8] copy assignment operator [0x28fec8] copy constructor [0x28fec8] destructor [0x28ff08] destructor ended assigning rvalue...

Wow! Looks very troublesome. In particular, it took an extra pair of constructor calls with a destructor to create and then delete a temporary object. And this is sad, because inside a copying assignment operator, another temporary object is created and deleted. Additional work for nothing.

But no! C ++ 11 gives us references to rvalue, with which you can implement "transfer semantics", and in particular "transfer assignment operator" (now I understand why I always called operator= copy assignment operator. In C ++ 11, this difference becomes important). Let's add another operator= to IntVec :

 Intvec& operator=(Intvec&& other) { log("move assignment operator"); std::swap(m_size, other.m_size); std::swap(m_data, other.m_data); return *this; }

Double Aspersand is a reference to rvalue . It means just what it promises - it gives a reference to the rvalue, which will be destroyed after the call. We can use this fact to simply “sneak” the insides of the rvalue - he doesn't need them anyway! This is what is displayed on the screen:

 assigning rvalue... [0x28ff08] constructor [0x28fef8] move assignment operator [0x28ff08] destructor ended assigning rvalue...

As we see, the new transfer assignment operator is called, since the rvalue is assigned to v2 . Constructor and destructor calls are still required for a temporary object that is created via Intvec(33) . However, another temporary object inside the assignment statement is no longer needed. The operator simply changes the internal rvalue buffer with its own, and thus the destructor rvalue deletes the buffer of the object itself, which will no longer be used. Purely!

I just want to note once again that this example is only the tip of the iceberg of the transfer semantics and references to rvalue. As you can guess, this is a complex topic with many special cases and mysteries. I tried only to demonstrate a very interesting application of the differences between lvalue and rvalue in C ++. The compiler can obviously distinguish them and take care of calling the correct constructor at compile time.

Conclusion

You can write a lot of C ++ code, without thinking about the differences between rvalue and lvalue, omitting them as incomprehensible compiler jargon in error messages. However, as I tried to show in this article, a better knowledge of this topic will provide a deeper understanding of certain C ++ constructs, and make parts of the C ++ standard and discussions between language experts more accessible to you.

In the C ++ 11 standard, this topic is even more important, since C ++ 11 introduces the notion of references to rvalue and transfer semantics. To really understand new language features, a strict understanding of rvalue and lvalue is simply necessary.

Source: https://habr.com/ru/post/348198/

All Articles