Hi, Habr! I present to you the translation of the article
Eli Bendersky ,
Understanding of lvalues and rvalues in C and C ++ .
From the translator: I
bring to your attention the translation of an interesting article on lvalue and rvalue in C / C ++ languages. The topic is not new, but it’s never too late to know about these concepts. The article is designed for beginners, or for programmers migrating from C (or other languages) to C ++. Therefore, be prepared for detailed chewing. If you're interested, welcome under the cat.The terms
lvalue and
rvalue are not something that one often encounters when programming in C / C ++, and when they meet, it doesn’t immediately become clear what exactly they mean. The most likely place to run into them is the compiler messages. For example, when compiling the following code with the
gcc
compiler:
int foo() { return 2; } int main() { foo() = 2; return 0; }
You get the following:
')
test.c: In function 'main': test.c:8:5: error: lvalue required as left operand of assignment
I agree that this code is a bit contrived, and you are unlikely to write something like this, but the error message mentions
lvalue , a term that you will not often see in C / C ++ tutorials. Another example is illustrative when compiling the following code with
g++
:
int& foo() { return 2; }
You will see the following error:
testcpp.cpp: In function 'int& foo()': testcpp.cpp:5:12: error: invalid initialization of non-const reference of type 'int&' from an rvalue of type 'int'
Again, the mystical
rvalue is mentioned in the error message. What in C and C ++ is meant by
lvalue and
rvalue ? This is the topic of this article.
Simple definition
To begin with, we will deliberately give the definitions of
lvalue and
rvalue in a simplified form. In the future, these concepts will be considered under a magnifying glass.
lvalue (locator value) is an object that takes up identifiable memory space (for example, has an address).
An rvalue is defined by an exception, saying that any expression is either an
lvalue or
rvalue . Thus, from the definition of
lvalue it follows that
rvalue is an expression that is
not an object that takes identifiable memory space.
Elementary examples
The terms defined above may seem a bit fuzzy. Therefore it is necessary to immediately consider a few simple explanatory examples. Suppose we are dealing with an integer type variable:
int var; var = 4;
The assignment operator expects an lvalue on the left side, and
var
is an lvalue because it is an object with an identifiable memory location. On the other hand, the following spells will lead to errors:
4 = var;
Neither the constant
4
nor the expression
var + 1
are lvalue
(which is automatically made by rvalue). They are not lvalue, because both are temporary results of expressions that do not have a certain place in memory (that is, they can be in some temporary registers for the duration of the calculations). Thus, the assignment in this case does not carry any semantic meaning. In other words - there is no place to assign.
Now it should be clear what the error message in the first code fragment means.
foo
returns a temporary value, which is an rvalue. Attempted assignment is an error. That is, seeing the code
foo() = 2;
, the compiler reports that it expects an lvalue on the left side of the assignment operator.
However, not all assignments to the result of a function call are erroneous. For example, using links in C ++ makes this possible:
int globalvar = 20; int& foo() { return globalvar; } int main() { foo() = 10; return 0; }
Here,
foo
returns a link
that is an lvalue , which means you can give it a value. In general, in C ++, the ability to return lvalues, as a result of a function call, is essential for the implementation of some overloaded operators. As an example, we will overload the operator
[]
in classes that implement access by search results. For example
std::map
:
std::map<int, float> mymap; mymap[10] = 5.6;
The assignment of
mymap[10]
works because the non-constant overload
std::map::operator[]
returns a reference that can be assigned a value.
Mutable lvalue
Initially, when the notion of
lvalue was introduced in C, it literally meant “an expression applicable on the left side of an assignment operator”. However, later, when ISO C added the keyword
const
, this definition needed to be improved. Really:
const int a = 10;
Thus, not all lvalues can be assigned a value. Those that can be called
mutable lvalues (modifiable lvalues). Formally, the C99 standard defines mutable lvalues as:
[...] lvalue, the type of which is not an array, is not incomplete, has no const
specifier, is not a structure or union containing fields (also including fields recursively nested in contained aggregates and unions) with the const
specifier.
Conversions between lvalue and rvalue
Figuratively speaking, language constructs that operate on the values of objects require rvalue as arguments. For example, the binary operator '+' takes two rvalues as arguments and returns also rvalues:
int a = 1;
As we have seen before,
a
and
b
both lvalues. Therefore, in the third line, they undergo an implicit
lvalue-to-rvalue conversion . All lvalues that are not an array, a function, and not of an incomplete type can be converted to an rvalue.
What about the other way around? Is it possible to convert rvalue to lvalue? Of course not! This would violate the essence of the lvalue, according to its definition (The absence of implicit conversion means that the rvalue cannot be used where lvalue is expected).
This does not mean that lvalues cannot be obtained from an rvalue in an explicit way. For example, the unary operator '*' (dereference) takes an rvalue as an argument, but returns an lvalue as its result. Consider the following valid code:
int arr[] = {1, 2}; int* p = &arr[0]; *(p + 1) = 10;
Conversely, the unary '&' (address) operator takes an lvalue as an argument and produces an rvalue:
int var = 10; int* bad_addr = &(var + 1);
The "&" character plays a slightly different role in C ++ - it allows you to define a reference type. It is called the “lvalue reference”. A non-constant reference to an lvalue cannot be assigned to an rvalue, since this would require an invalid rvalue-to-lvalue conversion:
std::string& sref = std::string();
Constant references to lvalue
can be assigned to an rvalue. Since they are constants, the value cannot be changed by reference and therefore the problem of modifying rvalue is simply missing. This property makes it possible for one of the fundamental idioms of C ++ to be the admission of values by a constant reference as function arguments, which avoids the need to copy and create temporary objects.
CV-specified rvalues
If you read carefully the part of the C ++ standard regarding the lvalue-to-rvalue conversion (chapter 4.1 in the draft of the C ++ 11 standard), you can see the following:
An lvalue (3.10) on a type T that is not functional, or an array, can be converted to an rvalue. [...] If T is not a class, the type of rvalue is a cv-unspecified version of type T. Otherwise, the type of rvalue is T.
So what does cv-unspecified mean? The CV specifier is a term used to describe
const and
volatile type specifiers.
From chapter 3.9.3:
Each type that is a cv-unspecified complete or incomplete object type or void (3.9) type has three cv-specified versions, respectively: a type with a specifier const, a type with a specifier volatile and a type with specifiers const volatile. [...] CV-specified and cv-unspecified types are different, but they have the same presentation and alignment requirements.
But how does all this relate to rvalue? In C, rvalues never have cv-specified types. This property is lvalue. However, in C ++ class rvalues can be cv-specified, which does not apply to built-in types like
int
. Consider an example:
#include <iostream> class A { public: void foo() const { std::cout << "A::foo() const\n"; } void foo() { std::cout << "A::foo()\n"; } }; A bar() { return A(); } const A cbar() { return A(); } int main() { bar().foo();
The second line in the
main
function will call the
foo() const
method
foo() const
, since
cbar
returns an object of type
const A
, which is different from
A
This is exactly what was meant in the last sentence of the excerpt from the standard above. By the way, notice that the return value of
cbar
is rvalue. This was an example of a cv-specified rvalue in action.
Links to rvalue (C ++ 11)
References to rvalue and the accompanying concept
of transfer semantics are one of the most powerful tools added to C ++ 11. A detailed discussion on this topic is beyond the scope of this modest article (you can find a lot of material just by running “rvalue references.” Here are some resources that I find useful:
this ,
this and
especially this one ), but I would like to cite A simple example, because I believe that this chapter is the most appropriate place to demonstrate how the understanding of lvalue and rvalue expands our ability to talk about non-trivial language concepts.
A good half of the article was spent explaining that one of the most important differences between lvalue and rvalue is the fact that lvalue can be changed, while rvalue is not. Well, C ++ 11 adds one crucial feature in this distinction, allowing us to have references to the rvalue and thereby change them in some cases.
As an example, consider the simplest implementation of a dynamic array of integers. Let's look only at the methods related to the topic of this chapter:
class Intvec { public: explicit Intvec(size_t num = 0) : m_size(num), m_data(new int[m_size]) { log("constructor"); } ~Intvec() { log("destructor"); if (m_data) { delete[] m_data; m_data = 0; } } Intvec(const Intvec& other) : m_size(other.m_size), m_data(new int[m_size]) { log("copy constructor"); for (size_t i = 0; i < m_size; ++i) m_data[i] = other.m_data[i]; } Intvec& operator=(const Intvec& other) { log("copy assignment operator"); Intvec tmp(other); std::swap(m_size, tmp.m_size); std::swap(m_data, tmp.m_data); return *this; } private: void log(const char* msg) { cout << "[" << this << "] " << msg << "\n"; } size_t m_size; int* m_data; };
So, here are the usual constructor and destructor, the copy constructor and the assignment operator (this is the canonical implementation of the copy assignment operator from the standpoint of exception tolerance. Using the copy constructor and then not throwing the
std::swap
exception, we can be sure that intermediate state with non-initialized memory, if an exception occurs somewhere). They all use the logging function so that we can understand when they are actually called.
Let's run a simple code that copies the contents of
v1
to
v2
:
Intvec v1(20); Intvec v2; cout << "assigning lvalue...\n"; v2 = v1; cout << "ended assigning lvalue...\n";
And here is what we will see:
assigning lvalue... [0x28fef8] copy assignment operator [0x28fec8] copy constructor [0x28fec8] destructor ended assigning lvalue...
Which is completely logical, since it accurately reflects what is happening inside the assignment operator. But let's assume that we want to assign
v2
some rvalue:
cout << "assigning rvalue...\n"; v2 = Intvec(33); cout << "ended assigning rvalue...\n";
Although here I only assign a value to a newly created vector, this is one of the demonstrations of the general case when some temporary rvalue is created and assigned to
v2
(this can happen for example, if the function returns a vector). Here is what we see on the screen:
assigning rvalue... [0x28ff08] constructor [0x28fef8] copy assignment operator [0x28fec8] copy constructor [0x28fec8] destructor [0x28ff08] destructor ended assigning rvalue...
Wow! Looks very troublesome. In particular, it took an extra pair of constructor calls with a destructor to create and then delete a temporary object. And this is sad, because inside a copying assignment operator,
another temporary object is created and deleted. Additional work for nothing.
But no! C ++ 11 gives us references to rvalue, with which you can implement "transfer semantics", and in particular "transfer assignment operator" (now I understand why I always called
operator=
copy assignment operator. In C ++ 11, this difference becomes important). Let's add another
operator=
to
IntVec
:
Intvec& operator=(Intvec&& other) { log("move assignment operator"); std::swap(m_size, other.m_size); std::swap(m_data, other.m_data); return *this; }
Double Aspersand is a
reference to rvalue . It means just what it promises - it gives a reference to the rvalue, which will be destroyed after the call. We can use this fact to simply “sneak” the insides of the rvalue - he doesn't need them anyway! This is what is displayed on the screen:
assigning rvalue... [0x28ff08] constructor [0x28fef8] move assignment operator [0x28ff08] destructor ended assigning rvalue...
As we see, the new transfer assignment operator is called, since the rvalue is assigned to
v2
. Constructor and destructor calls are still required for a temporary object that is created via
Intvec(33)
. However, another temporary object inside the assignment statement is no longer needed. The operator simply changes the internal rvalue buffer with its own, and thus the destructor rvalue deletes the buffer of the object itself, which will no longer be used. Purely!
I just want to note once again that this example is only the tip of the iceberg of the transfer semantics and references to rvalue. As you can guess, this is a complex topic with many special cases and mysteries. I tried only to demonstrate a very interesting application of the differences between lvalue and rvalue in C ++. The compiler can obviously distinguish them and take care of calling the correct constructor at compile time.
Conclusion
You can write a lot of C ++ code, without thinking about the differences between rvalue and lvalue, omitting them as incomprehensible compiler jargon in error messages. However, as I tried to show in this article, a better knowledge of this topic will provide a deeper understanding of certain C ++ constructs, and make parts of the C ++ standard and discussions between language experts more accessible to you.
In the C ++ 11 standard, this topic is even more important, since C ++ 11 introduces the notion of references to rvalue and transfer semantics. To really understand new language features, a strict understanding of rvalue and lvalue is simply necessary.