📜 ⬆️ ⬇️

Pointers, references, and arrays in C and C ++: dots above i

In this post I will try to finally understand such subtle concepts in C and C ++, as pointers, references and arrays. In particular, I will answer the question whether the arrays are C pointers or not.

Legend and Assumptions




Pointers and links


Pointers . What pointers, I will not tell. :) We will assume that you know this. I will remind only the following things (all code examples are assumed to be inside some function, for example, main):
')
 int x; int *y = &x; //            "&".     int z = *y; //        "*".     ,     


I also recall the following: char is always exactly one byte and in all C and C standards sizeof (char) == 1 (but standards do not guarantee that the byte contains exactly 8 bits :)). Further, if we add a number to a pointer to a type T, then the real numerical value of this pointer will increase by that number multiplied by sizeof (T) . That is, if p is of type T *TYPE , then p + 3 equivalent to (T *)((char *)p + 3 * sizeof (T)) . Similar considerations apply to subtraction.

References Now about the links. Links are the same as pointers, but with a different syntax and some other important differences, which will be discussed further. The following code is no different from the previous one, except for the fact that it contains links instead of pointers:
 int x; int &y = x; int z = y; 


If a link is to the left of the assignment sign, then there is no way to understand whether we want to assign the link itself or the object to which it refers. Therefore, such an assignment always assigns an object, not a link. But this does not apply to the initialization of the link: the link itself, of course, is initialized. Therefore, after initializing the link, there is no way to change it itself, that is, the link is always constant (but not its object).

Lvalue Those expressions that can be assigned are called lvalues ​​in C, C ++, and many other languages ​​(this is short for “left value”, i.e., to the left of the equal sign). The remaining expressions are called rvalue. Variable names are obviously lvalues, but not only them. The expressions a[i + 2] , some_struct.some_field , *ptr , *(ptr + 3) are also lvalue.

The amazing fact is that references and lvalues ​​are in one sense the same thing. Let's argue. What is lvalue? This is something to assign. That is, it is a kind of fixed place in memory where you can put something. Ie address. That is, a pointer or a link (as we already know, pointers and links are two syntactically different ways in C ++ to express the concept of an address). Moreover, a link is more likely than a pointer, since the link can be placed to the left of the equal sign and this will mean assignment to the object referenced. So lvalue is a link.

And what is the link? This is one of the syntaxes for the address, i.e., again, something to put. And the link can be put to the left of the equal sign. So the link is lvalue.

Okay, but after all (almost any) variable can also be to the left of the equal sign. So (such) variable is a link? Nearly. The expression that represents the variable is a link.

In other words, suppose we declare int x . Now x is a variable of type int TYPE and no other. This is int and that's it. But if I now write x + 2 or x = 3 , then in these expressions the subexpression x is of type int &TYPE . Because otherwise, this x would be no different from, say, 10, and he (like the top ten) could not be assigned anything.

This principle ("an expression that is a variable - a link") is my invention. That is, in no textbook, standard, etc., I did not see this principle. Nevertheless, it simplifies a lot and is conveniently considered correct. If I implemented a compiler, I would simply consider the variables in the expressions as references, and, quite possibly, this is exactly what is expected in real compilers.

Moreover, it is convenient to assume that a special data type for lvalue (i.e. reference) exists even in C. That is how we will continue to assume. Just the concept of a link cannot be expressed syntactically in C, a link cannot be declared.

The principle “any lvalue - link” is also my invention. But the principle “any reference is lvalue” is a completely legitimate, generally accepted principle (of course, the reference must be a reference to the object being changed, and this object must allow assignment).

Now, taking into account our agreements, we formulate strictly the rules for working with links: if, say, int x declared, then the expression x has the type int &TYPE . If now this expression (or any other expression of the link type) is to the left of the equal sign, then it is used as a link, in almost all other cases (for example, in the situation x + 2 ) x is automatically converted to the type int TYPE (another operation , next to which the link is not converted into its object, is &, as we will see later). To the left of the equal sign can only be a link. Only a link can initialize a (non-constant) link.

Operations * and & . Our agreements allow us to take a fresh look at operations * and &. Now the following becomes clear: the * operation can only be applied to the pointer (specifically, it was always known) and it returns a reference to the same type. & always applies to the link and returns a pointer of the same type. Thus, * and & turn pointers and links into each other. That is, in fact, they do nothing at all and only replace the essence of one syntax with the essence of another! Thus, & generally not quite correctly called the operation of taking the address: it can only be applied to an already existing address, it just changes the syntactic embodiment of this address.

Note that pointers and references are declared as int *x and int &x . Thus, the principle of “announcement prompts the use of” is once again confirmed: the declaration of the pointer reminds how to turn it into a link, and the declaration of the link is the opposite.

Also note that &*EXPR (here EXPR is an arbitrary expression, not necessarily a single identifier) ​​is equivalent to EXPR whenever it makes sense (that is, always when EXPR is a pointer), and *&EXPR also equivalent to EXPR whenever it has meaning (i.e., when EXPR is a link).

Arrays


So, there is such a data type - an array. Arrays are defined, for example, as follows:
 int x[5]; 

The expression in square brackets must necessarily be a compile-time constant in C89 and C ++ 98. At the same time in square brackets should be a number, empty square brackets are not allowed.

Just as all local variables (recall, we assume that all code examples are inside functions) are on the stack, arrays are also on the stack. That is, the above code led to the allocation right on the stack of a huge memory block of 5 * sizeof (int) size, in which our entire array is located. No need to think that this code declared a pointer, which points to a memory located somewhere far away, in a heap. No, we declared an array, the real one. Here on the stack.

What will be sizeof (x) ? Of course, it will be equal to the size of our array, i.e. 5 * sizeof (int) . If we write
 struct foo { int a[5]; int b; }; 

then, again, the space for the array will be completely allocated directly inside the structure, and the sizeof from this structure will confirm this.

From the array, you can take the address ( &x ), and it will be a real pointer to the place where this array is located. The type of the &x expression, as is easily understood, will be int (*TYPE)[5] . At the beginning of the array its zero element is located, therefore the address of the array itself and the address of its zero element are numerically the same. That is, &x and &(x[0]) numerically equal (here I famously wrote the expression &(x[0]) , in fact, it is not so simple, we'll return to this). But these expressions have a different type - int (*TYPE)[5] and int *TYPE , so comparing them with == will not work. But you can use the trick with void * : the following expression will be true: (void *)&x == (void *)&(x[0]) .

Well, let's assume, I convinced you that an array is an array, and not something else. Where does all this confusion between pointers and arrays come from? The fact is that the name of an array in almost any operation is converted to a pointer to its zero element.

So, we declared int x[5] . If we now write x + 0 , then this will convert our x (which was of type int TYPE[5] , or, more precisely, int (&TYPE)[5] ) to &(x[0]) , i.e. pointer to the zero element of the array x. Now our x is of type int *TYPE .

Converting an array name to void * or applying == to it also leads to a preliminary conversion of this name into a pointer to the first element, therefore:
 &x == x //  ,  : int (*TYPE)[5]  int *TYPE (void *)&x == (void *)x //  x == x + 0 //  x == &(x[0]) //  


Operation [] . The a[b] record is always equivalent to *(a + b) (recall that we do not consider overriding operator[] and other operations). Thus, x[2] means the following:


The types of expressions involved are as follows:
 x // int (&TYPE)[5],   : int *TYPE x + 2 // int *TYPE *(x + 2) // int &TYPE x[2] // int &TYPE 


Also note that to the left of the square brackets it does not have to be an array, there can be any pointer. For example, you can write (x + 2)[3] , and this will be equivalent to x[5] . I also note that *a and a[0] always equivalent, as in the case when a is an array, and when a is a pointer.

Now, as I promised, I return to &(x[0]) . It is now clear that in this expression, first x is converted to a pointer, then [0] is applied to this pointer in accordance with the above algorithm and the result is a value of type int &TYPE , and finally, using & it is converted to type int *TYPE . Therefore, to explain with the help of this complex expression (inside which an array is converted to a pointer) a slightly simpler notion of converting an array to a pointer is done - it was a bit mulling.

And now the question of backfilling : what is &x + 1 ? Well, &x is a pointer to the entire array, + 1 leads to a step to the whole array. That is, &x + 1 is (int (*)[5])((char *)&x + sizeof (int [5])) , i.e. (int (*)[5])((char *)&x + 5 * sizeof (int)) (here, int (*)[5] is int (*TYPE)[5] ). So &x + 1 numerically equal to x + 5 , not x + 1 , as one might think. Yes, as a result, we point to a memory that is outside the array (immediately after the last element), but who cares? After all, C still does not check whether the array goes beyond the bounds. Also note that the expression *(&x + 1) == x + 5 true. You can also write it like this: (&x)[1] == x + 5 . It will also be true *((&x)[1]) == x[5] , or, equivalently, (&x)[1][0] == x[5] (unless we seize the segmentation fault, of course for trying to turn beyond our memory :)).

An array cannot be passed as an argument to a function . If you write int x[2] or int x[] in the function header, it will be equivalent to int *x and a pointer will always be passed to the function (the sizeof from the passed variable will be the same as the pointer). In this case, the size of the array specified in the header will be ignored. You can easily specify int x[2] in the header and pass an array of length 3 there.

However, in C ++ there is a way to pass a function reference to an array:
 void f (int (&x)[5]) { // sizeof (x)   5 * sizeof (int) } int main (void) { int x[5]; f (x); // OK f (x + 0); //  int y[7]; f (y); // ,    } 

With such a transfer, you still transmit only the link, not the array, that is, the array is not copied. But still you get a few differences compared to the usual pointer passing. The link to the array is passed. You cannot pass a pointer instead. It is necessary to transfer an array of the specified size. Inside the function, the link to the array will behave exactly as a link to the array, for example, it will have sizeof like an array.

And what is most interesting, this program can be used as follows:
 //    template <typename t, size_t n> size_t len (t (&a)[n]) { return n; } 

Similarly, the std :: end function is implemented in C ++ 11 for arrays.

"Pointer to an array . " Strictly speaking, a “pointer to an array” is exactly a pointer to an array and nothing else. In other words:
 int (*a)[2]; //    .  .    int (*TYPE)[2] int b[2]; int *c = b; //     .   .       int *d = new int[4]; //      .   

However, sometimes the phrase “pointer to an array” informally means a pointer to the memory area in which the array is located, even if the type of this pointer is unsuitable. According to this informal understanding, c and d (and b + 0 ) are pointers to arrays.

Multidimensional arrays . If int x[5][7] declared, then x is not an array of length 5 of some pointers pointing somewhere far away. No, x now is a single monolithic block of size 5 x 7 placed on the stack. sizeof (x) is 5 * 7 * sizeof (int) . The elements are located in the memory as follows: x[0][0] , x[0][1] , x[0][2] , x[0][3] , x[0][4] , x[0][5] , x[0][6] , x[1][0] and so on. When we write x[0][0] , events happen like this:
 x // int (&TYPE)[5][7],  : int (*TYPE)[7] x[0] // int (&TYPE)[7],  : int *TYPE x[0][0] // int &TYPE 

The same applies to **x . I note that in expressions, say, x[0][0] + 3 and **x + 3 in reality, retrieving from memory occurs only once (despite the presence of two asterisks), at the time of converting the final reference like int &TYPE just int TYPE . That is, if we looked at the assembler code that is generated from the expression **x + 3 , we would see in it that the operation of extracting data from memory is performed there only once. **x + 3 can also be written differently as *(int *)x + 3 .

Now look at this situation:
 int **y = new int *[5]; for (int i = 0; i != 5; ++i) { y[i] = new int[7]; } 


What is y now? y is a pointer to an array (in an informal sense!) of pointers to arrays (again, in an informal sense). Nowhere there is a single block of size 5 x 7, there are 5 blocks of size 7 * sizeof (int) , which can be far from each other. What is y[0][0] ?
 y // int **&TYPE y[0] // int *&TYPE y[0][0] // int &TYPE 

Now, when we write y[0][0] + 3 , retrieving from memory occurs two times: retrieving from the array y and then retrieving from the array y[0] , which may be far from the array y. The reason for this is that there is no conversion of the array name to a pointer to its first element, unlike the example with the multidimensional array x. Therefore **y + 3 is not equivalent here *(int *)y + 3 .

I will explain one more time. x[2][3] equivalent to *(*(x + 2) + 3) . And y[2][3] equivalent to *(*(y + 2) + 3) . But in the first case, our task is to find the “third element in the second row” in a single block of size 5 x 7 (of course, the elements are numbered from zero, so this third element will be in some sense the fourth :)). The compiler calculates that in fact the necessary element is located at 2 * 7 + 3 th place in this block and extracts it. That is, x[2][3] is equivalent to ((int *)x)[2 * 7 + 3] , or, equivalently, *((int *)x + 2 * 7 + 3) . In the second case, first retrieves the 2nd element in the array y, and then the 3rd element in the resulting array.

In the first case, when we do x + 2 , we immediately shift by 2 * sizeof (int [7]) , i.e. by 2 * 7 * sizeof (int) . In the second case, y + 2 is a shift by 2 * sizeof (int *) .

In the first case (void *)x and (void *)*x (and (void *)&x !) Is the same pointer, in the second it is not.

Source: https://habr.com/ru/post/251091/


All Articles