
In forums, people often mention that
64-bit versions of programs consume more memory and stack. It usually refers to the fact that the size of the data became 2 times larger. However, this is an unfounded statement, since the size of most types (char, short, int, float) in C / C ++ language remains the same on 64-bit systems. Of course, for example, the size of pointers has increased, but not all the data in the program consists of pointers. The reasons for the growth of memory consumption and the stack are more complex. I decided to investigate this issue in more detail.
In this article I will talk about the stack, and in the future I plan to discuss the allocation of memory and the size of the binary code. And I also want to immediately notice that the article is devoted to the C / C ++ language and the Visual Studio development environment.
')
Until recently, I thought that the code of a 64-bit program can absorb a stack no faster than twice as compared to a 32-bit code. Based on this assumption, I recommended in articles just in case to double the program stack. However, now I found out an unpleasant fact. The absorption of the stack can grow significantly more than twice. I was surprised because I had previously considered growing the stack twice as the most pessimistic scenario. The reason for my unfounded hopes will become clear later. Let us now consider how parameters are passed in a 64-bit program when calling functions.
When developing
calling conventions for the
x86-64 architecture, they decided to put an end to the existence of various options for calling functions. In
Win32, there were a number of calling conventions: stdcall, cdecl, fastcall, thiscall, and so on. In
Win64, there is only one native calling convention. Modifiers like __cdecl are ignored by the compiler. I think everyone will agree in the nobility of such a sharp reduction in the number of agreements.
The x86-64 calling convention is similar to the fastcall agreement in x86. In the x64 agreement, the first four integer arguments (from left to right) are transmitted in 64-bit registers chosen specifically for this purpose:
RCX: 1st integer argument
RDX: 2nd integer argument
R8: 3rd integer argument
R9: 4th integer argument
The remaining integer arguments are passed through the stack. The “this” pointer is considered an integer argument, so it is always placed in the RCX register. If floating-point values ​​are transmitted, the first four of them are transmitted in the XMM0-XMM3 registers, and the subsequent ones via the stack.
From this information, I previously concluded that a 64-bit program in many cases can save stack memory compared to 32-bit. After all, if parameters are passed through registers, the function code is short and there is no need to store arguments in the memory (stack), then the size of the used stack memory should be reduced. But it is not.
Although the arguments can be passed in registers, the compiler still reserves space for them in the stack, reducing the value of the RSP register (stack pointer). At a minimum, each function must reserve 32 bytes in the stack (four 64-bit values ​​corresponding to the registers RCX, RDX, R8, R9). This space in the stack makes it easy to save the contents of the registers passed to the function in the stack. The called function is not required to drop the input parameters passed through the registers to the stack, but reserving a place in the stack, if necessary, allows this. If more than four integer parameters are passed, the corresponding additional space must be reserved in the stack.
Consider an example. A certain function transfers two integer parameters of a child function. The compiler will put the values ​​of the arguments in the registers RCX and RDX and at the same time subtract 32 bytes from the register RSP. The called function can access parameters via the RCX and RDX registers. If the code of this function needs these registers for some other purpose, it will be able to copy their contents into the reserved stack space of 32 bytes in size.
The described feature leads to a significant increase in the rate of absorption of the stack. Even if the function has no parameters, 32 bytes will still be “bit off” from the stack, which are then not used at all. I did not catch the point of using such an inefficient mechanism. Something is said about the unification and simplification of debugging, but somehow vague.
Let's pay attention to one more moment. The RSP stack pointer must be aligned to 16 bytes before another function call. Thus, the total size of the used stack when calling a function
without parameters in the 64-bit code is: 8 (return address) + 8 (alignment) + 32 (reserve for arguments) =
48 bytes!Consider what this may lead to in practice. Here and further, for experiments, I will use Visual Studio 2010. We construct a recursive function of the form:
void StackUse (size_t * depth)
{
volatile size_t * ptr = 0;
if (depth! = NULL)
ptr = depth;
cout << * ptr << endl;
(* ptr) ++;
StackUse (depth);
(* ptr) -;
}
The function is a bit confusing so that the optimizer doesn't turn it into “nothing”. The main thing here is the following: the function has a pointer type argument and one local variable, also a pointer type. Let's see how many functions a stack consumes in 32-bit and 64-bit versions and how many times it can be called recursively with a 1-megabyte stack (default size).
Release 32-bit: The last displayed number (stack depth) is 51331.
The compiler uses 20 bytes when calling this function.
Release 64-bit: The last number displayed is 21288.
The compiler uses 48 bytes when calling this function.
Thus, the 64-bit version of the StackUse function is more than 2 times more voracious than the 32-bit one.
Note that changing the data alignment rules can also have an effect on the size of the stack being absorbed. Suppose a function takes a structure as an argument:
struct S
{
char a;
size_t b;
char c;
};
void StackUse (S s) {...}
The size of the 'S' structure due to changes in the alignment rules and changes in the size of the 'b' member will grow from 12 to 24 bytes when recompiled in 64-bit mode. The structure is passed to the function by value. And, therefore, the structure in the stack will also take twice as much memory.
is it so bad? Not. We should not forget about the greater number of registers available in the 64-bit compiler. Complicate the code of the experimental function:
void StackUse (size_t * depth, char a, int b)
{
volatile size_t * ptr = 0;
int c = 1;
int d = -1;
for (int i = 0; i <b; i ++)
for (char j = 0; j <a; j ++)
for (char k = 0; k <5; k ++)
if (* depth> 10 && k> 2)
{
c + = j * k - i;
d - = (i - j) * c;
}
if (depth! = NULL)
ptr = depth;
cout << c << "" << d << "" << * ptr << endl;
(* ptr) ++;
StackUse (depth, a, b);
(* ptr) -;
}
Startup Results:
Release 32-bit: The last displayed number is 16060.
The compiler uses 64 bytes when calling this function.
Release 64-bit: The last number displayed is 21310.
The compiler still uses 48 bytes when calling this function.
For this example, the 64-bit compiler managed to use additional registers and build more efficient code, which reduced the amount of stack memory used!
findings
- It is impossible to predict how much stack memory will use the 64-bit version of the program compared to the 32-bit one. The size can be either smaller (which is unlikely) or much larger.
- For a 64-bit program, just in case, it is worth increasing the volume of the reserved stack by a factor of 2-3. Better 3 times for peace of mind. To do this, in the project settings there is the Stack Reserve Size parameter (/ STACK: reserve key). The default stack size is 1 megabyte.
- You should not worry that a 64-bit program consumes more stack memory. There is much more physical memory in 64-bit systems. A 2 megabyte stack on a 64-bit system with 8 gigabytes of memory takes up a smaller percentage of memory than 1 megabyte of a stack on a 32-bit system with 2 gigabytes of memory.
Additional links
- Raymond Chen. The history of calling conventions, part 5: amd64. http://www.viva64.com/go.php?url=325
- Kevin Frei. x64 ABI vs. x86 ABI (aka Calling Conventions for AMD64 & EM64T). http://www.viva64.com/go.php?url=326
- Msdn x64 Software Conventions. http://www.viva64.com/go.php?url=327
- Wikipedia. x86 calling conventions. http://www.viva64.com/go.php?url=328