Debunking the x32 ABI myths

Probably, some of you have heard of a freebie called x32 ABI .

Briefly about x32 ABI

In short, this is an opportunity to use all the advantages of a 64-bit architecture, but at the same time preserving 32-bit pointers. Potentially, the application will use less memory, although it will not be able to address more than 4 GiB of memory.

Example. In your code, you define an array of integers and fill it with values. How much memory do you use? If it is very crude to portray, it will turn out like this:
32 bits: Pointer + Count of the number of elements + N integers = N + 2 32-bit numbers
64 bits: Pointer + Count of the number of elements + N integers = N + 2 64-bit numbers = 2N + 4 32-bit numbers
So the engineers thought about it: what if you try using 32-bit pointers on a 64-bit architecture? The X86-64 architecture has a CISC command system and allows you to do this. In this case, our array above will consume 2N + 3 memory instead of 2N + 4. The savings are of course insignificant, but the fact is that in modern code the number of different kinds of pointers in structures often reaches ten, and the use of short pointers will potentially save up to 50% of memory (ideally).
')
For those who need calculations more precisely:
How big are arrays (and values) in PHP? (Hint: VERY BIG)
* How much memory does objects consume in PHP and is it worth using the 64-bit version?

But as it turned out there will be no freebies.

Translation of the article Debunking x32 myths

There were many comments on my previous x32 ABI article . Some of them are interesting, while others simply do not understand what they are writing about. I got the impression that there was something like a cargo cult around this topic. People think: “For some reason they are doing it, so I can also use it,” while there is no technical literacy to appreciate this very benefit.

So in the same spirit that I used to walk through ccache ` almost four years ago (wow, my blog has been for so many years, well, haven't I done anything?), I'll try to debunk the myths and misconceptions about this x32 ABI .

x32 ABI code faster

Not quite right. Now we have only a few test results, laid out by those who created the ABI itself. Of course, you expect those who spent the time to set up the system found it interesting and faster, but, frankly, I have doubts about the results, for reasons that will be clear after reading the next few sentences.

It is also interesting to note that, despite the fact that general measurements turned out to be faster, the difference is not fundamental. And even Intel's presentation shows big differences only in comparison with the original x86, which is already clear that it is worse than x86-64.

Also, these results were obtained using synthetic tests, and not from the actual use of the system, and you know, of course, if you know that such results can lie with three boxes.

x32 ABI code is smaller

The new ABI generates a smaller code, which means that more instructions will go to the processor’s cache, and we will have fewer files. This is absolutely wrong. The generated code is generally the same as for x86-64, because the instruction set does not change, the so-called data model simply changes, which means that you change the size for long (and related types) and the size of the pointers (but also changes and the size of the available address space).

It is theoretically true that if you intend to use smaller data structures, then they will fit into the data cache more (but not in the instruction cache, rest assured (note: CISC inside will immediately convert all short instructions to long ones)), but is it right approach? In my experience, it is better to focus on writing code that is optimally located in the cache if your code devours the cache. You can use the dev-util / dwarves utilities from arnaldo (acme). pahole , for example, will tell you how your data structures will be shared in memory.

Also remember that for compatibility, the system calls will be kept the same as in x86-64, which means that all the kernel code and system data structures you will use will be the same as for x86-64. Which means that a large number of structures will not change their size in the new ABI (note lane: binary interface).

Finally, if you again turn to the presentation, you can see on slide 24 that the x32 ABI code may be longer than the original x86 code. It would be nice if they also included an example for x86-64 code (since I do not own VCISC (note lane: I mean a group of 64-bit instructions from CISC)), but I think this is one same code.

Let's compare the size of the file libc.so.6 for interest. Here is the output of the rbelf-size utility from my Ruby Elf suite:

exec data rodata relro bss overhead allocated filename 1239436 7456 341974 13056 17784 94924 1714630 /lib/libc.so.6 1259721 4560 316187 6896 12884 87782 1688030 x32/libc.so.6

Running code is even more in the x32 version. A big change is of course in the data structure (data, rodata, relro and bss), since pointers are now abbreviated. To be honest, I am even concerned: “How can a C library have so many pointers in its own structures?”; but this is off topic. Even with the fact that the pointers are shorter, the difference is not so big. In general, you will have savings of something like 30 KiB, which is unlikely to change the memory mapping picture.

Reducing data size is useful

Well, yes, this is the main question. Of course, data structures are smaller with x32, since for this it was made, after all. But the main question will probably be: “Is it that important?”; I do not think. Even in the example above with the C library, where the difference is palpable, it is only about 20% of the space occupied. And this is the C library! A library that assumes you will write much smaller interfaces.

Now, if you add all the possible libraries to this, then maybe you can save a couple of megabytes of data, of course, but you also need to take into account all the porting problems I’m going to discuss soon. Yes, it is true that C ++ and most of the languages with a virtual machine will have less difficulty, especially when copying objects, thanks to reduced pointers, but for now we can say this with a big stretch. Especially since most of your data buffers need to be aligned at least 8 bytes (64 bits) to use new instructions. And you already align them to 16 bytes (128 bits) in order to use some SIMD instruction sets.

And for those who think that x32 will save disk space. Remember that you cannot have a “clean” x32 system, what you get will be a mixture of three approaches: x86, x86-64 and x32.

It has no application for applications using more than 4 GiB of memory.

Yes, of course, it may be true. But seriously, are you really worried about the size of the pointers? If you really want to make sure that the application does not use more than a certain amount of memory, use the system limits! They are certainly less “heavy” than creating a new ABI as a whole.

Interestingly, there are 2 different, opposite approaches for applications in a full 64-bit address space with a memory of less than 4 GiB:

ASLR (Address Space Layout Randomization), which can actually load various application objects across a wide range of addresses (lane: i.e., as it were, scatter over memory)
and Prelink , which makes it so that each unique object in the system is always loaded at the same address, and this is really the opposite of what ASLR does

Applications use long , but they do not need 64-bit address space

(Note. Per .: the author has in mind 64-bit long)
And, of course, the solution is to create a new ABI for this, according to some people.

I'm not going to say that a lot of people for applications still use long without thinking about why they do it. They may have small ranges of numbers that they want to use, and yet they use large types, such as long , because they may have studied programming on systems that use long as a synonym for int , or even on systems where long is 32-bit and int is 16bit (hi MS-DOS!).

The solution to this problem is simple - use the standard types provided by stdint.h , such as uint32_t and int16_t . So you will always use the size of the data you expect. It also works on more systems than you expected, and works with FFI and other technicians.

Assembled inserts are not so much

This was told to me by several people after my previous post, where I complained that in the new ABI we would lose most of the assembler inserts. This statement may be true, but in fact they are not as few as you think. Even if we exclude all multimedia programs, cryptographic programs that use SIMD quite well through assembler inserts (and not through compiler optimization).

There is also a problem with assembler inserts in things like Ruby , where Ruby 1.9 does not compile to x32. With Ruby 1.8, the situation is more interesting, because it compiles, but throws segfaults at runtime on startup. Doesn't it remind you anything?

In addition, the C library itself comes with a large number of assembly inserts. And the only reason why you don’t need to port so much is simple - HJ Lu, which takes care of most of them, is one of the authors of the new ABI, which means that the code has already been ported.

x32 ABI will be compatible with x86, if not now then in the future

Well, well, I did not mention this before, but this is one of the misconceptions that I noticed before I was stoned. Fortunately, the presentation will help with this. Slide 22 makes it clear that the new ABI is not compatible. Among other things, you may notice that ABI at least fixes some factual errors in x86, including the use of 32-bit data types for off_t and others. Again, I touched on this topic a little two years ago .

This is the future of 64-bit processors.

No, again, we turn to the slides, especially to slide 10. This is clearly done for proprietary systems than with a replacement for x86-64! How are you feeling now?

Porting will be trivial, you just need to change a few lines of assembler inserts and change the size of pointers

This is not the case. Porting requires you to solve a number of other issues, and assembly inserts are just the tip of the iceberg. Breaking the notion that x86-64 pointers are 64-bit is a big task in itself, but not as big as you might expect at first glance (and also for Windows) compared to the FFI implementation of C-bindings. Remember, I said that this is not an easy answer ?

The processor performs better 32-bit instructions than 64-bit

Interestingly, only one processor, which Intel claims to perform better in a presentation on 32-bit instructions, is Atom. I quote: "Delays on 64-bit IMUL operations are twice as high as on 32-bit operations on Atom."

So what is IMUL? This is a sign multiplication operation. Do you multiply pointers? It's pointless. In addition, pointers are not iconic. And you tell me that you are more worried about the platform (Atomʻe), which has big delays when people use 64-bit data instead of the set 32-bit data? And your solution to this problem is to create a new ABI, where it is difficult to use 64-bit types. And all this instead of simply fixing what causes these problems in the program?

I probably should dwell on this, since this last comment about Atom and IMUL will please many people who only superficially understand the new interface.

UPD. I just tried building PHP on my virtual machine with Gentoo x32 ABI RC . Like Ruby, it does not compile.

Source: https://habr.com/ru/post/170407/

All Articles