Puns typing functions in C

C has a reputation for inflexible language. But you know that you can change the order of the function arguments in C if you don’t like it?

#include <math.h> #include <stdio.h> double DoubleToTheInt(double base, int power) { return pow(base, power); } int main() { //          double (*IntPowerOfDouble)(int, double) = (double (*)(int, double))&DoubleToTheInt; printf("(0.99)^100: %lf \n", DoubleToTheInt(0.99, 100)); printf("(0.99)^100: %lf \n", IntPowerOfDouble(100, 0.99)); }

This code never actually defines the IntPowerOfDouble function - because the IntPowerOfDouble function IntPowerOfDouble not exist. This is a variable that points to DoubleToTheInt , but with a type that says he wants an int argument to go in front of a double argument.

You might expect IntPowerOfDouble take the arguments in the same order as DoubleToTheInt , but lead the arguments to other types, or something like that. But this is not what is happening.

Try it - you will see the same result in both lines.

 emiller@gibbon ~> clang something.c emiller@gibbon ~> ./a.out (0.99)^100: 0.366032 (0.99)^100: 0.366032

Now try changing all int to float - you will see that FloatPowerOfDouble does something even FloatPowerOfDouble . Yes,

 double DoubleToTheFloat(double base, float power) { return pow(base, power); } int main() { double (*FloatPowerOfDouble)(float, double) = (double (*)(float, double))&DoubleToTheFloat; printf("(0.99)^100: %lf \n", DoubleToTheFloat(0.99, 100)); // OK printf("(0.99)^100: %lf \n", FloatPowerOfDouble(100, 0.99)); // ... }

gives:

 (0.99)^100: 0.366032 (0.99)^100: 0.000000

The value in the second line is “not even erroneous” - if the problem were in the rearrangement of the arguments, we would expect the answer to be 100 ^ 0.99 = 95.5 and not 0. What happens?

The code examples above represent type punning of functions — a dangerous form of "assembler without assembler" that should never be used at work, alongside heavy machinery, or in combination with prescription drugs. These examples are absolutely logical for those who understand the code at the assembly level - but, most likely, it will confuse everyone else.

I was a little cheating - suggested that you run the code on a 64-bit x86 computer. On another architecture, this focus may not work. Although it is believed that C has an infinite number of dark corners, the behavior with int and double arguments is not exactly part of the C standard. This is the result of how functions are called on modern x86 machines, and can be used for elegant programmer tricks.

This is not my signature

If you studied C at university, you may remember that the arguments are passed to functions on the stack. The caller puts the arguments on the stack in reverse order, and the function reads the arguments from the stack.

At least, they explained it to me like this, but most computers today pass the first few arguments directly to the CPU registers. Thus, functions will not need to be read from the stack, which is much slower than registers.

The number and location of registers used for function arguments depends on the calling convention. Windows has one convention — four registers for floating point values and four registers for pointers and integers. Unix has another convention called System V convention. It uses eight registers for floating-point arguments and six more for pointers and integers. (If the arguments are not registered in the registers, then they are sent to the stack according to the old one.)

In C, header files exist only to tell the compiler where to put the function arguments, often combining registers and stack. Each calling convention has its own algorithm for placing these arguments in registers and on the stack. Unix, for example, is very aggressive about breaking structures and attempting to fit all fields in registers, while Windows is a bit lazier and simply passes a pointer to a large parameter structure.

But in both Windows and Unix, the basic algorithm works like this:

The floating-point arguments are arranged, in order, in the SSE registers, denoted XMM0, XMM1, etc.
Integers and pointers are arranged, in order, in general-purpose registers, labeled RDX, RCX, etc.

Let's see how the arguments to the DoubleToTheInt function are DoubleToTheInt .

The signature of the function is as follows:

  double DoubleToTheInt(double base, int power);

When the compiler encounters DoubleToTheInt(0.99, 100) , it registers as follows:

Rdx	RCX	R8	R9
100	???	???	???
XMM0	Xmm1	Xmm2	Xmm3
0.99	???	???	???

(For simplicity, I use the Windows calling convention.) If in return there was such a function:

  double DoubleToTheDouble(double base, double power);

Arguments would be arranged like this:

Rdx	RCX	R8	R9
???	???	???	???
XMM0	Xmm1	Xmm2	Xmm3
0.99	100	???	???

Now you may have guessed why the little trick from the beginning of the article works. Consider the following function signature:

  double IntPowerOfDouble(int y, double x);

Calling IntPowerOfDouble(100, 0.99) , the compiler will arrange the registers like this:

Rdx	RCX	R8	R9
100	???	???	???
XMM0	Xmm1	Xmm2	Xmm3
0.99	???	???	???

In other words, just like in DoubleToTheInt(0.99, 100) !
Due to the fact that the compiled function has no idea how it was called - only where in registers and on the stack we can expect our arguments - we can call a function with a different order of arguments by casting a pointer to the function to the wrong (but ABI-compatible) function signature .

In fact, as long as the integer and floating point arguments preserve order, we can mix them as you please, and the location of the registers will be the same. That is,

double functionA(double a, double b, float c, int x, int y, int z);

there will be the same register arrangement as in:

double functionB(int x, double a, int y, double b, int z, float c);

and the same as u:

double functionC(int x, int y, int z, double a, double b, float c);

In all three cases in the registers will be:

Rdx	RCX	R8	R9
`int x`	`int y`	`int z`	???
XMM0	Xmm1	Xmm2	Xmm3
`double a`	`double b`	`double c`	???

Note that both double and single precision arguments occupy XMM registers - but they are not ABI-compatible with each other. Therefore, if you remember the second code example at the very beginning, the reason FloatPowerOfDouble returned zero (not 95.5) is because the compiler arranged the single-precision value (32-bit) 100.0 in XMM0, and the double-precision value (64-bit) 0.99 in XMM1 - but the called function expected double- precision number in XMM0 and single in XMM1. Because of this, the exhibitor pretended to be a mantissa, the bits of the mantissa were cut off or taken as an exponent, and the FloatPowerOfDouble function raised a Very Small Number to the FloatPowerOfDouble a Very Large Number, getting a zero. The riddle is solved.

Pay attention to ??? in the tables above. The values of these registers are not defined - there can be any value from previous calculations. The called function does not matter what is in them, and it can overwrite them during execution.

This creates an interesting possibility - in addition to calling a function with a different order of arguments, you can also call a function with a different number of arguments . There are several reasons why you might want to do something so crazy.

Dial 1-800-I-Really-Enjoy-Type-Punning

Try it:

 #include <math.h> #include <stdio.h> double DoubleToTheInt(double x, int y) { return pow(x, y); } int main() { double (*DoubleToTheIntVerbose)( double, double, double, double, int, int, int, int) = (double (*)(double, double, double, double, int, int, int, int))&DoubleToTheInt; printf("(0.99)^100: %lf \n", DoubleToTheIntVerbose( 0.99, 0.0, 0.0, 0.0, 100, 0, 0, 0)); printf("(0.99)^100: %lf \n", DoubleToTheInt(0.99, 100)); }

It is not surprising that in both lines the same result - all arguments are placed in registers, and the location of registers is the same.

Now the fun begins. We can define a new "verbose" type of function that can call many different types of functions, provided that the arguments get into the registers and the function returns the same type.

 #include <math.h> #include <stdio.h> typedef double (*verbose_func_t)(double, double, double, double, int, int, int, int); int main() { verbose_func_t verboseSin = (verbose_func_t)&sin; verbose_func_t verboseCos = (verbose_func_t)&cos; verbose_func_t verbosePow = (verbose_func_t)&pow; verbose_func_t verboseLDExp = (verbose_func_t)&ldexp; printf("Sin(0.5) = %lf\n", verboseSin(0.5, 0.0, 0.0, 0.0, 0, 0, 0, 0)); printf("Cos(0.5) = %lf\n", verboseCos(0.5, 0.0, 0.0, 0.0, 0, 0, 0, 0)); printf("Pow(0.99, 100) = %lf\n", verbosePow(0.99, 100.0, 0.0, 0.0, 0, 0, 0, 0)); printf("0.99 * 2^12 = %lf\n", verboseLDExp(0.99, 0.0, 0.0, 0.0, 12, 0, 0, 0)); }

This type compatibility is convenient because we can, for example, create a simple calculator that refers to any function that accepts and returns double-precision numbers:

 #include <math.h> #include <stdio.h> #include <stdlib.h> #include <string.h> typedef double (*four_arg_func_t)(double, double, double, double); int main(int argc, char **argv) { four_arg_func_t verboseFunction = NULL; if (strcmp(argv[1], "sin") == 0) { verboseFunction = (four_arg_func_t)&sin; } else if (strcmp(argv[1], "cos") == 0) { verboseFunction = (four_arg_func_t)&cos; } else if (strcmp(argv[1], "pow") == 0) { verboseFunction = (four_arg_func_t)&pow; } else { return 1; } double xmm[4]; int i; for (i=2; i<argc; i++) { xmm[i-2] = strtod(argv[i], NULL); } printf("%lf\n", verboseFunction(xmm[0], xmm[1], xmm[2], xmm[3])); return 0; }

Checking:

 emiller@gibbon ~> clang calc.c emiller@gibbon ~> ./a.out pow 0.99 100 0.366032 emiller@gibbon ~> ./a.out sin 0.5 0.479426 emiller@gibbon ~> ./a.out cos 0.5 0.877583

Not quite a competitor to Mathematica , but you can present a more complex version with a table of function names and corresponding function pointers — to add a new function, you just need to update the table and not explicitly call a new function in code.

Other uses include JIT compilers. If you have ever practiced on an LLVM tutorial, you might suddenly come across a message:

"Full-featured argument passing not supported yet!"

LLVM skillfully turns code into machine codes and loads machine codes into memory, but is not very flexible if you need to call a function loaded into memory. With LLVMRunFunction , you can call main() -like functions (integer argument, pointer argument, pointer argument, returns integer), but not much else. Most tutorials recommend wrapping your compiler function with a function similar to main() , hiding all your arguments for the pointer argument, and using a wrapper to pull the arguments from the pointer and call the real function.

But with our new knowledge of X86 registers, we can simplify the ceremony by getting rid of the wrapper function in many cases. Instead of checking that the function belongs to the restricted list of C-callable function signatures ( int main() , int main(int) , int main(int, void *) , etc.), we can create a pointer, signature It fills all the registers of parameters and, consequently, is compatible with all functions that pass arguments only through registers, and call them, passing zero (or whatever) for unused arguments. We just need to define a separate type for each return type, rather than for each possible function signature, and call the functions more flexibly using a method that otherwise would require the use of an assembler.

I'll show you the last trick before closing the shop. Try to figure out how this code works:

 double NoOp(double a) { return a; } int main() { double (*ReturnLastReturnValue)() = (double (*)())&NoOp; double value = pow(0.99, 100.0); double other_value = ReturnLastReturnValue(); printf("Value: %lf Other value: %lf\n" value, other_value); }

(You should first read your calling agreement ...)

Translator theory

The function returns the result via XMM0. Nothing happens between the two functions, and in XMM0 the result of the last function remains, which NoOp picks up as an argument and returns.

Some assembly language required

If you ever ask on a programmer’s forum about an assembler, the usual answer is: You don’t need an assembler - leave it for the brilliant doctors of science who write compilers. Yes, please hold your hands in sight.

Compiler writers are smart people, but I think it is a mistake to assume that everyone else should carefully avoid assembler. In the short run on puns of typing, we saw how register locations and calling convention — supposedly the exclusive care of assembler compiler writers — pops into C from time to time and how to use this knowledge to do things that ordinary C programmers would consider impossible.

But this is only the very tip of the programming language in assembly language — specially presented without a single line of assembly language code — and I advise anyone who has time to take a deeper dive into this topic. The assembler is the key to understanding how the CPU handles instructions — what the instruction counter is, what the frame pointer is, what the stack pointer is, what registers do — and allows you to look at programs in a different (brighter) light. Even basic knowledge can help you come up with solutions that otherwise would not even occur to you and understand what is happening when you slip past the prison guards of your favorite high-level language and squint into the harsh, beautiful sun.

Source: https://habr.com/ru/post/307706/

All Articles

Puns typing functions in C

This is not my signature

Dial 1-800-I-Really-Enjoy-Type-Punning

Some assembly language required

More articles: