📜 ⬆️ ⬇️

C ++: a session of spontaneous archeology and why you should not use variable functions in the style of C

It all began, as usual, with an error. The first time I worked with the Java Native Interface and in C ++ parts I wrapped a function that created a Java object. This function - CallVoidMethod - is variable, i.e. in addition to the pointer to the JNI environment, the pointer to the type of the object being created, and the identifier of the method being called (in this case, the constructor), it takes an arbitrary number of other arguments. Which is logical, because these other arguments are passed to the called method on the Java side, and the methods may be different, with different numbers of arguments of any type.

Accordingly, I also made my wrapper variable. To pass an arbitrary number of arguments to CallVoidMethod used va_list , because there is no other way. Yes, and sent va_list to CallVoidMethod . And dropped the JVM with the commonplace segmentation fault.

In 2 hours I managed to try several versions of the JVM, from the 8th to the 11th, because: firstly, this is my first experience with the JVM , and in this matter I trusted StackOverflow more than myself, and secondly, someone then on StackOverflow I advised in this case not to use OpenJDK, but OracleJDK, and not 8, but 10. And only then I finally noticed that besides the variable CallVoidMethod there is CallVoidMethodV , which takes an arbitrary number of arguments via va_list .
')
What I didn’t like most about this story is that I didn’t immediately notice the difference between the ellipsis (ellipsis) and va_list . But noticing, I could not explain to myself what the fundamental difference is. This means that it is necessary to deal with both ellipse, and va_list , and (since it’s still C ++) with variable patterns.

What about ellipse and va_list is stated in the Standard


The C ++ standard describes only the differences between its requirements and those of the Standard C. About the differences themselves later, but for now I will briefly retell what Standard C says (starting with C89).


Why? But because!


There are not many types in C. Why is va_list declared in the Standard, but nothing is said about its internal structure?

Why do we need an ellipsis, if an arbitrary number of arguments to the function can be passed via va_list ? It could be said now: “as syntactic sugar,” but 40 years ago, I am sure, it was not up to sugar.

Philip James Plauger ( Phillip James Plauger ) in his book The Standard C library - The Year 1992 - says that C was originally created exclusively for PDP-11 computers. And there it was possible to go through all the function arguments using simple pointer arithmetic. The problem appeared with the growing popularity of C and moving the compiler to other architectures. The first edition of C Programming Language ( The C Programming Language ) by Brian Kernighan and Dennis Ritchie (1978) - explicitly states:
By the way, there is no acceptable way to write a portable function of an arbitrary number of arguments, since There is no portable way for the called function to find out how many arguments were passed to it when it was called. ... printf , the most typical C language function of an arbitrary number of arguments ... is not portable and must be implemented for each system.
This book describes printf , but no vprintf yet, and does not mention the type and macros va_* . They appear in the second edition of the C Programming Language (1988), and this is the merit of the committee to develop the first C Standard (C89, aka ANSI C). The committee added the heading <stdarg.h> to the Standard, based on <varargs.h> , created by Andrew Koenig ( Andrew Koenig ) in order to increase the portability of the UNIX OS. va_* was decided to leave the va_* macros as macros to make it easier for existing compilers to support the new Standard.

Now, with the advent of C89 and the va_* family, it has become possible to create portable variable functions. And although the internal structure of this family is still not described in any way, and there are no requirements for it, it is already clear why.

From pure curiosity, you can find examples of the implementation of <stdarg.h> . For example, in the same Standard Library C, an example is given for Borland Turbo C ++ :

<stdarg.h> from Borland Turbo C ++
 #ifndef _STADARG #define _STADARG #define _AUPBND 1 #define _ADNBND 1 typedef char* va_list #define va_arg(ap, T) \ (*(T*)(((ap) += _Bnd(T, _AUPBND)) - _Bnd(T, _ADNBND))) #define va_end(ap) \ (void)0 #define va_start(ap, A) \ (void)((ap) = (char*)&(A) + _Bnd(A, _AUPBND)) #define _Bnd(X, bnd) \ (sizeof(X) + (bnd) & ~(bnd)) #endif 


A much newer SystemV ABI for AMD64 uses this type for va_list :

va_list of SystemV ABI AMD64
 typedef struct { unsigned int gp_offset; unsigned int fp_offset; void *overflow_arg_area; void *reg_save_area; } va_list[1]; 


In general, it can be said that the type and macros va_* provide a standard interface for traversing the arguments of a variable function, and their implementation for historical reasons depends on the compiler, the target platform and the architecture. Moreover, an ellipse (i.e., variable functions in general) appeared in C earlier than va_list (i.e., the header <stdarg.h> ). And va_list was created not to replace the ellipsis, but to allow developers to write their own portable variable functions.

C ++ largely maintains backward compatibility with C, so all of the above applies to it. But there are some peculiarities.

Variable functions in C ++


The development of the C ++ Standard was carried out by the WG21 working group. Back in 1989, the newly created Standard C89 was taken, which gradually changed to describe C ++ itself. In 1995, the proposal N0695 from John Micco ( John Micco ) was received, in which the author proposed to change the restrictions for macros va_* :

I didn’t even translate the last point to share my pain. First, the “ default argument type boost ” in the C ++ Standard remains [C ++ 17 8.2.2 / 9] . And secondly, I was puzzled for a long time over the meaning of this phrase, compared with Standard C, where everything is clear. Only after reading N0695 did I finally understand: the same is meant here.

However, all 3 changes were accepted [C ++ 98 18.7 / 3] . Even in C ++, the variable function’s requirement to have at least one named parameter disappeared (in this case, the rest cannot be accessed, but more on that later), and the list of valid types of unnamed arguments was supplemented with pointers to class members and POD types.

The C ++ 03 standard did not bring any changes to the variable functions. C ++ 11 began to convert an unnamed argument of type std::nullptr_t to void* and allowed compilers to support types with non-trivial constructors and destructors at their discretion [C ++ 11 5.2.2 / 7] . C ++ 14 allowed to use functions and arrays as [C ++ 14 18.10 / 3] as the last named parameter, and C ++ 17 forbade disclosing of package of parameters ( pack expansion ) and variables captured by lambda [C ++ 17 21.10.1 / 1] .

As a result, C ++ added the variable functions of its pitfalls. Only one unspecified ( unspecified ) type support with non-trivial constructors / destructors is worth something. Below, I will try to reduce all non-obvious features of variable functions into one list and supplement it with concrete examples.

How easy and wrong to use variable functions


  1. It is wrong to declare the last named argument with the type being raised, i.e. char , signed char , unsigned char , singed short , unsigned short or float . The result according to the Standard will be undefined behavior.

    Wrong code
     void foo(float n, ...) { va_list va; va_start(va, n); std::cout << va_arg(va, int) << std::endl; va_end(va); } 


    Of all the compilers that I had on hand (gcc, clang, MSVC), only clang issued a warning.

    Clang warning
     ./test.cpp:7:18: warning: passing an object that undergoes default argument promotion to 'va_start' has undefined behavior [-Wvarargs] va_start(va, n); ^ 

    And although in all cases the compiled code behaved correctly, you should not count on it.

    It will be right
     void foo(double n, ...) { va_list va; va_start(va, n); std::cout << va_arg(va, int) << std::endl; va_end(va); } 

  2. It is wrong to declare the last named argument by reference. Any reference. The standard in this case also promises undefined behavior.

    Wrong code
     void foo(int& n, ...) { va_list va; va_start(va, n); std::cout << va_arg(va, int) << std::endl; va_end(va); } 

    gcc 7.3.0 compiled this code without a single comment. clang 6.0.0 issued a warning, but nevertheless compiled it.

    Clang warning
     ./test.cpp:7:18: warning: passing an object of reference type to 'va_start' has undefined behavior [-Wvarargs] va_start(va, n); ^ 

    In both cases, the program worked correctly (lucky, you can not rely on it). But MSVC 19.15.26730 distinguished itself - he refused to compile the code, because The va_start argument va_start not be a link.

    MSVC Error
     c:\program files (x86)\microsoft visual studio\2017\community\vc\tools\msvc\14.15.26726\include\vadefs.h(151): error C2338: va_start argument must not have reference type and must not be parenthesized 

    Well, the correct version looks like this, for example.
     void foo(int* n, ...) { va_list va; va_start(va, n); std::cout << va_arg(va, int) << std::endl; va_end(va); } 

  3. It is wrong to query the va_arg type to be va_arg - char , short or float .

    Wrong code
     #include <cstdarg> #include <iostream> void foo(int n, ...) { va_list va; va_start(va, n); std::cout << va_arg(va, int) << std::endl; std::cout << va_arg(va, float) << std::endl; std::cout << va_arg(va, int) << std::endl; va_end(va); } int main() { foo(0, 1, 2.0f, 3); return 0; } 

    Here is more interesting. gcc when compiling produces a warning that you need to use double instead of float , and if this code is still executed, the program will end with an error.

    Gcc warning
     ./test.cpp:9:15: warning: 'float' is promoted to 'double' when passed through '...' std::cout << va_arg(va, float) << std::endl; ^~~~~~ ./test.cpp:9:15: note: (so you should pass 'double' not 'float' to 'va_arg') ./test.cpp:9:15: note: if this code is reached, the program will abort 

    And indeed, the program crashes with a complaint about an invalid instruction.
    Dump analysis shows that the program received a SIGILL signal. And also shows the structure of va_list . For 32 bits this

     va = 0xfffc6918 "" 

    those. va_list is just char* . For 64 bits:

     va = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0x7ffef147e7e0, reg_save_area = 0x7ffef147e720}} 

    those. exactly what is described in SystemV ABI AMD64.

    clang when compiling warns about undefined behavior and also suggests replacing float with double .

    Clang warning
     ./test.cpp:9:26: warning: second argument to 'va_arg' is of promotable type 'float'; this va_arg has undefined behavior because arguments will be promoted to 'double' [-Wvarargs] std::cout << va_arg(va, float) << std::endl; ^~~~~ 

    But the program does not fall, the 32-bit version gives:

     1 0 1073741824 

    64-bit:

     1 0 3 

    MSVC produces exactly the same results, but without warning, even with /Wall .

    Here one could assume that the difference between 32 and 64 bits is due to the fact that in the first case ABI passes the called function all the arguments through the stack, and in the second the first four (Windows) or six (Linux) arguments through the processor registers, the rest through stack [ wiki ]. But no, if you call foo not with 4 arguments, but from 19, and also output them, the result will be the same: full mash in the 32-bit version, and zeros for all float in the 64-bit. Those. It's a matter of course in ABI, but not in the use of registers for passing arguments.

    Well, correctly, of course, do so
     void foo(int n, ...) { va_list va; va_start(va, n); std::cout << va_arg(va, int) << std::endl; std::cout << va_arg(va, double) << std::endl; std::cout << va_arg(va, int) << std::endl; va_end(va); } 

  4. It is wrong to pass as an unnamed argument an instance of a class with a non-trivial constructor or destructor. If, of course, the fate of this code worries you a little more than “compile and run here and now.”

    Wrong code
     #include <cstdarg> #include <iostream> struct Bar { Bar() { std::cout << "Bar default ctor" << std::endl; } Bar(const Bar&) { std::cout << "Bar copy ctor" << std::endl; } ~Bar() { std::cout << "Bar dtor" << std::endl; } }; struct Cafe { Cafe() { std::cout << "Cafe default ctor" << std::endl; } Cafe(const Cafe&) { std::cout << "Cafe copy ctor" << std::endl; } ~Cafe() { std::cout << "Cafe dtor" << std::endl; } }; void foo(int n, ...) { va_list va; va_start(va, n); std::cout << "Before va_arg" << std::endl; const auto b = va_arg(va, Bar); va_end(va); } int main() { Bar b; Cafe c; foo(1, b, c); return 0; } 

    Stricter of all again clang . He simply refuses to compile this code due to the fact that the second argument va_arg not a POD type, and warns that the program will va_arg when launched.

    Clang warning
     ./test.cpp:23:31: error: second argument to 'va_arg' is of non-POD type 'Bar' [-Wnon-pod-varargs] const auto b = va_arg(va, Bar); ^~~ ./test.cpp:31:12: error: cannot pass object of non-trivial type 'Bar' through variadic function; call will abort at runtime [-Wnon-pod-varargs] foo(1, b, c); ^ 

    So it will be, if you still compile with the -Wno-non-pod-varargs flag.

    MSVC warns that the use in this case of types with non-trivial constructors is intolerable.

    MSVC warning
     d:\my documents\visual studio 2017\projects\test\test\main.cpp(31): warning C4840:    "Bar"          

    But the code is compiled and executed correctly. The following is obtained in the console:

    Startup Result
     Bar default ctor Cafe default ctor Before va_arg Bar copy ctor Bar dtor Cafe dtor Bar dtor 

    Those. a copy is created only at the moment of calling va_arg , and the argument, it turns out, is passed by reference. Somehow not obvious, but the Standard allows.

    gcc 6.3.0 compiles without a single comment. At the output we have the same thing:

    Startup Result
     Bar default ctor Cafe default ctor Before va_arg Bar copy ctor Bar dtor Cafe dtor Bar dtor 

    gcc 7.3.0 doesn't warn about anything either, but the behavior changes:

    Startup Result
     Bar default ctor Cafe default ctor Cafe copy ctor Bar copy ctor Before va_arg Bar copy ctor Bar dtor Bar dtor Cafe dtor Cafe dtor Bar dtor 

    Those. this version of the compiler passes arguments by value, and when you call va_arg it makes another copy. It would be fun to look for this difference when switching from the 6th to the 7th version of gcc, if designers / destructors have side effects.

    By the way, if you explicitly transmit and request a reference to the class:

    Another wrong code
     void foo(int n, ...) { va_list va; va_start(va, n); std::cout << "Before va_arg" << std::endl; const auto& b = va_arg(va, Bar&); va_end(va); } int main() { Bar b; Cafe c; foo(1, std::ref(b), c); return 0; } 

    then all compilers will give an error. As required by the Standard.

    In general, if you really want to, it is better to pass arguments by pointer.

    Like this
     void foo(int n, ...) { va_list va; va_start(va, n); std::cout << "Before va_arg" << std::endl; const auto* b = va_arg(va, Bar*); va_end(va); } int main() { Bar b; Cafe c; foo(1, &b, &c); return 0; } 


Overload Resolution and Variable Functions


On the one hand, everything is simple: a comparison with an ellipsis loses to a comparison with a normal named argument, even in the case of a standard or user-defined type cast.

Overload example
 #include <iostream> void foo(...) { std::cout << "C variadic function" << std::endl; } void foo(int) { std::cout << "Ordinary function" << std::endl; } int main() { foo(1); foo(1ul); foo(); return 0; } 


Startup Result
 $ ./test Ordinary function Ordinary function C variadic function 

But this only works as long as the call to foo without arguments is not considered separately.

Call foo with no arguments
 #include <iostream> void foo(...) { std::cout << "C variadic function" << std::endl; } void foo() { std::cout << "Ordinary function without arguments" << std::endl; } int main() { foo(1); foo(); return 0; } 

Compiler output
 ./test.cpp:16:9: error: call of overloaded 'foo()' is ambiguous foo(); ^ ./test.cpp:3:6: note: candidate: void foo(...) void foo(...) ^~~ ./test.cpp:8:6: note: candidate: void foo() void foo() ^~~ 

Everything is in accordance with the Standard: there are no arguments - there is no comparison with the ellipsis, and when overload is resolved, the variable function becomes no worse than usual.

When is it still worth using variable functions


Well, the variable functions in some places do not behave very obviously and in the context of C ++ can easily turn out to be poorly portable. On the Internet, there are many tips like “Do not create or use optional C functions,” but they are not going to remove their support from the C ++ Standard. So, is there any benefit from these functions? Well, there is.


Variable templates or how to create functions from an arbitrary number of arguments in modern C ++


The idea of ​​variable patterns was proposed by Douglas Gregor, Jaakko Järvi and Gary Powell back in 2004, i.e. 7 years before the adoption of the C ++ 11 standard, in which these variable patterns were officially supported. The Standard entered the third revision of their proposal numbered N2080 .

From the very beginning, variable templates were created to enable programmers to create type-safe (and portable!) Functions from an arbitrary number of arguments. Another goal is to simplify support for class templates with a variable number of parameters, but now we are talking only about variable functions.

Variable templates brought in C ++ three new concepts [C ++ 17 17.5.3] :


Example
 template <class ... Args> void foo(const std::string& format, Args ... args) { printf(format.c_str(), args...); } 

class ... Args — , Args ... args — , args... — .

A complete list of where and how to disclose parameter packages can be found in the Standard itself [C ++ 17 17.5.3 / 4] . And in the context of the discussion of variable functions, suffice it to say that:


In disclosing the explicit ellipsis package is needed to support the various templates ( patterns ) disclosure and to avoid this ambiguity.

for example
 template <class ... Args> void foo() { using OneTuple = std::tuple<std::tuple<Args>...>; using NestTuple = std::tuple<std::tuple<Args...>>; } 

OneTuple — ( std:tuple<std::tuple<int>>, std::tuple<double>> ), NestTuple — , — ( std::tuple<std::tuple<int, double>> ).

Example of implementing printf using variable patterns


As I already mentioned, variable templates were created, including as a direct replacement for the variable functions of C. The authors of these templates themselves offered their own, very simple, but type-safe version printf- one of the first variable functions in C.

printf on templates
 void printf(const char* s) { while (*s) { if (*s == '%' && *++s != '%') throw std::runtime_error("invalid format string: missing arguments"); std::cout << *s++; } } template <typename T, typename ... Args> void printf(const char* s, T value, Args ... args) { while (*s) { if (*s == '%' && *++s != '%') { std::cout << value; return printf(++s, args...); } std::cout << *s++; } throw std::runtime_error("extra arguments provided to printf"); } 

I suspect, then this pattern of iterating variable arguments appeared - through a recursive call to overloaded functions. But I still like the non-recursion option.

printf on templates and without recursion
 template <typename ... Args> void printf(const std::string& fmt, const Args& ... args) { size_t fmtIndex = 0; size_t placeHolders = 0; auto printFmt = [&fmt, &fmtIndex, &placeHolders]() { for (; fmtIndex < fmt.size(); ++fmtIndex) { if (fmt[fmtIndex] != '%') std::cout << fmt[fmtIndex]; else if (++fmtIndex < fmt.size()) { if (fmt[fmtIndex] == '%') std::cout << '%'; else { ++fmtIndex; ++placeHolders; break; } } } }; ((printFmt(), std::cout << args), ..., (printFmt())); if (placeHolders < sizeof...(args)) throw std::runtime_error("extra arguments provided to printf"); if (placeHolders > sizeof...(args)) throw std::runtime_error("invalid format string: missing arguments"); } 

Overload Resolution and Variable Template Functions


When resolved, these variable functions are considered after the others - as template and as least specialized. But there are no problems in the case of a call without arguments.

Overload example
 #include <iostream> void foo(int) { std::cout << "Ordinary function" << std::endl; } void foo() { std::cout << "Ordinary function without arguments" << std::endl; } template <class T> void foo(T) { std::cout << "Template function" << std::endl; } template <class ... Args> void foo(Args ...) { std::cout << "Template variadic function" << std::endl; } int main() { foo(1); foo(); foo(2.0); foo(1, 2); return 0; } 

Startup Result
 $ ./test Ordinary function Ordinary function without arguments Template function Template variadic function 

When overload is enabled, the variable template function can bypass only the variable C function (although why mix them?). Except - of course! - call without arguments.

Call without arguments
 #include <iostream> void foo(...) { std::cout << "C variadic function" << std::endl; } template <class ... Args> void foo(Args ...) { std::cout << "Template variadic function" << std::endl; } int main() { foo(1); foo(); return 0; } 

Startup Result
 $ ./test Template variadic function C variadic function 

There is a comparison with an ellipsis - the corresponding function loses, there is no comparison with an ellipsis - and the template function is inferior to the non-sample.

A quick note on the speed of variable template functions.


In 2008, LoĂŻc Joly submitted to the C ++ Standardization Committee his proposal N2772 , in which he showed in practice that variable template functions work slower than similar functions, the argument of which is the initialization list ( std::initializer_list). And although this contradicted the theoretical rationale of the author himself, Joly proposed to implement it std::min, std::maxand std::minmaxit was with the help of initialization lists, and not variable templates.

But already in 2009, a denial appeared. In the tests of Joly, a “serious error” was discovered (it seems, even by him). New tests (see here and here) showed that variable template functions are still faster, and sometimes significantly. What is not surprising, since The initialization list makes copies of its elements, and for variable templates you can count a lot more at the compilation stage.

However, in C ++ 11 and subsequent standards std::min, std::maxand std::minmaxare the usual template functions, an arbitrary number of arguments are passed through the initialization list.

Short summary and conclusion


So, variable functions in the style of C:


The only allowable use of variable functions is interaction with the C API in C ++ code. For everything else, including SFINAE , there are variable template functions that:


Variable template functions can be more verbose than their C-like counterparts and sometimes even require their overloaded non-template version (recursive argument traversal). They are harder to read and write. But all this is more than compensated by the absence of the listed disadvantages and the presence of the listed advantages.

Well, the conclusion is simple: the variable functions in the C style remain in C ++ only because of backward compatibility, and they offer a wide choice of possibilities to shoot yourself a leg. In modern C ++, it is highly desirable not to write new ones and, if possible, not to use already existing variable C functions. Variable template functions belong to the world of modern C ++ and are much more secure. Use them.

Literature and sources



PS


It is easy to find and download electronic versions of the books mentioned on the net. But I am not sure that it will be legal, therefore I do not give references.

Source: https://habr.com/ru/post/430064/


All Articles