📜 ⬆️ ⬇️

Passing smart pointers over a constant link. Autopsy

Smart pointers are often passed to other functions via a constant link. C ++ experts, Andrei Alexandrescu, Scott Meyers and Sutter's Coat of Arms, discuss this issue at the C ++ and Beyond 2011 conference (Watch from [04:34] On shared_ptr performance and correctness).

In fact, a smart pointer, which is transmitted via a constant link, already lives in the current scope, somewhere in the calling code. If it is stored in a member of a class, then it may happen that this member is zeroed out. But this is not a problem of passing by reference, it is a problem of architecture and ownership policy.

But this post is not about correctness. Here we look at the performance that we can get when switching to constant links. At first glance it may seem that the only benefit is the absence of atomic increments / decrements of the reference counter when calling the copy constructor and destructor. Let's write some code and take a closer look at what happens under the hood.
')


Translation of the article: blog.linderdaum.com/2014/07/03/smart-pointers-passed-by-const-reference


For starters, a few auxiliary functions:

const size_t NUM_CALLS = 10000000; double GetSeconds() { return ( double )clock() / CLOCKS_PER_SEC; } void PrintElapsedTime( double ElapsedTime ) { printf( "%fs/Mcalls\n", float( ElapsedTime / double( NUM_CALLS / 10000000 ) ) ); } 


Intrusive Link Counter:

 class iIntrusiveCounter { public: iIntrusiveCounter():FRefCounter(0) {}; virtual ~iIntrusiveCounter() {} void IncRefCount() { FRefCounter++; } void DecRefCount() { if ( --FRefCounter == 0 ) { delete this; } } private: std::atomic<int> FRefCounter; }; 


Ad hoc smart pointer:

 template <class T> class clPtr { public: clPtr(): FObject( 0 ) {} clPtr( const clPtr& Ptr ): FObject( Ptr.FObject ) { FObject->IncRefCount(); } clPtr( T* const Object ): FObject( Object ) { FObject->IncRefCount(); } ~clPtr() { FObject->DecRefCount(); } clPtr& operator = ( const clPtr& Ptr ) { T* Temp = FObject; FObject = Ptr.FObject; Ptr.FObject->IncRefCount(); Temp->DecRefCount(); return *this; } inline T* operator -> () const { return FObject; } private: T* FObject; }; 


So far, so simple enough, right?
Let's declare a simple class, an instance of which we will pass to the function first by value and then by constant link:

 class clTestObject: public iIntrusiveCounter { public: clTestObject():FPayload(32167) {} //  -  void Do() { FPayload++; } private: int FPayload; }; 


Now you can write the benchmark code directly:

 void ProcessByValue( clPtr<clTestObject> O ) { O->Do(); } void ProcessByConstRef( const clPtr<clTestObject>& O ) { O->Do(); } int main() { clPtr<clTestObject> Obj = new clTestObject; for ( size_t j = 0; j != 3; j++ ) { double StartTime = GetSeconds(); for ( size_t i = 0; i != NUM_CALLS; i++ ) { ProcessByValue( Obj ); } PrintElapsedTime( GetSeconds() - StartTime ); } for ( size_t j = 0; j != 3; j++ ) { double StartTime = GetSeconds(); for ( size_t i = 0; i != NUM_CALLS; i++ ) { ProcessByConstRef( Obj ); } PrintElapsedTime( GetSeconds() - StartTime ); } return 0; } 


Collect and see what happens. First, we will build a non-optimized version (I use gcc.EXE (GCC) 4.10.0 20140420 (experimental) ):

 gcc -O0 main.cpp -lstdc++ -std=c++11 


The speed of work is 0.375 s / m calls for the “by value” version versus 0.124 s / M calls for the “constant-link” version. Convincing difference of 3x in the debug build. It's good. Let's look at the assembler listing. Version "by value":

 L25: leal -60(%ebp), %eax leal -64(%ebp), %edx movl %edx, (%esp) movl %eax, %ecx call __ZN5clPtrI12clTestObjectEC1ERKS1_ //    subl $4, %esp leal -60(%ebp), %eax movl %eax, (%esp) call __Z14ProcessByValue5clPtrI12clTestObjectE leal -60(%ebp), %eax movl %eax, %ecx call __ZN5clPtrI12clTestObjectED1Ev //   addl $1, -32(%ebp) L24: cmpl $10000000, -32(%ebp) jne L25 


Version of the "constant link". Pay attention to how much everything has become cleaner even in the debug build:

 L29: leal -64(%ebp), %eax movl %eax, (%esp) call __Z17ProcessByConstRefRK5clPtrI12clTestObjectE //    addl $1, -40(%ebp) L28: cmpl $10000000, -40(%ebp) jne L29 


All the challenges are in place and all that was saved was two rather expensive atomic operations. But debug builds are not what we need, right? Let's optimize everything and see what happens:

 gcc -O3 main.cpp -lstdc++ -std=c++11 


The “by value” version is now executed in 0.168 seconds for 1 million calls. The execution time of the constant-link version has literally dropped to zero. It's not a mistake. No matter how many iterations we do, the execution time of this simple test will be zero. Let's look at the assembler to see if we have made a mistake somewhere. Here is an optimized version of the transfer by value:

 L25: call _clock movl %eax, 36(%esp) fildl 36(%esp) movl $10000000, 36(%esp) fdivs LC0 fstpl 24(%esp) .p2align 4,,10 L24: movl 32(%esp), %eax lock addl $1, (%eax) //  IncRefCount()... movl 40(%esp), %ecx addl $1, 8(%ecx) // ProcessByValue()  Do()   2  lock subl $1, (%eax) //   DecRefCount(). . jne L23 movl (%ecx), %eax call *4(%eax) L23: subl $1, 36(%esp) jne L24 call _clock 


Well, but what else can you do when passing by reference that it will work so fast that we cannot measure it? Here she is:

  call _clock movl %eax, 36(%esp) movl 40(%esp), %eax addl $10000000, 8(%eax) //   ,  ,  call _clock movl %eax, 32(%esp) movl $20, 4(%esp) fildl 32(%esp) movl $LC2, (%esp) movl $1, 48(%esp) flds LC0 fdivr %st, %st(1) fildl 36(%esp) fdivp %st, %st(1) fsubrp %st, %st(1) fstpl 8(%esp) call _printf 


Wow! In this listing fit the whole benchmark. The absence of atomic operations allowed the optimizer to get into this code and expand the loop into one precomputed value. Of course, this example is trivial. However, it allows you to clearly talk about the 2 benefits of transmitting smart pointers via a constant link, which makes it not a premature optimization, but a serious means of improving performance:

1) the removal of atomic operations gives great benefit in itself
2) removing atomic operations allows the optimizer to brush the code

The full source is here .

On your compiler, the result may differ :)

PS Gerb Sutter has a very detailed essay on this topic, which in great detail affects the language side of passing smart pointers by reference in C ++.

Source: https://habr.com/ru/post/228687/


All Articles