📜 ⬆️ ⬇️

Performance shared_ptr and C ++ 11: why I don’t believe libraries

Hello!

I once optimized a critical section of code, and there was boost :: shared_ptr ... And I understood: I don’t believe the libraries, even though they are clever guys.

Details under the cut.
')
So, I optimized the code, and there was such a site:
auto pRes = boost :: static_pointer_cast < TBase > ( boost :: allocate_shared < TDerived > ( TAllocator ( ) ) ) ;
// ... Doing something with pRes
return std :: move ( pRes )

The optimization came to an end, so the release was compiled, and I decided to look in the disassembler, and then my favorite studio compiled it, expecting to see something beautiful and fast. Just what I saw shocked me:
; -------------------------------------------------- -------------------------------------------
; Line 76: auto pRes = boost :: static_pointer_cast <CBase> (boost :: make_shared <CDerived> ());

; ... nothing interesting - prepare the parameters
call boost :: make_shared <CDerived> ( 0D211D0h )
; ... again, nothing interesting - prepare the parameters
call boost :: static_pointer_cast <CBase , CDerived> ( 0D212F0h )
; ... nothing interesting again - accepting the result of a call

; similar to the if (pRes) check, in fact it doesn't matter. It is important that je is NOT FULFILLABLE
test eax , eax
je `anonymous namespace` :: f + 7Ah ( 0D210CAh ) ; -> do not jump anywhere, we have pRes! = 0
; ... nothing interesting

; Epic fail # 1 - Interlocked Cmp Exchange
; This block actually deletes the temporary shared_ptr created as a result of
; call make_shared: the reference count is reduced here and then the conditional jump is made,
; the transition is performed if the reference count is not zero (which is obviously our option,
; because we are creating a pointer).
lock xadd dword ptr [ eax ] , ecx
jne `anonymous namespace` :: f + 7Ah ( 0D210CAh ) ; -> jump to the next line in c ++ code

; ... there is still a potential pointer removal, but this is a dead code

; -------------------------------------------------- -------------------------------------------
; Line 78: return std :: move (pRes);

; Assembler, I'm probably tired.
; In this block, Epic Fail # 2 is first called - Interlocked Increment, because we copy
; pRes to return the value. Then Epic Fail # 3 - Interlocked Cmp Exchange as result
; deleting the pRes pointer (memory release, of course, does not occur)

I will add that I kept silent about 3 more interlocked instructions inside the make_shared and static_pointer_cast calls ... I looked at it and it became bad for me in front of my eyes. This is what happens? I here specifically move the designers to call, and they give me the reference count back and forth?

* Lyrical digression: why it is so bad.
I think everyone knows that the thing called smart pointer shared_ptr has a pointer to the number of shared pointer-s that refer to the same stored object. When we copy shared_ptr, this very amount increases, and when we destroy it, it decreases. During the destruction of the last shared pointer, the number of links becomes zero and with it the stored object is also deleted. So, to make it all work fine in a multi-threaded environment, you need to change the number of links with atomic operations, the very ones with the assembler lock prefix: this prefix ensures that the processor will do everything exactly as it should, and no caches will prevent us from living. The prefix is ​​good, only slow, very slow. He slows down the team by about 2 orders of magnitude, since requires resetting the cache line, which means it should be used as little as possible.

* Lyrical digression 2: how it happened and why there should be no atomic instructions.
C ++ 11 gave us a very tasty piece called move semantics. Now you can define “moving” constructors that move data from one object to another, instead of creating a copy of it. Such a constructor, for example, moves a pointer to an internal string buffer from one std :: string to another, allowing you to move a string from one object to another without re-allocating memory. Similarly, you can (and should!) Move the reference count from one shared_ptr to another. Indeed, in this case, we do not need any atomic operations, because we do not change the number of pointers. We just “transfer” all internal data from one to another (and the pointer from which we took the data no longer indicates anywhere).

So how did it happen ... Probably overlooked. I wanted to write a tearful letter to boost, I even started to do it ... But then I found what struck me completely. During the creation of boost :: shared_ptr, the get_deleter function calls type comparison via typeid (oh gods!). I don’t know how they have it, but my compiler does it through strcmp (sad, isn't it?).

Then I decided to measure the speed of the standard library versus boost. 2 times! boost :: make_shared is slower std :: make_shared 2 times! Why, you ask? It's simple, boost allocates memory for 2 objects - the reference counter and the actual stored object. But the standard library - just under one, this object contains both. And the allocation of memory - it is slow. Oral plus went to Microsoft, another one got there for the fact that smart pointers work as it should in the standard library - move the designer does not do any atomic operations. Creating a pointer takes place in lock free mode ... Well, almost. static_pointer_cast, after all, they didn’t master: it copies the pointer in spite of what it could move. This problem was solved by “finishing” the library. not portable to another platform dopilivaniem, but the relevant standard, you can download it here: pastebin.com/XZaE2cnW - works in MSVC2010.

PS

So, our today's winner is std from MSVC2010: it has one plus in total
But the boost was out of luck: -1

Well, I say goodbye, I hope at least someone this information was useful. Use std :: shared_ptr, allocate memory via make / allocate shared and be happy :)

Source: https://habr.com/ru/post/138658/


All Articles