An unpleasant std :: list feature that not everyone knows

A doubly linked list is a fundamental data structure that everyone knows and uses everywhere. Everyone knows why and in which cases it is more efficient than a vector, which operations have linear complexity, and which ones are constant ...

Wait a minute, do you know what the complexity of the size () function is?
“Of course I know - O (1)!”, Many of you will answer, “What could be simpler?”

size_type size() const { return _size; }

')
Trivial, effective and safe, right?
I would implement this function in this way, most of you would do the same.

However, those who wrote the GNU STL implementation have a different opinion on this.

Here is the implementation of this function in gcc 4.x:

 /** Returns the number of elements in the %list. */ size_type size() const { return std::distance(begin(), end()); }

If you don't believe it, go check on your distribution.

What do we see here? To perform such a trivial operation as getting the size, our list will perform a full pass with counting!

This means that if you jerk this function in a loop, passing through the list, then the complexity of your algorithm is automatically multiplied by N.

 for (std::list <int>::const_iterator I = my_list.begin (); I != my_list.end (); ++I) { if (*I > my_list.size ()) { std::cout << "Haha! "<< *I << std::endl; } }

Instead of the seemingly obvious, O (N), we get here O (N ^ 2). Well, if the list of 100 items. And what if 1,000,000?

This also means that the size () call is unsafe in a multi-threaded environment.

 std::list <int> my_list; // Thread 1 void thread1_proc () { while (!stop) { std::cout << "List size: " << my_list.size () << std::endl; } } // Thread 2 void thread2_proc () { while (!stop) { int n = rand () my_list.push_back (n); if (n % 2) my_list.pop_front (); } }

We have a serious risk of getting crash in the first thread.

It also means that

To compare two lists, we always have to do a full pass, instead of a trivial size comparison, to cut off 90% of cases.
Less efficient resize. Here we have two cases
1. Reduced size. If the size of the list is known to us (read is available for O (1)), we can decide whether we should start from the beginning or the end depending on whether we delete several elements at the end or, on the contrary, leave a few at the beginning. This is not possible in the GNU implementation.
2. Increase size. If the size of the list is known to us, we simply need to add the missing elements. In the GNU implementation, we will always have to do a full pass through the list.
Probably something else who knows, tell me.

So what made the GNU programmers implement the list like this?

It turns out that the absence of the need to maintain the _size internal variable allows splice to be implemented in O (1), otherwise it would be O (N) for the worst case.

splice is the operation of transferring consecutive elements from one list to another. Without having to count the new sizes of lists, it is enough for us to “switch” pointers from one node to another, i.e. for constant time.

Having the _size internal variable, we would have to calculate how many elements we transfer and update it in both lists accordingly.

This is the reason. By the way, how often do you use this operation? And all of the above?

In general, be careful. And keep in mind that in other implementations of STL, the _size variable is and size () works accordingly for O (1). So be careful if you build your project with different implementations of STL on different platforms.

I bow to this. Sorry for the botanical post on Friday.

UPD
The new C ++ 11 standard requires that size for all STL containers work in O (1), which is very good.
However, an attempt to make a change in the implementation of GNU STL has so far failed due to binary compatibility issues. Read more here gcc.gnu.org/bugzilla/show_bug.cgi?id=49561

Source: https://habr.com/ru/post/149384/

All Articles

An unpleasant std :: list feature that not everyone knows

This means that if you jerk this function in a loop, passing through the list, then the complexity of your algorithm is automatically multiplied by N.

This also means that the size () call is unsafe in a multi-threaded environment.

It also means that

So what made the GNU programmers implement the list like this?

More articles: