The interior of the boolinq for adults

This article is for those who are interested in the implementation of the boolinq library from my previous post . In this article I’ll dig into the source and show some interesting techniques that made the library lazy and extensible.

What is wrong with iterators?

Before we look at the library code, I would like to draw your attention to STL. This is a standard library of templates, contains containers, algorithms, etc. One of the most interesting elements of the library is the entity - iterator . Iterators were designed to be a layer between containers and algorithms. So that any algorithm can be applied to any container (well, almost any).
')
At first glance, everything is fine. Algorithms and containers are connected through an intermediate entity - an iterator. But this is only at first glance, if you look at the code:

int sum = 0; for (std::vector<int>::iterator it = candles.begin(); it != candles.end(); it++) sum += it->ClosePrice;

There is one drawback of STL-iterators, which is not noticeable at first glance, Java iterators are deprived of this minus.

 int sum = 0; Iterator it = al.iterator(); while (it.hasNext()) { it = it.next(); sum += it.ClosePrice; }

Yes, these are .begin() and .end() - these are two parts of the same entity. If these two parts were taken and combined into one entity ... It is said - done (the idea was borrowed from Alexandrescu from the “Iterators Must Go” presentation):

 template<typename TIter> class IterRange { public: typedef typename std::iterator_traits<TIter>::value_type value_type; IterRange(TIter b, TIter e) : b(b), e(e) { } bool empty() { return (b == e); } value_type popFront() { assert(!empty()); return *(b++); } value_type popBack() { assert(!empty()); return *(--e); } value_type front() { assert(!empty()); return *b; } value_type back() { assert(!empty()); TIter tmp = e; return *(--tmp); } private: TIter b; TIter e; };

Thus, we have one entity, we can ask it if there are more elements in the sequence, request an element and request to proceed to the next element. In fact, the back() and popBack() methods also do not interfere.

Well, then it is not difficult to guess how all the classes of the library look like - these are wrappers over such a Range. For example, WhereRange looks like this:

 template<typename R, typename F> class WhereRange { public: typedef typename R::value_type value_type; WhereRange(R r, F f) : r(r), f(f) , frontReady(false) , backReady(false) { } bool empty() { if (!frontReady) seekFront(); return r.empty(); } value_type popFront() { if (!frontReady) seekFront(); auto tmp = *this; r.popFront(); frontReady = false; return tmp.front(); } value_type popBack() { //  } value_type front() { if (!frontReady) seekFront(); return r.front(); } value_type back() { //  } private: void seekFront() { while(!r.empty() && !f(r.front())) r.popFront(); frontReady = true; } void seekBack() { //  } private: R r; F f; bool frontReady; bool backReady; };

In a nutshell: the class accepts a different Range in the constructor, from which it has to take elements, and a predicate that determines the belonging of each element to the resulting sample. There are 2 variables that determine whether the value "from the beginning" and "from the end" of the Range is found. The seekFront () and seekBack () functions directly search for the next front and next back.

Actually all the other Range-and look similar. The next problem was to reduce the use to the point notation ...

I want a dot notation

On the one hand, I wanted to make use of methods as they are used in .NET LINQ, but after all, C ++ does not have Extension Methods as in C #:

 int max = arr.Where(...).OrderBy(...).Select(...).Max();

On the other hand, I wanted to make the library extensible ... and this thought came)) The Linq class will be wrapped at the top of all Range, when calling the transforming sequence of methods, the Linq class nested in Linq will turn around, and the Linq class will remain on top of all.

I made each Range class for the following mixin class:

 template<template<typename> class TLinq, typename R> class WhereRange_mixin { public: template<typename F> TLinq<WhereRange<R,F> > where(F f) const { return boolinq::where(((TLinq<R>*)this)->r,f); } };

Then, I inherited the Linq class from all the necessary Mixins:

 template<typename R> class Linq : public SkipRange_mixin<Linq,R> , public TakeRange_mixin<Linq,R> , public WhereRange_mixin<Linq,R> , public SelectRange_mixin<Linq,R> , public ReverseRange_mixin<Linq,R> , public OrderByRange_mixin<Linq,R> // ...   ... { public: typedef typename R::value_type value_type; Linq(const R & r) : r(r) { } bool empty() { return r.empty(); } value_type popFront() { return r.popFront(); } value_type popBack() { return r.popBack(); } value_type front() { return r.front(); } value_type back() { return r.back(); } public: R r; };

Reversing the order of elements

I wanted to dwell separately on this feature. She is remarkable because I really wanted to nail the double reverse of the sequence. And not to beat a compilation error, but in an amicable way. Class code is pretty simple:

 template<typename R> class ReverseRange { public: typedef typename R::value_type value_type; ReverseRange(R r) : r(r) { } bool empty() { return r.empty(); } value_type popFront() { return r.popBack(); } value_type popBack() { return r.popFront(); } value_type front() { return r.back(); } value_type back() { return r.front(); } template<typename R2> friend R2 reverse(ReverseRange<R2> r); // smart needed private: R r; };

Everything in this code is exactly as you expected: the methods working with front and back are changed to the opposite, but a friendly function is lost among them. And it is Friendly, then, in order to crawl to a private field - a wrapped Range, this is the actual code for this function:

 template<typename R> ReverseRange<R> reverse(R r) { return r; } // Unwrap for double-reverse case template<typename R> R reverse(ReverseRange<R> r) { return rr; // smart }

Yes! Function is not one - there are two of them. The first one works as it should - wraps the Range with our ReaverseRange (here an implicit constructor call occurs). The second one, on the contrary, unfolds ReverseRange. It is important that this happens at the compilation level, and not at the execution stage. But this is not the most difficult thing - hell started when I tried to portray it in Mixin:

 template<template<typename> class TLinq, typename R> class ReverseRange_mixin { public: TLinq<ReverseRange<R> > reverse() const { return boolinq::reverse(((TLinq<R>*)this)->r); } }; // Unwrap for double-reverse case template<template<typename> class TLinq, typename T> class ReverseRange_mixin<TLinq,ReverseRange<T> > { public: TLinq<T> reverse() const { return boolinq::reverse(((TLinq<ReverseRange<T> >*)this)->r); } };

Again, the first Mixin does not do anything unusual, but the second at the compilation stage reveals the type Linq<ReverseRange<XxxRange<...>>> and expands it to Linq<XxxRange<...>> . Brain broke while got compiled code.

How can a user expand a library?

The idea was as follows, let him create his magic Range-class, then create a Mixin-class in the image and likeness of other Mixin-s. And after that, it creates its own class CustomLinq and uses it when creating the initial sequence (it’s impossible to inherit from Linq, because its Mixins will wrap everything not in CustomLinq, but in Linq):

 boolinq::from<CustomLinq>(arr)

instead:

 boolinq::from(arr)

Well, the user can do without all this and do not use the dot notation at all. After all, you can write code like this:

 using namespace boolinq; int sum = sum(select(where(from(arr), [](...){...}), [](...){...}));

Performance tests

I will carry out the test specifically for those interested: we will assume the variance of the random variable. First, we find the average value of all odd elements of the vector, then we calculate the standard deviation of odd elements from the average.

1. Generate 100 million pseudo-random elements:

 srand(0xDEADBEEF); std::vector<int> vec(100000000, 0); for (unsigned i = 0; i < vec.size(); i++) vec[i] = rand();

2. Write an algorithm in C ++

 //      double sum = 0; int count = 0; for (unsigned i = 0; i < vec.size(); i++) { if (vec[i] % 2 == 1) { sum += vec[i]; count++; } } double avgValue = sum / count; //       avgValue double disperSum = 0; for (unsigned i = 0; i < vec.size(); i++) if (vec[i] % 2 == 1) disperSum += (vec[i] - avgValue)*(vec[i] - avgValue); double disper = disperSum / count;

3. Let's write the algorithm on boolinq

 //      double avgValue = from(vec).where( [](int a){return a%2 == 1;}) .cast<double>() .avg(); //       avgValue double disper = from(vec).where( [](int a){return a%2 == 1;}) .select([=](int a){return (a-avgValue)*(a-avgValue);}) .cast<double>() .avg();

Do not see that C ++ code does not use iterators. I wrote the code with iterators, but let me not upload it - it is absolutely similar. Now let's compile in the release in MS Visual C ++ 2010 and run it on my machine ...

C ++ Code	1207 ms
C ++ code with iterators	1224 ms
Boolinq code	1564 ms

Boolinq, of course, loses a little (by a third) - but again it depends on the tasks. In theory, you need to test all methods separately. By the way, .NET LINQ loses analogue on cycles much, much stronger.

In the near future, it is planned to add the .begin() and .end() methods to the Linq class for backward compatibility with STL.

Source: https://habr.com/ru/post/142657/

All Articles