Introduction
After a long break, I had to go back to C ++ programming. After C #, the var keyword and the linq query building capabilities were missing. However, as it turned out, progress does not stand still, and during my absence, a new version of C ++ 11 was released, which has new interesting features, and is also implemented in most compilers. I was involved in a cross-platform project and I was interested in the GCC compilers for Linux, Visual Studio and mingw for the Windows world. Attempting to find a linq-like library did not succeed, all I found was an unviable handicraft on my knee. Having resigned myself, I gave up the search, however, in April 2012, there was an encouraging
LINQ to Objects article
in C ++ that described a library that suited me. Having tried it in business and having understood its device, I was disappointed by inefficiency, but I picked up some ideas. There was only one thing left to write the same, only with blackjack, which I did at
github.com/drbasic/CppLinq , at the same time having understood the automatic output of the type (auto) and lambda expressions.
The library was designed so that using the fluent syntax and lambda expressions, the user could build a transformation graph. These columns can be copied, completed, merged, i.e. implement the behavior as close as possible to the pre-image of Linq to Objects from the world of C #. The library functional, without thinking twice, I borrowed from C #, adding an explicit left join and full join. An important limitation of the library is to move along the transformation graph not of copies, but of pointers to the elements of the original sequence. This allows you to effectively deal with complex elements of collections, because now there is no overhead for copying, but the original sequence because of this should not be "virtual". Those. By the beginning of work, each element of the initial sequence must have a unique address and elements must not be moved in memory during the operation of linq transforms. In general, arrays, Qt containers, all standard containers except std :: bitset are suitable for this. Difficulties arose only with constant sequences that were never completed, as I did not really need them. The library was tested and successfully compiled by Visual Studio 2010 and 2012, gcc 4.8, mingw 4.8. The easiest way to cope was with the Microsoft compiler, to make happy gcc was much more difficult, and all the compilers with an internal error happened, sometimes even without intelligible screams.
To battle
So, for an example of using CppLinq, take a sequence of 10 elements, select those with an iVal> 5 field, sort by the sVal field, and place the result in std :: vector:
TestClass1 data[10]; auto t1 = Linq::from(data) .where([](const TestClass1 &a){return a.iVal >= 5; }) .orderBy([](const TestClass1 &a){return a.sVal; }) .toVector();
')
In this case, an adapter is first created for the source data from, then a graph is constructed from the where filtering operator, the orderBy sort operator. At this point, no action on the data has yet occurred, but toVector finally starts the processing pipeline, which will work according to the following algorithm. toVector will begin to query the last filter in the processing chain for all elements in turn and build std :: vector from them. The last filter in the chain is orderBy, but since it needs all the elements at once for proper ordering, it will query all its pointers to the elements of the sequence from its overlaying where filter, put them in the buffer and sort them. The where filter has the ability to select the necessary elements “on the fly”, so for each request of a new element of the sequence below, it requests elements from its overlying filter from, until it finds a condition that satisfies the condition. The adapter of the sequence from for each request from below returns the address of the next element, and when the elements end with a failure.
Graphs can be built dynamically, for example, by adding the necessary filters to the chain at runtime:
auto src = Linq::from(data) .where([](const TestClass1 &a){return a.iVal >= 5; }); if (order == ascending) src = src.orderBy([](const TestClass1 &a){return a.sVal; }); else src = src.orderByDesc([](const TestClass1 &a){return a.sVal; }); auto result = src.toVector();
When forming a graph, optimizations are made by combining several consecutive sorts into one sort operation, for this purpose all the function of comparing elements is combined together. For example, for the case below, despite the task of the two sequence sorting filters, only one sort will be performed.
auto t4 = Linq::from(src1) .orderByDesc([](const TestClass1 &a){return a.sVal; }) .thenBy([](const TestClass1 &a){return a.iVal; }) .toVector();
For purely academic purposes, optimization is implemented, when two consecutive reversals of the sequence annihilate and do not form a filter in the chain. All filters are not greedy, if possible. if it is possible not to request extra elements from the upper sequence, then the work will be carried out "from the wheels". But if the data were received, and they will certainly be needed again, then the pointers are necessarily stored in their own cache in order to minimize access to the upper filter, since the cost of obtaining an element from the upper filter is unknown. The balance is shifted to the use of excess memory, rather than additional passes through the filter chain for the next element. In addition, pointers are stored in buffers, not the elements themselves, which should not lead to a large memory consumption with adequate sequence sizes.
When implementing join, there was a dilemma how to implement a comparison of sequence elements for equality, through operator ==, or equivalence, through operator <. Both options have pros and cons. In the case of a smaller operation, it would be possible to sort both input sequences and merge more efficiently, since there is no need to compare each element of the first sequence with each element of the second sequence. However, in Linq to Objects, it is customary to implement a comparison exactly for equality through operator ==, so I also did this by getting the algorithmic complexity O (n * m).
How it works.
There is an interface class Linq, which describes all the methods of transformations. Methods can be divided into two groups, those that lead to the further construction of the graph, they return Linq <?> For example, where:
template <typename U> Linq<T> where(U f);
And "terminal", which start the execution of the graph, for example:
size_t count(); std::vector<T> toVector();
The transformation graph is always given by value, but this is not a problem, as with the use of the semantics of the move, in most cases it was possible to do without full copying. For example, in the first example, no deep copying occurs.
Interestingly made additional ordering operations thenBy and thenByDesc. Since they should be available only if the previous statement is orderBy or orderByDesc, then a Linq descendant is made for this, the LinqOrd class in which these operations are made, and the orderBy or orderByDesc methods are declared as returning LinqOrd:
LinqOrd<T> orderBy(); LinqOrd<T> orderByDesc();
Thus, after the orderBy operation, an object of type LinqOrd is returned, in which additional ordering operations already exist.
Inside the Linq class there is a single data member, a pointer to the Range interface class. This is the gear wheel of the transformation graph that performs all the work.
template<typename T> class Range { public: Range(){} virtual ~Range(){} virtual bool empty() = 0; virtual T& popFront() = 0; virtual T& front() = 0; virtual void rewind() = 0; virtual Range* clone() = 0; };
The interface is quite simple; it is, by and large, a unidirectional iterator. The empty () method returns whether there are more elements. The popFront () method retrieves the next element of the sequence, and front () returns the current element. The rewind () method moves the pointer to the beginning of a sequence. The clone () method creates a new object and makes a deep copy of the entire internal chain of filters.
Conversion operations implement the Range interface. For example, here’s the transformation code for the where operation: WhereRange <T, F>. Where T is the type of the element, and F is the class of the object used for filtering. In this case, F must have a method that accepts T or a reference to T and returns true if the element matches the filter conditions.
template<typename T, typename F> class WhereRange : public Range<T> { public: WhereRange(Range<T> *src, F f) : src_(src) , f_(f) , frontReady(false) { } ~WhereRange() { delete src_; }; bool empty() override { if (!frontReady) seekFront(); return src_->empty(); } T& popFront() override { if (!frontReady) seekFront(); auto &result = src_->front(); frontReady = false; src_->popFront(); seekFront(); return result; } T& front() override { return src_->front(); } void rewind() override { frontReady = false; src_->rewind(); } Range<T>* clone() override { auto result = new WhereRange(CloneRange(src_), f_); return result; } private: Range<T> *src_; F f_; bool frontReady; void seekFront() { while(!src_->empty() && !f_(src_->front())) src_->popFront(); frontReady = true; } };
To entrust even more work to the compiler, in lambda expressions, you can display the type of an element automatically, using the decltype keyword. You can rewrite the first example as follows:
TestClass1 data[10]; auto src = Linq::from(data); auto result = src .where([](decltype(src.const_reference()) a){return a.iVal >= 5; }) .orderBy([](decltype(src.const_reference()) a){return a.sVal; }) .toVector();
As you can see, nowhere does the type of the element TestClass1 appear explicitly. Sometimes it is convenient, and sometimes it is difficult to figure out what type we are dealing with.
In the sequence merging example below, intermediate types are also output by the compiler:
auto data1 = testData1(); auto data2 = testData2(); auto src1 = Linq::from(data1); auto src2 = Linq::from(data2); auto result = src1 .join( src2, [](decltype(src1.const_reference()) a){ return a.iVal; }, [](decltype(src2.const_reference()) b){ return b.iVal2; }, [] (decltype(src1.const_reference()) a, decltype(src2.const_reference()) b) { return std::make_pair(b.dVal, a.sVal); } ) .toVector(); }
Here, the first and second lambdas return the keys by which sequences will be combined, and the third lambda returns the result formed from the matching elements of both sequences. Then all received elements are saved in std :: vector.
PS gcc 4.7 does not digest the override keyword, so for this you can set #define override