📜 ⬆️ ⬇️

Improving LINQ to work with IReadOnly-collections

As you know, when using the IEnumerable <> interface, where the collection is implied, problems can occur (see, for example, Problems using IEnumerable and LINQ against LSP ). Fortunately, in .NET v4.5 in 2012 (a little late, but better late than never), the interfaces IReadOnlyCollection <> , IReadOnlyList <> , IReadOnlyDictionary <> appeared (hereinafter I will generally call them IReadOnly- interfaces). Unlike IEnumerable <> , IReadOnly interfaces allow you to sufficiently designate the collection functionality without any unnecessary requirements, which allows them to be recommended for use instead of IEnumerable <> everywhere where reading of the collection is implied. But there is one difficulty. One of the important components that consume and create collections is LINQ, and especially its part “LINQ to objects”. Unfortunately, IReadOnly interfaces appeared 5 years after LINQ, and are not used in it. All input and output collections of LINQ operations have the basic IEnumerable <> type, based on the limited capabilities of which, many operations imply unnecessary costs: complete sequential enumeration or even creating intermediate copies of the input collections. Moreover, by returning the same IEnumerable <> from operations, LINQ requires that further use of the result again uses a full search and the creation of intermediate copies. In this regard, I have long had the thought to “make friends” LINQ with IReadOnly interfaces.

Found on the Internet development on this topic did not satisfy me. For example, the Linq to Collections library offers an effective replacement for only some LINQ operations for the IReadOnlyList <> interface only , is poorly optimized, and for some reason it forcibly adds a number of ambiguous extension methods for the basic types. In another found project on this topic, CountableSharp , a small number of optimized LINQ operations are offered only for the IReadOnlyCollection <> interface, in which a decorator is returned, delegating all calls to the base System.Linq.Enumerable and only the Count property is calculated in advance without requiring a full busting collection.

Next, I propose to use my Collection.LINQ library, which optimizes “LINQ to objects” by providing efficient implementation of most operations for collections that implement IReadOnly interfaces.

First of all, I will allow myself a small digression, allowing you to better understand and use the work I have done. As many of you have probably noticed, in the series of IReadOnly interfaces, there is clearly not enough interface for sets. Therefore, we will create it ourselves in this obvious form:
')
public interface IReadOnlyFiniteSet<T> : IReadOnlyCollection<T> { bool Contains (T item); } 

The only thing that is ambiguous here is the finiteness of the set associated with the inheritance of the IReadOnlyCollection <> . Alternatively, it was possible to create an intermediate interface of an infinite set of IReadOnlySet <> inherited from IEnumerable <> . However, there is no practical need for it, since infinite sets are only of academic interest and it is difficult to imagine where they can find application. All the sets existing in the base class library already implement the methods necessary for IReadOnlyFiniteSet <> , so there will be no problems with its implementation. Next, I will talk about optimizing LINQ operations, including for the IReadOnlyFiniteSet <> interface, hoping that it will someday be made part of the base library.

So, we will consider what operations are implemented in Collection.LINQ and how exactly they increase the efficiency of LINQ to objects.

We start by specifying those operations where optimization is not required. Aggregate Aggregate (), Average (), Min (), Max (), Sum () methods do not need any optimization because IEnumerable <> is a necessary and sufficient interface for them, but they return a value, not a collection. For the same reason, they do not need to optimize the overloading of the methods All (), Any (), Count (), LongCount (), First (), FirstOrDefault (), Last (), LastOrDefault (), Single (), SingleOrDefault (), taking parameter predicate filter. The presence of a predicate dictates the need for complete exhaustive search, for which IEnumerable <> is sufficient. Obviously, they also do not require optimization and the ToArray (), ToList (), ToDictionary (), ToLookup () methods, because they semantically imply brute-force and copy creation. Only the creation of an array in the ToArray () method can be slightly optimized by knowing in advance the number of elements in the collection.

The methods for creating the collections Empty (), Range () and Repeat () require one obvious optimization: they must not return a basic IEnumerable <> , but a specific IReadOnly interface.

Now about the main optimization. In operations that return a collection, the result is created in the form of a decorator, which delegates calls to members directly to the input collections, without prior iterations and creating copies of the elements. In order for such optimization to work when sequentially applying several operations, it is also important that the returned collections also be an IReadOnly type. In LINQ to objects, a similar internal optimization is already implemented: many methods first of all check the implementation of some interfaces by the input collection, and use them for more efficient execution of actions. But, of course, IReadOnly interfaces (which did not exist at the moment LINQ appeared) are not used, and the returned collection always has only the basic type IEnumerable <> . In my library, optimization in the form of direct decoration is used in the following LINQ operations.
for collections of type IReadOnlyCollection <>
Any (), Count (), LongCount ()immediately return the result using the Count property
DefaultIfEmpty ()returns IReadOnlyCollection <>, depending on the Count property, initial or one-element
Select (), Zip ()return IReadOnlyCollection <> - decorator
Skip (), Take ()return IReadOnlyCollection <>, depending on the Count property, original, empty or decorator
Concat ()returns IReadOnlyCollection <>, depending on the properties of the Count, one of the source or the decorator
Reverse ()returns an IReadOnlyCollection <>, depending on the Count property, either the original or the decorator, which creates a complete copy of the collection when the enumeration request
OrderBy (), OrderByDescending (), ThenBy (), ThenByDescending ()return an IReadOnlyCollection <>, depending on the Count property, either the original or the decorator, which creates an entire copy of the collection and a sorted index when an enumeration request is made
for IReadOnlyFiniteSet <> sets (in addition to what is available for IReadOnlyCollection <> collections)
Contains ()immediately returns the result using the Contains () method
Distinct ()returns the original IReadOnlyFiniteSet <> set
Except (), Intersect (), Union ()return IReadOnlyFiniteSet <>, depending on the Count property, one of the initial, empty or decorator, when creating a fully enumerating the smaller of the input sets
DefaultIfEmpty ()returns IReadOnlyFiniteSet <>, depending on the Count property, initial or one-element
Reverse ()returns IReadOnlyFiniteSet <>, depending on the Count property, the original or decorator that creates a complete copy of the set
for IReadOnlyList <> lists (in addition to what is available for IReadOnlyCollection <> collections)
ElementAt (), ElementAtOrDefault (), First (), FirstOrDefault (), Last (), LastOrDefault ()immediately return the result using the methods IReadOnlyList <>
DefaultIfEmpty ()returns IReadOnlyList <>, depending on the Count property, initial or one-element
Skip (), Take (), Select (), Concat (), Zip (), Reverse ()return IReadOnlyList <> - decorator
OrderBy (), OrderByDescending (), ThenBy (), ThenByDescending ()return IReadOnlyList <>, depending on the Count property, the source or decorator that creates a sorted index to delegate the receipt of items by position number and enumeration
Unfortunately, for some LINQ operations, the functionality of IReadOnly collections cannot add efficiency. These are the operations that result in a collection obtained by filtering the values ​​of the elements of the input collections. There is no way without complete brute force and copying. The filtering operations SkipWhile (), TakeWhile (), Where (), as well as the conditional grouping and join methods Join (), GroupJoin (), GroupBy (), SelectMany () are not optimized.

Additionally, I will mention a few optimizations in my Collection.Linq , not related to IReadOnly interfaces:
To make sure that I didn’t miss anything and did it more efficiently than the library System.Linq.Enumerable , I constantly spied on its sources .

Using the Collection.LINQ library is easy: just in your code by line
 using System.Linq; 

add line
 using BusinessClassLibrary.Collections.Linq; 

It doesn't matter how you use “LINQ to objects”, in the form of query syntax or in a fluent form. After connecting my library, your LINQ code will automatically use the most optimal methods, provided that collections that implement IReadOnly interfaces are input . The only exception is the methods for creating collections of the class System.Linq.Enumerable , which are not extension methods: Enumerable.Empty (), Enumerable.Range () and Enumerable.Repeat (). These methods will need to be manually replaced with ReadOnlyFiniteSet.Empty () / ReadOnlyList.Empty (), ReadOnlyFiniteSet.Range () / ReadOnlyList.Range () and ReadOnlyList.Repeat (), respectively.

The entire library is presented in a public project on github . At the request of readers, a nuget-package with a library was also created.

Source: https://habr.com/ru/post/237389/


All Articles