How IQueryable and LINQ Data Providers Work

LINQ tools allow .Net developers to work consistently with collections of objects in memory as well as with objects stored in a database or other remote source. For example, to query ten red apples from the list in memory and from the database using the Entity Framework, we can use absolutely identical code:

List<Apple> appleList; DbSet<Apple> appleDbSet; var applesFromList = appleList.Where(apple => apple.Color == “red”).Take(10); var applesFromDb = appleDbSet.Where(apple => apple.Color == “red”).Take(10);

However, these requests are executed differently. In the first case, when enumerating the result using foreach, the apples will be filtered using the specified predicate, after which the first 10 of them will be taken. In the second case, the syntax tree with the query expression will be transferred to a special LINQ provider, which translates it into an SQL query to the database and executes, then generates C # objects for the 10 found records and returns them. This behavior is enabled by the IQueryable <T> interface, which is intended for creating LINQ providers to external data sources. Below we will try to understand the principles of organization and use of this interface.

Interfaces IEnumerable <T> and IQueryable <T>

At first glance it might seem that LINQ is based on a set of extension methods like Where (), Select (), First (), Count (), etc. to the IEnumerable <T> interface, which ultimately allows the developer to write queries in a uniform way for both in-memory objects (LINQ to Objects) and databases (for example, LINQ to SQL, LINQ to Entities) and remote services (for example, LINQ to OData Services). But it is not. The fact is that within the extension methods of IEnumerable <T>, the corresponding operations with sequences have already been implemented. For example, the First <TSource> method (Func <TSource, bool> predicate) is implemented in .Net Framework 4.5.2, the source code of which is available to us here , as follows:

 public static TSource First<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) { if (source == null) throw Error.ArgumentNull("source"); if (predicate == null) throw Error.ArgumentNull("predicate"); foreach (TSource element in source) { if (predicate(element)) return element; } throw Error.NoMatch(); }

It is clear that in the general case such a method cannot be performed on data located in the database or service. To execute it, we can only preload the entire data set directly into the application, which for obvious reasons is unacceptable.
')
To implement LINQ providers to data external to the application, the IQueryable <T> interface (inherited from IEnumerable <T>) is used, along with a set of extension methods that are almost completely identical to those written for IEnumerable <T>. It is precisely because List <T> implements IEnumerable <T>, and DbSet <T> from the Entity Framework is IQueryable <T>, the queries with apples given at the beginning of the article are executed differently.

The peculiarity of extension methods for IQueryable <T> is that they do not contain data processing logic. Instead, they simply form a syntactic structure with a description of the request, “building up” it with each new method call in the chain. When calling aggregate methods (Count (), etc.) or enumerating using foreach, the request description is sent to the provider that is encapsulated within a specific IQueryable <T> implementation, and the request is already converted into the language of the data source and performs it. In the case of the Entity Framework, this language is SQL, in the case of the .Net driver for MongoDb, this is a search json object, etc.

Incidentally, some “interesting” characteristics of LINQ providers stem from this feature:

a request that is successfully executed by one provider may not be supported by another; Moreover, we will find out about this even at the stage of constructing the request, but only at the stage of its execution by the provider;
the provider can modify it before performing the request; for example, a limit on the number of returned objects, additional filters, etc. can be added to all requests.

Making LINQ with your own hands: ISimpleQueryable <T>

Before describing the device interface IQueryable <T>, try to write its own simple analogue - the interface ISimpleQueryable <T>, as well as a couple of methods extensions to it in the style of LINQ. This will allow you to clearly demonstrate the basic principles of working with IQueryable <T>, without going into the nuances of its implementation.

 public interface ISimpleQueryable<TSource> : IEnumerable<TSource> { string QueryDescription { get; } ISimpleQueryable<TSource> CreateNewQueryable(string queryDescription); TResult Execute<TResult>(); }

In the interface, we see the QueryDescription property, which contains the description of the query, as well as the Execute <TResult> () method, which should execute this query if necessary. This is a generic method, since the output can be both an enumeration and a value of an aggregate function, such as Count (). In addition, the interface has the CreateNewQueryable () method, which allows adding a new instance of ISimpleQueryable <T> when adding a new LINQ method, but with a new description of the query. Note that the request description is presented here as a string, and in LINQ, expression trees (Expression Trees) are used for this, which you can read about here or here .

We now turn to extension methods:

 public static class SimpleQueryableExtentions { public static ISimpleQueryable<TSource> Where<TSource>(this ISimpleQueryable<TSource> queryable, Expression<Func<TSource, bool>> predicate) { string newQueryDescription = queryable.QueryDescription + ".Where(" + predicate.ToString() + ")"; return queryable.CreateNewQueryable(newQueryDescription); } public static int Count<TSource>(this ISimpleQueryable<TSource> queryable) { string newQueryDescription = queryable.QueryDescription + ".Count()"; ISimpleQueryable<TSource> newQueryable = queryable.CreateNewQueryable(newQueryDescription); return newQueryable.Execute<int>(); } }

As we see, these methods simply add information about themselves to the query description and create a new instance of ISimpleQueryable <T>. In addition, the Where () method, unlike its counterpart for IEnumerable <T>, does not accept the predicate Func <TSource, bool> itself, but the previously mentioned expression tree with its description Expression <Func <TSource, bool> >. In this example, this simply gives us the opportunity to get a string with the predicate code, and in the case of a real LINQ, the ability to save all the details of the query as an expression tree.

Finally, we will create a simple implementation of our ISimpleQueryable <T>, which will contain everything necessary for writing LINQ queries, except for the method of their execution. To make it realistic, add a link to the data source (_dataSource), which should be used when executing the query using the Execute () method.

 public class FakeSimpleQueryable<TSource> : ISimpleQueryable<TSource> { private readonly object _dataSource; public string QueryDescription { get; private set; } public FakeSimpleQueryable(string queryDescription, object dataSource) { _dataSource = dataSource; QueryDescription = queryDescription; } public ISimpleQueryable<TSource> CreateNewQueryable(string queryDescription) { return new FakeSimpleQueryable<TSource>(queryDescription, _dataSource); } public TResult Execute<TResult>() { //    QueryDescription     dataSource throw new NotImplementedException(); } public IEnumerator<TSource> GetEnumerator() { return Execute<IEnumerator<TSource>>(); } IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); } }

Now consider a simple query to FakeSimpleQueryable:

 var provider = new FakeSimpleQueryable<string>("", null); int result = provider.Where(s => s.Contains("substring")).Where(s => s != "some string").Count();

Let's try to figure out what will happen when executing the above code (see also the figure below):

First, the first call to the Where () method will take an empty query description from the FakeSimpleQueryable instance created with the help of the constructor, add ".Where (s => s.Contains (" substring ")) to it" and form a second instance of FakeSimpleQueryable with a new description;
then the second Where () call will take the description of the request from the previously created FakeSimpleQueryable, add ".Where (s => s! =" some string ")" to it, and then again form a new, third, instance of FakeSimpleQueryable with the description of the request " .Where (s => s.Contains ("substring")). Where (s => s! = "Some string") ";
Finally, a Count () call will take a description of the request from the FakeSimpleQueryable instance created in the previous step, add the ".Count ()" to it and form the fourth FakeSimpleQueryable instance, then call the Execute <int> method, since it is impossible to build the query further;
as a result, inside the Execute () method, we will have a QueryDescription value of ".Where (s => s.Contains (" substring ")). Where (s => s! =" some string "). Count ()", which need to be processed further.

Real IQueryable <T> ... and IQueryProvider <T>

Consider now what is the IQueryable <T> interface, implemented in .Net:

 public interface IQueryable : IEnumerable { Expression Expression { get; } Type ElementType { get; } IQueryProvider Provider { get; } } public interface IQueryable<out T> : IEnumerable<T>, IQueryable {} public interface IQueryProvider { IQueryable CreateQuery(Expression expression); IQueryable<TElement> CreateQuery<TElement>(Expression expression); object Execute(Expression expression); TResult Execute<TResult>(Expression expression); }

Note that:

in .Net there is a generic and regular version of IQueryable;
to store the tree with the LINQ query description, the Expression property is used (in our implementation, we used a string QueryDescription);
the ElementType property contains information about the type of elements returned by the query and is used in implementations of LINQ providers for checking the type conformity;
a pair of methods for creating new instances of IQueryable (CreateQuery () and CreateQuery <TElement> ()), as well as a pair of methods for executing a query (Execute () and Execute <TResult> ()) are moved to a separate interface IQueryProvider <T>; it can be assumed that such a separation was needed in order to separate the request itself, which is re-created with each new call to the extension method, from the object that actually has access to the data source, does all the basic work and can be quite “heavy” for permanent re-creation;
The IQueryable.Provider property points to the associated IQueryProvider instance.

Now let's take a look at the work of extension methods for IQueryable <T> using the example of the Where () method:

 public static IQueryable<TSource> Where<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, int, bool>> predicate) { if (source == null) throw Error.ArgumentNull("source"); if (predicate == null) throw Error.ArgumentNull("predicate"); return source.Provider.CreateQuery<TSource>( Expression.Call( null, ((MethodInfo)MethodBase.GetCurrentMethod()).MakeGenericMethod(typeof(TSource)), new Expression[] { source.Expression, Expression.Quote(predicate) } )); }

We see that the method constructs a new IQueryable <TSource> instance by passing an expression to CreateQuery <TSource> (), in which a call to the actual Where () method with the predicate passed as an argument is added to the source expression from source.Expression.

Thus, despite some differences between the IQueryable <T> and IQueryProvider <T> interfaces from the ISimpleQueryable <T> we created earlier, the principles of their use in LINQ are the same: each extension method added to the query complements the expression tree with information about itself, then it creates a new IQueryable <T> instance using the CreateQuery <T> () method, and the aggregate methods also initiate the query execution by calling the Execute <T> () method.

A couple of words about the development of LINQ providers

Since the LINQ query design mechanism has already been implemented in .Net for us, the development of a LINQ provider is mostly reduced to the implementation of the Execute () and Execute <TResult> () methods. This is where you need to parse the expression tree that came to be executed, convert it to the data source language, execute the query, wrap the results in C # objects and return them. Unfortunately, this procedure involves the processing of a considerable number of different nuances. Moreover, the available information on developing LINQ providers is quite small. Below are the most informative, in the opinion of the author, articles on this topic:

I hope that the material of this article will be useful to anyone who wanted to understand the organization of the work of LINQ-providers to remote data sources or to approach the creation of such a provider, but has not yet decided.

Source: https://habr.com/ru/post/256821/

All Articles

How IQueryable and LINQ Data Providers Work

Interfaces IEnumerable <T> and IQueryable <T>

Making LINQ with your own hands: ISimpleQueryable <T>

Real IQueryable <T> ... and IQueryProvider <T>

A couple of words about the development of LINQ providers

More articles: