Linq To Entities vs. Linq To Objects on the example of grouping

LINQ is a convenient, beautiful, but rather insidious abstraction. The most unexpected things usually occur at the junction of any implementation of LINQ and LINQ To Objects. Today, using one example, I’ll look at how LINQ To Entities (Entity Framework) and LINQ To Objects work together.

Take the repository method as a basis, which accepts a list of customer identifiers as input and returns a set of orders grouped by these identifiers (the Orders table contains the fields OrderId, OrderDate and CustomerId):

public IDictionary<long, List<Order>> GetOrdersByCustomersIds(IList<long> customersIds) { using (var ctx = new RepositoryContext()) { return ctx.Orders. Where(o => customersIds.Contains(o.Id)). GroupBy(o => o.CustomerId). ToDictionary(o => o.Key, o => o.ToList()); } }

')
Wait a minute! How does it work? After all, when executing a GROUP BY query, we can select only the fields by which the grouping takes place, as well as aggregated values. The standard solution to this problem is a JOIN of the table data and grouping results. Like that:

  SELECT o1.*, MinTotal FROM Orders as o1 INNER JOIN (SELECT o2.CustomerId, Min(o2.Total) as MinTotal FROM Orders o2 GROUP BY o2.CustomerId) as o3 ON o1.CustomerId = o3.CustomerId Where o1.CustomerId in (1, 2, 3, 4, 5)

Something in this spirit should generate an EF provider. Let's see this. I had a MySQL .NET Connector on hand (the official ADO.NET provider for MySQL), so I used it and received the following generated request (transferring a list of identifiers from 1 to 5 to the input):

 SELECT `Project2`.`C1`, `Project2`.`CustomerId`, `Project2`.`C2`, `Project2`.`CustomerId1`, `Project2`.`Id`, `Project2`.`OrderDate` FROM (SELECT `Distinct1`.`CustomerId`, 1 AS `C1`, `Extent2`.`CustomerId` AS `CustomerId1`, `Extent2`.`Id`, `Extent2`.`OrderDate`, CASE WHEN (`Extent2`.`CustomerId` IS NULL) THEN (NULL) ELSE (1) END AS `C2` FROM (SELECT DISTINCT `Extent1`.`CustomerId` FROM `orders` AS `Extent1` WHERE ((1 = `Extent1`.`Id`) OR (2 = `Extent1`.`Id`)) OR (((3 = `Extent1`.`Id`) OR (4 = `Extent1`.`Id`)) OR (5 = `Extent1`.`Id`))) AS `Distinct1` LEFT OUTER JOIN `orders` AS `Extent2` ON (((1 = `Extent2`.`Id`) OR (2 = `Extent2`.`Id`)) OR (((3 = `Extent2`.`Id`) OR (4 = `Extent2`.`Id`)) OR (5 = `Extent2`.`Id`))) AND (`Distinct1`.`CustomerId` = `Extent2`.`CustomerId`)) AS `Project2` ORDER BY `CustomerId` ASC, `C2` ASC

A bit worse than manual implementation, but on the whole a thought can be traced.

Stop! And why do we use grouping at the database level? Grouping is justified in the case of using aggregation functions (as in the above manual implementation of the query). In our case, grouping is only a convenient presentation of the data obtained. Let's slightly modify the repository method and move the grouping process to the LINQ To Objects level:

 public IDictionary<long, List<Order>> GetOrdersByCustomersIds(IList<long> customersIds) { using (var ctx = new RepositoryContext()) { return ctx.Orders. Where(o => customersIds.Contains(o.Id)). AsEnumerable(). GroupBy(o => o.CustomerId). ToDictionary(o => o.Key, o => o.ToList()); } }

For the sake of completeness, let's see which request will be generated by the EF provider:

 SELECT `Extent1`.`CustomerId`, `Extent1`.`Id`, `Extent1`.`OrderDate` FROM `orders` AS `Extent1` WHERE ((1 = `Extent1`.`Id`) OR (2 = `Extent1`.`Id`)) OR (((3 = `Extent1`.`Id`) OR (4 = `Extent1`.`Id`)) OR (5 = `Extent1`.`Id`))

This query is definitely more efficient than the previous one.

That's all. Nothing special - I just wanted to focus your attention on the cunning of the transition from LINQ To X to LINQ To Objects after you fell into this trap. Be carefull!

PS Despite the fact that I used the MySQL .NET Connector, I absolutely do not recommend using this provider in production: this is not a provider, but a co-centered bunch of bugs that have not been fixed for years.

Source: https://habr.com/ru/post/119624/

All Articles

Linq To Entities vs. Linq To Objects on the example of grouping

More articles: