Pitfalls Entity Framework and Performance

When working with the Entity Framework, as with any other ORM, there are often issues related to its performance. Many developers because of ignorance of the nuances make mistakes that lead to poor results. Then, during the analysis of the problems and the search for solutions, not enough to understand the question, they come to the conclusion that the situation can only be improved by switching to another ORM or abandoning it in general. Although in some situations such a decision may be reasonable, it is often not so bad — you just need to know the nuances. In this article, I tried to collect those pitfalls that I most often encountered in practice.

Enable tracking changes when not needed

Suppose there is such a code fragment in our project, designed to read some list of entities and then transfer them to the client:

using(var context = new EntityDataContext()) { found = context.Entities.Where(e => e.Name.StartsWith(filter)).ToList(); }

At first glance, there are no problems, but if there are no special settings hidden in the context constructor, this code will lead to extra computational resources. Each request has a MergeOption attribute that specifies how to load the objects read by the request into the context. By default, this attribute is AppendOnly, which says that entities that are not yet in context should be added to it.
')
The question arises - why add objects to the context if there is no further action to change and save them? The answer is no need, these are extra expenses, and they are noticeable. If the objects read using the Entity Framework context will not be modified and will not participate in modifications of other objects, in the corresponding request, call AsNoTracking () for the collection:

 using(var context = new EntityDataContext()) { found = context.Entities.AsNoTracking().Where(e => e.Name.StartsWith(filter)).ToList(); }

This function sets the MergeOption attribute to NoTracking, thereby excluding actions to add the read objects to the context. You can see how significant the gain is at this link https://msdn.microsoft.com/en-us/data/hh949853.aspx (point 5.2) or here - http://blog.staticvoid.co.nz/2012/4 / 2 / entity_framework_and_asnotracking .

Another potential problem associated with excessive tracking of changes is primarily concerned with scenarios with adding and modifying data. The context configuration has an AutoDetectChangesEnabled property that indicates whether to automatically call the DetectChanges method before performing certain operations. Such operations include adding an object to the context, saving changes to the database, searching for objects through the Find method, etc. The call to DetectChanges is needed, in particular, to determine what exactly has changed / deleted / added, to update links between objects, etc. For more information about what this method is for, you can read here - http://blog.oneunicorn.com/2012/03/10/secrets-of-detectchanges-part-1-what-does-detectchanges-do/ .

Suppose a typical scenario is to add a set of objects to the database:

 using (var context = new EntitiesContext()) { for(int i = 0; i < 1000; i++) { context.PassengerCars.Add(new PassengerCar { Name = "RandomCar " + i.ToString() }); } context.SaveChanges(); }

The value of AutoDetectChangesEnabled defaults to true, which means that when adding each new PassengerCar object, DetectChanges will be called first, which will go through all the objects in the context and check for changes. But in this case, it is absolutely not needed - there are no changes to the added entities in this code, they are saved in the form in which they were added. And the cost of DetectChanges is very significant, examples can be seen here - http://blog.staticvoid.co.nz/2012/5/7/entityframework_performance_and_autodetectchanges .

The thoughtless shutdown of the AutoDetectChangesEnabled property can lead to undesirable consequences (loss of changes, exceptions due to data integrity), so I would put the simplest rule - if your code does not imply further changes to the objects added to the context within the same session, then this property can be easily disabled. This situation occurs quite often - a typical CRUD API usually receives an object from the outside and either just adds it, or determines what changes have been made since the time of reading, and updates the information about the state of the object in the context accordingly (for example, using GraphDiff , or using self-tracked entities, or any other similar solutions). The object itself does not change.

Permanent recompilation of some queries

Starting with the Entity Framework 5, queries are automatically cached after compilation, which significantly speeds up their subsequent execution - the SQL query text will be taken from the cache, it only remains to substitute the required parameter values. But there are several situations in which compilation will be performed with each execution.

Using Contains on collection in memory

In practice, it is often necessary to add to the query a condition similar to the SQL IN operator — check whether the value of the property matches any of the elements of the collection. For example, like this:

 List<int> channels = new List<int> { 1, 5, 9 }; dataContext.Entities .AsNoTracking() .Where(e => channels.Contains(e.Channel)) .ToList();

This expression is eventually converted to the following SQL:

 SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[Channel] AS [Channel] FROM [dbo].[Entities] AS [Extent1] WHERE [Extent1].[Channel] IN (1, 5, 9)

It turns out that for the IN operator the parameters are not used, but instead the values themselves are substituted. Such a request to cache will not work, because when using a collection with different content, the query text will need to be regenerated. This, incidentally, beats not only the performance of the Entity Framework itself, but also against the database server, since for any new list of values in the IN operator, the server will have to rebuild and cache the execution plan.

If a collection that doesn’t expect a large number of elements (say, no more than a hundred), the problem can be solved by dynamically generating conditions connected by the OR operator. This is easily done, for example, using the LinqKit library:

 List<int> channels = new List<int> { 1, 5, 9 }; var channelsCondition = PredicateBuilder.False<Entity>(); channelsCondition = channels.Aggregate(channelsCondition, (current, value) => current.Or(e => e.Channel == value).Expand()); var query = dataContext.Entities .AsNoTracking() .Where(channelsCondition);

As a result, we get the already parameterized query:

 SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[Channel] AS [Channel] FROM [dbo].[Entities] AS [Extent1] WHERE [Extent1].[Channel] IN (@p__linq__0,@p__linq__1,@p__linq__2)

Despite the fact that the dynamic construction of a query looks like an extra costly job, in practice it takes relatively little CPU time. In one of the real tasks, building a query with each call took more than a second. And replacing Contains with a similar dynamic expression reduced the query processing time (except for the first one) to tens of milliseconds.

Use Take and Skip

In many projects there is a need to implement paging for search results. The obvious solution for selecting the desired portion of records here are the Take and Skip functions:

 int pageSize = 10; int startFrom = 10; var query = dataContext.Entities .AsNoTracking() .OrderBy(e => e.Name) .Skip(startFrom) .Take(pageSize);

Let's see what SQL will be in this case:

 SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[Channel] AS [Channel] FROM [dbo].[Entities] AS [Extent1] ORDER BY [Extent1].[Name] ASC OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY

Both the page size and the offset value are specified in the request by constants, not parameters. This, again, means that the request text will not be cached. Fortunately, starting with Entity Framework 6, there is a simple way to get around this problem - use lambda expressions in the Take and Skip functions:

 var query = dataContext.Entities .AsNoTracking() .OrderBy(e => e.Name) .Skip(() => startFrom) .Take(() => pageSize);

And the resulting query will contain parameters instead of constants:

 SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[Channel] AS [Channel] FROM [dbo].[Entities] AS [Extent1] ORDER BY [Extent1].[Name] ASC OFFSET @p__linq__0 ROWS FETCH NEXT @p__linq__1 ROWS ONLY

A large number of Include in one request

Obviously, the easiest way to read data from a database along with child collections and other navigation properties is to use the Include () method. Regardless of the number of Include () in a LINQ query, one SQL query will be generated by the total, which returns all the specified data. It may seem that within the Entity Framework such an approach for reading complex objects will be the most optimal in any situation. But it is not so.

To begin, consider the structure of the final SQL query. For example, we have a LINQ request with two Include for collections.

 var query = c.GuidKeySequentialParentEntities .AsNoTracking() .Include(e => e.Children1) .Include(e => e.Children2) .Where(e => e.Id == sequentialGuidKey);

The corresponding SQL will contain UNION ALL:

 SELECT [UnionAll1].[C2] AS [C1], [UnionAll1].[Id] AS [C2], [UnionAll1].[Name] AS [C3], [UnionAll1].[C1] AS [C4], [UnionAll1].[Id1] AS [C5], [UnionAll1].[Name1] AS [C6], [UnionAll1].[ParentId] AS [C7], [UnionAll1].[C3] AS [C8], [UnionAll1].[C4] AS [C9], [UnionAll1].[C5] AS [C10] FROM (SELECT CASE WHEN ([Extent2].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1], 1 AS [C2], [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent2].[Id] AS [Id1], [Extent2].[Name] AS [Name1], [Extent2].[ParentId] AS [ParentId], CAST(NULL AS uniqueidentifier) AS [C3], CAST(NULL AS varchar(1)) AS [C4], CAST(NULL AS uniqueidentifier) AS [C5] FROM [dbo].[GuidKeySequentialParentEntities] AS [Extent1] LEFT OUTER JOIN [dbo].[GuidKeySequentialChildEntity1] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ParentId] WHERE [Extent1].[Id] = @p__linq__0 UNION ALL SELECT 2 AS [C1], 2 AS [C2], [Extent3].[Id] AS [Id], [Extent3].[Name] AS [Name], CAST(NULL AS uniqueidentifier) AS [C3], CAST(NULL AS varchar(1)) AS [C4], CAST(NULL AS uniqueidentifier) AS [C5], [Extent4].[Id] AS [Id1], [Extent4].[Name] AS [Name1], [Extent4].[ParentId] AS [ParentId] FROM [dbo].[GuidKeySequentialParentEntities] AS [Extent3] INNER JOIN [dbo].[GuidKeySequentialChildEntity2] AS [Extent4] ON [Extent3].[Id] = [Extent4].[ParentId] WHERE [Extent3].[Id] = @p__linq__0) AS [UnionAll1] ORDER BY [UnionAll1].[Id] ASC, [UnionAll1].[C1] ASC

It would be logical to assume that Include () simply adds another JOIN to the query. But the Entity Framework behaves harder. If the included navigation property is a single object, not a collection, then there will simply be another JOIN. If the collection is for each, a separate subquery will be formed, where the parent table is joined with the child, and all such subqueries will be merged into a common UNION ALL. Obviously, if only one child collection is needed, then UNION ALL will not. Schematically, this can be represented as:

 SELECT /*   */ FROM (SELECT /*   */ FROM /*   */ LEFT OUTER JOIN /*   1 */ WHERE /*   */ UNION ALL SELECT /*   */ FROM /*   */ INNER JOIN /*   2 */ WHERE /*   */ UNION ALL SELECT /*   */ FROM /*   */ INNER JOIN /*   3 */ WHERE /*   */ /* ... */ ORDER BY /*   */

This is done to combat the problem of multiplying the results. Suppose an object has three child collections of 10 elements each. If all three are added via OUTER JOIN directly to the “main” query, then the result will be 10 * 10 * 10 = 1000 records. If we go through the Entity Framework, and collect these three collections in one request via UNION, we get 30 records. The more collections and elements in them, the more obvious the benefit of the approach with UNION.

But the problem is that, given the great complexity of the entities themselves and the selection criteria, the construction and optimization of such a query is very laborious for the Entity Framework, as is its implementation at the database server level. Therefore, if the results of the profiling show unsatisfactory performance of queries containing Include, and with the indexes in the database everything is in order, it makes sense to think about alternative solutions.

The main idea of alternative solutions is the reading of each collection by a separate request. The simplest option is possible if the objects are added to the context during selection, i.e. without using AsNoTracking ():

 var children1 = c.ChildEntities1 .Where(e => e.Parent.CreatedAt >= DbFunctions.AddDays(DateTime.UtcNow, -1)) var children2 = c.ChildEntities2 .Where(e => e.Parent.CreatedAt >= DbFunctions.AddDays(DateTime.UtcNow, -1)) children1.Load(); children2.Load(); var query = c.ParentEntities .Where(e => e.CreatedAt >= DbFunctions.AddDays(DateTime.UtcNow, -1)) .ToList();

It turns out that for each child collection we read all the objects that are related to the parent entities that fall under the query criteria. After calling Load () objects are added to the context. During the proofreading of the parent entities, the Entity Framework will find all the children in the context and add references to them accordingly.

The main disadvantage here is that each request is a separate appeal to the database server. Fortunately, there is a way to solve this problem. In the EntityFramework.Extended library it is possible to create “future” requests. The basic idea is that all requests for which the extension method Future () was called will be sent in one call to the server when one of them has a terminal method called:

 var children1 = c.ChildEntities1 .Where(e => e.Parent.CreatedAt >= DbFunctions.AddDays(DateTime.UtcNow, -1)) .Future(); var children2 = c.ChildEntities2 .Where(e => e.Parent.CreatedAt >= DbFunctions.AddDays(DateTime.UtcNow, -1)) .Future(); var results = c.ParentEntities .Where(e => e.CreatedAt >= DbFunctions.AddDays(DateTime.UtcNow, -1)) .Future() .ToList();

As a result, as in the first example, objects from the results collection will contain correctly filled collections of Children1 and Children2, all the data will be received in one call to the server.

Using “future” queries will be useful in all situations where there is a need to perform several separate queries.

Subtracting fields from the base entity only when using the Table Per Type mapping

Imagine a system in which a number of entities have a base class that contains their common characteristics (name, creation date, owner, status, etc.). There is also a requirement to implement a search for these characteristics and display a list of results. Mapping implies, again, using only basic characteristics.

From the point of view of model flexibility, the Table Per Type mapping is well suited for this task, where a separate table is created for each type. For example, we have the base class Vehicle and the heirs - PassengerCar, Truck, Motorcycle. In this case, four tables will be created in the database.

We write a query that reads the search results for any criterion. For example, the date of addition is not earlier than 10 days ago:

 var vehicles = context.Vehicles .AsNoTracking() .Where(v => v.CreatedAt >= DbFunctions.AddDays(DateTime.UtcNow, -10)) .ToList();

And let's see what the Entity Framework transforms it to:

 SELECT CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project3].[C1] = 1) AND ([Project3].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN '0X' WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN '0X0X' WHEN (([Project3].[C1] = 1) AND ([Project3].[C1] IS NOT NULL)) THEN '0X1X' ELSE '0X2X' END AS [C1], [Extent1].[Id] AS [Id], [Extent1].[CreatedAt] AS [CreatedAt], [Extent1].[Name] AS [Name], CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project3].[C1] = 1) AND ([Project3].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS bit) WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN [Project1].[HasCycleCar] WHEN (([Project3].[C1] = 1) AND ([Project3].[C1] IS NOT NULL)) THEN CAST(NULL AS bit) END AS [C2], CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project3].[C1] = 1) AND ([Project3].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS int) WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN CAST(NULL AS int) WHEN (([Project3].[C1] = 1) AND ([Project3].[C1] IS NOT NULL)) THEN [Project3].[Seats] END AS [C3], CASE WHEN (( NOT (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL))) AND ( NOT (([Project3].[C1] = 1) AND ([Project3].[C1] IS NOT NULL))) AND ( NOT (([Project2].[C1] = 1) AND ([Project2].[C1] IS NOT NULL)))) THEN CAST(NULL AS int) WHEN (([Project1].[C1] = 1) AND ([Project1].[C1] IS NOT NULL)) THEN CAST(NULL AS int) WHEN (([Project3].[C1] = 1) AND ([Project3].[C1] IS NOT NULL)) THEN CAST(NULL AS int) ELSE [Project2].[Capacity] END AS [C4] FROM [dbo].[Vehicles] AS [Extent1] LEFT OUTER JOIN (SELECT [Extent2].[Id] AS [Id], [Extent2].[HasCycleCar] AS [HasCycleCar], cast(1 as bit) AS [C1] FROM [dbo].[Motorcycles] AS [Extent2] ) AS [Project1] ON [Extent1].[Id] = [Project1].[Id] LEFT OUTER JOIN (SELECT [Extent3].[Id] AS [Id], [Extent3].[Capacity] AS [Capacity], cast(1 as bit) AS [C1] FROM [dbo].[Trucks] AS [Extent3] ) AS [Project2] ON [Extent1].[Id] = [Project2].[Id] LEFT OUTER JOIN (SELECT [Extent4].[Id] AS [Id], [Extent4].[Seats] AS [Seats], cast(1 as bit) AS [C1] FROM [dbo].[PassengerCars] AS [Extent4] ) AS [Project3] ON [Extent1].[Id] = [Project3].[Id] WHERE [Extent1].[CreatedAt] >= (DATEADD (day, -10, SysUtcDateTime()))

It turns out that we need only basic information, and the Entity Framework reads all, and with a rather cumbersome request. In fact, in this particular situation there is nothing bad - despite the fact that we select objects from the collection of base classes, the framework must comply with polymorphic behavior and return an object of the type it was created.

The main question here is how to simplify the request so that it does not read too much? Fortunately, starting with the Entity Framework 5, there is such a possibility - this is the use of projection. Simply create an object of another type or anonymous, using to fill it only with the properties of the base entity:

 var vehicles = context.Vehicles .AsNoTracking() .Where(v => v.CreatedAt >= DbFunctions.AddDays(DateTime.UtcNow, -10)) .Select(v => new { Id = v.Id, CreatedAt = v.CreatedAt, Name = v.Name }) .ToList();

And everything becomes much easier:

 SELECT 1 AS [C1], [Extent1].[Id] AS [Id], [Extent1].[CreatedAt] AS [CreatedAt], [Extent1].[Name] AS [Name] FROM [dbo].[Vehicles] AS [Extent1] WHERE [Extent1].[CreatedAt] >= (DATEADD (day, -10, SysUtcDateTime()))

But there is some unpleasant news - if there is a collection in the base class and it needs to be read, the problem remains. Here is an example:

 var vehicles = context.Vehicles .AsNoTracking() .Where(v => v.CreatedAt >= DbFunctions.AddDays(DateTime.UtcNow, -10)) .Select(v => new { Id = v.Id, CreatedAt = v.CreatedAt, Name = v.Name, ServiceTickets = v.ServiceTickets }) .ToList();

And the SQL generated for it:

 SELECT [Project1].[Id1] AS [Id], [Project1].[Id2] AS [Id1], [Project1].[Id3] AS [Id2], [Project1].[Id] AS [Id3], [Project1].[C1] AS [C1], [Project1].[CreatedAt] AS [CreatedAt], [Project1].[Name] AS [Name], [Project1].[C2] AS [C2], [Project1].[Id4] AS [Id4], [Project1].[Comments] AS [Comments], [Project1].[Vehicle_Id] AS [Vehicle_Id] FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[CreatedAt] AS [CreatedAt], [Extent1].[Name] AS [Name], [Extent2].[Id] AS [Id1], [Extent3].[Id] AS [Id2], [Extent4].[Id] AS [Id3], 1 AS [C1], [Extent5].[Id] AS [Id4], [Extent5].[Comments] AS [Comments], [Extent5].[Vehicle_Id] AS [Vehicle_Id], CASE WHEN ([Extent5].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2] FROM [dbo].[Vehicles] AS [Extent1] LEFT OUTER JOIN [dbo].[Motorcycles] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Id] LEFT OUTER JOIN [dbo].[Trucks] AS [Extent3] ON [Extent1].[Id] = [Extent3].[Id] LEFT OUTER JOIN [dbo].[PassengerCars] AS [Extent4] ON [Extent1].[Id] = [Extent4].[Id] LEFT OUTER JOIN [dbo].[ServiceTickets] AS [Extent5] ON [Extent1].[Id] = [Extent5].[Vehicle_Id] WHERE [Extent1].[CreatedAt] >= (DATEADD (day, -10, SysUtcDateTime())) ) AS [Project1] ORDER BY [Project1].[Id1] ASC, [Project1].[Id2] ASC, [Project1].[Id3] ASC, [Project1].[Id] ASC, [Project1].[C2] ASC

I created a ticket for the Entity Framework on this topic: https://entityframework.codeplex.com/workitem/2814 , but they politely told me that because of the great complexity and danger of breaking everything, they would not fix it.

In some cases, when the size of the base and / or the number of heir objects is small, you can live with it. If such requests begin to significantly degrade performance, you need to look for solutions. Once at the level of the framework itself, the problem cannot be prevented, a workaround is needed. The easiest option here is to read the collections in separate requests, for example:

 //   var vehiclesQuery = context.Vehicles .AsNoTracking() .Where(v => v.CreatedAt >= DbFunctions.AddDays(DateTime.UtcNow, -10)); //       ,   var vehicles = vehiclesQuery .Select(v => new VehicleDto { Id = v.Id, CreatedAt = v.CreatedAt, Name = v.Name }) .ToList(); //  ,     ,    var serviceTickets = context.ServiceTickets .AsNoTracking() .Where(s => vehiclesQuery.Any(v => v.Id == s.VehicleId)) .ToList(); //     vehicles.ForEach(v => v.ServiceTickets .AddRange(serviceTickets.Where(s => s.VehicleId == v.Id)));

There is no universal recipe, and the above solution may not give a win in all cases. For example, a basic query can be quite complicated, and it will be unprofitable to execute it in a new way for each collection. You can try to get around this problem by obtaining a list of identifiers from the results of the basic query, and then using it in all further subqueries. But if there are many results, there may not be a win. In addition, in this case, one should remember what was said earlier about the Contains method, which is obviously being asked for by identifiers.

I would formulate a general approach to solving a problem like this - if there is an opportunity not to use Table Per Type mapping, it is better not to use it. In cases where it is difficult to manage without it, you need to try the options described above and see if they give a win.

Additional Information

Performance related nuances that should be considered when working with the Entity Framework (including those described in the article) are briefly described at this link: https://msdn.microsoft.com/en-us/data/hh949853.aspx . Unfortunately, alternative solutions are not indicated for all problems, but the information is still very useful. It should also be noted that at least clause 4.3 is not confirmed in practice for the Entity Framework 6.1.3.

Source: https://habr.com/ru/post/269901/

All Articles