Entity Framework and Performance

In the process of working on a web portal project, I explored ways to improve performance, and came across a small article about micro-ORM Dapper, which was written by the authors of the project StackOverflow.com. Initially, their project was written in Linq2Sql, and now all the performance-critical places have been rewritten using this solution.
The disadvantage of this, as well as other similar solutions, which I managed to see, is that they very little help in facilitating the development process, providing by and large only materialization, hiding work with ADO.Net directly. SQL queries need to be written by hand.

The Linq2Entities syntax also contributes to more “clean code”, allowing both code testing and reuse. In addition, when changing in the database, immediately after updating the context, the compiler generates errors, in all places where a remote or renamed field is used, the changed structure of relations between the tables will highlight the places where the corresponding navigation properties are used.

But the article is not about how much EF speeds up development, and not about the fact that it’s not very good to have a part of queries written in linq, and a part right away on sql. Here I will provide a solution that allows you to combine EF-entities and Linq2 Entities requests on the one hand and the “pure performance” of ADO.Net on the other. But first, a little background. All those who worked with such projects, as I believe, have come across the fact that per-row calls work very slowly. And many probably tried to optimize this by writing a huge query and squeezing everything into it. It works, but it looks very scary - the method code is huge, it is difficult to maintain and impossible to test. The first stage of the solution, which I tried, is the materialization of all the necessary entities, each individual request. A connection / transformation into a domain structure occurs separately with the materialization.
Let me explain by example. It is necessary to display a list of insurance policies, the primary request, looks like this:

int clientId = 42; var policies = context.Set<policy>().Where(x => x.active_indicator).Where(x => x.client_id == clientId);

Further, to display the necessary information, we need dependent, or as they can be called, “subsidiary” entities.

 var coverages = policies.SelectMany(x => x.coverages); var premiums = coverages.Select(x => x.premium).Where(x => x.premium_type == SomeIntConstant);

Entities connected by means of NavProps can also be loaded via Include, but this has its own difficulties, it turned out to be easier (and more productive, more on this later) as in the aforesaid example.
This alteration in itself did not give such an increase in performance relative to the original comprehensive request, but it simplified the code, made it possible to break it down into smaller methods, and make the code a more pleasing and familiar look.
')
Performance came as the next step, when running the SQL Server Profiler, I found that two of the 30 requests are executed 10-15 times longer than the others. The first of these requests was

 var tasks = workflows.SelectMany(x => x.task) .Where(x => types.Contains(x.task_type)) .GroupBy(x => new { x.workflow_id, x.task_type}) .Select(x => x.OrderByDescending(y => y.task_id).FirstOrDefault());

As it turned out, EF generates a very unsuccessful request, and just moving GroupBy from last place to first, I approached the speed of these requests to the rest, receiving about 30-35% decrease in the total execution time.

 var tasks = context.Set<task> .GroupBy(x => new { x.workflow_id, x.task_type}) .Select(x => x.OrderByDescending(y => y.task_id).FirstOrDefault()) .Join(workflows, task => task.workflow_id, workflow => workflow.workflow_id, (task, workflow) => task) .Where(x => types.Contains(x.task_type));

Just in case, I will say that Join in this query is equivalent to SelectMany in the previous one.
Find and eliminate such a blemish in the depths of a huge query is problematic, on the verge of the impossible. And through Include this is also not implemented.

Returning to the beginning of the article, to the micro-ORM, I want to immediately say that such an approach may not be justified in all scenarios. In ours, it was necessary to load a portion of data from the database, do some transformations and calculations, and send it to the client in a browser, using JSON.
As a prototype of the solution, I tried to implement the materialization through PetaPoco, and was very impressed with the test result, the difference in the materialization time of the target query group was 4.6x (756ms versus 3493ms). Although it would be better to say that I was disappointed with the performance of the EF.
For reasons of strict settings in StyleCop, PetaPoco did not work in the project, and in order to adapt it for the task, I had to get into it and make changes, so the idea to write my own solution was ripe.
The solution relies on the fact that when generating queries, the EF in the query will indicate the field names for the dataset, corresponding to the names of the properties of the objects that it generated for the context. Alternatively, you can rely on the order of these fields, which also works. To retrieve the query and parameters from the query, the ToObjectQuery method is used, and the ToTraceString method and the Parameters property are used on the resulting object. The following is a simple reading cycle, taken from MSDN, the “highlight” of the solution is the materializers. PetaPoco emits the materializer code at runtime, but I decided to generate a code for them using T4 Templates. I took as a basis a file that generates entities for the context, while reading .edmx, used all the auxiliary classes from it, and replaced the directly generating code.
Example generated class:

  public class currencyMaterialize : IMaterialize<currency>, IEqualityComparer<currency> { public currency Materialize(IDataRecord input) { var output = new currency(); output.currency_id = (int)input["currency_id"]; output.currency_code = input["currency_code"] as string; output.currency_name = input["currency_name"] as string; return output; } public bool Equals(currency x, currency y) { return x.currency_id == y.currency_id; } public int GetHashCode(currency obj) { return obj.currency_id.GetHashCode(); } }

The code that issues PetaPoco is conventionally identical to this, which is also confirmed by the same execution time.
As you can see, the class also implements the IEqualityComparer interface, from which it should already be understood that on objects materialized in this way, the usual ReferenceEquals comparison no longer works, unlike objects that EF materializes, and to make Distinct in memory, and needed such an addition.

The result of the research I designed in the form of Item Template and published in the Visual Studio gallery . A brief description of how to use is there. I would be glad if someone interested in the decision.

Source: https://habr.com/ru/post/227637/

All Articles

Entity Framework and Performance

More articles: