7 myths about Linq to Database

Linq appeared in 2007, the first IQueryable provider also appeared - Linq2SQL, it worked only with MS SQL Server, it was rather slow and didn’t cover all the scenarios. Almost 7 years have passed, several Linq-providers have appeared that work with different DBMS, almost all the “childhood diseases” of technology have won and, for a couple of years, Linq to Database (a generic name for popular providers) is ready for industrial use.

Nevertheless, not everyone uses Linq to Database and explains this not only because the project is old and rather difficult to rewrite to linq, but also cites various myths as arguments. These myths wander from one company to another and often spread via the Internet.

In this post I collected the most popular myths and denials to them.

')

Myth number 1

The database is handled by a specially trained DBA, which makes all queries, and programmers write code, so Linq to Database is not needed.

Despite the attractiveness of the myth, this approach usually does not work. To make effective DBA queries, you need to understand very well what is happening in the program, what data is needed in each scenario.

If the DBA does not have this knowledge, then it usually comes down to the fact that the DBA makes a small set of CRUD stored on each entity + several stored for the most “thick” queries. And the rest is already done by programmers in the code. This most often works inefficiently, because on average, much more data is pulled than is needed for a particular scenario. And to optimize this is difficult.

If the DBA knows each script, then it has two options:
a) Make a lot of things (almost the same), each for a specific scenario, and then painfully maintain them.
b) Make a few universal storage with a bunch of parameters, inside which to glue the lines to form optimal queries. Moreover, adding an additional parameter to the request becomes an extremely complex process.

Both options for DBA are very complex, so the hybrid option with several very complex storage is often the result, and the rest is a banal CRUD. Linq allows you to do the same splice of lines much more efficiently, so you can generate optimal queries in your program code or close to optimal ones.

A DBA can create views and functions that will be used in queries from application code, as well as stored procedures for batch processing. But designing requests is better to leave on the side of the application.

Myth number 2

Linq generates inefficient SQL queries.

Very often repeated myth. But most of the inefficiencies of Linq queries are created by people.

The reasons for this are simple:
1) People do not understand how Linq differs from SQL. Linq works with ordered sequences, and SQL with unordered sets. Therefore, some Linq operations add extremely inefficient sort operators to SQL.
2) People do not understand the mechanisms of IQuryable-providers and how queries are executed in the database. Read more in a previous post - habrahabr.ru/post/230479

But there are also bugs in providers that lead to the generation of requests that are far from optimal.

For example, in the Entity Framework there is a bug when using navigation properties:

context.Orders .Where(o => o.Id == id) .SelectMany(o => o.OrderLines) .Select(l => l.Product) .ToList();

Such a query generates the following SQL:

Lot of code

  [Project1].[Id] AS [Id], [Project1].[OrderDate] AS [OrderDate], [Project1].[UserId] AS [UserId], [Project1].[C1] AS [C1], [Project1].[OrderId] AS [OrderId], [Project1].[ProductId] AS [ProductId], [Project1].[Id1] AS [Id1], [Project1].[Title] AS [Title] FROM ( SELECT [Extent1].[Id] AS [Id], [Extent1].[OrderDate] AS [OrderDate], [Extent1].[UserId] AS [UserId], [Join1].[OrderId] AS [OrderId], [Join1].[ProductId] AS [ProductId], [Join1].[Id] AS [Id1], [Join1].[Title] AS [Title], CASE WHEN ([Join1].[OrderId] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1] FROM [dbo].[Orders] AS [Extent1] LEFT OUTER JOIN (SELECT [Extent2].[OrderId] AS [OrderId], [Extent2].[ProductId] AS [ProductId], [Extent3].[Id] AS [Id], [Extent3].[Title] AS [Title] FROM [dbo].[OrderLines] AS [Extent2] INNER JOIN [dbo].[Products] AS [Extent3] ON [Extent2].[ProductId] = [Extent3].[Id] ) AS [Join1] ON [Extent1].[Id] = [Join1].[OrderId] WHERE [Extent1].[Id] = @p__linq__0 ) AS [Project1] ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC

In this query, the calculated field and sorting by it cannot be optimized by SQL Server and you have to perform a real sort.

But if you slightly rewrite Linq's request to use the join operator, then there will be no problem:

 var orders1 = from o in context.Orders where o.Id == id join ol in context.OrderLines on o.Id equals ol.OrderId into j from p in j.DefaultIfEmpty() select p.Product; orders1.ToArray();

SQL obtained:

 SELECT [Extent3].[Id] AS [Id], [Extent3].[Title] AS [Title] FROM [dbo].[Orders] AS [Extent1] LEFT OUTER JOIN [dbo].[OrderLines] AS [Extent2] ON [Extent1].[Id] = [Extent2].[OrderId] LEFT OUTER JOIN [dbo].[Products] AS [Extent3] ON [Extent2].[ProductId] = [Extent3].[Id] WHERE [Extent1].[Id] = @p__linq__0

It is well covered with indexes and optimized by SQL Server.

I also heard about ineffective requests from NHibernate, but did not work with it so actively to find such bugs.

Myth number 3

Slowly running mapping.

The conversion of the DataReader into a set of objects is performed in a fraction of a microsecond per object. And the linq2db provider manages to do it faster than the advertised Dapper.

But what can work slowly is attaching the received objects to the Change Tracking context. But this must be done only in the case when the objects will be modified and recorded in the database. In other cases, you can explicitly specify that the objects do not join the context or use projections.

Myth number 4

Slowly generated queries.

Indeed, to generate a SQL query from Linq requires a traversal of the tree, a lot of work with reflection and analysis of metadata. But in all providers, such analysis is performed once, and then the data is cached.

As a result, for simple requests, the generation of a request is performed on average over 0.4 ms. For complex ones, this can be up to several milliseconds.
This time is usually less than the statistical error of the total query execution time.

Myth number 5

Cannot use hints.

In SQL Server, there is a Plan Guide mechanism that allows you to add hints to any query. Similar mechanisms exist in other DBMSs.

Even so, hints are not much needed when using Linq. Linq generates fairly simple queries that the DBMS itself optimizes in the presence of statistics, indexes and constraints. Hints of locks should be replaced by setting the correct isolation levels and limiting the number of rows requested.

Myth number 6

Linq cannot use all SQL features.

This is partly true. But many SQL features can be wrapped in functions or views, and they can already be used in Linq queries.

Moreover, the Entity Framework allows you to perform any SQL queries, and results on objects, including those with Change Tracking.

Myth number 7

Stored procedures run faster than ad-hoc queries generated by Linq.

This was true in the mid-90s. Today, all DBMSs “compile” requests and cache plans, regardless of whether this is a procedure or an ad-hoc request.

Here is a brief set of myths that can be found. If you have more - add.

Source: https://habr.com/ru/post/230623/

All Articles