Oh, these query plans

The story is as old as the world. Two tables:

Cities - 100 unique cities.
People - 10 million people. For some people, the city may not be specified.

The distribution of people in cities is even.
Indices on the fields Cites.Id, Cites.Name, People .CityId - in stock.

You need to select the first 100 People records sorted by Cites.
')

Rolling up our sleeves, we cheerfully write:

select top 100 p.Name, c.Name as City from People p
left join Cities c on c.Id=p.CityId
order by c.Name

In this case, we get something like:

For ... 6 seconds. (MS SQL 2008 R2, i5 / 4Gb)

But how so! 6 seconds from where ?! We know that in the first 100 records there will be only Almaty! After all, there are 10 million records, and this means that the city accounts for 100 thousand each. Even if this is not the case, we can select the first city in the list and check whether it has at least 100 inhabitants.

Why the SQL server, having statistics, does not do so:

select * from People p
left join Cities c on c.Id=p.CityId
where p.CityId
in (select top 1 id from Cities order by Name)
order by c.[Name]

This query returns approximately 100k records in less than a second! We made sure that there are 100 required records and gave them very, very quickly.

However, MSSQL does everything according to plan. And he has a plan, "pure thermonuclear" (c).

Question to experts:
How do I need to correct the SQL query or do some actions on the server in order to get the result 10 times faster on the first query?

PS

CREATE TABLE [dbo].[People] (
[Id] uniqueidentifier NOT NULL,
[Name] nvarchar(50) NOT NULL,
[CityId] uniqueidentifier
)
ON [PRIMARY]
GO

CREATE TABLE [dbo].[Cities] (
[Id] uniqueidentifier NOT NULL,
[Name] nvarchar(50) NOT NULL,
)
ON [PRIMARY]
GO

Pps
Where the legs grow from:
The task is very real. There is a table with the main entity, many dimensions depart from it according to the “star” principle. The user needs to display it in the grid, providing sorting by fields.
Starting from a certain size of the main table, sorting boils down to the fact that a window is selected with the same (extreme) values (like "Almaty") but at the same time, the system begins to slow down terribly.
I would like to have ONE parameterized query that will work effectively with both the small size of the People table and the large one.

PPPS
Interestingly, if City would have been NotNull and used by InnerJoin, then the query is executed instantly.
Interestingly, EVEN IF the City field would have been NotNull but LeftJoin was used - that query slows down.

In the comments idea: First select all InnerJoin and then Union by Null values. Tomorrow I will check this and other crazy ideas)

PPPPS tried. It worked!

WITH Help AS
(
select top 100 p.Name, c.Name as City from People p
INNER join Cities c on c.Id = p.CityId
order by c.Name ASC
UNION
select top 100 p.Name, NULL as City from People p
WHERE p.CityId IS NULL
)
SELECT TOP 100 * FROM help

Gives 150 milliseconds under the same conditions! Thank you holem .

Source: https://habr.com/ru/post/98622/

All Articles

Oh, these query plans

More articles: