PIVOT

In modern information systems, the decision-making process is often based on consolidated information. In practice, when developing business logic operating with such information, it is very often necessary to convert rows into columns.

The T-SQL syntax has a separate PIVOT construct for performing this conversion. It is worth noting that in SQL Server 2000 there was no support for the PIVOT construct, so similar problems were solved through multiple CASE WHEN.

Actually, why did I mention CASE WHEN , if there is a PIVOT ? Indeed, by definition, the PIVOT is a more elegant design and, accordingly, should be more efficient.
')
Let's test it in practice ...

We will create a table that will contain information about the employee’s exit to the workplace.

IF OBJECT_ID('dbo.WorkOut', 'U') IS NOT NULL DROP TABLE dbo.WorkOut GO CREATE TABLE dbo.WorkOut ( DateOut DATETIME NOT NULL, EmployeeID INT NOT NULL, CONSTRAINT PK_WorkOut PRIMARY KEY CLUSTERED (DateOut, EmployeeID) ) GO

And fill it with test data:

 INSERT INTO dbo.WorkOut (EmployeeID, DateOut) SELECT TOP 1500000 ao.[object_id], ao1.modify_date FROM sys.all_objects ao CROSS JOIN sys.all_objects ao1

Next, we will write a PIVOT request that will return the number of exits for each employee in the context of days:

 SELECT * FROM ( SELECT EmployeeID , [WeekDay] = DATENAME(WEEKDAY, DateOut) FROM dbo.WorkOut ) t PIVOT ( COUNT([WeekDay]) FOR [WeekDay] IN ( Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday ) ) p

When executing the request, we get the following plan and execution time:

SQL Server Execution Times:
CPU time = 5662 ms, elapsed time = 8075 ms.

On the plan, you can see the operators Sort and Hash Match . Their efficient operation depends very much on the size of the incoming data and the available amount of physical memory in order to process this very data.

If it is impossible to allocate the required amount of memory, processing of the results will occur in the tempdb database (exclamation mark) - this can lead to a noticeable load on the disk subsystem and an increase in the query execution time:

SQL Server Execution Times:
CPU time = 6193 ms, elapsed time = 9571 ms.

Let's see how the construction of CASE WHEN conditions with similar functionality behaves:

 SELECT EmployeeID , Monday = COUNT(CASE WHEN [WeekDay] = 'Monday' THEN 1 END) , Tuesday = COUNT(CASE WHEN [WeekDay] = 'Tuesday' THEN 1 END) , Wednesday = COUNT(CASE WHEN [WeekDay] = 'Wednesday' THEN 1 END) , Thursday = COUNT(CASE WHEN [WeekDay] = 'Thursday' THEN 1 END) , Friday = COUNT(CASE WHEN [WeekDay] = 'Friday' THEN 1 END) , Saturday = COUNT(CASE WHEN [WeekDay] = 'Saturday' THEN 1 END) , Sunday = COUNT(CASE WHEN [WeekDay] = 'Sunday' THEN 1 END) FROM ( SELECT EmployeeID , [WeekDay] = DATENAME(WEEKDAY, DateOut) FROM dbo.WorkOut ) t GROUP BY EmployeeID

When performing we will get a simpler plan. At the same time, the execution time will not differ too much from PIVOT (of course, within the framework of the error):

SQL Server Execution Times:
CPU time = 5201 ms, elapsed time = 8400 ms.

In conditions of low memory, we get the following results:

SQL Server Execution Times:
CPU time = 6006 ms, elapsed time = 13883 ms.

From the obtained data, you can make a small observation - when aggregating data in a single column, there is a clear advantage over the PIVOT design. Even in a situation where there is a shortage of memory to handle the results.

Now let's see how these examples behave when increasing the number of columns by which the aggregation goes.

1. Grouping by section: employee + year:

 SELECT EmployeeID , [Year] , Monday = COUNT(CASE WHEN [WeekDay] = 'Monday' THEN 1 END) , ... , Sunday = COUNT(CASE WHEN [WeekDay] = 'Sunday' THEN 1 END) FROM ( SELECT EmployeeID , [Year] = YEAR(DateOut) , [WeekDay] = DATENAME(WEEKDAY, DateOut) FROM dbo.WorkOut ) t GROUP BY EmployeeID, [Year]

SQL Server Execution Times:
CPU time = 5569 ms, elapsed time = 9200 ms.

 SELECT * FROM ( SELECT EmployeeID , [Year] = YEAR(DateOut) , [WeekDay] = DATENAME(WEEKDAY, DateOut) FROM dbo.WorkOut ) t PIVOT ( COUNT([WeekDay]) FOR [WeekDay] IN ( Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday ) ) p

SQL Server Execution Times:
CPU time = 5454 ms, elapsed time = 8878 ms.

If you compare the plans, you can see that the Hash Match operation is more costly when using PIVOT , but the execution time suggests otherwise.

2. Grouping by section: employee + year + month

 SELECT EmployeeID , [Year] , [Month] , Monday = COUNT(CASE WHEN [WeekDay] = 'Monday' THEN 1 END) , ... , Sunday = COUNT(CASE WHEN [WeekDay] = 'Sunday' THEN 1 END) FROM ( SELECT EmployeeID , [Year] = YEAR(DateOut) , [Month] = MONTH(DateOut) , [WeekDay] = DATENAME(WEEKDAY, DateOut) FROM dbo.WorkOut ) t GROUP BY EmployeeID, [Year], [Month]

SQL Server Execution Times:
CPU time = 6365 ms, elapsed time = 9979 ms.

 SELECT * FROM ( SELECT EmployeeID , [Year] = YEAR(DateOut) , [Month] = MONTH(DateOut) , [WeekDay] = DATENAME(WEEKDAY, DateOut) FROM dbo.WorkOut ) t PIVOT ( COUNT([WeekDay]) FOR [WeekDay] IN ( Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday ) ) p

SQL Server Execution Times:
CPU time = 6193 ms, elapsed time = 9861 ms.

As a matter of fact, the situation repeats - SQL Server estimates the PIVOT construction as more costly.

But runtime again puts everything in its place.

Small conclusions can be drawn from this: in the overwhelming majority of situations, using the PIVOT construct, you can quickly convert columns to rows.

A small note is the following: with an increase in the number of columns by which aggregation is taking place, the difference in execution time between PIVOT and CASE WHEN will decrease and at some point will be within the framework of measurement error.

All experiments were conducted on SQL Server 2012 SP1 (11.00.3128) .

Source: https://habr.com/ru/post/206630/

All Articles

PIVOT

More articles: