📜 ⬆️ ⬇️

String aggregation in the SQL Server world

In practice, the task of combining strings into one come across quite often. It is very sad, but the T-SQL standard does not provide for the use of string data inside the SUM aggregate function:

Msg 8117, Level 16, State 1, Line 1
Operand data type char is invalid for sum operator.

Although to solve this kind of problem, the GROUP_CONCAT function was added to MySQL , and LISTAGG was added to Oracle . In turn, SQL Server does not yet have such built-in functionality.
')
However, this should not be considered as a disadvantage, since T-SQL capabilities allow string concatenation to be performed more flexibly and efficiently by using other constructs, which will be discussed later.

Suppose that we need to separate several lines through a comma using one of the following table data:

IF OBJECT_ID('dbo.Chars', 'U') IS NOT NULL DROP TABLE dbo.Chars GO CREATE TABLE dbo.Chars ([Char] CHAR(1) PRIMARY KEY) INSERT INTO dbo.Chars ([Char]) VALUES ('A'), ('B'), ('C'), ('F'), ('D') 

The most obvious solution to this problem is to use the cursor:

 DECLARE @Chars VARCHAR(100) , @Char CHAR(1) DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR SELECT [Char] FROM dbo.Chars OPEN cur FETCH NEXT FROM cur INTO @Char WHILE @@FETCH_STATUS = 0 BEGIN SET @Chars = ISNULL(@Chars + ', ' + @Char, @Char) FETCH NEXT FROM cur INTO @Char END CLOSE cur DEALLOCATE cur SELECT @Chars 

However, its use reduces the efficiency of query execution and, at a minimum, does not look too elegant.

To get rid of it, you can concatenate strings by assigning variables:

 DECLARE @Chars VARCHAR(100) SELECT @Chars = ISNULL(@Chars + ', ' + [Char], [Char]) FROM dbo.Chars SELECT @Chars 

On the one hand, the design turned out very simple, on the other hand, its performance on a large sample leaves much to be desired.

For string aggregation, it is also possible to do it via XML using the following construction:

 SELECT Chars = STUFF(( SELECT ', ' + [Char] FROM dbo.Chars FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 2, '') 

If you look at the plan, you will notice the high cost of calling the value method:



To get rid of this operation, you can rewrite the query using the XQuery properties:

 SELECT Chars = STUFF(CAST(( SELECT [text()] = ', ' + [Char] FROM dbo.Chars FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '') 

As a result, we get a very simple and fast execution plan:



In principle, the concatenation of rows in one column does not cause much difficulty.

More interesting is the situation when you need to perform concatenation in several columns at once. For example, we have the following table:

 IF OBJECT_ID('dbo.EntityValues', 'U') IS NOT NULL DROP TABLE dbo.EntityValues GO CREATE TABLE dbo.EntityValues ( EntityID INT , Value1 CHAR(1) , Value2 CHAR(1) ) CREATE NONCLUSTERED INDEX IX_WorkOut_EntityID ON dbo.EntityValues (EntityID) GO INSERT INTO dbo.EntityValues (EntityID, Value1, Value2) VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (2, 'C', 'Z'), (2, 'F', 'H'), (1, 'D', 'R') 

in which it is necessary to group the data as follows:



Alternatively, you can copy calls to XML , but then we get duplicate reads that can significantly affect the efficiency of the query:

 SELECT ev.EntityID , Values1 = STUFF(CAST(( SELECT [text()] = ', ' + ev2.Value1 FROM dbo.EntityValues ev2 WHERE ev2.EntityID = ev.EntityID FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '') , Values2 = STUFF(CAST(( SELECT [text()] = ', ' + ev2.Value2 FROM dbo.EntityValues ev2 WHERE ev2.EntityID = ev.EntityID FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '') FROM ( SELECT DISTINCT EntityID FROM dbo.EntityValues ) ev 

This can be easily seen if you look at the implementation plan:



To reduce repeated readings, you can use a small XML hack:

 SELECT ev.EntityID , Values1 = STUFF(REPLACE( CAST([XML].query('for $a in /a return xs:string($a)') AS VARCHAR(100)), ' ,', ','), 1, 1, '') , Values2 = STUFF(REPLACE( CAST([XML].query('for $b in /b return xs:string($b)') AS VARCHAR(100)), ' ,', ','), 1, 1, '') FROM ( SELECT DISTINCT EntityID FROM dbo.EntityValues ) ev CROSS APPLY ( SELECT [XML] = CAST(( SELECT [a] = ', ' + ev2.Value1 , [b] = ', ' + ev2.Value2 FROM dbo.EntityValues ev2 WHERE ev2.EntityID = ev.EntityID FOR XML PATH('') ) AS XML) ) t 

But this query will also not be optimal due to repeated calling the query method.

You can use the cursor:

 IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL DROP TABLE #EntityValues GO SELECT DISTINCT EntityID , Values1 = CAST(NULL AS VARCHAR(100)) , Values2 = CAST(NULL AS VARCHAR(100)) INTO #EntityValues FROM dbo.EntityValues DECLARE @EntityID INT , @Value1 CHAR(1) , @Value2 CHAR(1) DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR SELECT EntityID , Value1 , Value2 FROM dbo.EntityValues OPEN cur FETCH NEXT FROM cur INTO @EntityID , @Value1 , @Value2 WHILE @@FETCH_STATUS = 0 BEGIN UPDATE #EntityValues SET Values1 = ISNULL(Values1 + ', ' + @Value1, @Value1) , Values2 = ISNULL(Values2 + ', ' + @Value2, @Value2) WHERE EntityID = @EntityID FETCH NEXT FROM cur INTO @EntityID , @Value1 , @Value2 END CLOSE cur DEALLOCATE cur SELECT * FROM #EntityValues 

However, as practice has shown, when working with large ETL packages, the most productive solution is the possibility of assigning variables in the UPDATE construction:

 IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL DROP TABLE #EntityValues GO DECLARE @Values1 VARCHAR(100) , @Values2 VARCHAR(100) SELECT EntityID , Value1 , Value2 , RowNum = ROW_NUMBER() OVER (PARTITION BY EntityID ORDER BY 1/0) , Values1 = CAST(NULL AS VARCHAR(100)) , Values2 = CAST(NULL AS VARCHAR(100)) INTO #EntityValues FROM dbo.EntityValues UPDATE #EntityValues SET @Values1 = Values1 = CASE WHEN RowNum = 1 THEN Value1 ELSE @Values1 + ', ' + Value1 END , @Values2 = Values2 = CASE WHEN RowNum = 1 THEN Value2 ELSE @Values2 + ', ' + Value2 END SELECT EntityID , Values1 = MAX(Values1) , Values2 = MAX(Values2) FROM #EntityValues GROUP BY EntityID 

SQL Server does not have a built-in counterpart to the GROUP_CONCAT and LISTAGG functions . However, this does not prevent, depending on the situation, effectively performing string concatenation tasks. The purpose of this post is to show it clearly.

Source: https://habr.com/ru/post/200120/


All Articles