How to replace old indexes and not to break the system?

It is possible that many people came across a situation that historically developed over the years, before appearing on a project, when all possible indexes with all include'ami were created on the table. I saw the index on the inherited database, which contained all the fields in the table. At the same time, it is not always possible to quickly change the indices, as often we need a guarantee that the changes will not affect the performance of the system.

With the growth of the volume of the table, it becomes painfully painful for the aimlessly occupied place, but just because the index can not be killed, and usage statistics show that the index is used.
The described example of consolidation of indices on a high-load database operating 24/7. The application uses only stored procedures. Version of MS SQL Server 2012 SP3.

Source table with clustered index:
')

CREATE TABLE [dbo].[ClientFile]( [StorageId] [int] NOT NULL, [FolderId] [int] NOT NULL, [ClientFileInternalId] [int] IDENTITY(1,1) NOT NULL, [FileName] [nvarchar](900) NOT NULL, [FileExtension] [nvarchar](10) NOT NULL, [FileClientVersionId] [smallint] NOT NULL, [ClientFileVersionId] [int] NULL, [FileInternalId] [bigint] NOT NULL, [FileLength] [bigint] NOT NULL, [OrderId] [tinyint] NOT NULL, [FileFileExtensionId] [int] NULL, [FileStatus] [tinyint] NOT NULL, [DirectoryVersionId] [int] NOT NULL, [DateDeleted] [datetime] NULL, CONSTRAINT [PK_ClientFile] PRIMARY KEY CLUSTERED ( [StorageId] ASC, [ClientFileInternalId] ASC ) )

Noncluster indices:

 CREATE NONCLUSTERED INDEX [IX_ClientFile_StorageId_FileStatus] ON [dbo].[ClientFile] ( [StorageId] ASC, [FileStatus] ASC, [OrderId] DESC ) INCLUDE ( [ClientFileInternalId], [FolderId], [DirectoryVersionId], [FileInternalId], [FileClientVersionId], [FileLength]); CREATE NONCLUSTERED INDEX [IX_ClientFile_StorageId_FolderId_FileStatus_FileName] ON [dbo].[ClientFile] ( [StorageId] ASC, [FolderId] ASC, [FileStatus] ASC, [FileName] ASC ) INCLUDE ( [DateDeleted], [DirectoryVersionId], [FileExtension], [FileInternalId], [FileClientVersionId], [FileLength]);

The 1m index of 14 fields in the table contains 9, and in the second - 10.

These 2 indexes in total occupy up to 180 GB on each server, servers 12. This is frustrating and troubling, since indexes with matching fields, as well as 6 included fields in each. In addition, sometimes the server selects these indexes in cases where it would be more efficient to use a clustered index, which required recompilation of the procedure. After recompilation, the cluster index was already used and the load on the CPU decreased.

Step 1. Analysis of statistics on the use of indices

Servers collect information about the use of indexes on the ClientFile table.

Statistics use indexes on the table

 declare @dbid int select @dbid = db_id() select (cast((user_seeks + user_scans + user_lookups) as float) / case user_updates when 0 then 1.0 else cast(user_updates as float) end) * 100 as [%] , (user_seeks + user_scans + user_lookups) AS total_usage , objectname=object_name(s.object_id), s.object_id , indexname=i.name, i.index_id , user_seeks, user_scans, user_lookups, user_updates , last_user_seek, last_user_scan, last_user_update , last_system_seek, last_system_scan, last_system_update , 'DROP INDEX ' + i.name + ' ON ' + object_name(s.object_id) as [Command] from sys.dm_db_index_usage_stats s, sys.indexes i where database_id = @dbid and objectproperty(s.object_id,'IsUserTable') = 1 and i.object_id = s.object_id and i.index_id = s.index_id and i.name not like 'PK_%' and object_name(s.object_id) = 'ClientFile' order by [%] asc

In the described case, both indexes are used. If an index is not used, it makes the task easier and often it can be simply deleted. These statistics are cleared when restarting MS SQL Server, when deciding whether to remove an index based on it, you need to make sure that the database does not have any very important report that is counted once a month and uses this index.

Step 2. Select all the plans from the cache, which have the use of the described 2 indices.

To do this, use the spGetPlanUsingIndex procedure (the main query is taken from the Jonathan Kehayias articles www.sqlskills.com/blogs/jonathan ), which adds usage statistics to a table. Jobe is set up with the launch of the procedure for collecting statistics every 1 hour.

The important point is that not all plans will be cached, for example, procedures with the RECOMPILE hint. If such a hint is used, the procedures should be checked and formulated for them and included in the analysis.

Fetch plans from cache with specified index

 CREATE TABLE [dbo].[LogDataFileIndexUsage]( [PlanId] [int] IDENTITY(1,1) NOT NULL, [PlanDate] [datetime] NOT NULL CONSTRAINT [DF_LogDataFileIndexUsage_PlanDate] DEFAULT (getutcdate()), [DBname] [varchar](50) NULL, [SPname] [varchar](256) NULL, [ScanCount] [int] NULL, [SeekCount] [int] NULL, [UpdateCount] [int] NULL, [RefCount] [int] NULL, [UseCount] [int] NULL, [QueryPlan] [xml] NULL ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] SET ANSI_PADDING OFF ALTER TABLE [dbo].[LogDataFileIndexUsage] ADD [IndexName] [varchar](256) NULL PRIMARY KEY CLUSTERED ( [PlanId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO CREATE PROCEDURE [dbo].[spGetPlanUsingIndex] @indexName NVARCHAR(128) AS BEGIN SET NOCOUNT ON; SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED -- Make sure the name passed is appropriately quoted IF (LEFT(@IndexName, 1) <> '[' AND RIGHT(@IndexName, 1) <> ']') SET @IndexName = QUOTENAME(@IndexName); WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') INSERT INTO [dbo].[LogDataFileIndexUsage] ( DBName, SPname, ScanCount, SeekCount, UpdateCount, RefCount, UseCount, QueryPlan, IndexName ) SELECT DB_NAME(E.dbid) AS [DBName], object_name(E.objectid, dbid) AS [ObjectName], E.query_plan.value('count(//RelOp[@LogicalOp = ''Index Scan'' or @LogicalOp = ''Clustered Index Scan'']/*/Object[@Index=sql:variable("@IndexName")])','int') AS [ScanCount], E.query_plan.value('count(//RelOp[@LogicalOp = ''Index Seek'' or @LogicalOp = ''Clustered Index Seek'']/*/Object[@Index=sql:variable("@IndexName")])','int') AS [SeekCount], E.query_plan.value('count(//Update/Object[@Index=sql:variable("@IndexName")])','int') AS [UpdateCount], P.refcounts AS [RefCounts], P.usecounts AS [UseCounts], E.query_plan AS [QueryPlan], @IndexName FROM sys.dm_exec_cached_plans P CROSS APPLY sys.dm_exec_query_plan(P.plan_handle) E WHERE E.query_plan.exist('//*[@Index=sql:variable("@IndexName")]') = 1 OPTION(MAXDOP 1, RECOMPILE); END GO

Step 3. Analysis of the collected data

In this case, the analysis showed that both indexes are used by the same procedures, that is, the procedure uses, then one index then the other, for the same query.
There are a total of 24 procedures that use these indexes. For each procedure, it is analyzed by what fields the data is filtered, the JOIN is made, and which fields are specified in the SELECT.

This was all done manually in the excel spreadsheet. Now I understand that it was possible to avoid such amount of manual labor by writing a request with a sample of this from the xml plan. Fields to select: seek predicate, predicate and which fields are selected from the table.

Step 4. Formation of a new index

Based on the data obtained in the analysis of the use of indices in the plans, a new index was formed.

 CREATE NONCLUSTERED INDEX [IX_ClientFile_StorageId_FolderId_FileStatus] ON [dbo].[ClientFile] ([StorageId], [FolderId], [FileStatus]);

Step 5. Testing

On the project, when making changes, load testing is necessarily done. The load test with the new index showed that it is necessary to add the Name field to the index in order to remove the Key lookup. The field is added because it is used in the WHERE clause.
Also after the test, the information in the missing index is checked.

Missing index

 SELECT mig.index_group_handle, mid.index_handle, CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) ) AS improvement_measure, 'CREATE INDEX missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle) + ' ON ' + mid.statement + ' (' + ISNULL (mid.equality_columns,'') + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL (mid.inequality_columns, '') + ')' + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') + ' with (online=on)' AS create_index_statement, migs.*, mid.*, mid.database_id, mid.[object_id] FROM sys.dm_db_missing_index_groups mig INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle WHERE CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10 --AND database_id = 12 AND mid.statement like ('%[ClientFile]') ORDER BY convert(varchar(10), last_user_seek, 120) desc, migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) /*last_user_seek*/ DESC

Total index

 CREATE NONCLUSTERED INDEX [IX_ClientFile_StorageId_FolderId_FileStatus_FileName] ON [dbo].[ClientFile] ([StorageId], [FolderId], [FileStatus], [FileName]);

Step 6. Apply the changes

The table has 700 million rows, so the new index was applied by creating a new table and migrating data to it.

Saving disk space 45% on each server. The server often uses a clustered index, which reduces the number of key lookup in the plans.

Source: https://habr.com/ru/post/344284/

All Articles