Three aspects of optimization (DB and software)

Foreword

Quite often, users, developers, administrators, etc. MS SQL Server DBMS encounter problems with the performance of the database or the database management system as a whole.

This article will give general recommendations on how to optimize both the database and the entire DBMS as a whole. There will also be given basic recommendations on the interaction of the .NET application and MS SQL Server. Examples of solutions to most of the recommendations below will be given.

This article will not consider the optimization of hardware resources, the OS itself and the use of various built-in features for the DBMS and the OS as a whole, t this would take a whole book.

Decision

In total, there are only 3 blocks of recommendations for optimizing the database:
')
1) directly optimization of the database and database management system as a whole
2) optimization of the interaction between the application and MS SQL Server (we will further discuss the interaction between the .NET application and MS SQL Server)
3) optimization of the queries themselves

We first analyze the first block.

There are only 3 main recommendations for optimizing the database and the database in general:

1) procedural cache obsolescence
2) non-optimal indices
3) non-optimal statistics

This unit needs to be worked out within the framework of routine work from 1 time per day to 1 time per week, depending on the needs of the entire information system. It is also worth considering that during the operation of this block, the databases and the DBMS as a whole will be heavily loaded with resources. Therefore, this work should be carried out either in the hours of minimum load, or on the backup server, or by distributing work throughout the day (in the latter version, then item 1 is not applicable).

It is also important to note that this block should be performed after all mass data processing in the framework of other routine maintenance.

Usually, item 2 is first performed (optimize indexes), and then item 1 (the procedural cache is cleared), and then item 3 is done (update statistics).

Let us analyze each item of the first block separately.

Item 1 on the obsolescence of the procedural cache is solved by simply clearing this cache by calling a simple command:

DBCC FLUSHPROCINDB ('NAME_BD');

However, it is important to note that such a solution does not always fit all databases and all DBMSs. This solution is suitable for databases with the following characteristics:

1) DB data size up to 500 GB
2) the total amount of data in the database is changing every day, that is, not only do new data appear, as well as significant amounts are updated and deleted

Examples of such databases are TFS, CRM, NAV and 1C.

If the database has a data size of more than 500 GB, or the data is only added, but it changes and deletes in extremely small amounts (it means that the amount of changes is so many times less than the total amount of unchanged data, then this amount can be neglected). The solution must first be tested on a test environment that is as close to production as possible. In any case, when clearing the procedural cache for such databases and the DBMS as a whole, the subsequent update of statistics will be a very long and resource-intensive operation. Therefore, for such databases and the DBMS as a whole, claim 1 for cleaning the procedural cache must first be canceled — at least before receiving the test results on the test environment.

For databases in which you can apply Clause 1, and DBMS containing only such databases, you can implement the following stored procedure [srv]. [AutoUpdateStatisticsCache] for later use:

An example implementation of the stored procedure [srv]. [AutoUpdateStatisticsCache]

USE [  ] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE PROCEDURE [srv].[AutoUpdateStatisticsCache] @DB_Name nvarchar(255)=null, @IsUpdateStatistics bit=0 AS BEGIN /*           */ SET NOCOUNT ON; declare @tbl table (Name nvarchar(255), [DB_ID] int); declare @db_id int; declare @name nvarchar(255); declare @str nvarchar(255); --  ,        insert into @tbl(Name, [DB_ID]) select name, database_id from sys.databases where name not in ('master', 'tempdb', 'model', 'msdb', 'distribution') and is_read_only=0 --write and state=0 --online and user_access=0 --MULTI_USER and is_auto_close_on=0 and (name=@DB_Name or @DB_Name is null); while(exists(select top(1) 1 from @tbl)) begin --    select top(1) @db_id=[DB_ID] , @name=Name from @tbl; --   id  DBCC FLUSHPROCINDB(@db_id); if(@IsUpdateStatistics=1) begin --  set @str='USE'+' ['+@name+']; exec sp_updatestats;' exec(@str); end delete from @tbl where [DB_ID]=@db_id; end END GO

Here at the end, if the @IsUpdateStatistics parameter is set to 1, then the statistics for the specified database in the @DB_Name parameter are also updated. If @ IsUpdateStatistics = 1, then the item 1 for cleaning the procedural cache should be done after solving the problem of item 2, that is, in this case, the problem of item 3 for non-optimal statistics is closed.

The current size of the entire plan cache and query plan cache can be viewed by implementing for example the following view in the administration database:

An example of the implementation of the representation [inf]. [VSizeCache]

 USE [  ] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE view [inf].[vSizeCache] as --         (https://club.directum.ru/post/1125) with tbl as ( select TotalCacheSize = SUM(CAST(size_in_bytes as bigint)) / 1048576, QueriesCacheSize = SUM(CAST((case when objtype in ('Adhoc', 'Prepared') then size_in_bytes else 0 end) as bigint)) / 1048576, QueriesUseMultiCountCacheSize = SUM(CAST((case when ((objtype in ('Adhoc', 'Prepared')) and (usecounts>1)) then size_in_bytes else 0 end) as bigint)) / 1048576, QueriesUseOneCountCacheSize = SUM(CAST((case when ((objtype in ('Adhoc', 'Prepared')) and (usecounts=1)) then size_in_bytes else 0 end) as bigint)) / 1048576 from sys.dm_exec_cached_plans ) select 'Queries' as 'Cache', (select top(1) QueriesCacheSize from tbl) as 'Cache Size (MB)', CAST((select top(1) QueriesCacheSize from tbl) * 100 / (select top(1) TotalCacheSize from tbl) as int) as 'Percent of Total/Queries' union all select 'Total' as 'Cache', (select top(1) TotalCacheSize from tbl) as 'Cache Size (MB)', 100 as 'Percent of Total/Queries' union all select 'Queries UseMultiCount' as 'Cache', (select top(1) QueriesUseMultiCountCacheSize from tbl) as 'Cache Size (MB)', CAST((select top(1) QueriesUseMultiCountCacheSize from tbl) * 100 / (select top(1) QueriesCacheSize from tbl) as int) as 'Percent of Queries/Queries' union all select 'Queries UseOneCount' as 'Cache', (select top(1) QueriesUseOneCountCacheSize from tbl) as 'Cache Size (MB)', CAST((select top(1) QueriesUseOneCountCacheSize from tbl) * 100 / (select top(1) QueriesCacheSize from tbl) as int) as 'Percent of Queries/Queries' --option(recompile) GO

Now we will consider item 2. about non-optimal indices.

Under non-optimal indices we will understand the following 4 factors:

1) highly fragmented indices
2) unused indexes
3) missing indexes
4) indexes that use more costs for their maintenance than they bring in performance gains

By a highly fragmented index we mean the following indicators of fragmentation:

1) more than 30% for indexes whose size is at least 20 pages
2) more than 20% for indexes whose size is at least 100 pages
3) more than 10% for indexes whose size is at least 500 pages

The indices of p.2 and p.4 can be determined, for example, using the following representation [inf]. [VDelIndexOptimize] for a specific database:

An example of the implementation of the representation [inf]. [VDelIndexOptimize]

 USE [_] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE view [inf].[vDelIndexOptimize] as /*   ,          ,   .  master, model, msdb  tempdb   */ select DB_NAME(t.database_id) as [DBName] , SCHEMA_NAME(obj.schema_id) as [SchemaName] , OBJECT_NAME(t.object_id) as [ObjectName] , obj.Type as [ObjectType] , obj.Type_Desc as [ObjectTypeDesc] , ind.name as [IndexName] , ind.Type as IndexType , ind.Type_Desc as IndexTypeDesc , ind.Is_Unique as IndexIsUnique , ind.is_primary_key as IndexIsPK , ind.is_unique_constraint as IndexIsUniqueConstraint , (t.[USER_SEEKS]+t.[USER_SCANS]+t.[USER_LOOKUPS]+t.[SYSTEM_SEEKS]+t.[SYSTEM_SCANS]+t.[SYSTEM_LOOKUPS])-(t.[USER_UPDATES]+t.[System_Updates]) as [index_advantage] , t.[Database_ID] , t.[Object_ID] , t.[Index_ID] , t.USER_SEEKS , t.USER_SCANS , t.USER_LOOKUPS , t.USER_UPDATES , t.SYSTEM_SEEKS , t.SYSTEM_SCANS , t.SYSTEM_LOOKUPS , t.SYSTEM_UPDATES , t.Last_User_Seek , t.Last_User_Scan , t.Last_User_Lookup , t.Last_System_Seek , t.Last_System_Scan , t.Last_System_Lookup , ind.Filter_Definition, STUFF( ( SELECT N', [' + [name] +N'] '+case ic.[is_descending_key] when 0 then N'ASC' when 1 then N'DESC' end FROM sys.index_columns ic INNER JOIN sys.columns c on c.[object_id] = obj.[object_id] and ic.[column_id] = c.[column_id] WHERE ic.[object_id] = obj.[object_id] and ic.[index_id]=ind.[index_id] and ic.[is_included_column]=0 order by ic.[key_ordinal] asc FOR XML PATH(''),TYPE ).value('.','NVARCHAR(MAX)'),1,2,'' ) as [Columns], STUFF( ( SELECT N', [' + [name] +N']' FROM sys.index_columns ic INNER JOIN sys.columns c on c.[object_id] = obj.[object_id] and ic.[column_id] = c.[column_id] WHERE ic.[object_id] = obj.[object_id] and ic.[index_id]=ind.[index_id] and ic.[is_included_column]=1 order by ic.[key_ordinal] asc FOR XML PATH(''),TYPE ).value('.','NVARCHAR(MAX)'),1,2,'' ) as [IncludeColumns] from sys.dm_db_index_usage_stats as t inner join sys.objects as obj on t.[object_id]=obj.[object_id] inner join sys.indexes as ind on t.[object_id]=ind.[object_id] and t.index_id=ind.index_id where ((last_user_seek is null or last_user_seek <dateadd(year,-1,getdate())) and (last_user_scan is null or last_user_scan <dateadd(year,-1,getdate())) and (last_user_lookup is null or last_user_lookup <dateadd(year,-1,getdate())) and (last_system_seek is null or last_system_seek <dateadd(year,-1,getdate())) and (last_system_scan is null or last_system_scan <dateadd(year,-1,getdate())) and (last_system_lookup is null or last_system_lookup <dateadd(year,-1,getdate())) or (((t.[USER_UPDATES]+t.[System_Updates])>0) and (t.[SYSTEM_SEEKS]<=(t.[USER_UPDATES]+t.[System_Updates]-(t.[USER_SEEKS]+t.[USER_SCANS]+t.[USER_LOOKUPS]+t.[SYSTEM_SCANS]+t.[SYSTEM_LOOKUPS]))))) and t.database_id>4 and t.[object_id]>0 and ind.is_primary_key=0 --     and ind.is_unique_constraint=0 --    and t.database_id=DB_ID() GO

It is also important to analyze the index overlap . To do this, create a view [srv]. [VDelIndexInclude] in the database in question:

An example of the implementation of the representation [srv]. [VDelIndexInclude]

 USE [_] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE view [srv].[vDelIndexInclude] as /*  ..  () .                 ,     ,        . http://www.sql.ru/blogs/andraptor/1218 */ WITH cte_index_info AS ( SELECT tSS.[name] AS [SchemaName] ,tSO.[name] AS [ObjectName] ,tSO.[type_desc] AS [ObjectType] ,tSO.[create_date] AS [ObjectCreateDate] ,tSI.[name] AS [IndexName] ,tSI.[is_primary_key] AS [IndexIsPrimaryKey] ,d.[index_type_desc] AS [IndexType] ,d.[avg_fragmentation_in_percent] AS [IndexFragmentation] ,d.[fragment_count] AS [IndexFragmentCount] ,d.[avg_fragment_size_in_pages] AS [IndexAvgFragmentSizeInPages] ,d.[page_count] AS [IndexPages] ,c.key_columns AS [IndexKeyColumns] ,COALESCE(ic.included_columns, '') AS [IndexIncludedColumns] ,tSI.is_unique_constraint FROM ( SELECT tSDDIPS.[object_id] AS [object_id] ,tSDDIPS.[index_id] AS [index_id] ,tSDDIPS.[index_type_desc] AS [index_type_desc] ,MAX(tSDDIPS.[avg_fragmentation_in_percent]) AS [avg_fragmentation_in_percent] ,MAX(tSDDIPS.[fragment_count]) AS [fragment_count] ,MAX(tSDDIPS.[avg_fragment_size_in_pages]) AS [avg_fragment_size_in_pages] ,MAX(tSDDIPS.[page_count]) AS [page_count] FROM [sys].[dm_db_index_physical_stats] (DB_ID(), NULL, NULL , NULL, N'LIMITED') tSDDIPS GROUP BY tSDDIPS.[object_id] ,tSDDIPS.[index_id] ,tSDDIPS.[index_type_desc] ) d INNER JOIN [sys].[indexes] tSI ON tSI.[object_id] = d.[object_id] AND tSI.[index_id] = d.[index_id] INNER JOIN [sys].[objects] tSO ON tSO.[object_id] = d.[object_id] INNER JOIN [sys].[schemas] tSS ON tSS.[schema_id] = tSO.[schema_id] CROSS APPLY ( SELECT STUFF(( SELECT ', ' + c.[name] + CASE ic.[is_descending_key] WHEN 1 THEN '(-)' ELSE '' END FROM [sys].[index_columns] ic INNER JOIN [sys].[columns] c ON c.[object_id] = ic.[object_id] and c.[column_id] = ic.[column_id] WHERE ic.[index_id] = tSI.[index_id] AND ic.[object_id] = tSI.[object_id] AND ic.[is_included_column] = 0 ORDER BY ic.[key_ordinal] FOR XML PATH('') ) ,1, 2, '' ) AS [key_columns] ) c CROSS APPLY ( SELECT STUFF(( SELECT ', ' + c.[name] FROM [sys].[index_columns] ic INNER JOIN [sys].[columns] c ON c.[object_id] = ic.[object_id] AND c.[column_id] = ic.[column_id] WHERE ic.[index_id] = tSI.[index_id] AND ic.[object_id] = tSI.[object_id] AND ic.[is_included_column] = 1 FOR XML PATH('') ) ,1, 2, '' ) AS [included_columns] ) ic WHERE tSO.[type_desc] IN ( N'USER_TABLE' ) AND OBJECTPROPERTY(tSO.[object_id], N'IsMSShipped') = 0 AND d.[index_type_desc] NOT IN ( 'HEAP' ) ) SELECT t1.[SchemaName] ,t1.[ObjectName] ,t1.[ObjectType] ,t1.[ObjectCreateDate] ,t1.[IndexName] as [DelIndexName] ,t1.[IndexIsPrimaryKey] ,t1.[IndexType] ,t1.[IndexFragmentation] ,t1.[IndexFragmentCount] ,t1.[IndexAvgFragmentSizeInPages] ,t1.[IndexPages] ,t1.[IndexKeyColumns] ,t1.[IndexIncludedColumns] ,t2.[IndexName] as [ActualIndexName] FROM cte_index_info t1 INNER JOIN cte_index_info t2 ON t2.[SchemaName] = t1.[SchemaName] AND t2.[ObjectName] = t1.[ObjectName] AND t2.[IndexName] <> t1.[IndexName] AND PATINDEX(REPLACE(t1.[IndexKeyColumns], '_', '[_]') + ',%', t2.[IndexKeyColumns] + ',') > 0 WHERE t1.[IndexIncludedColumns] = '' -- don't check indexes with INCLUDE columns AND t1.[IndexIsPrimaryKey] = 0 -- don't check primary keys AND t1.is_unique_constraint=0 -- don't check unique constraint AND t1.[IndexType] NOT IN ( N'CLUSTERED INDEX' ,N'UNIQUE CLUSTERED INDEX' ) -- don't check clustered indexes GO

It is important to note here that even if the index falls under item 2 or item 4, then there is no need to hurry to delete it. You must make sure that the system does not really need it. To do this, it is necessary on a test environment that is as close as possible to the production environment, to carry out the necessary tests — first with an index, and then with a remote index (take measurements and compare).

Missing indices (p.3) can be determined, for example, using the following representation [inf]. [VRecomendateIndex]:

An example of the implementation of the [inf]. [VRecomendateIndex]

 USE [_] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE view [inf].[vRecomendateIndex] as --    DMV SELECT @@ServerName AS ServerName , DB_Name(ddmid.[database_id]) as [DBName] , t.name AS 'Affected_table' , ddmigs.user_seeks * ddmigs.avg_total_user_cost * (ddmigs.avg_user_impact * 0.01) AS index_advantage, ddmigs.group_handle, ddmigs.unique_compiles, ddmigs.last_user_seek, ddmigs.last_user_scan, ddmigs.avg_total_user_cost, ddmigs.avg_user_impact, ddmigs.system_seeks, ddmigs.last_system_scan, ddmigs.last_system_seek, ddmigs.avg_total_system_cost, ddmigs.avg_system_impact, ddmig.index_group_handle, ddmig.index_handle, ddmid.database_id, ddmid.[object_id], ddmid.equality_columns, -- = ddmid.inequality_columns, ddmid.[statement], ( LEN(ISNULL(ddmid.equality_columns, N'') + CASE WHEN ddmid.equality_columns IS NOT NULL AND ddmid.inequality_columns IS NOT NULL THEN ',' ELSE '' END) - LEN(REPLACE(ISNULL(ddmid.equality_columns, N'') + CASE WHEN ddmid.equality_columns IS NOT NULL AND ddmid.inequality_columns IS NOT NULL THEN ',' ELSE '' END, ',', '')) ) + 1 AS K , COALESCE(ddmid.equality_columns, '') + CASE WHEN ddmid.equality_columns IS NOT NULL AND ddmid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + COALESCE(ddmid.inequality_columns, '') AS Keys , ddmid.included_columns AS [include] , 'Create NonClustered Index IX_' + t.name + '_missing_' + CAST(ddmid.index_handle AS VARCHAR(20)) + ' On ' + ddmid.[statement] COLLATE database_default + ' (' + ISNULL(ddmid.equality_columns, '') + CASE WHEN ddmid.equality_columns IS NOT NULL AND ddmid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL(ddmid.inequality_columns, '') + ')' + ISNULL(' Include (' + ddmid.included_columns + ');', ';') AS sql_statement , ddmigs.user_seeks , ddmigs.user_scans , CAST(( ddmigs.user_seeks + ddmigs.user_scans ) * ddmigs.avg_user_impact AS BIGINT) AS 'est_impact' , ( SELECT DATEDIFF(Second, create_date, GETDATE()) Seconds FROM sys.databases WHERE name = 'tempdb' ) SecondsUptime FROM sys.dm_db_missing_index_groups ddmig INNER JOIN sys.dm_db_missing_index_group_stats ddmigs ON ddmigs.group_handle = ddmig.index_group_handle INNER JOIN sys.dm_db_missing_index_details ddmid ON ddmig.index_handle = ddmid.index_handle INNER JOIN sys.tables t ON ddmid.OBJECT_ID = t.OBJECT_ID WHERE ddmid.database_id = DB_ID() --ORDER BY est_impact DESC; GO

This will return a list of missing indexes for a specific database.

If you need a list of missing indexes for all DBMS databases, you can output it by defining the following representation [inf]. [VNewIndexOptimize]:

An example of the implementation of the representation [inf]. [VNewIndexOptimize]

 USE [  ] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE view [inf].[vNewIndexOptimize] as /*     index_advantage: >50 000 -     >10 000 -   ,       <=10000 -     */ SELECT @@ServerName AS ServerName, DB_Name(ddmid.[database_id]) as [DBName], OBJECT_SCHEMA_NAME(ddmid.[object_id], ddmid.[database_id]) as [Schema], OBJECT_NAME(ddmid.[object_id], ddmid.[database_id]) as [Name], ddmigs.user_seeks * ddmigs.avg_total_user_cost * (ddmigs.avg_user_impact * 0.01) AS index_advantage, ddmigs.group_handle, ddmigs.unique_compiles, ddmigs.last_user_seek, ddmigs.last_user_scan, ddmigs.avg_total_user_cost, ddmigs.avg_user_impact, ddmigs.system_seeks, ddmigs.last_system_scan, ddmigs.last_system_seek, ddmigs.avg_total_system_cost, ddmigs.avg_system_impact, ddmig.index_group_handle, ddmig.index_handle, ddmid.database_id, ddmid.[object_id], ddmid.equality_columns, -- = ddmid.inequality_columns, ddmid.[statement], ( LEN(ISNULL(ddmid.equality_columns, N'') + CASE WHEN ddmid.equality_columns IS NOT NULL AND ddmid.inequality_columns IS NOT NULL THEN ',' ELSE '' END) - LEN(REPLACE(ISNULL(ddmid.equality_columns, N'') + CASE WHEN ddmid.equality_columns IS NOT NULL AND ddmid.inequality_columns IS NOT NULL THEN ',' ELSE '' END, ',', '')) ) + 1 AS K , COALESCE(ddmid.equality_columns, '') + CASE WHEN ddmid.equality_columns IS NOT NULL AND ddmid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + COALESCE(ddmid.inequality_columns, '') AS Keys , ddmid.included_columns AS [include] , 'Create NonClustered Index [IX_' + OBJECT_NAME(ddmid.[object_id], ddmid.[database_id]) + '_missing_' + CAST(ddmid.index_handle AS VARCHAR(20)) + '] On ' + ddmid.[statement] COLLATE database_default + ' (' + ISNULL(ddmid.equality_columns, '') + CASE WHEN ddmid.equality_columns IS NOT NULL AND ddmid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL(ddmid.inequality_columns, '') + ')' + ISNULL(' Include (' + ddmid.included_columns + ');', ';') AS sql_statement , ddmigs.user_seeks , ddmigs.user_scans , CAST(( ddmigs.user_seeks + ddmigs.user_scans ) * ddmigs.avg_user_impact AS BIGINT) AS 'est_impact' , ( SELECT DATEDIFF(Second, create_date, GETDATE()) Seconds FROM sys.databases WHERE name = 'tempdb' ) SecondsUptime FROM sys.dm_db_missing_index_group_stats ddmigs INNER JOIN sys.dm_db_missing_index_groups AS ddmig ON ddmigs.group_handle = ddmig.index_group_handle INNER JOIN sys.dm_db_missing_index_details AS ddmid ON ddmig.index_handle = ddmid.index_handle --WHERE mid.database_id = DB_ID() --ORDER BY migs_adv.index_advantage GO

Here (as in clause 2 and clause 4) it is also important to note that even if the index falls under clause 3, then there is no need to rush to create it. You need to make sure that the system really needs it. For this, it is necessary on a test environment that is as close as possible to the production environment, to carry out the necessary tests — first without a new index, and then with a new index (take measurements and compare). It is not uncommon for a new index from clause 3 to become later an index from clause 2 or clause 4.

So how to solve the problem of claim 1 - to get rid of a strong degree of index fragmentation? The Internet is full of ready-made solutions on this issue. Let's give one more example which will be based on recommendations from msdn .

To do this, create the [inf]. [VIndexDefrag] view in those databases where you need to consider the level of index fragmentation:

An example of the implementation of the representation [inf]. [VIndexDefrag]

 USE [_] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE view [inf].[vIndexDefrag] as with info as (SELECT ps.[object_id], ps.database_id, ps.index_id, ps.index_type_desc, ps.index_level, ps.fragment_count, ps.avg_fragmentation_in_percent, ps.avg_fragment_size_in_pages, ps.page_count, ps.record_count, ps.ghost_record_count FROM sys.dm_db_index_physical_stats (DB_ID() , NULL, NULL, NULL , N'LIMITED') as ps inner join sys.indexes as i on i.[object_id]=ps.[object_id] and i.[index_id]=ps.[index_id] where ps.index_level = 0 and ps.avg_fragmentation_in_percent >= 10 and ps.index_type_desc <> 'HEAP' and ps.page_count>=8 --1  and i.is_disabled=0 ) SELECT DB_NAME(i.database_id) as db, SCHEMA_NAME(t.[schema_id]) as shema, t.name as tb, i.index_id as idx, i.database_id, (select top(1) idx.[name] from [sys].[indexes] as idx where t.[object_id] = idx.[object_id] and idx.[index_id] = i.[index_id]) as index_name, i.index_type_desc,i.index_level as [level], i.[object_id], i.fragment_count as frag_num, round(i.avg_fragmentation_in_percent,2) as frag, round(i.avg_fragment_size_in_pages,2) as frag_page, i.page_count as [page], i.record_count as rec, i.ghost_record_count as ghost, round(i.avg_fragmentation_in_percent*i.page_count,0) as func FROM info as i inner join [sys].[all_objects] as t on i.[object_id] = t.[object_id]; GO

Here will be a list of those included indexes that are not heaps, occupy at least 1 extent (8 pages) and have a fragmentation level of at least 10%.

We will also create 2 tables in the database for administration - to save the list of processed indexes and to save the results of processed indexes. The first table is needed in order not to take into account two times the same indices until all indices are processed:

Table for saving a list of indexes that have been reorganized in one iteration

 USE [  ] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [srv].[ListDefragIndex]( [db] [nvarchar](100) NOT NULL, [shema] [nvarchar](100) NOT NULL, [table] [nvarchar](100) NOT NULL, [IndexName] [nvarchar](100) NOT NULL, [object_id] [int] NOT NULL, [idx] [int] NOT NULL, [db_id] [int] NOT NULL, [frag] [decimal](6, 2) NOT NULL, [InsertUTCDate] [datetime] NOT NULL, CONSTRAINT [PK_ListDefragIndex] PRIMARY KEY CLUSTERED ( [object_id] ASC, [idx] ASC, [db_id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [srv].[ListDefragIndex] ADD CONSTRAINT [DF_ListDefragIndex_InsertUTCDate] DEFAULT (getutcdate()) FOR [InsertUTCDate] GO

Table for saving history about the reorganization of the indexes of all databases of an instance of MS SQL Server

 USE [  ] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [srv].[Defrag]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [db] [nvarchar](100) NOT NULL, [shema] [nvarchar](100) NOT NULL, [table] [nvarchar](100) NOT NULL, [IndexName] [nvarchar](100) NOT NULL, [frag_num] [int] NOT NULL, [frag] [decimal](6, 2) NOT NULL, [page] [int] NOT NULL, [rec] [int] NULL, [ts] [datetime] NOT NULL, [tf] [datetime] NOT NULL, [frag_after] [decimal](6, 2) NOT NULL, [object_id] [int] NOT NULL, [idx] [int] NOT NULL, [InsertUTCDate] [datetime] NOT NULL, CONSTRAINT [PK_Defrag] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [srv].[Defrag] ADD CONSTRAINT [DF_Defrag_InsertUTCDate] DEFAULT (getutcdate()) FOR [InsertUTCDate] GO

Next, create the [srv]. [AutoDefragIndex] stored procedure itself for optimizing indexes on each desired database (also on system databases) as follows:

An example implementation of the stored procedure [srv]. [AutoDefragIndex]

 USE [_] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER PROCEDURE [srv].[AutoDefragIndex] @count int=null ---    ,@isrebuild bit=0 --    (   30%) AS BEGIN SET NOCOUNT ON; SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; --      ONLINE declare @isRebuildOnline bit=CASE WHEN (CAST (SERVERPROPERTY ('Edition') AS nvarchar (max)) LIKE '%Enterprise%' OR CAST (SERVERPROPERTY ('Edition') AS nvarchar (max)) LIKE '%Developer%' OR CAST (SERVERPROPERTY ('Edition') AS nvarchar (max)) LIKE '%Evaluation%') THEN 1 ELSE 0 END; declare @IndexName nvarchar(100) ,@db nvarchar(100) ,@db_id int ,@Shema nvarchar(100) ,@Table nvarchar(100) ,@SQL_Str nvarchar (max)=N'' ,@frag decimal(6,2) ,@frag_after decimal(6,2) ,@frag_num int ,@page int ,@ts datetime ,@tsg datetime ,@tf datetime ,@object_id int ,@idx int ,@rec int; --  declare @tbl table ( IndexName nvarchar(100) ,db nvarchar(100) ,[db_id] int ,Shema nvarchar(100) ,[Table] nvarchar(100) ,frag decimal(6,2) ,frag_num int ,[page] int ,[object_id] int ,idx int ,rec int ); --  declare @tbl_copy table ( IndexName nvarchar(100) ,db nvarchar(100) ,[db_id] int ,Shema nvarchar(100) ,[Table] nvarchar(100) ,frag decimal(6,2) ,frag_num int ,[page] int ,[object_id] int ,idx int ,rec int ); set @ts = getdate() set @tsg = @ts; -- ,    ,   10% --     if(@count is null) begin insert into @tbl ( IndexName ,db ,[db_id] ,Shema ,[Table] ,frag ,frag_num ,[page] ,[object_id] ,idx ,rec ) select ind.index_name, ind.db, ind.database_id, ind.shema, ind.tb, ind.frag, ind.frag_num, ind.[page], ind.[object_id], ind.idx , ind.rec from [inf].[vIndexDefrag] as ind where not exists( select top(1) 1 from [  ].[srv].[ListDefragIndex] as lind where lind.[db_id]=ind.database_id and lind.[idx]=ind.idx and lind.[object_id]=ind.[object_id] ) --order by ind.[page] desc, ind.[frag] desc end else begin insert into @tbl ( IndexName ,db ,[db_id] ,Shema ,[Table] ,frag ,frag_num ,[page] ,[object_id] ,idx ,rec ) select top (@count) ind.index_name, ind.db, ind.database_id, ind.shema, ind.tb, ind.frag, ind.frag_num, ind.[page], ind.[object_id], ind.idx , ind.rec from [inf].[vIndexDefrag] as ind where not exists( select top(1) 1 from [  ].[srv].[ListDefragIndex] as lind where lind.[db_id]=ind.database_id and lind.[idx]=ind.idx and lind.[object_id]=ind.[object_id] ) --order by ind.[page] desc, ind.[frag] desc end --    (   ) --     --   if(not exists(select top(1) 1 from @tbl)) begin delete from [  ].[srv].[ListDefragIndex] where [db_id]=DB_ID(); if(@count is null) begin insert into @tbl ( IndexName ,db ,[db_id] ,Shema ,[Table] ,frag ,frag_num ,[page] ,[object_id] ,idx ,rec ) select ind.index_name, ind.db, ind.database_id, ind.shema, ind.tb, ind.frag, ind.frag_num, ind.[page], ind.[object_id], ind.idx , ind.rec from [inf].[vIndexDefrag] as ind where not exists( select top(1) 1 from [  ].[srv].[ListDefragIndex] as lind where lind.[db_id]=ind.database_id and lind.[idx]=ind.idx and lind.[object_id]=ind.[object_id] ) --order by ind.[page] desc, ind.[frag] desc end else begin insert into @tbl ( IndexName ,db ,[db_id] ,Shema ,[Table] ,frag ,frag_num ,[page] ,[object_id] ,idx ,rec ) select top (@count) ind.index_name, ind.db, ind.database_id, ind.shema, ind.tb, ind.frag, ind.frag_num, ind.[page], ind.[object_id], ind.idx , ind.rec from [inf].[vIndexDefrag] as ind where not exists( select top(1) 1 from [  ].[srv].[ListDefragIndex] as lind where lind.[db_id]=ind.database_id and lind.[idx]=ind.idx and lind.[object_id]=ind.[object_id] ) --order by ind.[page] desc, ind.[frag] desc end end --    if(exists(select top(1) 1 from @tbl)) begin --   INSERT INTO [  ].[srv].[ListDefragIndex] ( [db] ,[shema] ,[table] ,[IndexName] ,[object_id] ,[idx] ,[db_id] ,[frag] ) select [db] ,[shema] ,[table] ,[IndexName] ,[object_id] ,[idx] ,[db_id] ,[frag] from @tbl; insert into @tbl_copy ( IndexName ,db ,[db_id] ,Shema ,[Table] ,frag ,frag_num ,[page] ,[object_id] ,idx ,rec ) select IndexName ,db ,[db_id] ,Shema ,[Table] ,frag ,frag_num ,[page] ,[object_id] ,idx ,rec from @tbl; --      (  -     ) while(exists(select top(1) 1 from @tbl)) begin select top(1) @IndexName=[IndexName], @Shema=[Shema], @Table=[Table], @frag=[frag] from @tbl; if(@frag>=30 and @isrebuild=1 and @isRebuildOnline=1) begin set @SQL_Str = @SQL_Str+'ALTER INDEX ['+@IndexName+'] on ['+@Shema+'].['+@Table+'] REBUILD WITH(ONLINE=ON);' end else begin set @SQL_Str = @SQL_Str+'ALTER INDEX ['+@IndexName+'] on ['+@Shema+'].['+@Table+'] REORGANIZE;' +'UPDATE STATISTICS ['+@Shema+'].['+@Table+'] ['+@IndexName+'];'; end delete from @tbl where [IndexName]=@IndexName and [Shema]=@Shema and [Table]=@Table; end --   execute sp_executesql @SQL_Str; --    insert into [  ].srv.Defrag( [db], [shema], [table], [IndexName], [frag_num], [frag], [page], ts, tf, frag_after, [object_id], idx, rec ) select [db], [shema], [table], [IndexName], [frag_num], [frag], [page], @ts, getdate(), (SELECT top(1) avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats (DB_ID([db]), [object_id], [idx], NULL , N'LIMITED') where index_level = 0) as frag_after, [object_id], [idx], [rec] from @tbl_copy; end END GO

It is important to note that when rebuilding an index, it is not necessary to update statistics on an index. The index is also rebuilt here only if it is fragmented by no less than 30% and, at the same time, the release of MS SQL Server allows it to be done in ONLINE mode, and the input parameter @isrebuild of the stored procedure was set as 1.

Here, the count parameter is needed more to distribute the load throughout the day. If optimization by indexes occurs only at a certain time during a day or less, then you can pass NULL to count (as default).

Now we will create the stored procedure [srv]. [AutoDefragIndexDB] in the database for administration, for its subsequent call:

An example implementation of the stored procedure [srv]. [AutoDefragIndexDB]

 USE [  ] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE PROCEDURE [srv].[AutoDefragIndexDB] @DB nvarchar(255)=NULL, --      @count int=NULL, ---       @IsTempdb bit=0 --   tempdb AS BEGIN /*       */ SET NOCOUNT ON; declare @db_name nvarchar(255); declare @sql nvarchar(max); declare @ParmDefinition nvarchar(255)= N'@count int'; if(@DB is null) begin select [name] into #tbls from sys.databases where [is_read_only]=0 and [state]=0 --ONLINE and [user_access]=0--MULTI_USER and (((@IsTempdb=0 or @IsTempdb is null) and [name]<>N'tempdb') or (@IsTempdb=1)); while(exists(select top(1) 1 from #tbls)) begin select top(1) @db_name=[name] from #tbls; set @sql=N'USE ['+@db_name+']; '+ N'IF(object_id('+N''''+N'[srv].[AutoDefragIndex]'+N''''+N') is not null) EXEC [srv].[AutoDefragIndex] @count=@count;'; exec sp_executesql @sql, @ParmDefinition, @count=@count; delete from #tbls where [name]=@db_name; end drop table #tbls; end else begin set @sql=N'USE ['+@DB+']; '+ N'IF(object_id('+N''''+N'[srv].[AutoDefragIndex]'+N''''+N') is not null) EXEC [srv].[AutoDefragIndex] @count=@count;'; exec sp_executesql @sql, @ParmDefinition, @count=@count; end END GO

Thus, to automate the process of optimizing indexes, you can take the following steps:

1) for each desired database, define the representation [inf]. [VIndexDefrag] and the stored procedure [srv]. [AutoDefragIndex]
2) in the administration database, define two tables [srv]. [ListDefragIndex] and [srv]. [Defrag], and stored procedure [srv]. [AutoDefragIndexDB]
3) create a task in the agent on a periodic call to the stored procedure [srv]. [AutoDefragIndexDB] administration of BD

Now analyze claim 3 nonoptimal statistics.

In most cases, when executing p.1-2 or p.2 simply by directly optimizing the database and the DBMS as a whole and setting statistics update in the database properties (automatic update of statistics, statistics of automatic creation, asynchronous automatic update of statistics, automatic creation of statistics with ), MS SQL Server copes well with the task of optimizing statistics.

The main thing is not to forget to update the statistics after the index reorganization, that is, in this case, it is not updated, as well as when executing item 1 on cleaning the procedural cache.

But sometimes there are cases when MS SQL Server does not cope with its task due to the specifics of the entire information system or if it is impossible to use item 1 (clearing the procedural cache). Then you can take from item 1 just a command to update the statistics for the entire database:

Updating statistics for the entire database

 USE [_] GO exec sp_updatestats; GO

However, if this is not enough or updating the statistics for the entire database takes a very long time, then it is necessary to consider a more flexible algorithm for updating the statistics.

Immediately, it is worth noting that when building a more flexible algorithm for updating statistics, step 1 for clearing the procedural cache and updating all database statistics is not applicable in the block for directly optimizing the database and the DBMS as a whole.

Here we give an example of the implementation of p.3 in the case when it is impossible to update the statistics for the entire database and the built-in tools are also insufficient.

To do this, create the following stored procedure [srv]. [AutoUpdateStatistics] in the necessary databases:

An example implementation of the stored procedure [srv]. [AutoUpdateStatistics]

 USE [_] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE PROCEDURE [srv].[AutoUpdateStatistics] --       @ObjectSizeMB numeric (16,3) = NULL, -- -    @row_count numeric (16,3) = NULL AS BEGIN /*    */ SET NOCOUNT ON; SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; declare @ObjectID int; declare @SchemaName nvarchar(255); declare @ObjectName nvarchar(255); declare @StatsID int; declare @StatName nvarchar(255); declare @SQL_Str nvarchar(max); ;with st AS( select DISTINCT obj.[object_id] , obj.[create_date] , OBJECT_SCHEMA_NAME(obj.[object_id]) as [SchemaName] , obj.[name] as [ObjectName] , CAST( ( --  ,    ( 8   1024 =  128) SELECT SUM(ps2.[reserved_page_count])/128. from sys.dm_db_partition_stats as ps2 where ps2.[object_id] = obj.[object_id] ) as numeric (38,2) ) as [ObjectSizeMB] --    , s.[stats_id] , s.[name] as [StatName] , sp.[last_updated] , i.[index_id] , i.[type_desc] , i.[name] as [IndexName] , ps.[row_count] , s.[has_filter] , s.[no_recompute] , sp.[rows] , sp.[rows_sampled] ---   : --  -           --   -                 , sp.[modification_counter]+ABS(ps.[row_count]-sp.[rows]) as [ModificationCounter] --%  ,    , --             , NULLIF(CAST( sp.[rows_sampled]*100./sp.[rows] as numeric(18,3)), 100.00) as [ProcSampled] --%  -           --      , CAST(sp.[modification_counter]*100./(case when (ps.[row_count]=0) then 1 else ps.[row_count] end) as numeric (18,3)) as [ProcModified] -- : --[ProcModified]*    -    , CAST(sp.[modification_counter]*100./(case when (ps.[row_count]=0) then 1 else ps.[row_count] end) as numeric (18,3)) * case when (ps.[row_count]<=10) THEN 1 ELSE LOG10 (ps.[row_count]) END as [Func] --  : --  ,    ,   --            , CASE WHEN sp.[rows_sampled]<>sp.[rows] THEN 0 ELSE 1 END as [IsScanned] , tbl.[name] as [ColumnType] , s.[auto_created] from sys.objects as obj inner join sys.stats as s on s.[object_id] = obj.[object_id] left outer join sys.indexes as i on i.[object_id] = obj.[object_id] and (i.[name] = s.[name] or i.[index_id] in (0,1) and not exists(select top(1) 1 from sys.indexes i2 where i2.[object_id] = obj.[object_id] and i2.[name] = s.[name])) left outer join sys.dm_db_partition_stats as ps on ps.[object_id] = obj.[object_id] and ps.[index_id] = i.[index_id] outer apply sys.dm_db_stats_properties (s.[object_id], s.[stats_id]) as sp left outer join sys.stats_columns as sc on s.[object_id] = sc.[object_id] and s.[stats_id] = sc.[stats_id] left outer join sys.columns as col on col.[object_id] = s.[object_id] and col.[column_id] = sc.[column_id] left outer join sys.types as tbl on col.[system_type_id] = tbl.[system_type_id] and col.[user_type_id] = tbl.[user_type_id] where obj.[type_desc] <> 'SYSTEM_TABLE' ) SELECT st.[object_id] , st.[SchemaName] , st.[ObjectName] , st.[stats_id] , st.[StatName] INTO #tbl FROM st WHERE NOT (st.[row_count] = 0 AND st.[last_updated] IS NULL)--       --   AND NOT (st.[row_count] = st.[rows] AND st.[row_count] = st.[rows_sampled] AND st.[ModificationCounter]=0) --    (   ) AND ((st.[ProcModified]>=10.0) OR (st.[Func]>=10.0) OR (st.[ProcSampled]<=50)) --,     AND ( ([ObjectSizeMB]<=@ObjectSizeMB OR @ObjectSizeMB IS NULL) AND (st.[row_count]<=@row_count OR @row_count IS NULL) ); WHILE (exists(select top(1) 1 from #tbl)) BEGIN select top(1) @ObjectID =[object_id] ,@SchemaName=[SchemaName] ,@ObjectName=[ObjectName] ,@StatsId =[stats_id] ,@StatName =[StatName] from #tbl; SET @SQL_Str = 'IF (EXISTS(SELECT TOP(1) 1 FROM sys.stats as s WHERE s.[object_id] = '+CAST(@ObjectID as nvarchar(32)) + ' AND s.[stats_id] = ' + CAST(@StatsId as nvarchar(32)) +')) UPDATE STATISTICS ' + QUOTENAME(@SchemaName) +'.' + QUOTENAME(@ObjectName) + ' ('+QUOTENAME(@StatName) + ') WITH FULLSCAN;'; execute sp_executesql @SQL_Str; delete from #tbl where [object_id]=@ObjectID and [stats_id]=@StatsId; END drop table #tbl; END GO

Here you can notice the fact that statistics are updated only for those objects for which significant changes were made. However, it is worth paying attention to the value of the [IsScanned] column. If it is different from 1, this means that the total number of rows in the table or indexed view at the last update of statistics and the total number of rows selected for statistical calculations do not match. This means that the statistics are already outdated. And although the algorithm considers only significant changes in the data, it is not necessary to exclude the need that someday you may need to update all the statistics of the object in which there were few changes and which weighs very much on the data. Therefore, the above algorithm cannot be universal for all databases,but for most it is suitable as a starting point for further fine tuning in statistics optimization.

Next, in the database for administration, we create a stored procedure [srv]. [AutoUpdateStatisticsDB], which will then need to be periodically run according to the rules:

An example implementation of the stored procedure [srv]. [AutoUpdateStatisticsDB]

 USE [  ] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE PROCEDURE [srv].[AutoUpdateStatisticsDB] @DB nvarchar(255)=NULL, --      @ObjectSizeMB numeric (16,3) = NULL, -- -    @row_count numeric (16,3) = NULL, @IsTempdb bit=0 --   tempdb AS BEGIN /*        */ SET NOCOUNT ON; declare @db_name nvarchar(255); declare @sql nvarchar(max); declare @ParmDefinition nvarchar(255)= N'@ObjectSizeMB numeric (16,3), @row_count numeric (16,3)'; if(@DB is null) begin select [name] into #tbls from sys.databases where [is_read_only]=0 and [state]=0 --ONLINE and [user_access]=0--MULTI_USER and (((@IsTempdb=0 or @IsTempdb is null) and [name]<>N'tempdb') or (@IsTempdb=1)); while(exists(select top(1) 1 from #tbls)) begin select top(1) @db_name=[name] from #tbls; set @sql=N'USE ['+@db_name+']; '+ N'IF(object_id('+N''''+N'[srv].[AutoUpdateStatistics]'+N''''+N') is not null) EXEC [srv].[AutoUpdateStatistics] @ObjectSizeMB=@ObjectSizeMB, @row_count=@row_count;'; exec sp_executesql @sql, @ParmDefinition, @ObjectSizeMB=@ObjectSizeMB, @row_count=@row_count; delete from #tbls where [name]=@db_name; end drop table #tbls; end else begin set @sql=N'USE ['+@DB+']; '+ N'IF(object_id('+N''''+N'[srv].[AutoUpdateStatistics]'+N''''+N') is not null) EXEC [srv].[AutoUpdateStatistics] @ObjectSizeMB=@ObjectSizeMB, @row_count=@row_count;'; exec sp_executesql @sql, @ParmDefinition, @ObjectSizeMB=@ObjectSizeMB, @row_count=@row_count; end END GO

Usually, a more flexible algorithm for updating statistics is needed in such databases, where the hardware power is not capable of updating statistics for the entire database in a reasonable time, that is, databases with very large data sizes (significantly more than 1 TB).

Thus, all 3 points of the first block were considered on the direct optimization of the database itself and the DBMS as a whole.

In addition to this block, the following general recommendations should be added:

1) it is necessary to ensure that the database data files are fragmented from 5 to 20% (if less than 5% increase (by setting the initial size in the database file properties), more than 20% - compress with the help of the command SHRINKFILE )

2) it is necessary to maintain the system databases by indices and statistics (especially msdb )

3) it is necessary to clean the logs of the msdb database , for example as follows:

An example of cleaning the msdb database logs

 declare @dt datetime=DateAdd(day,-28,GetDate()); exec msdb.dbo.sp_delete_backuphistory @dt; exec msdb.dbo.sp_purge_jobhistory @oldest_date=@dt; exec msdb.dbo.sp_maintplan_delete_log null, null, @dt; exec msdb.dbo.sp_cycle_errorlog; exec msdb.dbo.sp_Cycle_Agent_ErrorLog; ...

Now let's analyze the second block for optimizing the interaction between the application and MS SQL Server (.NET and MS SQL Server applications).

Here we present only the main recommendations without examples of implementation, so that the article does not turn out too capacious.

So, here are the general recommendations for optimizing the interaction between the application and MS SQL Server:

1) Try to work not with a string, but with a set when sending commands to the DBMS
2) Try to send requests to the DBMS asynchronously and not make the user wait for the application response
3) Try to send requests to the DBMS in batches, and not in single ones (especially relevant when changing data) - that is, implement a delayed launch mechanism and query accumulation system
4) Use hashing for all software components to reduce access to the DBMS
5) Sign application components in the connection string to the database in the Application Name:

Application Name Usage Example

 using System; using System.Collections.Generic; using System.Data.SqlClient; using System.Linq; using System.Text; using System.Threading.Tasks; namespace ConsoleApp2 { class Program { static void Main(string[] args) { OpenSqlConnection(); Console.ReadKey(); } private static void OpenSqlConnection() { string connectionString = GetConnectionString(); using (SqlConnection connection = new SqlConnection()) { connection.ConnectionString = connectionString; connection.Open(); Console.WriteLine("State: {0}", connection.State); Console.WriteLine("ConnectionString: {0}", connection.ConnectionString); } } static private string GetConnectionString() { SqlConnectionStringBuilder conn = new SqlConnectionStringBuilder(); conn.ApplicationName = "MyProgram"; conn.DataSource = "SQLServerName"; conn.InitialCatalog = "DBName"; conn.IntegratedSecurity = true; return conn.ToString(); } } }

6) Set the isolation level correctly when working with the database.

7) On the server side of the software, if necessary, implement a queue of calls to the DBMS, which will take into account not only the specifics of the information system, but also the hardware capabilities of the DBMS server or DBMS server group, if the latter work as AlwaysOn

8) Try to pre-filter as much data as possible, rather than requesting everything from the DBMS at once and then applying the filter (for example, use paged return if necessary with dynamic asynchronous data loading)

9) Try not to filter on a large array of data not on the DBMS side (see the previous paragraph)

10) Separate the component logs and the database logs, but do not write everything in one table

Now we will sort the third block on optimization of requests.

We will not go into much detail here again, we will not occupy the whole book, but we will describe only some key points.

So, the general recommendations for optimizing the queries themselves:

1) try to filter the data as far as possible before joining them with other tables
2) try to sort the result set as little as possible by the data volume
3) avoid the DISTINCT, LIKE '% construction whenever possible. .. ', OUTER JOIN especially on big data
4) if the sample needs only one field from the table being joined, then do not join such a table, and in the sample itself make a subquery
5) when filtering, aggregating and sampling, try to take into account the available indexes so that the optimizer can use them
6) try to return only those fields that are really needed, and not all fields from all the connected tables (do not match the code in T-SQL, t for it is a very bad approach especially with large data)
7) when updating and deleting data (if the filter or aggregation is not built on the clustered index or clustered index in addition there are other conditions or aggregation), then do not do the operation from the table, and the first Select the deleted / modified data to a temporary table (which will consist of columns that are included in the clustered index, as well as all other required fields to update) and apply directly after removal / update
8) do not overload the conditions for joining the tables, but rather bring some of the conditions into the filter
9) use hints sensibly for queries.

For more understanding, we will give an example of clause 7.

For example, we have a Personal table (there is a clustered index in the ID field) and remove all people with a name containing the substring 'on' from it and update the comment by the name of those people whose last name ends with 'you'.

Here's how to implement this task:

Examples according to claim 7

 select [ID] into #tbl_del from [Personal] where [FirstName] like '%%'; delete from p from #tbl_del as t inner join [Personal] as p on t.[ID]=p.[ID]; drop table #tbl_del; select [ID] ,[FirstName] into #tbl_upd from [Personal] where [LastName] like '%'; update p set p.[Comment]=t.[FirstName] from #tbl_upd as t inner join [Personal] as p on t.[ID]=p.[ID]; drop table #tbl_upd;

Now, when the example of clause 7 was considered on simple queries, we will give an example of the work of so-called duplicates. The problem of removing duplicate records is quite common. It can be implemented as follows:

An example of removing duplicate records according to paragraph 7

 --      Personal     . --       ,  ,    : ;with DelDups as ( select [ID], --         row_number() over(partition by [FirstName], [LastName] order by [InsertDate] desc) as [rn] from [Personal] ) select [ID] into #tbl_del delete from DelDups where rn>1; delete from p from #tbl_del as t inner join [Personal] as p on t.[ID]=p.[ID]; drop table #tbl_del;

Similarly, an example of updating duplicate rows is considered.

Thus, all 3 blocks were considered for optimizing both the database itself and the DBMS as a whole, and when the software was accessed and the queries themselves.

In addition, it is also worth noting that with the growth of the information system, in particular the number of simultaneous users and the volumes of the database itself, it is worthwhile to consider the separation of systems into OLTP and OLAP, where background tasks (such as integration, data movement (ETL, etc.) and e) will be performed with the OLAP system (including reports), and real-time tasks emanating from users will be performed with the OLTP system. This separation will allow even more fine-tuning of each of the systems and greatly reduce the load on the OLTP system. Here you can follow the golden rule: the number of requests from users in the OLTP system should be many times greater than the number of all other requests. Similarly, the total volume of data processed for a sufficiently long period (week, month and so on).Although in practice it is not uncommon to encounter the opposite situation, when users turn mainly to the OLAP system, and all other tasks apply (accumulate new data with subsequent transfer to the OLAP via ETL) to the OLTP system.

Comment.In fact, to distinguish the OLAP system from the OLTP, it is enough to check one fact: in the first system, the number of data samples will be many times greater than the number of changes in data on the frequency of operations (either changes are rare or occur in such small volumes as compared with selectable volumes, that these changes can be neglected). The OLTP system is exactly the opposite. OLAP and OLTP systems are configured differently, and the hybrid solution will increasingly show its inadequacy with increasing data volumes and the number of simultaneous references to these data (these characteristics will be particularly acute in databases whose volumes are significantly larger than 1 TB). If the database is significantly less than 1 TB, then the division into OLTP and OLAP may not be necessary, and this optimization may not give a significant increase in performance (users will not sense it).

Also try to use all available features intelligently from both the DBMS itself and the OS as a whole, and also properly configure the hardware resources themselves for overall optimization of work.

Also, do not forget about the limitation of resources (CPU, RAM and the capacity of the input-output system, as well as the speed of its work).

After all the optimizations, it is also important to clarify over time the future growth in requirements for the DBMS (in particular, the increase in the number of simultaneous calls, the increase in the volume of data being processed, and so on) and predict problems in advance and ways to solve them where it is necessary to improve the hardware or purchase the missing capacity for the future).

Result

This article has reviewed the optimization of the database and database management system as a whole, and also provides general recommendations for optimizing applications for working with the database management system and the database queries themselves.

The above recommendations make it possible to significantly optimize the entire DBMS for a long term, taking into account the further development of the information system and an increase in the number of simultaneous references to the DBMS.

Sources:

» Dynamic administrative views of the system
» Index overlapping
» Reorganization and rebuilding of indices
» SHRINKFILE
» msdb
» Connection string
» Isolation level
» Automatic update of statistics
» Automatic defragmentation of indexes
» Current size of all plan cache and query plan cache

Source: https://habr.com/ru/post/349910/

All Articles

Three aspects of optimization (DB and software)

Foreword

Decision

Result

Sources:

More articles: