📜 ⬆️ ⬇️

SQL Server wait statistics or please tell me where it hurts

How many times have you experienced performance problems with SQL Server and decided exactly where to look?

One of the most rarely used SQL Server performance troubleshooting methodologies is called “Waits and Queues” (also known as “Waiting Statistics”). The basic premise of the methodology is that SQL Server constantly keeps track of which threads have to wait. You can ask SQL Server for this information in order to reduce the list of possible causes of performance problems. “Expectations” is what SQL Server is tracking. "Queues" are resources that streams can access. The system usually captures a huge amount of expectations, and they all mean waiting for access to various resources. For example, waiting for PAGEIOLATCH_EX means that the stream is waiting to read a page of data from disk to the buffer pool. Waiting for LCK_M_X means that the thread is waiting to be able to impose an exclusive lock on something.

The great news is that SQL Server knows exactly what the performance issues are, and all you need to do is ask him ... and then correctly interpret what he will say, which can be a little more difficult.

The following information is for people who are worried about each expectation and understand what causes it. Expectations always arise. This is how the SQL Server scheduling system works.
')
The thread uses the processor and has the status of “running” (RUNNING) until it is faced with the need to wait for access to the resource. In this case, it is placed in an unordered list of threads in the "suspended" state (SUSPENDED). At the same time, the next thread in the queue of threads waiting for access to the processor, organized according to the FIFO principle (the first to arrive - first dropped), and having the status of “ready for execution” (RUNNABLE) gets access to the processor and becomes “running”. If a thread in the "paused" state receives a notification that its resource is available, it becomes "ready for execution" and placed at the end of the queue of threads ready for execution. The flow continues its cyclical movement along the chain “is being executed” - “suspended” - “is ready for execution” until the task is completed. You can see the processes and their states using the Dynamic Management View (DMV) sys.dm_exec_requests.

SQL Server tracks the time elapsed between the exit of the thread from the “running” state and its return to this state, defining it as “wait time” and time spent in the “ready to run” state, defining it as “time signal wait time ”, i.e. how long it takes for the thread after receiving a signal about the availability of resources in order to gain access to the processor. We need to understand how much time the thread spends in the “suspended” state, called the “resource wait time”, subtracting the signal wait time from the total wait time.

An excellent source of information that I recommend reading about this is the new (2014) waiting statistics document "Tuning SQL Server Performance Using Expectancy Statistics: A Guide for a Newbie" (English), which I advise you to read. There is also a much older document ( "Adjusting performance using expectations and queues" (English) with a lot of useful information, but quite obsolete at the moment. The best manual on various types of expectations (and classes of short-term locks) - my comprehensive library of expectations (English) and short-term blocking (English).

You can query SQL Server for accumulated wait statistics using DMV sys.dm_os_wait_stats. Many people prefer to wrap a DMV call into some kind of summary code. Below is the latest version of my script as of 2016, which works with all versions and includes wait types for SQL Server 2016 (see the version of the script for use in Azure here ):

WITH [Waits] AS (SELECT [wait_type], [wait_time_ms] / 1000.0 AS [WaitS], ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS], [signal_wait_time_ms] / 1000.0 AS [SignalS], [waiting_tasks_count] AS [WaitCount], 100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage], ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum] FROM sys.dm_os_wait_stats WHERE [wait_type] NOT IN ( N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR', N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE', N'CHKPT', N'CLR_AUTO_EVENT', N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE', -- Maybe uncomment these four if you have mirroring issues N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE', N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD', N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE', N'EXECSYNC', N'FSAGENT', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX', -- Maybe uncomment these six if you have AG issues N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE', N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE', N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP', N'LOGMGR_QUEUE', N'MEMORY_ALLOCATION_EXT', N'ONDEMAND_TASK_QUEUE', N'PREEMPTIVE_XE_GETTARGETSTATE', N'PWAIT_ALL_COMPONENTS_INITIALIZED', N'PWAIT_DIRECTLOGCONSUMER_GETNEXT', N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', N'QDS_ASYNC_QUEUE', N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP', N'QDS_SHUTDOWN_QUEUE', N'REDO_THREAD_PENDING_WORK', N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE', N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP', N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY', N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP', N'SLEEP_SYSTEMTASK', N'SLEEP_TASK', N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT', N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS', N'WAITFOR', N'WAITFOR_TASKSHUTDOWN', N'WAIT_XTP_RECOVERY', N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN', N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT') AND [waiting_tasks_count] > 0 ) SELECT MAX ([W1].[wait_type]) AS [WaitType], CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S], CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S], CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S], MAX ([W1].[WaitCount]) AS [WaitCount], CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage], CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S], CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S], CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S], CAST ('https://www.sqlskills.com/help/waits/' + MAX ([W1].[wait_type]) as XML) AS [Help/Info URL] FROM [Waits] AS [W1] INNER JOIN [Waits] AS [W2] ON [W2].[RowNum] <= [W1].[RowNum] GROUP BY [W1].[RowNum] HAVING SUM ([W2].[Percentage]) - MAX( [W1].[Percentage] ) < 95; -- percentage threshold GO 


The result of the query will show the expectations, grouped by percentage of all expectations in the system, in descending order. The expectations that are (potentially) worth paying attention to are at the top of the list and represent a large part of the expectations that SQL Server spends time. You see a large list of expectations that are removed from consideration - as I said earlier, they always arise and those listed above, we can, as a rule, ignore.

You can also reset the statistics accumulated by the server using this code:

 DBCC SQLPERF (N'sys.dm_os_wait_stats', CLEAR); GO 


And, of course, you can come up with storing results every few hours or every day and do some time analysis to find out the direction of the changes, or automatically track problems in case they start to appear.
You can also use Performance Dashboard to display the results graphically in SQL Server 2005 and Data Collector in SQL Server 2008. In SQL Server 2000, you can use DBCC SQLPERF (N'waitstats') .

After you get the results, you will begin to think how to interpret them and where to look. The document I referred to earlier has a wealth of information on most types of expectations (except those added in SQL Server 2008).

Now I would like to provide the results of a study that I published some time ago. I asked people to run the code presented above and report the results to me. I received a huge amount of results - from 1823 servers - thanks!

Below is a graphical representation of the results:



I am not at all surprised by the top 4 results, as I have seen them again and again on the systems of my clients.

In continuation of my article, I am going to list the most popular types of expectations provided by the survey respondents, in descending order, and comment in a few words what exactly they can mean if they are basic to your system. The format of the list shows the number of systems from the respondents in which the specified type of waiting is the main one.



Hope it was interesting! Let me know if you are interested in something specifically or that you have read this article and enjoyed it!

Source: https://habr.com/ru/post/216309/


All Articles