Foreword
There is an information system that I administer. The system consists of the following components:
1. MS SQL Server Database
2. Server application
3. Client applications
These information systems are installed at several sites. The information system is actively and around the clock used simultaneously from 2 to 20 users at each site. Therefore, you can not perform maintenance work all at once. T e have to defragment indexes smeared for the whole day, and not in one fell swoop to defragment all the necessary fragmented indexes. Similarly with other works.
')
Auto-update statistics set in the properties of the database itself. Statistics is also updated by defragmented index.
Problem
About a year ago I encountered the following problem:
Periodically, all requests were executed for a long time. Moreover, the braking time was random. This happened on every object on a random day. Moreover, when I began to analyze how often the brakes occur (by a profiler), I managed to find out that they occur every day at a random time. Simply, users do not always pay attention to this, but perceive it as the only random delay, and then the system again works quickly.
Path to salvation
Requests themselves that were executed for a long time were revised. But the most interesting thing is that all requests at random times were executed for a long time. Even the simplest types pull out the last entry from a table of several thousand lines.
Further, the following work was carried out:
- MS SQL Server and Windows Server logs are analyzed - the cause of braking could not be found
- The indices are analyzed (fragmentation and so on) - missing are added and unused
- Analyzed queries - improved some queries
- Analyzed jobs in SQL Agent - failed to bind problems to braking
- Tasks analyzed in Task Scheduler - the task could not be tied to the braking problem
- Profiler also gave the effect, not the cause of the braking.
- A deadlock check was performed - no long locks were detected at all
As a result, more than 3 months were spent on unsuccessful searches for the causes of periodic inhibitions. However, it was possible to reveal an interesting fact - all requests had an increase in the Elapsed waiting indicator, and not the Worker’s fulfillment indicator itself. What prompted the fact that something is possible with the disks. I checked them too - everything is fine.
Decision
It was surprising that it was randomly possible to establish that when the request in the application is executed slowly, then in the SSMS itself it is executed quickly. Then the
following article helped to solve it (she at least later came up with the idea).
From this article I will quote the following paragraph:
In practice, the most important SET option is ARITHABORT, because the default value for this option is different for applications and for SQL Server Management Studio. This explains why you can detect a slow query in your application, and then get a good speed by running it in SSMS. The application uses a plan that was built to set values that differ from the actual, correct values. Whereas if you run a query in SSMS, then most likely the cache does not yet have an execution plan for ARITHABORT ON, and therefore SQL Server will build a plan for your current values.
The difference in execution was in the
SET ARITHABORT parameter. This option is enabled for all requests executed in SSMS, and disabled for requests from outside (from applications). And it cannot be enabled even by a simple request for applications:
SET ARITHABORT ON;
This was followed by a crazy idea - at the time of the hang, clear the procedural cache:
click .
For the subsequent manual check before the request in SSMS you need to write:
SET ARITHABORT OFF;
Then the query will be executed as if it came from the application. When the query was executed for a long time, I cleared the procedural cache. And it always cured. T e before cleaning the procedural cache, the query could run for up to 20-30 seconds, and after - 0 seconds.
After that, another experiment was performed - cleaning the entire procedural cache for the entire database every hour via the SQL Agent:
After that, all requests were simply executed very quickly (less than 0.05 sec.), There were only a few outliers up to 5-10 seconds of execution, but users did not notice the hangs. Moreover, updating the statistics did not improve the results, so I removed the update statistics.
After a few more months of research, it was possible to establish that single freezes occur when either the server itself eats up the cache and nothing remains or there is no free memory, but less than 1 GB of RAM, or MS SQL Server eats up all the RAM allocated to it (via Task Manager). But the second occurred only 2 times for the entire study.
The fact is that everything literally is written to the cache, but the cache is not always released on time. The problem with the cache was solved using the program
EmptyStandbyList.exe .
This application is configured through Task Scheduler to run 1 time every hour. After the work done for more than six months there is no inhibition on requests for all objects.
The only thing that remains unclear is the rare cases when one request hangs for 5-10 seconds 1 time per month on a random day and at a random time. In total there were 4 such cases in half a year and then at two sites, and not at all. At the same time, the MS SQL Server service eats up for a short time all the RAM allocated to it.
I did the steps described
in the article , but this solution did not help.
In principle, it was possible and not to dig deeper, that is, users do not notice hang-ups and everything works, but if anyone has any thoughts, share, I would be grateful.
This article was written with the aim of helping those who stumble upon similar problems, I did not find a comprehensive answer on the Internet, and a lot of time was spent researching the problem and solving it.
Sources
»
One ⬝
Two ⬝
Three ⬝
Four ⬝
Five ⬝
Six ⬝
Seven ⬝
Eight