Analysis of MS SQL Server, for those who see it for the first time

Published continued: part 2

Recently I ran into a problem - I turned SVN on the ubuntu server. I myself program under windows and with linux “on you” ... Googled by mistake - to no avail. The error turned out to be the most typical (the server unexpectedly closed the connection) and not talking about anything concrete. Therefore, it is necessary to dive deeper and analyze the logs / settings / rights /, etc., and with this, just, I am on you.

As a result, of course, I figured out and found everything I needed, but time was spent a lot. Once again thinking how globally (yes, worldwide or at least ⅙ of land) to reduce useless hours - I decided to write an article that will help people quickly find their way around unfamiliar software.
')
I will not write about Linux - although I have solved the problem, I hardly became a professional. I will write about the more familiar to me MS SQL. Fortunately, I had to answer questions many times and the list of model ones is already ready.

For whom i'm writing

If you are an admin in Sberbank (or Yandex or <another top-100 company>), you can save the article as a favorite. Yes, it is useful! When newcomers come to you again, with the same questions, you will give them a link to it. It will save your time.

If no joke, this DBMS is often used in small companies. Often in conjunction with 1C or other software. Separate DB administrator such companies keep expensive - it will be necessary to get out the usual IT person. For such and write.

What problems will we consider?

If the server tells you that “E-disk space has run out”, a deep analysis is not needed. We will not consider errors, the solution of which is obvious from the text of the message. Also, we will not consider errors for which Google immediately gives a link to msdn with a solution.
Consider the problems for which it is not obvious what to google. Such as, for example, a sudden drop in performance or, for example, no connection. Consider the basic tools for customization. Consider the means of analysis. Let's look for where logs and other useful information lie. And in general, I will try in one article to collect the necessary information for a quick start.

The very first

Let's start with the leader of the list of frequent questions, so much he is ahead of everyone that we consider it separately. In addition, they write about this in all articles about the work of MS SQL - and I will not break the tradition.

If you suddenly, for no reason at all, began to work slowly, and you didn’t change anything (as they did, everything worked, nobody touched anything) - first of all, update the statistics and rebuild the indices. Just making sure that this is done - it makes sense to dig deeper. I emphasize once again - it is necessary to do this, the only question is how often.

The Internet is full of recipes for how to do this, provides examples of scripts. I suppose that all those methods for “pros” and beginners are incomprehensible. Well, I will describe the simplest method: to implement it, you only need the mouse control.

Abbreviations and Applications

SSMS - application “Microsoft SQL Server Management Studio”, located in “Start”. It is installed by a separate tick (Client management tools) from the server distribution kit. Starting with the 2016 version, is available for free on the MS website as a separate application. Older studio versions work fine with lower server versions. On the contrary, they also sometimes work (basic functions).
docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms “SSMS is free! It doesn’t require a license to install and use. ”
Profiler - “SQL Server Profiler” application, located in “Start”, installed with SSMS.
Performance Monitor (System Monitor) - snap-in control panel. Allows you to monitor performance counters, log and view measurement history.

Updating statistics using the “maintenance plan”:

we start SSMS;
connect to the right server;
expand the tree in the Object Inspector: Management \ Maintenance Plans (Maintenance Plans)
right click on the site, select “Maintenance Plan Wizard”
in the wizard with the mouse, we note the tasks we need:
- rebuild index (rebuild index)
- update statistics (update statistics)
you can mark both tasks at once, or make two maintenance plans for one task in each (see the “important notes” below);
next, we tick the necessary database (or several). We do this for each task (if you select two tasks, there will be two dialogues with the choice of database).
Next, Next, Finish

After these actions, you will create (rather than execute) a “maintenance plan”. Starting can be done manually - right click on it, select “Execute”. Or configure the launch via the “SQL Agent”.

Important notes:

Statistics update is a non-blocking operation. You can perform in working mode. Of course, it will create additional load, but after all, everything will be slowing down for you, there will be a little more - unnoticed.
Rebuilding the index is a blocking operation. Run only after hours. There is an exception - Enterprise edition of the server allows the execution of an “online rebuild”. This option is enabled by ticking the task settings. Please note that there is a tick in all editions, but it works only in Enterprise.
Of course, these tasks must be performed regularly. I propose a simple way to determine how often to do it:
- with the first problems, perform a maintenance plan;
- if it helps, wait until the problems start again (as a rule, until the next month closure / calculation of salary, etc., and mass operations);
- The resulting period of normal operation will guide you;
- for example, set up a maintenance plan twice as often.

Server is slow - what to do?

Server Used Resources

Like any other program, the server needs: processor time, data on the disk, RAM, and network bandwidth.

To estimate the shortage of one or another resource in the first approximation, you can use the Task Manager (Task Manager), as if in caps it would not sound.

CPU load

View the download in the manager can even a student. Here we just need to make sure that if the processor is loaded, then it is with the sqlserver.exe process.

If this is your case, then you need to go to the analysis of user activity in order to understand what exactly caused the download (we scroll below).

Disk loading

Many only look at the CPU load, but do not forget that the DBMS is a data warehouse. Data volumes are growing, processor performance is growing, and the HDD speed remains almost unchanged. With SSD, the situation is better, but terabytes on them are expensive to store.

It turns out that I often come across situations where the disk system becomes the bottleneck, and not the CPU.

For discs, the following indicators are important to us:

average queue length (I / O pending operations, pieces);
read / write speed (in MB / s).

The server version of the task manager, as a rule (depending on the version of the system), shows both. If not, launch the “Performance Monitor” control panel snap-in. We are interested in counters:

Physical (logical) disk / Average read (write) time
Physical (logical) disk / Average disk queue length
Physical (logical) disk / disk exchange rate

Expanded - you can read the manufacturer's manuals, for example, social.technet.microsoft.com/wiki/contents/articles/3214.monitoring-disk-usage.aspx . In short:

The queue is desirable so that it does not exceed 1. Short-term bursts are acceptable if they quickly fall down. Bursts may vary depending on your system. For a simple Reid mirror of two HDDs, there is a queue of more than 10-20 problems. For a cool library with super caching, I saw bursts of up to 600-800 that instantly resolved without causing delays.
The normal exchange rate also depends on the type of disk system. Normal (desktop) HDD “shakes” at 50-100 Mb / s. A good disk library of 500 MB / s and more. For small random operations, the speed is less. Approximately and be guided.
These parameters must be viewed in the complex. If your library is pumping 50Mb / s and at the same time there is a queue of 50 operations, there is clearly something wrong with the hardware. If the queue is built when pumping close to the maximum - then most likely the disks are not to blame - they simply can no longer - you need to look for a way to reduce the load.
It is necessary to look at the load separately by disks (if there are several of them) and compare with the placement of server files. Task Manager can show the most actively used files. It is convenient to use to make sure that the load comes exactly from the DBMS.

What can be caused by problems with the disk system:

iron problems
- cache crashed, performance dropped dramatically;
- disk system is used by something else;
Lack of RAM. Swap. Caching deteriorated, performance dropped (see the section on OP below).
Increased user load. It is necessary to evaluate the work of users (problem query / new functionality / increase in the number of users / increase in the amount of data /, etc.).
Fragmentation of database data (look at the rebuild indexes above), fragmentation of system files.
The disk system has reached its maximum capacity.

If you have the last option - do not rush to throw out the equipment. Sometimes you can squeeze a little more out of the system if you approach the problem with the mind. Check the placement of the system files for compliance with the recommended requirements:

Do not mix OS files with database data files. Place them on physically different media so that the system does not compete with the DBMS for I / O.
A database consists of two types of files: data (* .mdf, * .ndf) and logs (* .ldf). Data files are usually used more for reading. Logs - more to write (and write - consistent). From the understanding of this fact, it is recommended to place the logs and data on physically different media so that writing to the log does not interrupt reading the data (as a rule, the write operation takes precedence over reading).
MS SQL can use “temporary tables” for processing requests. They are stored in the tempdb system database. If you have a high load on the files of this database - then you can try to put it on physically separate media.

Summarizing the placement of files, use the principle of "divide and conquer." Evaluate which files are accessed and try to distribute them to different media. Also, use the features of RAID systems. For example, RAID-5 reads faster than it writes - which is well suited for data files.

In continuation:

We analyze the use of OP and network.
We look in detail at the work of users using SSMS, profiler and direct requests to system views.
query plan and statistics (consider several ways to get). live query statistics.
waits (standby). current information and statistics.
Problems connecting to the server. processes / ports / protocols

Source: https://habr.com/ru/post/336302/

All Articles

Analysis of MS SQL Server, for those who see it for the first time

Server is slow - what to do?

More articles: