Routine work with the database of the 24x7 information system in MS SQL Server

Foreword

In this article, the main routine maintenance of the 24x7 information system database (that is, which has no downtime) and the approaches to their implementation in MS SQL Server will be discussed. Also please note that this article will be a brief overview, and not all works will be sufficiently detailed. However, this information is sufficient to, if necessary, to study in more detail a particular routine work.

I would be very grateful if amendments and additions to this article appear in the comments.

Routine work

There are the following main maintenance work with the database:

1) Routine backup with subsequent verification without recovery
2) Planned restoration of previously created backups in order to fully test their performance
3) Analysis of the media on which the system and all necessary databases are located
4) Routine inspection of the work of the necessary services
5) Planned optimization of system performance
6) Routine data integrity check
7) Routine data validation check
')
The first three points are the most critical, as they provide system recovery after various failures. However, the last three points must be planned to be carried out in order for users to work comfortably (all requests were executed quickly or in a reasonable time), and the data were correct in different reporting systems.

To automate routine work, you can arrange parts of this work into the tasks of the Agent or Windows Scheduler.

Clause 6 is based on the CHECKDB command.

Item 7 is implemented with respect to the subject area, which is used in the information system.

Items 6 and 7 will not be discussed in detail.

Now we analyze the first five points in more detail.

Routine backup creation with subsequent verification without recovery

Enough articles have been written on this topic (see the sources and an Example of implementing an automated backup and recovery process of databases with built-in tools ), so we only note that this regular work should be done constantly, preferably on a backup, not on the main server. On such a backup server must be current data (for example, using replications). It is also necessary to make backup copies of all system databases (except tempdb) on each instance of MS SQL Server.

If the backup was unsuccessful (or a backup check revealed a problem), this information should be reported to the administrators. For example, to mail ( Setting up email notifications in MS SQL Server ).

It is important to determine a reservation strategy that would answer the following questions as well:

1) how often and when to backup (full, differential, transaction log)
2) how long and when to delete backup copies.

Planned restoration of previously created backups in order to fully test their performance

This procedure is also best performed on the backup server using third-party utilities or the RESTORE command.

If the backup is not successfully restored, this information must be reported to the administrators. For example, to mail ( Setting up email notifications in MS SQL Server ).

Also, as a check, you must restore backup copies of system databases. To do this, it is enough to restore them as a normal user database with a name different from the names of the system databases.

Analysis of the storage media on which the system and all necessary databases are located

Here it is necessary to analyze how much space each database takes, how file sizes change, and also how free space of the entire medium changes (for example, some of this task can be done using Auto-collection of data about database files and operating system logical disks in MS SQL Server ).

This test can be done every day and send the results. For example, to mail ( Setting up email notifications in MS SQL Server ).
It is also necessary to monitor the system databases. If this is not done, it can happen, for example, this .

It is also important to test the media themselves for depreciation or erroneous sectors. It is important that at the time of such testing, the carrier must be taken out of operation, and the data should be transferred to another carrier, because this kind of verification strongly loads the carrier.

This task is already a purely task of the system administrator, and therefore we will not consider it in detail. For complete control, it is better to negotiate with the system administrator about automatically sending a test report to the mail.

This test is best done once a year.

Routine inspection of the work of the necessary services

Generally speaking, services should never fall. For this purpose, a backup server is intended, which in the event of a primary server failure, will become the main one. But you need to look through the logs periodically. You can also think of automatic collection of information with the subsequent notification of administrators. For example, sending results to the mail ( Setting up email notifications in MS SQL Server ).

It is also necessary to check the tasks of the Agent (or Windows Task Scheduler) themselves. For example, you can automate the task scan of the Agent by using the automatic data collection of completed tasks in MS SQL Server .

System performance optimization

These include the following components:

1) Automation of defragmentation of indexes in MS SQL Server database
2) Auto-collection of data about changes in database schemas in MS SQL Server (so that you can restore the necessary backup and compare the changes, for example, using dbForge )
3) Automatic removal of hung processes in MS SQL Server
4) Cleaning the procedural cache (it is important to determine when and what to clean)
5) Implementing a performance indicator
6) Development and modification of indices ( A Tale of a Clustered Index )

We will not describe each item in detail, but dwell on paragraphs 5-6.
In paragraph 5, you need to create a system that would show the speed of the database. It can also be done this way .

Also in most cases I advise you to disable the AUTO_CLOSE parameter in the database settings.

Let's talk a little more about optimizing the queries themselves.

Sometimes for various reasons (and sometimes not clear) the optimizer decides to parallelize the query. And he does not always do it optimally.

There is a general recommendation:

1) If there is a lot of data as a result of processing (sampling, changing), then leave parallelism
2) If there is little data as a result of processing (sampling, modification), then parallelism in most cases does not optimally build a plan.

Two parameters in the settings of the MS SQL Server instance are responsible for parallelism:

1) The maximum degree of parallelism (max degree of parallelism) (to disable parallelism, set the value to “1”, that is, there will always be only one processor involved in the execution of the query plan)
2) The cost threshold for parallelism (in most cases it is better to leave it by default)

There are two main queues:

1) Waiting for processor time (QCPU queue) is when the request process has already been started and is waiting for the processor to execute (for example, another process has supplanted the request process)
2) Waiting for resources (QR queue) is when the process waits for the release of resources for its execution or continuation of execution (disks, RAM, etc.)

The entire execution time of the query T is determined by the following formula:

T = TP + TQR + TCPU + TQCPU, where:

1) TP - compile time plan
2) TQR - waiting time for resources (time of the process in the QR queue)
3) TQCPU - waiting time for the release of the necessary processors (the time the process is in the QCPU queue)
4) TCPU - query execution time (summed up all the time on all processors)

In sys.dm_exec_query_stats and similar system views:

1) total_worket_time = TP + TCPU + TQCPU
2) total_elapsed_time = TQR + TCPU

From this it follows that the built-in tools cannot say for sure how long the request was actually executed. But in most cases with the help of total_elapsed_time you can get the time that is close to the execution time of the request itself. You can determine the execution time of a query more precisely, either using tracing, or using the information system itself, if the beginning and end of the query call is somehow logged into the log table in the database. But any trace loads the system, so it is better to perform it also on the backup server, and collect data from the main server. Then the load will be only on the network.

During parallelization, n processes are allocated to a request (in the release Standart n <= 4). And each process will require such processor time for execution (not always 1 process will be executed on each core). The more processes, the greater the likelihood that one or more processes will be ousted by other processes, which in turn will lead to an increase in TQCPU.

The query may run slower when paralleling in the following cases:

1) if the system has very low throughput of disk subsystems, then when analyzing a query, its decomposition can be performed longer than without parallelism.
2) it is possible to skew the data or block the data ranges for the process, generated by another, used in parallel and the process started later, etc.
3) if there is no index for the predicate, which leads to scanning the table. A parallel operation within a query can hide the fact that the query would execute much faster with a consistent execution plan and with the correct index.

Recommendations:

To prohibit parallelization of requests on servers where there is no large sample (the total_worket_time indicator should decrease due to the possible reduction of TCPU and TQCPU, especially the latter, t to the first may not change approximately, since all processes will turn into one linear process). To do this, the parameter Max degree of parallelism (max degree of parallelism) must be set to 1, so that only one processor is always involved in the execution of the query plan.

You can also use other ready-made solutions to build a system that determines the speed of the database. It is important to clearly understand how these solutions work and correctly interpret the numbers obtained.

According to claim 6 there are also quite a lot of comprehensive information on the Internet. The main thing is to understand how the index is logically arranged and how it works.

I only remind you that the primary key and the clustered index are not the same:

A primary key is a field or a set of fields that make a unique entry in a table (for it, you can build either a unique clustered index or a unique nonclustered index). It is used in other tables as a foreign key to ensure data integrity.

The cluster index is essentially a B-tree (or its modification), the leaves of which contain the data itself (rows), and the nodes contain what was defined as a cluster index (field or group of fields). This definition is given roughly as to what is contained in the leaves and nodes of the B-tree, but this definition is necessary to understand the process of searching and inserting into this tree. In this case, a clustered index may be non-unique, but it is better to try to make it unique.

Let me remind you that a B-tree is a structure that stores data in a sorted form according to a clustered index. Therefore, it is also important that the fields selected as a clustered index either increase or decrease in order. That is, fields of the integer type (identity) are perfect for a clustered index, as well as date and time (the data will always be written to the end of the tree), but the uniqueidentifier fields are bad, and the latter will lead to permanent rearrangements B -tree, which in turn will increase the number of readings and records on the storage medium where the database is located.

Unfortunately, this fact is sometimes forgotten even by the leading software developers, and this is clearly not covered in free resources, but the indices themselves are dismantled and how they are arranged. And from this it is already concluded that it is possible to do a clustered index, and what is better not to do.

You also need to make sure that the index is used with the sys.dm_db_index_usage_stats system view.

PS: You also need to constantly check that the data on the backup server is up-to-date, as well as the system itself that synchronizes this data (for example, replication) and automate information on collecting data on the growth of tables and files of all MS SQL Server databases . And most importantly, do not make typical DBA errors

The following is an overview of using Zabbix to monitor the MS SQL Server database and the implementation of the failure in MS SQL Server 2017 Standard .

Sources:

CHECKDB
» Setting up email notifications in MS SQL Server
» RESTORE
» Auto-collection of data about database files and logical disks of the operating system in MS SQL Server
» Database Performance Testing with tSQLt and SQLQueryStress
» AUTO_CLOSE
» Story about msdb size of 42 GB
» TOP (10) free SSMS plugins
" Everyday Service Plan" - Part 1: Automatic Index Defragmentation
» Daily Service Plan - Part 2: Automatic Update of Statistics
" A daily service plan" - Part 3: Automatic backup creation
» Auto-collection of data on completed tasks in MS SQL Server
» Automating defragmentation of indexes in MS SQL Server database
» Auto-collection of data about changes in database schemas in MS SQL Server
» Useful features of dbForge for administering MS SQL Server databases )
» Automatically remove hung processes in MS SQL Server
» Cleaning the procedural cache
» Implementing a performance indicator for queries, stored procedures and triggers in MS SQL Server. Autotrace
» Degrees of concurrency and uncertainty in Microsoft SQL Server
» Presentation of waiting
» Query execution scheme
»The degree of parallelism
» Investigate databases with T-SQL
» A Tale of Clustered Index
» Automation to collect data on the growth of tables and files of all MS SQL Server databases
» 13th article on typical DBA errors MS SQL Server
» Failure implementation in MS SQL Server 2017 Standard
» Using Zabbix to monitor MS SQL Server database
» Sys.dm_db_index_usage_stats
" An example of the implementation of an automated database backup and recovery process with built-in tools

Source: https://habr.com/ru/post/322312/

All Articles