How to ensure the performance of Microsoft SQL Server databases hosted in the cloud

Hello! Today we want to talk about cloud databases, and more precisely about those proactive and one-time events that directly provide their performance.

Cloud databases have long ago resolved the issue of rapid growth in capacity and the launch of new databases, and today they have become an almost urgent need for companies of any size, thanks to the transfer of administrative and monitoring tasks to the provider.

We have done a lot of work, launching and optimizing our new cloud database service on our own platform Technoserv Cloud and, of course, we faced a number of problems and developed our approaches to solving them. Now that the service has been tested and is working, we want to share our experience with you - we are sure that by reading this material you will be able to avoid repeating other people's mistakes or discover something new.

Today, businesses have to deal with constantly growing volumes of data, so there is a need to somehow ease the task of managing all these huge information files. The solution was found through the introduction of cloud computing platforms and the creation of cloud databases .

According to calculations, the volumes of both structured and unstructured data increase by an average of 60% per year. Until now, traditional databases have served as a storage place for all this information, but this was not enough, and then cloud technologies came to the rescue. They relieved users of the need to allocate their own computing power for the database, placing this responsibility on cloud service providers. This approach has proven to be extremely productive in terms of improving the performance and availability of databases, as well as improving their scalability.

In our article we decided to make a small overview of the most optimal settings and mechanisms to ensure the performance of MS SQL Server databases. This list is by no means exhaustive, since the recommendations of the developers and the “best industry practices” are adjusted over time. Only a team of professional DBAs (database administrators) can track all these changes and correctly implement them. It is this staff of specialists that has a cloud database service provider and, most likely, the majority of customers cannot boast of having such a team.

To ensure high availability and high performance of MS SQL Server databases hosted in the cloud, in accordance with Microsoft recommendations and best practices, we conduct the following events.

Note: Although most of the recommendations are general, their application on each specific server depends on very many factors. Therefore, below are links to relevant Microsoft documents with more detailed information.

Optimize MS SQL Server settings

Install the latest Service Packs / Cumulative Updates / Security Updates on all MS SQL servers:

Since 2016, the update scheme for Microsoft SQL Server has been simplified - now updates are released regularly.

The general sequence of installing updates is given below (all updates before installation on Production servers — initially we test in the Test environment):

• Install the latest Service Pack (SP).
• Install the latest cumulative update for the Service Pack - Cumulative Update (CU).
• In the case of Security Updates - also install them.
• In case of problems, look for and use Critical On-Demand (COD) - fix to fix them.

Note: Although Microsoft recommends installing the latest CUs as soon as they are released, often most companies install only the last SP, and CUs are installed only if the CU includes a fix for the problem on the server. We will coordinate this process with the download in accordance with its internal instructions, but always suggest that we first consider the official recommendations of Microsoft.

We configure the optimal parameters for using MS SQL Server memory and the optimal parameters for MaxDOP:

By default, MS SQL Server can dynamically change memory requirements based on the availability of system resources. By default, min server memory is 0, and max server memory is 2,147,483,647 MB. You can read about choosing the optimal parameters for using MS SQL Server memory here .

If MS SQL Server runs on a multiprocessor computer, it determines the optimal degree of parallelism, that is, the number of processors used to execute one instruction for each of the parallel execution plans. To limit the number of processors in terms of parallel execution, the max degree of parallelism parameter can be used. You can read about choosing the optimal MaxDOP parameters here .

Use trace flags if necessary:

The trace flags in MS SQL Server are a kind of “switch” of server behavior from the default to the other. Information on trace flags can be found here .

We optimize the settings of the “TempDB” database and other system databases:

MS SQL Server includes the following system databases:

• “master” - this system stores all system-level data for an instance of MS SQL Server;
• “msdb” - used by MS SQL Server Agent for scheduling alerts and tasks;
• “model” - used as a template for all databases created in an instance of MS SQL Server. Changing the size, sorting parameters, recovery model and other parameters of the model database leads to a change in the corresponding parameters of all databases created after the change;
• “Resource” is a read-only database. Contains system objects that are included with MS SQL Server. System objects are physically stored in the Resource database, but are logically displayed in the sys schema of any database;
• “TempDB” - workspace for temporary objects or interaction of result sets.

Recommendations for tuning the optimal performance of the “TempDB” database can be found here .

Correctly configure the default settings of the data files / log files:

When a new database is created in MS SQL Server without explicitly specifying the location for the data file / s and the log file, MS SQL Server creates these files in the default location. This default location is configured when installing MS SQL Server. You can read about setting the parameters of the default locations of data files / log files here .

Optimizing Windows Server Settings

We use the optimal settings of the disk subsystem (fast SSD drives formatted with a cluster size of 64K):

MS SQL Server has its own data storage features. In this regard, the preparation of the disk subsystem at both the physical and logical levels, taking into account these features, will have a serious impact on performance. Read more about it here .

Configure “Instant Database Database Initialization”:

In MS SQL Server, data files can be initialized instantly. Instant file initialization frees up disk space without filling the space with zeros. Instead, the disk contents are overwritten, as new data is written to the files. Log files cannot be initialized instantly. Details are here .

We use different network interfaces for “user” and “system” loads:

Our servers have several network interfaces and each individual interface can be used for some dedicated task, for example, for periodic backup traffic. Such a configuration has its advantages, for example, it makes it possible to differentiate between the use of interfaces for specific tasks and tasks as strictly as possible.

Optimizing custom database settings

A source

Check that the “Auto Shrink” and “Auto Close” parameters are turned off:

“Auto Shrink” (Automatic Compression) indicates that MS SQL Server will periodically compress database files (in more detail here ).
“Auto Close” indicates that the database will be closed after all its resources are released and all users are disconnected (in more detail here ).

Check that the “Auto Create Statistics” and “Auto Update Statistics” options are enabled:

If the “Auto Create Statistics” option is enabled, the query optimizer, if necessary, creates statistics for individual columns in the query predicate to improve the estimate of the number of items for the query plan (in more detail here ).

If the “Auto Update Statistics” option is enabled, the query optimizer determines when the statistics may be outdated and updates it if it is used in the query (in more detail here ).

Use the “Read Committed Snapshot Isolation” if necessary:

The term “Snapshot” reflects the fact that all requests in a transaction detect the same version, or database snapshot, that corresponds to the state of the database at the time the transaction starts. The snapshot transaction does not require locking of the base lines or data pages, which allows you to perform another transaction without locking it with a previous unfinished transaction. Transactions that change data do not block transactions in which data is read, and transactions that read data do not block transactions in which data is recorded, which is usually also observed when using the “Read Committed” default level set in MS SQL Server. The use of such an approach involving the rejection of locks contributes to a significant reduction in the likelihood of deadlocks in complex transactions.

Enabling the “Read Committed Snapshot Isolation” parameter provides access to versions of the lines from under the default isolation level of “Read Committed”. If the “Read Committed Snapshot Isolation” parameter is set to OFF, then to gain access to row versions you will need to explicitly set the isolation level of the snapshot for each session (in more detail here ).

Check that the “Page Verify” parameter is set to “CHECKSUM”:

If the “Page Verify” database parameter is set to “CHECKSUM”, then MS SQL Server calculates a checksum for the page content as a whole and stores the value in the page header when the page is written to disk. When a page is read from disk, the checksum is recalculated and compared to the value from the header. This helps to ensure a high level of data integrity in the files (in more detail here ).

Optimally manage data file / s and database log file

The data file / s and database log file is placed on separate physical disks:

Placing data files and log files on the same device can lead to contention, which will result in poor performance. Placing files on different disks allows you to perform I / O operations for data files and log files in parallel (in more detail here ).

Create only one database log file:

The log file is used by MS SQL Server sequentially, not in parallel, and there is no performance gain in having several log files (in more detail here ).

We do not allow the fragmentation of the “Virtual Log File (VLF)” database:

The database log file is internally divided into sections called Virtual Log Files (VLF), and the higher the fragmentation in the log file, the greater the number of VLFs. After the VLF number in the log file exceeds 200, the performance of log-related operations such as reading the log file (say, for transactional replication / rollback), backing up the log file, etc., may deteriorate. (more detail here ).

Choose the correct initial sizes of the data file / s and database log file:

When creating a database, data files should be made as large as possible, in accordance with the largest estimated amount of data in the database. For example, if we know that now we will have 50 GB of data, and another half GB will be added in six months, then the initial size of the data file should be immediately set to 100 GB (in more detail here ).

Select the correct parameters “Auto-growth” for the data file / s and the database log file:

It is not recommended to use “Auto-growth” in percentages, since if the size of the database files is large, the process of increasing the database itself can cause a significant decrease in performance, therefore, it is more preferable to increase the base by a fixed size in MB (in more detail here ).

We constantly monitor the sizes of the data file / s and the log file of the database and, if necessary, proactively increase them during the minimum load of the database:

In the production system, the “Auto-Growth” function should only be used as a means of increasing the size of files in an emergency. It is not recommended to use it for daily management of the growth of database data files. Alerts or monitoring programs are usually used to monitor file sizes and proactively. This allows you to avoid fragmentation and to transfer the execution of these maintenance operations to the hours when the load is minimal (in more detail here ).

Optimally serving database

A source

We perform database integrity check:

We check the logical and physical integrity of all objects in the database by performing the following operations (in more detail here ):

• executing DBCC CHECKALLOC for a database;
• executing DBCC CHECKTABLE for each table and each view in the database;
• executing DBCC CHECKCATALOG for a database;
• checking the contents of each indexed view in the database;
• checking consistency between files and file system directories and table metadata at the link level when varbinary (max) data is stored in the file system using FILESTREAM;
• Verification of Service Broker data in the database.

Perform custom index rebuild / reorganize depending on index fragmentation:

MS SQL Server automatically maintains the state of the indices when performing insert, update, or delete operations on the underlying data. Over time, these changes may cause the data in the index to be scattered across the database (fragmented). Fragmentation occurs when indexes contain pages for which a logical order based on the key value does not match the physical order in the data file. Significantly fragmented indexes can seriously degrade query performance and cause slower application responses. You can eliminate fragmentation by reorganizing or rebuilding the index (in more detail here ).

Update statistics:

By default, the query optimizer updates statistics as needed to improve the query plan. Updating statistics ensures that queries will be compiled with up-to-date statistics. However, updating statistics causes recompilation of queries. It is recommended not to update the statistics too often, since it is necessary to find a balance between performance gains due to improved query plans and loss of time to recompile queries. The criteria for choosing a compromise solution depend on the application (in more detail here ).

We do not use any “bad” practices, for example, such as “regular compression”:

Data transferred during file compression can be scattered around any available places in the file. This causes index fragmentation and may increase the execution time for queries that perform searches in the index range (in more detail here ).

If necessary, we organize regular cleaning of the database from the “old” data:

Often, companies must store data for a period of time in order to comply with the requirements of current legislation and their internal requirements. After the data is no longer needed - it is usually recommended to delete it, which improves the performance of MS SQL Server and allows you to more accurately predict the possible increase in requirements for server hardware (in more detail here ).

We perform optimal database backup

A source

We determine the best database backup strategy in accordance with customer requirements for RTO / RPO and the best international practices:

MS SQL Server provides the necessary protection for sensitive data that is stored in databases. In order to minimize the risk of irreversible data loss, it is necessary to regularly create backup copies of databases in which the changes made to the data will be saved. A well-thought-out backup and recovery strategy protects the database from data loss due to damage resulting from various failures (in more detail here ).

We perform regular test restoration of database backups:

It can be said that a recovery strategy is missing until backups are not tested. It is very important to fully test the backup strategy for each database, restoring a copy of the database to the test system. You need to test restoring each type of backup you plan to use (in more detail here ).

We use the optimal technology of high availability database (depending on requirements)

Always On Failover Cluster Instances:

AlwaysOn failover clustering instances use the Windows Server failover clustering (WSFC) functionality to provide high availability of local resources through redundancy at the server instance level - a cluster failover instance (FCI). A Failover Cluster Instance (FCI) is the only instance of MS SQL Server installed on all Windows Server Failover Clustering nodes (WSFC) and, possibly, on several subnets. An instance of a failover cluster appears on the network as an instance of MS SQL Server running on one computer, but an instance of a failover cluster provides failover with the transition of one WSFC node to another node if the current node becomes unavailable (in more detail here ).

Always On availability groups:

AlwaysOn Availability Groups is a high availability and disaster recovery solution that is an alternative to database mirroring (“database mirroring”). An availability group maintains a fault-tolerant environment for a set of user databases, known as availability databases, which together make a transition to another resource. An availability group maintains a set of primary databases for reading / writing and from one to eight sets of corresponding secondary databases. In addition, secondary databases can be made available only for reading and / or for some backup operations (in more detail here ).

Database mirroring:

Database mirroring is a solution aimed at improving the availability of an MS SQL Server database. Each database is mirrored separately and works only with those databases that use the full recovery model (in more detail here ).

Log shipping:

MS SQL Server allows you to automatically send backup copies of transaction logs from the source database of the source server instance to one or more recipient databases of other instances of the recipient server. Transaction log backups are applied to each of the recipient databases individually (in more detail here ).

We constantly monitor the server status (MS SQL + Windows)

Server monitoring is categorized as vital events. Effective observation involves regularly taking snapshots of current performance to detect processes that are causing problems, and continuously collecting data to track growth trends or performance changes.

A source

Continuously assessing database performance helps achieve optimal performance by minimizing response times and maximizing throughput. Approximate network traffic, disk I / O and CPU utilization are key factors affecting performance. You should carefully analyze the requirements of the application, understand the logical and physical structure of the data, evaluate the use of the database and achieve a compromise between such conflicting workloads as online transaction processing (OLTP) and decision support (in more detail here ).

Source: https://habr.com/ru/post/336070/

All Articles

How to ensure the performance of Microsoft SQL Server databases hosted in the cloud

Optimize MS SQL Server settings

Install the latest Service Packs / Cumulative Updates / Security Updates on all MS SQL servers:

We configure the optimal parameters for using MS SQL Server memory and the optimal parameters for MaxDOP:

Use trace flags if necessary:

We optimize the settings of the “TempDB” database and other system databases:

Correctly configure the default settings of the data files / log files:

Optimizing Windows Server Settings

We use the optimal settings of the disk subsystem (fast SSD drives formatted with a cluster size of 64K):

Configure “Instant Database Database Initialization”:

We use different network interfaces for “user” and “system” loads:

Optimizing custom database settings

Check that the “Auto Shrink” and “Auto Close” parameters are turned off:

Check that the “Auto Create Statistics” and “Auto Update Statistics” options are enabled:

Use the “Read Committed Snapshot Isolation” if necessary:

Check that the “Page Verify” parameter is set to “CHECKSUM”:

Optimally manage data file / s and database log file

The data file / s and database log file is placed on separate physical disks:

Create only one database log file:

We do not allow the fragmentation of the “Virtual Log File (VLF)” database:

Choose the correct initial sizes of the data file / s and database log file:

Select the correct parameters “Auto-growth” for the data file / s and the database log file:

We constantly monitor the sizes of the data file / s and the log file of the database and, if necessary, proactively increase them during the minimum load of the database:

Optimally serving database

We perform database integrity check:

Perform custom index rebuild / reorganize depending on index fragmentation:

Update statistics:

We do not use any “bad” practices, for example, such as “regular compression”:

If necessary, we organize regular cleaning of the database from the “old” data:

We perform optimal database backup

We determine the best database backup strategy in accordance with customer requirements for RTO / RPO and the best international practices:

We perform regular test restoration of database backups:

We use the optimal technology of high availability database (depending on requirements)

Always On Failover Cluster Instances:

Always On availability groups:

Database mirroring:

Log shipping:

We constantly monitor the server status (MS SQL + Windows)

More articles: