Failure implementation in MS SQL Server 2017 Standard

Foreword

Quite often, you can meet the problem of ensuring the resiliency of MS SQL Server DBMS. It becomes even more interesting when there is no Enterprise license, but only Standard.

Immediately, we note that the Express license will not be considered, and there are significant limitations on this instance. Yes, some of them can be circumvented. For example, the maximum database size of 10 GB is easily solved by decomposing a large database into smaller ones (for example, creating a new database for some reason, and combining the samples from the same tables of different databases in the main database). But fault tolerance in Express will be achieved either by the system administrator, or by using the actually developed (or third-party) software.

In this article, we will briefly review all existing standard fault tolerance technologies for MS SQL Server 2017 and will consider an example of how to implement the failure of the most appropriate unified one in the Standard license.

Short review

AlwaysOn: load distribution among all participants; all participants should be as similar as possible to each other.
In synchronous mode, the maximum reliability of data transmission is ensured, however, the speed of work will be equal to the speed of the slowest participant. In asynchronous mode, maximum speed is ensured, however, data mismatch between the participants may occur, which leads to more complex support and the likelihood of losing the latest changes in case of a major participant failure.
')
The speed of switching in synchronous mode is almost instantaneous and does not require intervention by the system administrator and DBA, in asynchronous mode it depends on the current state of the duplicate database, but usually it takes up to 5 minutes on average (you can also automate switching by one DBA without involving the system administrator).

Recognized by Microsoft as the recommended technology for the database. Available with Enterprise license from 2012 version and above. Available with restrictions with the Standard license (for details, see Top 5 Questions about Basic Availability Groups ).
Clustering: despite the simplicity of configuration, this solution is unreliable in view of the bottleneck in the form of a single data warehouse for all. In case of failure of the data warehouse, recovery will take a rather long period of time — more than 1 hour.
Available with Standard license prior to 2008 version and above.
Replication: any replication implies the creation of system triggers on each member table, and the replication of snapshots will be quite hard for the main database. Therefore, the replication of snapshots can be done only during the minimum hours of database load (for example, at night), which is unacceptable, because a hot reserve is required. Merge replication is difficult to maintain for some systems (for example, CRM, NAV), it is also not suitable for 1C due to frequent changes in database structures.

Transactional replication is possible, but with an Enterprise license.
Available with Standard license (merge and database snapshots) and Enterprise (transactions) up to 2008 version and higher.
Mirroring: possible in any mode, however, as with AlwaysOn, synchronous mode provides maximum reliability and fast switching, and asynchronous mode gives maximum speed from the main database, but data mismatch between all participants is possible, and switching will not be instantaneous. Here, switching at the database level provides the witness server automatically (for example, with a CPU load of more than 50% on the main server) or by means of DBA. Connection to another server is provided by the system administrator. The backup database for any type of mirroring is in the permanent recovery mode, as a result of which it is impossible to access it.
DB recovery mode is full.

Recognized by Microsoft as obsolete technology for the database.
Available with Standard (in synchronous mode) and Enterprise (in asynchronous mode) up to 2008 version and higher.
Delivery of transaction logs: there are 2 modes — permanent recovery on a backup server or recovery with a delay.

The first mode switches the backup database (as in mirroring) to the permanent recovery mode and cannot be accessed.

The second mode transfers the backup database to the recovery mode periodically at the moment of updates rolling (between the update rolls, the backup database is available, but this is possible provided that the MS SQL Server instances are the same version).

The principle of operation is simple:
1. Periodically, a backup copy of the database transaction log at the source is made to the public folder of both the source and the backup whine (the path and schedule are configured, by default, every 15 minutes).
2. The backup skully periodically copies the resulting backup of the database transaction log to a local accessible folder (the path and schedule are configured, by default, every 15 minutes).
3. The backup script restores the transaction log from the copied backup of the transaction log (a schedule is configured, by default - every 15 minutes).
Switching can be automated at the DB level by DBA forces, and at the level of connections to the server at the level of the system administrator.

It is also worth noting that this method always works in asynchronous mode. You can configure multiple backup databases.

Database Recovery Mode - full or incomplete logging
Available with Standard license prior to 2008 version and above.

Conclusion

From the above options, the delivery of transaction logs in the Standard license is more suitable, and it is convenient to use it for a smooth transition from one server to another (for example, when updating the environment). Also, the delivery of transaction logs is sufficiently simple to maintain and easy to operate, and it also always works in asynchronous mode, which is not so much (as a synchronous mirroring mode) loads the database. In any case, mirroring is acceptable if it is possible to configure your own automatic switching, otherwise false switching may occur (for example, when the CPU of the main server is loaded by more than 50%).

For Enterprise - the technology AlwaysOn is definitely offered.

Configure failover on the delivery of transaction logs

Details on setting up the delivery of transaction logs are described here . This process can also be automated by writing your own utility for repeated reuse, as well as for returning to the main server after it has been repaired in the event of a failover.

Let us now analyze one of the possible options for failover during the delivery of transaction logs at the DBMS level.

It is important to note that this method is suitable only for the server, which is a backup for only one instance of MS SQL Server, that is, for several there is a problem in determining which tasks to perform and which not.

We first describe the sequence of actions:

Perform all tasks of copying the latest files from the source (with a well-thought-out architecture, the path must be accessible even if the main server falls)
Shut down all tasks for copying files from source
Perform all tasks to restore the database on the latest files from the source
Disable all tasks to restore the database on the latest files from the source
Make the database restored and the main log shipping, but without the recipient
Create full backups for the database
Create jobs to back up database transaction logs

Below is an example implementation of the above described sequence in the form of a stored procedure.

It is worth noting that it is important to set up a login (preferably a domain), under which the created tasks will be run to create backup copies of transaction logs.

Transaction Log Failover Implementation Example

CREATE PROCEDURE [srv].[RunLogShippingFailover] @isfailover bit=1, @login nvarchar(255)=N'', -- ( ),              @backup_directory nvarchar(255)=N''--          MS SQL Server (, 'D:\Shared') AS /*            @isfailover=1    @isfailover=0   -           ,                 . ,              */ BEGIN --     ,           if(@isfailover=1) begin select [job_id] into #jobs from [msdb].[dbo].[sysjobs] where [name] like 'LSCopy%'; declare @job_id uniqueidentifier; while(exists(select top(1) 1 from #jobs)) begin select top(1) @job_id=[job_id] from #jobs; begin try EXEC [msdb].dbo.sp_start_job @job_id=@job_id; end try begin catch end catch delete from #jobs where [job_id]=@job_id; end drop table #jobs; end --             --             update [msdb].[dbo].[sysjobs] set [enabled]=case when (@isfailover=1) then 0 else 1 end where [name] like 'LSCopy%'; --     ,             if(@isfailover=1) begin select [job_id] into #jobs2 from [msdb].[dbo].[sysjobs] where [name] like 'LSRestore%'; while(exists(select top(1) 1 from #jobs2)) begin select top(1) @job_id=[job_id] from #jobs2; begin try EXEC [msdb].dbo.sp_start_job @job_id=@job_id; EXEC [msdb].dbo.sp_start_job @job_id=@job_id; end try begin catch end catch delete from #jobs2 where [job_id]=@job_id; end drop table #jobs2; end --                --                update [msdb].[dbo].[sysjobs] set [enabled]=case when (@isfailover=1) then 0 else 1 end where [name] like 'LSRestore%'; --    ,        ,    if(@isfailover=1) begin select [secondary_database] as [name] into #dbs from msdb.dbo.log_shipping_monitor_secondary where [secondary_server]=@@SERVERNAME; declare @db nvarchar(255); while(exists(select top(1) 1 from #dbs)) begin select top(1) @db=[name] from #dbs; begin try RESTORE DATABASE @db WITH RECOVERY; end try begin catch end catch delete from #dbs where [name]=@db; end drop table #dbs; select [secondary_database] as [name] into #dbs2 from msdb.dbo.log_shipping_monitor_secondary where [secondary_server]=@@SERVERNAME; declare @jobId BINARY(16); declare @command nvarchar(max); declare @dt nvarchar(255)=cast(YEAR(GetDate()) as nvarchar(255)) +'_'+cast(MONTH(GetDate()) as nvarchar(255)) +'_'+cast(DAY(GetDate()) as nvarchar(255)) +'_'+cast(DatePart(hour,GetDate()) as nvarchar(255)) +'_'+cast(DatePart(minute,GetDate()) as nvarchar(255)) +'.trn'; declare @backup_job_name nvarchar(255); declare @schedule_name nvarchar(255); declare @disk nvarchar(255); declare @uid uniqueidentifier; while(exists(select top(1) 1 from #dbs2)) begin select top(1) @db=[name] from #dbs2; set @disk=@backup_directory+N'\'+@db+N'.bak'; set @backup_job_name=N'LSBackup_'+@db; set @schedule_name=N'LSBackupSchedule_'+@@SERVERNAME+N'_'+@db; set @command=N'declare @disk nvarchar(max)='+N''''+@backup_directory+N'\'+@db+'_'+@dt+N'''' +N'BACKUP LOG ['+@db+'] TO DISK = @disk WITH NOFORMAT, NOINIT, NAME = '+N''''+@db+N''''+N', SKIP, NOREWIND, NOUNLOAD, STATS = 10;'; set @uid=newid(); begin try BACKUP DATABASE @db TO DISK = @disk WITH NOFORMAT, NOINIT, NAME = @db, SKIP, NOREWIND, NOUNLOAD, STATS = 10; EXEC msdb.dbo.sp_add_job @job_name=@backup_job_name, @enabled=1, @notify_level_eventlog=0, @notify_level_email=0, @notify_level_netsend=0, @notify_level_page=0, @delete_level=0, @description=N'No description available.', @category_name=N'[Uncategorized (Local)]', @owner_login_name=@login, @job_id = @jobId OUTPUT; EXEC msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=@backup_job_name, @step_id=1, @cmdexec_success_code=0, @on_success_action=1, @on_success_step_id=0, @on_fail_action=2, @on_fail_step_id=0, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'TSQL', @command=@command, @database_name=N'master', @flags=0; EXEC msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1; EXEC msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=@backup_job_name, @enabled=1, @freq_type=4, @freq_interval=1, @freq_subday_type=4, @freq_subday_interval=5, @freq_relative_interval=0, @freq_recurrence_factor=0, @active_start_date=20171009, @active_end_date=99991231, @active_start_time=0, @active_end_time=235959, @schedule_uid=@uid; EXEC msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'; end try begin catch end catch delete from #dbs2 where [name]=@db; end drop table #dbs2; end END

To return to the primary server, you need to reconfigure the delivery of transaction logs from the backup server to the primary server, and then do a failover (after that, the primary server will again become operational). After that, configure the delivery of transaction logs from the battle server to the backup server.

Configure autocorrection to monitor the delivery of transaction logs

To monitor the delivery of transaction logs, the monitor server uses the LSAlert_ <NAME_SCEMPLARE> task and report (Right-click on the monitor server instance, hereafter Reports \ Standard Report \ Transaction Log Delivery Status).

Quite often, over time, the server monitor (in case it is not combat) incorrectly takes the last time to create a backup copy of the database transaction log for the combat server. As a result, false warnings occur.

You can solve the problem with the following simple script:

An example of the implementation of the correction monitor the delivery of transaction logs

 CREATE PROCEDURE [srv].[AutoCorrectMonitorLogShipping] AS BEGIN /*       */ SET NOCOUNT ON; update t2 set t2.[last_backup_date]=t1.[BackupFinishDate] ,t2.[last_backup_date_utc]=DateAdd(hour,-DateDiff(hour,GetUTCDate(),GetDate()),t1.[BackupFinishDate]) ,t2.[last_backup_file]= RIGHT(t1.[PhysicalDeviceName], CHARINDEX('\',REVERSE(t1.[PhysicalDeviceName]),1)-1) from [__].[SRV].[inf].[vServerLastBackupDB] as t1 inner join [msdb].[dbo].[log_shipping_monitor_primary] as t2 on t1.[DBName] collate SQL_Latin1_General_CP1_CI_AS=t2.[primary_database] collate SQL_Latin1_General_CP1_CI_AS where t1.[BackupType]=N'log'; END

Calling a stored procedure can be automated in time. For example, every 5 minutes, creating the corresponding task in the Agent. Naturally, the combat server must be connected on the backup (Server Objects \ Linked Servers).

This uses the [inf]. [VServerLastBackupDB] view in the SRV database, which defines the latest database backup copies:

An example implementation of the vServerLastBackupDB view

 CREATE VIEW [inf].[vServerLastBackupDB] as with backup_cte as ( select bs.[database_name], backup_type = case bs.[type] when 'D' then 'database' when 'L' then 'log' when 'I' then 'differential' else 'other' end, bs.[first_lsn], bs.[last_lsn], bs.[backup_start_date], bs.[backup_finish_date], cast(bs.[backup_size] as decimal(18,3))/1024/1024 as BackupSizeMb, rownum = row_number() over ( partition by bs.[database_name], type order by bs.[backup_finish_date] desc ), LogicalDeviceName = bmf.[logical_device_name], PhysicalDeviceName = bmf.[physical_device_name], bs.[server_name], bs.[user_name] FROM msdb.dbo.backupset bs INNER JOIN msdb.dbo.backupmediafamily bmf ON [bs].[media_set_id] = [bmf].[media_set_id] ) select [server_name] as [ServerName], [database_name] as [DBName], [user_name] as [USerName], [backup_type] as [BackupType], [backup_start_date] as [BackupStartDate], [backup_finish_date] as [BackupFinishDate], [BackupSizeMb], --   [LogicalDeviceName], [PhysicalDeviceName], [first_lsn] as [FirstLSN], [last_lsn] as [LastLSN] from backup_cte where rownum = 1;

Result

This article reviewed a brief overview of all possible options for fault tolerance and fast availability in MS SQL Server 2017, as well as analyzed examples of the implementation of failover and autocorrection of monitoring the delivery of transaction logs.

Thanks

Thanks NoOne for constructive comments about AlwaysOn.

Sources:

" Msdb
» Comparison of editions of SQL Server 2017
» AlwaysOn
» Clustering
» Replication
» Mirroring
» Transaction log shipping
» Customize log shipping
SQL Server database was restored
» Top 5 Questions about Basic Availability Groups

Source: https://habr.com/ru/post/342248/

All Articles