Imagine a situation where our project, using Oracle as a DBMS, suddenly (or hopefully expected) became critical for a business (respectively, there was a willingness to allocate funds to ensure the reliability of the system).
Up to this point, we completely managed with daily or even weekly backup (“hot” or “cold” copying, or maybe just exporting data) and we were satisfied with the system recovery time of about a day (we assume that we have a couple of terabytes of data).
And it turned out that we are given no more than an hour to restore the system, and we cannot lose any data.
So, everything indicates that we will have to raise a standby server.
In principle, most of what is said in this article is described in
"Oracle Data Guard Concepts and Administartion" , as well as in a heap of places on the web, but for the most part these are instructions containing a sequence of commands, without much descriptions of their meaning and, most importantly, without recommendations on what to do if something goes wrong.
I will try to describe the process of deploying a physical standby base in as much detail as possible indicating those rakes that I have ever come across.
An indication of problems that I have not accidentally discovered, as well as any clarifications and additions are greatly appreciated.
In the following, when examples of commands and queries will be given in the text, I will use the following notation:
$ - the command is entered in the command line of the operating system under the user oracle.
SQL> - the command is entered in sqlplus. In this article, wherever it is not defined explicitly, it is implied that sqlplus is running in administrative mode (
sqlplus / as sysdba ), and the base instance is set via the $ ORACLE_SID variable.
RMAN> - the command is entered into rman. Here also, unless something else is explicitly defined, it is assumed that rman is started with the
rman target / command, and the base instance is set via the $ ORACLE_SID variable.
Before we begin, it is worthwhile to say a little about those Oracle database organization principles that we will need to understand the mechanism of backup and recovery of data in Oracle DBMS.
An Oracle DB instance contains the following file types:
Control files - contain service information about the database itself. Without them, data files cannot be opened and therefore access to database information cannot be opened.
Data files (Data files) - contain database information.
Operational logs (Redo logs) - contain information about all changes that have occurred in the database. This information can be used to restore the state of the base in case of failures.
')
There are other files that are not formally included in the database, but are important for the success of the database.
Parameters file - used to describe the starting configuration of an instance.
Password file - allows users to remotely connect to the database to perform administrative tasks.
Archive journals - contain the history of the operational log files created by the instance (their offline copies). These files allow you to restore the database. Using them and database reserves, you can recover a lost data file.
The main idea when creating a standby instance is to keep the backup database up to date by performing transactions stored in operational or archive logs of the main database (such a mechanism for Oracle is called Data Guard).
Hence the first requirement for our main database - it must be run in
archivelog mode .
The second requirement is the presence of a password file. This will allow you to remotely connect to our database in administrative mode.
The third requirement is the
force logging mode. This mode is needed to force transactions to be recorded in redo logs even for operations performed with the NOLOGGING option. The absence of this mode can lead to the fact that on a standby base, some data files will be damaged, because during the "knurling" of archived logs, it will not be possible to get data from them about the transactions performed with the NOLOGGING option.
It should also be noted that if you use Oracle below 11g, then it is necessary that the servers for the main base and for standby have the same platform. That is, if your main database runs on a Linux server, then the standby server cannot be running Windows.
All the examples in this article will be focused on unix-systems, however, their difference from the case of Windows-based systems, basically, consists only in the way of writing paths to files.
We also do not forget that the data exchange between the main and standby servers will take place via SQL-Net, therefore it is necessary that connections to the corresponding port (as a rule, 1521 tcp) be opened in both directions.
We assume that our database is called
test . We will adjust the configuration of the main and standby base in such a way that at any moment we can switch their roles to places (switchover). We plan that our system will use Data Guard protection mode, which is called MAXIMUM PERFORMANCE.
So let's go.
To begin with, we check the compliance of our database with the necessary requirements.
1. We look, in what mode our main database works:
SQL> select name, open_mode, log_mode from v$database;
NAME OPEN_MODE LOG_MODE
--------- ---------- ------------
TEST READ WRITE ARCHIVELOG
If you do not see the ARCHIVELOG value in the LOG_MODE field, execute the following commands:
SQL> shutdown immediate;
SQL> startup mount;
SQL> alter database archivelog;
SQL> alter database open;
2. Check the presence of the password file:
SQL> select * from v $ pwfile_users;
USERNAME SYSDB SYSOP
------------------------------ ----- -----
SYS TRUE TRUE
If you do not see this result, create the necessary file:
$ orapwd file=$ORACLE_HOME/dbs/orapw$ORACLE_SID password=xxxxxxxx force=y
Instead of 'xxxxxxxx', you must insert the current SYS user password.
3. Enable the
force logging mode:
SQL> alter database force logging;
Moving on to configuring our system. To begin, we will perform the necessary settings on the main base. We will save all the data in the
/ data / backup directory.
Create
standby redo logs . They are only needed on a standby base for recording data stored in redo logs on the main base. We will need them on the main base when we switch it to standby mode and at the same time use real-time apply redo. The standby redo logs files should be the same size as the online redo logs. You can see the size of online redo logs with the command:
SQL> select bytes from v$log;
BYTES
----------
52428800
52428800
52428800
We look, what groups for redo logs are in our database:
SQL> select group# from v$logfile;
GROUP#
----------
1
2
3
Create standby redo logs:
SQL> alter database add standby logfile group 4 '/oradata/test/stnbylog01.log' size 50m;
Database altered.
SQL> alter database add standby logfile group 5 '/oradata/test/stnbylog02.log' size 50m;
Database altered.
SQL> alter database add standby logfile group 6 '/oradata/test/stnbylog03.log' size 50m;
Database altered.
Create a file with the parameters of our instance (pfile). We will take into account that our main base can be switched to standby mode, and this requires setting parameters that will be used only in standby mode.
SQL> create pfile='/data/backup/pfilePROD.ora' from spfile;
We need to add some parameters to the resulting file if they are not there:
db_name = 'test' is the name of our database (the same for the main and standby instance).
db_unique_name = 'testprod' - and this is a unique name for each instance, it will not change when changing roles from standby to production.
l
og_archive_config = 'dg_config = (testprod, teststan)' - we define the names of the instances between which the exchange of logs will take place.
log_archive_dest_1 = 'SERVICE = teststan LGWR ASYNC VALID_FOR = (ONLINE_LOGFILES, PRIMARY_ROLE) db_unique_name =' teststan ' - when the instance is the main base (PRIMARY_ROLE), we will transfer archive logs to a standby server using the LGWR process. The ASYNC parameter indicates that the data generated by the transaction need not be received on standby before the completion of the transaction — this will not cause the main database to stop if there is no connection to standby.
log_archive_dest_2 = 'LOCATION = / oradata / test / archive VALID_FOR = (ALL_LOGFILES, ALL_ROLES) db_unique_name = testprod' - here we specify the directory where the archive logs will be saved locally (for the main database) or where the logs coming from the main database will be saved (for standby base).
l
og_archive_dest_state_1 = ENABLE -
enable the recording of archive logs in log_archive_dest_1. Until we have created a standby database, this parameter can be set to the DEFER value, if we do not want to see unnecessary messages about the inaccessibility of the standby database in the alert_log.
log_archive_dest_state_2 = ENABLE -
enable the recording of archive logs in log_archive_dest_2.
fal_client = 'testprod' - this parameter specifies that when the instance goes into standby mode, it will be the client for receiving the archive logs (fetch archive log).
fal_server = 'teststan' - determines the FAL (fetch archive log) server from which the archive logs will be transferred. The
fal_client and
fal_server parameters work only when the database is running in standby mode.
standby_file_management = 'AUTO' - we set the automatic file management mode in standby mode. With this parameter value, all the created or deleted files of the main database will be automatically created or deleted on the standby base.
If we still want to place our standby database in directories other than those in which the main database is located, we will need additional parameters:
db_file_name_convert = '/ oradata_new / test', '/ oradata / test' - this parameter indicates that in the names of the data files that will be created in the standby database (i.e. when our main instance starts operating in standby mode), you need to change paths with '/ oradata_new / test' to '/ oradata / test'.
log_file_name_convert = '/ oradata_new / test / archive', '/ oradata / test / archive' - this parameter indicates that in the names of the log files that will be created in the standby database, you need to change the paths from '/ oradata_new / test / archive' to '/ oradata / test / archive'.
As a result, the parameter file for the main database, among other things, should have the following entries:
# PRIMARY STANDBY
db_name='test'
db_unique_name='testprod'
log_archive_config='dg_config=(testprod,teststan)'
log_archive_dest_1='SERVICE=teststan LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name='teststan' log_archive_dest_2='LOCATION=/oradata/test/archive VALID_FOR=(ALL_LOGFILES,ALL_ROLES) db_unique_name=testprod'
log_archive_dest_state_1=ENABLE
log_archive_dest_state_2=ENABLE
# STANDBY
fal_client='testprod'
fal_server='teststan'
standby_file_management='AUTO'
If there is such an opportunity, we restart the main database with new parameters and create a new
spfile based on the
pfile we have
revised :
SQL> shutdown immediate;
SQL> startup nomount pfile='/data/backup/pfilePROD.ora';
SQL> create spfile from pfile='/data/backup/pfilePROD.ora';
SQL> shutdown immediate;
SQL> startup;
If we are unable to stop the main database during our manipulations, we will have to make changes to the current configuration using ALTER SYSTEM.
Here we must
bear in mind that we will not be able to change
db_unique_name on a working base. Therefore, we will have to use the current name in the configuration. You can view it using the command:
SQL> show parameter db_unique_name
NAME TYPE VALUE
------------------------------------ ----------- --------------------------
db_unique_name string test
Set the necessary parameters:
SQL> alter system set log_archive_config='dg_config=(test,teststan)';
System altered.
We set the places for recording archive logs. On a working base, we will not be able to correct the parameter
log_archive_dest_1 , if it is set. Therefore, we only add the direction of copying to the standby base:
SQL> alter system set log_archive_dest_2='SERVICE=teststan LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name=teststan';
System altered.
SQL> alter system set log_archive_dest_state_2=ENABLE;
System altered.
SQL> alter system set FAL_SERVER=teststan;
System altered.
SQL> alter system set FAL_CLIENT=test;
System altered.
SQL> alter system set standby_file_management='AUTO';
System altered.
Add to tnsnames.ora a record about the standby database:
TESTSTAN =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = standbysrv)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = teststan)
)
)
It's time to create backups (if there are none). For this we will use the rman utility.
It is necessary that the place where the backup is located, from which we will deploy the standby base, was exactly the same as the place where we saved this backup. Those. if we add the backup to the '/ data / backup' directory, then when restoring the database on the standby-server, rman will search for the backup data in the same directory. To solve this problem, there are two obvious ways: copy the backup data from the main server to standby in the exact same directory created there, or use a network resource for backup, which is equally mounted on both servers.
Run rman on the main server:
$ rman target /
An interesting point for the case when Oracle is installed on Linux. If you have the PolyglotMan (RosettaMan) package installed, then when you try to
$ rman target /
an error may occur:
rman: can't open target
This situation occurs if the path to the
rman executable file of this package - (for example, / usr / X11R6 / bin / rman) in the $ PATH environment variable is located earlier than the path to the Oracle
rman . Those. we are trying to run rman from the PolyglotMan package and pass as a parameter the target file, which, naturally, it cannot open.
Create a control file for standby database:
RMAN> backup current controlfile for standby format '/data/backup/standbycontrol.ctl';
We create a backup of our main database and archives:
RMAN> run
2> {
3> allocate channel c1 device type disk format '/data/backup/%u';
4> backup database plus archivelog;
5> }
Here we may face trouble if, for some reason, we do not have a complete set of archival journals (for example, they have been deleted). Then rman will give the error:
RMAN-20242: Specification does not match any archivelog in the recovery catalog
To remedy the situation, it is necessary to check and change the statuses of the archive logs in the rman repository. To do this, run the following command:
RMAN> change archivelog all crosscheck;
If the backup was successful, copy the contents of the
/ data / backup / directory to the standby server (if we did not use the network share for the backup) and proceed to creating an instance on the standby server.
First we need to install Oracle on a standby server without creating an instance of the database. To facilitate the future life, the path to $ ORACLE_HOME on a standby server should be the same as on the main one. We also install all the patches that were installed on the main server to fully match the versions of Oracle.
We create a configuration of listener-and net service names.
Since we will use rman running on the combat server to deploy a copy of the main database on the standby server, and we will have the nomby base instance in standby mode, we need to explicitly set the service in listener.ora, otherwise all attempts to connect from rman to future standby how to auxiliary will be blocked.
As a result,
listener.ora should look something like this:
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /oracle)
(PROGRAM = extproc)
)
(SID_DESC =
(GLOBAL_DBNAME = teststan)
(ORACLE_HOME = /oracle)
(SID_NAME = test)
)
)
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = standbysrv)(PORT = 1521))
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC0))
)
)
It should be noted that the SID_NAME parameter in this case is case sensitive, since listener will look for a password file named orapw $ SID_NAME.
By the way, now is the time to copy the password file ($ ORACLE_HOME / dbs / orapw $ ORACLE_SID) from the main server to standby.
We should also register our main and standby database in
tnsnames.ora :
TEST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = standbysrv)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = teststan)
)
)
TESTPROD =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = productionsrv)(PORT = 1521))
)
(CONNECT_DATA =
(SID = test)
)
)
Because it is understood that the applications “know” our base under the name
test , then for the standby base we set the SID test.
Do not forget to restart the
listener :
$ORACLE_HOME/bin/lsnrctl stop
$ORACLE_HOME/bin/lsnrctl start
Now we create a directory structure for our database. Here it is important not to forget that you need to create all the directories for storing data files and logs, as well as the directories
adump ,
bdump ,
cdump ,
dpdump ,
udump , usually located in $ ORACLE_HOME / admin / $ ORACLE_SID.
If you do not want to save the base database directory structure on standby, you need to create directories according to the values of the
db_file_name_convert and
log_file_name_convert parameters.
We also need to create a file of parameters for standby based on the parameters files of the main base. To do this, we will overwrite the
pfilePROD.ora file on the standby server, renaming it to
pfileSTAN.ora , and make the necessary corrections to the part we edited earlier:
# PRIMARY STANDBY
db_name='test'
db_unique_name='teststan'
log_archive_config='dg_config=(testprod,teststan)'
log_archive_dest_1='SERVICE=testprod LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name='testprod' log_archive_dest_2='LOCATION=/oradata/test/archive VALID_FOR=(ALL_LOGFILES,ALL_ROLES) db_unique_name=teststan'
log_archive_dest_state_1=ENABLE
log_archive_dest_state_2=ENABLE
# STANDBY
fal_client='teststan'
fal_server='testprod'
standby_file_management='AUTO'
When placing the standby base in other directories, we also add the necessary parameters:
db_file_name_convert='/oradata/test','/oradata_new/test'
log_file_name_convert='/oradata/test/archive','/oradata_new/test/archive'
It's time to start a standby copy of the database:
SQL> startup nomount pfile='/data/backup/pfileSTAN.ora';
SQL> create spfile from pfile='/data/backup/pfileSTAN.ora';
SQL> shutdown immediate;
SQL> startup nomount;
We expand standby base from backup. To do this, go to the main server and run
rman .
Connect to the future standby database and perform duplication (we remember that the data backup and control file are in the directory that is visible from the main server and standby as
/ data / backup ):
RMAN> connect auxiliary sys@teststan
RMAN> duplicate target database for standby nofilenamecheck dorecover;
We need the
nofilenamecheck parameter so that
rman does not
swear at duplicate file names (if we use the same directory structure on the main and standby servers).
If everything went well, then we put the system into automatic mode of applying transactions on a standby base.
Switch the log file and look at the last number of the archive log on the main database:
SQL> alter system switch logfile;
System altered.
SQL> select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
205
Now go to the standby server.
Check the status of the database:
SQL> select name,open_mode,log_mode from v$database;
NAME OPEN_MODE LOG_MODE
--------- ---------- ------------
TEST MOUNTED ARCHIVELOG
SQL> select recovery_mode from v$archive_dest_status;
RECOVERY_MODE
-----------------------
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
IDLE
11 rows selected.
SQL> select max(sequence#) from v$log_history;
MAX(SEQUENCE#)
--------------
202
We see that the last applied log on standby is lagging behind the main base, and also that ARCH processes do not work.
Check for standby redo logs:
SQL> select * from v$standby_log;
If not, create:
SQL> alter database add standby logfile group 4 '/oradata/test/stnbylog01.log' size 50m;
Database altered.
SQL> alter database add standby logfile group 5 '/oradata/test/stnbylog02.log' size 50m;
Database altered.
SQL> alter database add standby logfile group 6 '/oradata/test/stnbylog03.log' size 50m;
Database altered.
We transfer our standby base to the Real-time apply redo mode:
SQL> alter database recover managed standby database using current logfile disconnect;
Look what happened:
SQL> select recovery_mode from v$archive_dest_status;
RECOVERY_MODE
-----------------------
MANAGED REAL TIME APPLY
MANAGED REAL TIME APPLY
MANAGED REAL TIME APPLY
MANAGED REAL TIME APPLY
MANAGED REAL TIME APPLY
MANAGED REAL TIME APPLY
MANAGED REAL TIME APPLY
MANAGED REAL TIME APPLY
MANAGED REAL TIME APPLY
MANAGED REAL TIME APPLY
MANAGED REAL TIME APPLY
11 rows selected.
SQL> select max(sequence#) from v$log_history;
MAX(SEQUENCE#)
--------------
205
As you can see, everything works.
If we do not want to use Real-time apply redo mode, but want to wait until the formation of the next archive log on the main server is completed and it will be transferred to standby for applying the transactions stored in it, then we need to transfer our standby base to redo apply mode by command :
SQL> alter database recover managed standby database disconnect;
If something went wrong, then to solve the problem, first of all you need to stop the "rolling" of the logs:
SQL> alter database recover managed standby database cancel;
It is possible that during the duplication process not all archive logs were transferred to the standby server. Then they must be manually copied to the standby server (in our case to the / oradata / test / archive directory), and manually rolled:
SQL> recover standby database;
and then again start the Real-time mode, apply redo:
SQL> alter database recover managed standby database using current logfile disconnect;
The processes of switching roles between instances (switchover) and transferring a standby base to primary mode in the event of a failure of the main base (failover) have many of their pitfalls, so this is a topic for a separate article.