Quite often such a nuisance happens that messages like Checkpoint not complete are added to the alert.log database. Standard advice in this case: “increase the number and / or size of redo logs”. And then the question is, who are these redo logs and what they eat with.
A bit of theory
So. When an application requests data, the database stores it in a buffer cache — the memory area in the SGA. When the data changes, the database does not make changes directly in the data file, but in the buffer cache. At the same time, a separate memory area — the redo log buffer — records information on which, if necessary, it will be possible to repeat the change that has occurred. When a change is committed, it, again, is not immediately dumped into the data file, but the information from the redo log buffer is reset to the online redo log — specifically for this intended file. Until the change is recorded in the data file, it is necessary to store information about it somewhere on the disk in case the base falls. If, for example, the server power is turned off, then, of course, all data stored in memory will be lost. In this case, the redo log is the only place where information about the change is stored. After restarting the Oracle database, it will actually repeat the past transaction, change the necessary blocks again and make a commit. Therefore, until the information from the redo log is reset to the data file, this redo log cannot be reused.
The DBWn special background database process frees the buffer cache as needed, and also executes a checkpoint event. A checkpoint is an event during which “dirty” (modified) blocks are written to data files. The checkpoint process (CKPT) is responsible for the checkpoint event, which writes information about the checkpoint in the control file (about what the control file is another time) and the headers of the data files.
A checkpoint event ensures data consistency and quick base recovery. Data recovery is accelerated, because all changes to the control point are written to the data files. This eliminates the need to apply redo logs generated to the checkpoint during recovery. The checkpoint ensures that all modified blocks in the cache are actually written to the appropriate data files.
')
There are several types of control points.
- Thread checkpoins. All changes that have occurred within a particular instance up to a certain point are written to the data file in succession. They happen in the following situations:
- full base stop;
- alter system checkpoint;
- switching online redo log;
- alter database begin backup.
- Checkpoints of data files and table spaces. Occurs when operations with tablespaces and data files occur (alter tablespace offline, alter tablespace read only, stinging a data file, etc.)
- Incremental checkpoints. Subtype of the control point of the instance, designed to avoid writing to the disk a huge number of blocks during the switching redo logs. The DBWn process checks at least once every three seconds whether new dirty blocks have appeared for writing to disk. If they appear, they are recorded in data files, the checkpoint label in redo shifts the log (so that you have to view fewer logs the next time), but the headers of the data files do not change.
Frequent checkpoint events provide faster recovery from failures, but can cause performance degradation. Depending on the number of data files in the system, a checkpoint event can be quite a resource-intensive operation, since at this moment all headers of all data files become inaccessible.
Parameters that affect the frequency of control point events and which can be adjusted if desired:
- FAST_START_MTTR_TARGET (how much time in seconds it will take to restore the database after a crash; if I understand something, then we are talking about the time it takes to use the online redo logs available).
- LOG_CHECKPOINT_INTERVAL (checkpoint event frequency — the allowable number of blocks of the online redo log file that were filled after the previous checkpoint; blocks are blocks in terms of the operating system, not the database).
- LOG_CHECKPOINT_TIMEOUT (maximum allowed number of seconds, between two checkpoint events).
- LOG_CHECKPOINTS_TO_ALERT (true / false; determines whether to reset the control point switch in alert.log; useful thing, it would be better to set it to true).
It makes sense to clarify that the parameters FAST_START_MTTR_TARGET and LOG_CHECKPOINT_INTERVAL, if you believe the documentation, are mutually exclusive.
Let’s look into our database now.
As already mentioned, a checkpoint event occurs when switching the online redo log. A good practice, according to metalink, is switching logs every twenty minutes. Too small online redo logs can increase the frequency of checkpoint events and reduce performance. Oracle recommends that the size of online redo log files be the same, and that there should be at least two log groups for each database instance.
To track the frequency of switching logs, you can look at the alert log.
An example of switching online redo logs.
Wed Nov 02 17:51:20 2011 Thread 1 advanced to log sequence 83 (LGWR switch) Current log# 2 seq# 83 mem# 0: D:\ORACLE\ORADATA\ORADB\REDO02.LOG Thread 1 advanced to log sequence 84 (LGWR switch) Current log# 3 seq# 84 mem# 0: D:\ORACLE\ORADATA\ORADB\REDO03.LOG
Sometimes the following errors can be found in alert.log.
Wed Nov 02 17:51:53 2011 Thread 1 cannot allocate new log, sequence 87 Checkpoint not complete Current log# 2 seq# 86 mem# 0: D:\ORACLE\ORADATA\ORADB\REDO02.LOG
This means that Oracle is going to reuse the online redo log, from which data has not yet been dumped into data files. In this case, all operations in the database are suspended (application performance deteriorates sharply), a checkpoint event is triggered, and “dirty” blocks are immediately flushed to disk. If such errors occur occasionally, then perhaps nothing catastrophic in this. However, if they become permanent, then it’s time to think about changing the size and number of redo logs.
From cookbook. How to change the size and / or amount of online redo logs
1. To start, just look at the state of the logs.
SQL> select group
Let's try to increase the size of the logs to 100M.
2. Let's look directly at the redo log files.
SQL> select group
3. Create three new log groups of 100M each.
SQL> alter database add logfile group 4 'D:\ORACLE\ORADATA\ORADB\REDO04.LOG' size 100M; Database altered. SQL> alter database add logfile group 5 'D:\ORACLE\ORADATA\ORADB\REDO05.LOG' size 100M; Database altered. SQL> alter database add logfile group 6 'D:\ORACLE\ORADATA\ORADB\REDO06.LOG' size 100M; Database altered.
Let's see what happened
SQL> select group
4. Now you can delete the old (too small) logs. To do this, you need to be active log from the newly created groups. If in the present situation it is not, then you can use the command
SQL> alter system switch logfile;
Now, with a clear conscience, remove the extra logs
SQL> alter database drop logfile group 1; Database altered. SQL> alter database drop logfile group 2; Database altered. SQL> alter database drop logfile group 3; alter database drop logfile group 3 * ERROR at line 1: ORA-01624: log 3 needed for crash recovery of instance oradb (thread 1) ORA-00312: online log 3 thread 1: 'D:\ORACLE\ORADATA\ORADB\REDO03.LOG'
When deleting the last log, there was a situation when it was impossible to delete the log, because data from it has not yet been dumped into data files. You can force it with the command
SQL> alter system checkpoint; System altered.
Then we boldly try again.
SQL> alter database drop logfile group 3; Database altered.
5. We will check if everything worked out for us.
SQL> select group
6. Now is the time to backup the database. So, just in case.
7. Now you can delete unnecessary files of the operating system.
D:\> del D:\oracle\oradata\oradb\REDO01.LOG D:\> del D:\oracle\oradata\oradb\REDO02.LOG D:\> del D:\oracle\oradata\oradb\REDO03.LOG
What to do if we have installed Oracle RAC.
In principle, everything is the same. Taking into account the fact that for each instance there is a separate set of redo logs and you will have to operate with them separately.
The ALTER SYSTEM CHECKPOINT LOCAL command only works with the instance that you are currently connected to. To trigger a checkpoint event for the entire base, you need to call ALTER SYSTEM CHECKPOINT or ALTER SYSTEM CHECKPOINT GLOBAL.
The ALTER SYSTEM SWITCH LOGFILE command only affects the instance that you are currently connected to. To switch online redo logs for the entire system, you can use the ALTER SYSTEM ARCHIVE LOG CURRENT command.
Create new online redo logs will have for each instance separately.
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 4 '+ASMGRP/ORADB/REDO04.LOG' SIZE 100M; ALTER DATABASE ADD LOGFILE THREAD 2 GROUP 4 '+ASMGRP/ORADB/REDO04.LOG' SIZE 100M;
By the way, the file name can not be specified. The base itself will call it in accordance with its own ideas about the beautiful.
By the way. You can duplicate online redo log files.
ALTER DATABASE ADD LOGFILE GROUP 7 ('D:\ORACLE\ORADATA\ORADB\REDO07.LOG','C:\ORACLE\ORADATA\ORADB\REDO07.LOG') SIZE 100M;
Why multiply? Because if for some reason the online redo log file is damaged or lost, then, having its intact and non-lost copy on another disk, recovery is a matter of two minutes. But if there is no copy, you will have to tinker (but the process of restoring lost redo logs is another story).
___________________________
Literature
Oracle Database 11.2 official documentation.Oracle Support nodes 147468.1 and 1035935.6
Thomas Kyte, Expert Oracle Database Architecture Architecture: 9i and 10g Programming Techniques and Solutions