What Is A Checkpoint?
What Is A Checkpoint?
A Checkpoint is a database event which synchronizes the modified data blocks in memory with the datafiles on
disk.
It offers Oracle the means for ensuring the consistency of data modified by transactions.
The mechanism of writing modified blocks on disk in Oracle is NOT synchronized with the COMMIT of the
corresponding transactions.
Because all database changes up to the checkpoint have been recorded in the datafiles,
The checkpoint must ensure that all the modified buffers in the cache are really written to the corresponding
datafiles to avoid the loss of data
- A shadow process must scan more than 1/4 of the db_block_buffer parameter.
- The database writer (DBWR) writes all modified database blocks in the buffer cache back to datafiles,
- Checkpoint process (ckpt) updates the headers of all the datafiles to indicate when the last checkpoint occurred
(SCN)
-------------------------------------
-------------------------------------
Checkpoints present a tuning dilemma for the Database Administrator. Frequent checkpoints will enable faster
recovery, but can cause performance degradation. How then should the DBA address this?
Depending on the number of datafiles in a database, a checkpoint can be a highly resource intensive operation,
since all datafile headers are frozen during the checkpoint. There is a performance trade-off regarding frequency
of checkpoints. More frequent checkpoints enable faster database recovery after a crash. This is why some
customer sites which have a very low tolerance for unscheduled system downtime will often choose this option.
However, the performance degradation of frequent checkpoints may not justify this philosophy in many cases.
Let's assume the database is up and running 95% of the time, and unavailable 5% of the time from infrequent
instance crashes or hardware failures requiring database recovery. For most customer sites, it makes more sense
to tune for the 95% case rather than the rare 5% downtime.
Due to the performance reasons If your goal is to minimize the frequency of checkpoints through tuning.
- FAST_START_MTTR_TARGET
- LOG_CHECKPOINT_INTERVAL
- LOG_CHECKPOINT_TIMEOUT
- LOG_CHECKPOINTS_TO_ALERT
"checkpoint not complete" messages found in the alert log, which indicate a need to tune redo logs and
checkpoints.
3. Parameters related to incremental checkpointing
Note: Log file switches will always override checkpoints caused by following paarameters.
FAST_START_MTTR_TARGET
--------------------------------------------
- FAST_START_MTTR_TARGET enables you to specify the number of seconds the database takes to perform crash
recovery of a single instance. Based on internal statistics, incremental checkpoint automatically adjusts the
checkpoint target to meet the requirement of FAST_START_MTTR_TARGET.
V$INSTANCE_RECOVERY.TARGET_MTTR shows the effective MTTR target in seconds enforced by the system.
V$MTTR_TARGET_ADVICE shows the number of I/Os resulted by the current workload under the current MTTR
setting and the estimated number of I/Os that would be resulted by the current workload under other MTTR
settings. This view helps the user to assess the trade-off between runtime performance and setting
FAST_START_MTTR_TARGET to achieve better recovery time.
LOG_CHECKPOINT_INTERVAL
-------------------------------------------
LOG_CHECKPOINT_INTERVAL parameter specifies the maximum number of redo blocks the incremental
checkpoint target should lag the current log tail.
On most Unix systems the operating system block size is 512 bytes. This means that setting
LOG_CHECKPOINT_INTERVAL to a value of 10,000 would mean the incremental checkpoint target should not lag
the current log tail by more than 5,120,000 (5M) bytes. .
If the size of your redo log is 20M, you are taking 4 checkpoints for each log.
LOG_CHECKPOINT_INTERVAL influences when a checkpoint occurs, which means careful attention should be given
to the setting of this parameter, keeping it updated as the size of the redo log files is changed.
The checkpoint frequency is one of the factors which impacts the time required for the database to recover from
an unexpected failure.
Longer intervals between checkpoints mean that if the system crashes, more time will be needed for the database
to recover.
Shorter checkpoint intervals mean that the database will recover more quickly, at the expense of increased
resource utilization during the checkpoint operation.
This parameter also impacts the time required to complete a database recovery operation during the roll forward
phase of recovery. The actual recovery time is dependent upon this time, and other factors, such as the type of
failure (instance or system crash, media failure, etc.), and the number of archived redo logs which need to be
applied. setting and the estimated number of I/Os that would be resulted by the current workload under other
MTTR settings.
This view helps the user to assess the trade-off between runtime performance and setting
FAST_START_MTTR_TARGET to achieve better recovery time.
LOG_CHECKPOINT_TIMEOUT
-------------------------------------------
The LOG_CHECKPOINT_TIMEOUT parameter specifies the maximum number of seconds the incremental
checkpoint target should lag the current log tail.
In another word, it specifies how long a dirty buffer in buffer cache can remain dirty.
Checkpoint frequency impacts the time required for the database to recover from an unexpected failure.
Longer intervals between checkpoints mean that more time will be required during database recovery.
Oracle recommends using LOG_CHECKPOINT_INTERVAL to control the checkpoint interval rather than
LOG_CHECKPOINT_TIMEOUT, which will initiate a checkpoint
every "n" seconds, regardless of the transaction frequency. This can cause unnecessary checkpoints in cases
where transaction volumes vary.
It is a misconception that setting LOG_CHECKPOINT_TIMEOUT to a given value will initiate a log switch at that
interval, enabling a recovery window used for a stand-by database configuration.
Log switches cause a checkpoint,but a checkpoint does not cause a log switch.
The only way to cause a log switch is manually with ALTER SYSTEM SWITCH LOGFILE or resizing the redo logs to
cause more frequent switches.
This is controlled by operating system blocks, not a timed interval.
Sizing of the online redo logs is critical for performance and recovery.
LOG_CHECKPOINTS_TO_ALERT
------------------------------------------------
Doing so is useful for determining whether checkpoints are occurring at the desired frequency.
Oracle generally advises this be set to TRUE as the overhead is negligible but the information in the alert log may
be useful.
--------------------------------------
--------------------------------------
A checkpoint occurs at every log switch. If a previous checkpoint is already in progress, the checkpoint forced by
the log switch will override the current checkpoint.
This necessitates well-sized redo logs to avoid unnecessary checkpoints as a result of frequent log switches.
The lag between the incremental checkpoint target and the log tail is also limited by 90% of the smallest online log
file size. This makes sure that in most cases log switch would not need to wait for checkpoint. Because of this, log
file sizes should be configured large enough.
A good rule of thumb is to switch logs at most every twenty minutes. Having your log files too small can increase
checkpoint activity and reduce performance.
Oracle recommends the user to set all online log files to be the same size, and have at least two log groups per
thread.
The alert log is a valuabletool for monitoring the rate that log switches occur, and subsequently, checkpoints occur.
The following is an example of quick log switches from the alert log:
If redo logs switch every 3 minutes, you will see performance degradation.
Redo log switching should occur approximately every 20-30 minutes to get better database performance.
If they switch too frequently, check the current size of redo log files from V$LOG and based on how frequently it is
switching on an average, you will need to create redo log groups with bigger size and then switch the redo log a
few times with “ALTER SYSTEM SWITCH LOGFILE;” so that the CURRENT redo log is one of the bigger redo log file
and then drop the redo log groups with smaller size with ALTER DATABASE DROP LOGFILE … command.
Sometimes, you can see in your alert.log file, the following corresponding messages:
This message indicates that Oracle wants to reuse a redo log file, but the current checkpoint position is still in that
log. In this case, Oracle must wait until the checkpoint position passes that log. Because the incremental
checkpoint target never lags the current log tail by more than 90% of the smallest log file size, this situation may
be encountered if DBWR writes too slowly, or if a log switch happens before the log is completely full, or if log file
sizes are too small. When the database waits on checkpoints,redo generation is stopped until the log switch is
done.
Caution
============
In the case when one or more of the below wait events is repeated frequently with considerable values then you
need to take an action like adding More online redo log files or increasing their sizes and/or modifying
checkpointing parameters.
log switch/archive