0% found this document useful (0 votes)
51 views30 pages

Handling Oracle Block Corruptions (Doc ID 28814.1)

The document provides guidelines for handling block corruptions in Oracle databases across various versions. It outlines steps to assess the corruption extent, identify affected objects, and choose appropriate recovery options, including RMAN Block Media Recovery and data salvage methods. Additionally, it emphasizes the importance of documenting all actions and evidence related to the corruption for effective resolution.

Uploaded by

Nizar Kraiem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views30 pages

Handling Oracle Block Corruptions (Doc ID 28814.1)

The document provides guidelines for handling block corruptions in Oracle databases across various versions. It outlines steps to assess the corruption extent, identify affected objects, and choose appropriate recovery options, including RMAN Block Media Recovery and data salvage methods. Additionally, it emphasizes the importance of documenting all actions and evidence related to the corruption for effective resolution.

Uploaded by

Nizar Kraiem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Handling Oracle Block Corruptions (Doc ID 28814.

1) To Bottom

Applies to:
Oracle Database - Enterprise Edition - Version 7.0.16.0 to 12.1.0.2 [Release 7.0 to 12.1]
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A
and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Information in this document applies to any platform.

Purpose
Handling Oracle Block Corruptions in Oracle7/8/8i/9i/10g/11g/12c

Details
Contents

Introduction

Overview of Steps to handle a Corruption

Corruption due to NOLOGGING or UNRECOVERABLE

(1) Determine the Extent of the Corruption Problem

(2) Replace or Move Away from Suspect Hardware

(3) Information to record for each Corruption

(4) Which Objects are Affected ?

Options for various Segment Types:

CACHE CLUSTER INDEX PARTITION INDEX LOBINDEX


LOBSEGMENT ROLLBACK

TABLE PARTITION TABLE TEMPORARY IOT TYPE2 UNDO Other


Segment Types No Segment

(5) Choosing a Recovery Option

(5A) Complete Recovery


RMAN Block Media Recovery ,

Datafile Recovery ,

Database Recovery ,

After Complete Recovery

(5B) Recreating Indexes

(5C) Salvaging Data from Tables

Methods of extracting data from a corrupt table AROUND a corrupt block

Methods of extracting data from a table with a corrupt LOBSEGMENT


block

Extracting data from the corrupt block itself

(5D) Leaving the Corruption in Place

Warnings when Leaving a Corruption in Place

(5E) Last Options

Introduction

This article discusses how to handle one or more block corruptions on an Oracle datafile
and describes the main actions to take to deal with them. Please read the complete article
before taking any action.
This note does not cover memory corruption issues (typically ORA-600 [17xxx] type
errors).

Note: If the problem is an ORA-1578 on STARTUP then please contact Oracle Support
Services for advice.

This article can be referred from many places for many forms of error - it is important to
have the following information for each corrupt block:

 An absolute FILE NUMBER of the file containing the corrupt block.


Referred to as "&AFN" in this article.
 The file name of the file containing the corrupt block.
Referred to as "&FILENAME" in this article.
If the FILE NUMBER is known but not its name then V$DATAFILE can be used
to get the file name:

SELECT name FROM v$datafile WHERE file#=&AFN;

If the file number does not appear in V$DATAFILE in Oracle8i


AND &AFN is greater than the DB_FILES parameter value then it is probably a
TEMPFILE. In this case the filename can be found using:

SELECT name FROM v$tempfile WHERE file#=(&AFN - &DB_FILES_value);

 The BLOCK NUMBER of the corrupt block in that file.


Referred to as "&BL" in this article.
 The tablespace number and name containing the affected block.
Referred to as "&TSN" (tablespace number) and "&TABLESPACE_NAME" in
this article.
If they are not known then they can be found using:

SELECT ts# "TSN" FROM v$datafile WHERE file#=&AFN;


SELECT tablespace_name FROM dba_data_files WHERE file_id=&AFN;

 The block size of the tablespace where the corruption lies.


Referred to as "&TS_BLOCK_SIZE" in this article.

SELECT block_size FROM dba_tablespaces


WHERE tablespace_name =
(SELECT tablespace_name FROM dba_data_files WHERE file_id=&AFN);

For Oracle 7, 8.0 and 8.1 every tablespace in the database has the same block size.
For these versions, execute "SHOW PARAMETER DB_BLOCK_SIZE" and use
this value as the &TS_BLOCK_SIZE.

Example: For the ORA-1578 error:

ORA-01578: ORACLE data block corrupted (file # 7, block # 12698)


ORA-01110: data file 22: '/<path>/<datafilename>.dbf'
then:

&AFN is "22" (from the ORA-1110 portion of the error)


&RFN is "7" (from the "file #" in the ORA-1578)
&BL is "12698" (from the "block #" in the ORA-1578)
&FILENAME is '/<path>/<datafilename>.dbf'
&TSN etc.. should be determined from the above SQL

For other errors (ORA-600 , ORA-1498 etc...) the above values should either be given
by Oracle Support, or be given from the article which covers the relevant error.

Some errors, such as ORA-1410 "invalid ROWID" , ORA-12899 "value too large for
column" etc.., do not give details of the corrupt file / block. For such cases
Note:869305.1 may help in locating the corrupt row.

{go to content}
Overview of Steps to handle a Corruption

There are many possible causes of a block corruption including:

o
o Bad IO hardware / firmware
o OS problems
o Oracle problems
o Recovering through "UNRECOVERABLE" or "NOLOGGING" database
actions
(in which case ORA-1578 is expected behaviour - see below)

The point in time when an Oracle error is raised may be much later than when any
corruption initially occurred.

As the root cause is not usually known at the time the corruption is encountered, and as
in most cases the key requirement is to get up and running again, then the steps used
tackle corruption problems in this article are:

1. Determine the extent of the corruption problems and also determine if the
problems are permanent or transient.

If the problem is widespread or the errors move about then focus on


identifying the cause first (check hardware etc..). This is important as there is
no point recovering a system if the underlying hardware is faulty.

2. Replace or move away from any faulty or suspect hardware.


3. Determine which database objects are affected.
4. Choose the most appropriate database recovery / data salvage option.

For all steps above it is sensible to collect evidence and document exactly what
actions are being taken. The 'Evidence>>' tags in this article list the information
which should be collected to assist with identifying the root cause of the problem.

Corruption due to NOLOGGING or UNRECOVERABLE

If a NOLOGGING (or UNRECOVERABLE) operation is performed on an object and


the datafile containing that object is subsequently recovered then the data blocks affected
by the NOLOGGING operation are marked as corrupt and will signal an ORA-1578 error
when accessed.
In Oracle8i and greater an ORA-26040 is also signalled ("ORA-26040: Data block was
loaded using the NOLOGGING option" ) which makes the cause fairly obvious, but
earlier releases have no additional error message. If a block is corrupt due to recovery
through a NOLOGGING operation then use this article from Section 3 "Information to
Record for Each Corruption" onwards but note that:

a.
a. Recovery cannot retrieve the NOLOGGING data
b. No data is salvagable from inside the block
b.
c. Solution
d. For Solution reference Note:794505.1

(1) Determine the Extent of the Corruption Problem

Reference Note 836658.1 for identifying the corruption extension. It is a good idea to
scan affected files (and any important files) with RMAN or DBVERIFY to check for
other corruptions in order to determine the extent of the problem. For details of using
DBVERIFY see Note:35512.1

Whenever a corruption error occurs note down the FULL error message/s and look in the
instance's alert log and trace files for any associated errors. It is important to do this first
to assess whether this is a single block corruption, an error due to an
UNRECOVERABLE operation or a more severe issue.

Once that a list of corrupt file/block combinations has been determined then the steps
below can be used to help in what action can be taken.

Evidence:

 Record the original error in full, along with details of the application which
encountered the error.
 Save an extract from the alert log from a few hours before the FIRST recorded
problem up to the current point in time.
 Save any tracefiles mentioned in the alert log.
 Record any recent OS problems encountered.
 Note if using any special features - Example: ASYNC IO, fast write disk options
etc..
 Record the current BACKUP position (Dates, Type etc...)
 Note if the database is in ARCHIVELOG mode or not
g: Issue "ARCHIVE LOG LIST" in SQL*Plus (or Server Manager)

(2) Replace or Move Away from Suspect Hardware

The vast majority of corruption problems are caused by faulty hardware.


If there is a hardware fault or a suspect component then it is sensible to either repair the
problem, or make disk space available on a separate disk sub-system prior to proceeding
with a recovery option.

One option is to move the datafiles using the following steps:


o Make sure the file to be relocated is either OFFLINE or the instance is in the
MOUNT state (not open)
o Physically restore (or copy) the datafile to its new location
example: /newlocation/myfile.dbf
o Tell Oracle the new location of the file.
example: ALTER DATABASE RENAME FILE '/oldlocation/myfile.dbf' TO
'/newlocation/myfile.dbf';
(Note that a TEMPFILE cannot be renamed - TEMPFILEs should be
dropped and recreated at the new location)

IMPORTANT: If there are multiple errors (which are NOT due to NOLOGGING)
OR There are OS level errors against the affected file
OR The errors are transient and keep moving about
then there is little point proceeding until the underlying problem has been addressed or
space is available on alternative disks.
Get the hardware vendor to check the system over and contact Oracle Support with details
of all errors.
Please note: Whilst a failed hardware check is a good indication that there is a hardware
issue, a successful hardware check should not be taken as proof that there is no hardware
related issue - it is very common for hardware tests to report success when there really is
some underlying fault.

If using any special IO options such as direct IO, async IO or similar it may be worth
disabling them in order to eliminate such options as a potential source of problems.

(3) Information to record for each Corruption

It is best to determine which objects are affected BEFORE making any decisions about
how to recover - this is because the corruption/s may be on object/s which can easily be
re-created.
Example: For a corruption on a 5 row lookup table it may be far quicker to drop and
recreate the table than to perform a recovery.

For each corruption collect the information in the following table.


The steps to do this are explained below.

Information to record for each Corruption


Relate
Absolut Relativ Block Segment Recover
Origina Tablespac Segmen d
e FILE# e File# # Owner.Nam y
l Error e t Type Object
&AFN &RFN &BL e Options
s
The notes below will help to fill in this table for each corruption.

1. "Original Error"This is the error as initially reported. Eg: ORA-1578 / ORA-1110 ,


ORA-600 with all arguments etc..
2. "Absolute File#", "Relative File#" and "Block#"The File# and Block# should have
been given either by the error, by Oracle Support, or by the steps in an error article
which directed to this article.
In Oracle8/8i/9i/10g: The absolute and relative file numbers are often the same but
can differ (especially if the database has been migrated from Oracle7). It is important
to get the correct numbers for &AFN and &RFN to avoid salvaging the wrong object
An ORA-1578 reports the RELATIVE file number, with the ABSOLUTE file number
given in the accompanying ORA-1110 error. ORA-600 errors normally provide the
absolute file number.
The following query will show the absolute and relative file numbers for datafiles in
the database:

SELECT tablespace_name, file_id "AFN", relative_fno "RFN" FROM dba_data_files;

In Oracle8i/9i/10g:
In addition to the notes above about Oracle8, Oracle8i onwards can have
TEMPFILES. The following query will show the absolute and relative file numbers
for tempfiles in the database:

SELECT tablespace_name, file_id+value "AFN", relative_fno "RFN" FROM


dba_temp_files, v$parameter WHERE name='db_files';

In Oracle 7: Use the same file number for both the "Absolute File#" and the "Relative
File#"

3. "Segment Type", "Owner", "Name" and "Tablespace"


The following query provides the object TYPE , OWNER and NAME of a segment
given the absolute file number "&AFN" and block number "&BL" of the corrupt block
- the database must be open in order to use this query:

SELECT *
FROM dba_extents
WHERE file_id = &AFN
and &BL between block_id AND block_id + blocks - 1;

If the block is in a TEMPFILE the above query will return no data.


For TEMPFILES the "Segment Type" will be "TEMPORARY".

If the above query does not return rows, it can also be that the corrupted block is a
segment header in a Locally Managed Tablespace (LMT). When the corrupted block
is a segment header block in a LMT, the above query produces a corruption message
in the alert.log but the query does not not fail. In that case use this query:

SELECT owner, segment_name, segment_type, partition_name


FROM dba_segments
WHERE header_file = &AFN
and header_block = &BL;

{go to content}

(4) Which Object is affected and possible Recovery options:

The related objects and recovery options which can be used depend on the
SEGMENT_TYPE. The additional queries and possible recovery options are listed below
for each of the most common segment types
CACHE
CLUSTER
INDEX PARTITION
INDEX
LOBINDEX
LOBSEGMENT
ROLLBACK
TABLE PARTITION
TABLE
TEMPORARY
IOT
TYPE2 UNDO
Some other Segment Type
"no rows" from the query

CACHE

If the segment type is CACHE, recheck what have been entered and if the SQL /
parameters are correct.

Solution:

Repair the Block with RMAN Block Media Recovery

{Continue} {Back to Segment List}

CLUSTER

If the segment is a CLUSTER determine which tables it contains.

Example: SELECT owner, table_name FROM dba_tables WHERE owner='&OWNER'


AND cluster_name='&SEGMENT_NAME';

Solution:

Repair the Block with RMAN Block Media Recovery

OR

Salvage data from all tables in the CLUSTER then recreate the CLUSTER and all its
TABLES. If the OWNER is "SYS" then contact Oracle support with all details.

As the CLUSTER may contain a number of TABLES, it is best to collect information for
each table in the cluster before making a decision.
{Collect TABLE information} {Back to Segment List}

INDEX PARTITION

If the segment is an INDEX PARTITION note the PARTITION NAME and OWNER and
then determine which partition is affected, example:

SELECT *
FROM dba_extents
WHERE file_id = &AFN AND &BL BETWEEN block_id AND block_id + blocks - 1

Solution:

Repair the Block with RMAN Block Media Recovery

OR

Index partitions can be rebuilt using:


ALTER INDEX xxx REBUILD PARTITION ppp;
(take care with the REBUILD option as described in "Recreating Indexes" below)

{Continue} {Back to Segment List}

INDEX

Find out which table the INDEX is on:

SELECT table_owner, table_name


FROM dba_indexes
WHERE owner='&OWNER' AND index_name='&SEGMENT_NAME';

Possible values for CONSTRAINT_TYPE are:


- P The index supports a primary key constraint.
- U The index supports a unique constraint.
If the INDEX supports a PRIMARY KEY constraint (type "P") then check if the primary
key is referenced by any foreign key constraints:

SELECT owner, constraint_name, constraint_type, table_name


FROM dba_constraints
WHERE r_owner='&TABLE_OWNER' AND r_constraint_name='&INDEX_NAME';

Solution:
Repair the Block with RMAN Block Media Recovery

OR

Recreate the Index (with any associated constraint disables/enables). If the OWNER is
"SYS" then contact Oracle support with all details.

{Continue} {Back to Segment List}

ROLLBACK

If the segment is a ROLLBACK segment contact Oracle support as rollback segment


corruptions require special handling.

Solution:

Repair the Block with RMAN Block Media Recovery

{Continue} {Back to Segment List}

TYPE2 UNDO

TYPE2 UNDO is a system managed undo segment which is a special form of rollback
segment. Corruptions in these segments require special handling.

Solution:

Repair the Block with RMAN Block Media Recovery

{Continue} {Back to Segment List}

TABLE PARTITION

If the segment is a TABLE PARTITION note the NAME and OWNER and then determine
which partition is affected thus:

SELECT partition_name
FROM dba_extents
WHERE file_id = &AFN
AND &BL BETWEEN block_id AND block_id + blocks - 1;

then continue below as if the segment was a TABLE segment.

Solution:

Repair the Block with RMAN Block Media Recovery


OR

If all corruptions are in the same partition then one option at this point is to EXCHANGE
the corrupt partition with an empty TABLE - this can allow the application to continue
(without access to the data in the corrupt partition) whilst any good data can then be
extracted from the table.
For other options see the TABLE options below.

{Continue} {Back to Segment List}

TABLE

If the OWNER is "SYS" then contact Oracle support with all details.
The database is likely to require recovery.

For a non-dictionary TABLE or TABLE PARTITIONs find out which iNDEXES exist on the
TABLE:

Example: SELECT owner, index_name, index_type


FROM dba_indexes
WHERE table_owner='&OWNER'
AND table_name='&SEGMENT_NAME';

and determine if there is any PRIMARY key on the table:

Example: SELECT owner, constraint_name, constraint_type, table_name


FROM dba_constraints
WHERE owner='&OWNER'
AND table_name='&SEGMENT_NAME'
AND constraint_type='P';

If there is a primary key then check if this is referenced by any foreign key constraints:

Example: SELECT owner, constraint_name, constraint_type, table_name


FROM dba_constraints
WHERE r_owner='&OWNER'
AND r_constraint_name='&CONSTRAINT_NAME';

Solution:

Repair the Block with RMAN Block Media Recovery

OR

Salvage data from the TABLE (or PARTITION). If the OWNER is "SYS" then contact
Oracle support with all details.

THEN Recreate the table (or partition)


OR Leave the corruption in place
(example: Use DBMS_REPAIR to mark the problem blocks to be skipped)

{Continue} {Back to Segment List}

IOT (INDEX ORGANIZED TABLE)

The corruption in IOT table should be handled in the same way as in a heap or
partitioned table.
The only exception is if the PK (Primary Key) is corrupted.
PK of an IOT table is the table itself and cannot be dropped and recreated.

Solution:

Repair the Block with RMAN Block Media Recovery

OR

Salvage data from the TABLE (or PARTITION) and Methods of extracting data from a
corrupt table AROUND a corrupt block. If the OWNER is "SYS" then contact Oracle
support with all details.

THEN
Recreate the table (or partition)
OR
Leave the corruption in place

(DBMS_REPAIR cannot be used with IOTs)

{Continue} {Back to Segment List}

LOBINDEX

FInd out which table the LOB belongs to:

SELECT table_name, column_name


fROM dba_lobs
wHERE owner='&OWNER'
AND index_name='&SEGMENT_NAME';

It is not possible to rebuild LOB indexes; treat the problem as a corruption on the LOB
column of the affected table.
Get index and constraint information for the table which has the corrupt LOB index using the
SQL in the TABLE section, then return here.

Solution:
Repair the Block with RMAN Block Media Recovery

OR

MOVE the LOB Segment which rebuilds the LOB index:

alter table &table_owner.&table_with_lob move LOB (&&lob_column) store as (tablespace


&tablespace_name);

If the OWNER is "SYS" then contact Oracle support with all details.

{Continue} {Back to Segment List}

LOBSEGMENT

Find out which table the LOB belongs to:

Example:
SELECT table_name, column_name
FROM dba_lobs
WHERE owner='&OWNER'
AND segment_name='&SEGMENT_NAME';

For non-dictionary tables (Owner != SYS) ...

Get index and constraint information for the table which has the corrupt LOB data using
the SQL in the TABLE section, then return here to find details of the exact rows affected.

Finding the exact row which references the corrupt LOB block can be a challenge as the
errors reported do not show any detail about which table row owns the lob entry which is
corrupt.

Typically one can refer to application logs or any SQL_TRACE or 10046 trace of a
session hitting the error (if available) or see if having

event "1578 trace name errorstack level 3"

set in the session helps identify the current SQL/binds/row.

Eg:
ALTER SYSTEM SET EVENTS '1578 trace name errorstack level 3';

Then wait for the error to be hit by the application and find the trace file.

If there are no clues then there is the option to construct a PLSQL block to scan the
problem table row by row extracting the LOB column data which loops until it hits an
error. Such a technique may take a while but it should be possible to get a primary key or
rowid of any row which references a corrupt LOB block.

Example:

set serverout on
exec dbms_output.enable(100000);
declare
error_1578 exception;
pragma exception_init(error_1578,-1578);
n number;
cnt number:=0;
badcnt number:=0;
begin
for cursor_lob in
(select rowid r, &LOB_COLUMN_NAME L from &OWNER..&TABLE_NAME)
loop
begin
n:=dbms_lob.instr(cursor_lob.L,hextoraw('AA25889911'),1,999999) ;
exception
when error_1578 then
dbms_output.put_line('Got ORA-1578 reading LOB at '||cursor_lob.R);
badcnt:=badcnt+1;
end;
cnt:=cnt+1;
end loop;
dbms_output.put_line('Scanned '||cnt||' rows - saw '||badcnt||' errors');
end;
/

another script more generic:

set serverout on
exec dbms_output.enable(100000);
declare
pag number;
len number;
c varchar2(10);
charpp number := 8132/2;

begin
for r in (select rowid rid, dbms_lob.getlength (<your_clob_column>) len
from <your_table_with_clcob_column>) loop
if r.len is not null then
for page in 0..r.len/charpp loop
begin
select dbms_lob.substr (<your_clob_column>, 1, 1+ (page * charpp))
into c
from <your_table_with_clcob_column>
where rowid = r.rid;

exception
when others then
dbms_output.put_line ('Error on rowid ' ||R.rid||' page '||page);
dbms_output.put_line (sqlerrm);
end;
end loop;
end if;
end loop;
end;
/

Solution:

Repair the Block with RMAN Block Media Recovery

OR

Update the lob column with empty_clob/empty_blob to avoid ORA-1578; this is cleaning
the LOB column in the table - note that the LOB is already corrupt and cannot be
accessed:

SQL> set concat off


SQL> update &table_owner.&table_with_lob
set &lob_column = empty_blob()
where rowid in ('list the identified rowids from the table from the plsql above');

Setting the corrupt lob to empty lob will add the blocks formerly mapped to this lob
to the freelist. Eventually when PCTVERSION or RETENTION criteria causes the
space to be salvaged and reused for new data, error ORA-1578 can be seen again in
the same LOB blocks. In that case and after applying the above procedure the lob
segment can be moved to a new segment:

alter table &table_owner.&table_with_lob move LOB (&&lob_column) store as (tablespace


&tablespace_name);

OR Salvage data from the TABLE (and its LOB column/s)


THEN Recreate the table
OR Leave the corruption in place
(It is not possible to use DBMS_REPAIR on LOB segments)

{Continue} {Back to Segment List}

TEMPORARY
If the segment type is TEMPORARY then the corruption does not affect a permanent object.
Check if the tablespace where the problem occurred is being used as a TEMPORARY
tablespace thus:

SELECT count(*) FROM dba_users WHERE


temporary_tablespace='&TABLESPACE_NAME';

Solution:

Use Note 1332088.1

No restore is normally required, although if the disk is suspect and the tablespace
contains useful data then a database recovery of the affected file/s may be wise.

{Continue} {Back to Segment List}

Some Other SEGMENT_TYPE

If the segment type returned is not covered above then contact Oracle support for advice with
all information collected so far.

{Continue} {Back to Segment List}

"NO ROWS SELECTED"

If there appears to be no extent containing the corrupt block then first double check the
figures used in the query. If sure that the file and block are correct and do not appear as
belonging to an object in DBA_EXTENTS then:

 Double check if the file involved is a TEMPFILE.


Note that TEMPFILE file numbers depend on the init.ora parameter DB_FILES so
any changes to this parameter change the absolute file number reported in errors.
 DBA_EXTENTS does not include blocks which are used for local space management
in locally managed tablespaces.
 If the database being queried is from a different point in time to the datafile with the
error then the problem object may have been dropped and so queries against
DBA_EXTENTS may show no rows.
 If the error being investigated was reported by DBVERIFY then DBV checks all
blocks regardless of whether they belong to an object or not. It is possible for a corrupt
block to exist in the datafile but in a block not in use by any segment.

Options:

An error on an UNUSED Oracle block (listed in dba_freespace) can be ignored as Oracle


will format the block once it's reused by any DML operation .
Note that a simple allocation of extent does not format the block, it is also necessary for
the DML operation to modify/use the block.
If you want to manually format the block you can refer How to Format Corrupted Block
Not Part of Any Segment Document 336133.1

If suspected that the block may be a space management block then use
DBMS_SPACE_ADMIN to help check this by running:

exec DBMS_SPACE_ADMIN.TABLESPACE_VERIFY('&TABLESPACE_NAME');

This should write inconsistencies to the trace file but if it encounters a fatally corrupt
block it will report an error like:

ORA-03216: Tablespace/Segment Verification cannot proceed

{Continue} {Back to Segment List}

{go to content}

EVIDENCE

For each corrupt block it is also a good idea to collect the following physical evidence if there
is a need to try and identify the actual cause of the corruption:

1. An operating system HEX dump of the bad block and the block either side of it.

On UNIX:

dd if=&FILENAME bs=&TS_BLOCK_SIZE skip=&BL-1 count=3 of=BL.dd


^^^^^^^^ ^^^^^^^^^^^^^^ ^^^

Example: For BL=1224:

dd if=ts11.dbf bs=4k skip=1223 count=3 of=1223_1225.dd

On VMS:

DUMP/BLOCKS=(start:XXXX,end:YYYY)/out=dump.out &FILENAME

Where XXXX=Operating system block number (in 512 byte chunks)


To calculate this multiply the block number reported by '&TS_BLOCK_SIZE/512'

2. If the database is in ARCHIVELOG mode, make a safe copy of the archived log files
around the time of the error, and preferably for a few hours before the error was
reported. Also secure any backup/s of the problem datafile from before the errors as
the before image PLUS redo can help point towards a cause.
(DBV can often be used to check if the problem exists in a backup copy of a file). The
ideal scenario is to have a datafile backup image which does not have any corruption
and all the redo from that point in time up to and just past the time when the
corruption is first reported.

3. Obtain an Oracle dump of the problem block/s:

ALTER SYSTEM DUMP DATAFILE '&FILENAME' BLOCK &BL;

(The output will go to a tracefile in the USER_DUMP_DEST).

{Continue} {Back to Segment List}

{go to content}

(5) Choosing A Recovery Option.

The best recovery option now depends on the objects affected. The notes in Section (3) above
should have highlighted the main options available for each affected object. The actual
recovery method chosen may include a mix or one or more methods thus:

Recovery Required ?

If the error is in a TEMPORARY tablespace, or is in a block which is no longer part of any


database object then no action is required, although it may be wise to relocate the problem
tablespace to a different storage device.

See Warnings

Is Complete Recovery an option ?

In order for complete recovery to be an option the following must be true:

 The database is in ARCHIVELOG mode

(The "ARCHIVE LOG LIST" command shows Archivelog Mode)

 There is a good backup of the affected files. Note that in some cases, the corruption
may have been present, but undetected, for a long period of time. If the most recent
datafile backup still contains the corruption, try an earlier backup as long as all the
necessary ARCHIVELOGS are available.
(DBV with the START / END options can be used to check if specific block/s in a
restored copy of a backup file are corrupt)
 All ARCHIVELOGS are available from the time of the backup to the current point in
time
 The current online log/s are available and intact
 The errors are NOT due to recovery through a NOLOGGING operation

When the above criteria are satisfied then complete recovery is usually the preferred
option

*BUT NOTE*

a. If the rollback of a transaction has seen a corrupt block on an object other than the
rollback segment itself then UNDO may have been discarded. In this case the indexes
may need to be rebuilt; check data integrity AFTER the recovery completes.

b. If the files to be recovered contain data from NOLOGGING operations performed


since the last backup then those blocks will be marked corrupt if datafile or database
recovery is used. In some cases this can put the problem in a worse scenario than the
current position.

If database recovery has already been performed and the corruption is still there then
either all the backups contain the corruption, the underlying fault is still present or the
problem is replaying through redo. In these cases choose some other recovery option.

See "(4A) Complete Recovery" for complete recovery steps.

{go to content}

Can the object be Dropped or Re-created without needing to extract any data from the
object itself?

It may be possible to lose the object, or to recreate it from a script / recent export. Once an
object is dropped then blocks in that object are marked as "free" and will be re-formatted
when the block gets allocated to a new object. It is advisable to RENAME rather than DROP
a table unless absolutely sure that the data in it is not needed.

In the case of a table partition then only the affected partition needs to be dropped.

example: ALTER TABLE ... DROP PARTITION ...

If the corruption affects the partition segment header, or the file containing the partition
header is offline, then DROP PARTITION may fail. In this case it may still be possible
to drop the partition by first exchanging it with a table of the same definition.

example: ALTER TABLE .. EXCHANGE PARTITION .. WITH TABLE ..;

The most common object which can be re-created is an index.


Always address TABLE corruptions before INDEX problems on a table.
See "(4B) Recreating Indexes" for more details.

For any segment, a quick way to extract the DDL for an object, when the absolute file
number and block number of the corrupt block are known, is:
set long 64000
select dbms_metadata.get_ddl(segment_type, segment_name, owner)
FROM dba_extents
WHERE file_id=&AFN
AND &BL BETWEEN block_id AND block_id + blocks -1;

{go to content}

Is it required to salvage data before recreating the object ?

If the problem is on a critical application table which is regularly updated then it may be
required to salvage as much data from the table as possible, then recreate the table.

See "(5C) Salvaging Data from Tables" for more details.

{go to content}

Is it acceptable to leave the corruption in place for the moment?

In some cases the best immediate option may be to leave the corruption in place and isolate it
from application access.

See "(5D) Leaving the Corruption In Place" for more details.

{go to content}

Last Options

Are any of the following possible ?

Recovery to an old point-in-time (via point in time recovery)of either the database or
tablespace point in time recovery

OR Restore of a COLD backup from before the corruption

OR Use of an existing export file

See "(5E) Last Options" for more details.

{go to content}

(5A) Complete Recovery

If the database is in ARCHIVELOG mode and there is a good backup of the affected files
then recovery is usually the preferred option.
This is not GUARANTEED to clear a problem, but is effective for the majority of corruption
issues. If recovery re-introduces the problem then return to the list of options above and
choose another method.
If using Oracle9i (or higher) then it may be possible to perform block level recovery using the
RMAN BLOCKRECOVER command.
If using an earlier Oracle release then either perform a Datafile Recovery (which can be done
while the rest of the database is still up and running), or Database Recovery (which requires
the database to be taken down) .

If using Oracle 11g (or higher) then it may be possible to use Data Recovery Advisor.

RMAN Block Media Recovery

As of Oracle9i RMAN allows individual blocks to be recovered whilst the rest of the database
(including other blocks in the datafile) are available for normal access. Note that block level
recovery can only be used to recover a block fully to the current point in time.

Example:

Consider having an ORA-1578 on file #6 block #30 which is likely due to a media
corruption problem and there is a good cold backup image of that file which has been
restored to '.../RESTORE/filename.dbf'.
Provided all archivelogs exist (in the default location) then RMAN can be used to
perform a Block Media Recovery (BMR) using a command sequence like:

rman nocatalog
connect target
catalog datafilecopy '.../RESTORE/filename.dbf';
run {blockrecover datafile 6 block 30;}

This uses the registered datafile backup image and any required archivelogs to perform
block recovery of just the one problem block to current point in time.

See the documentation for full details of the RMAN BLOCKRECOVER command and
limitations. See Note 144911.1 for an example and Note 342972.1 to perform Block
Media Recovery when backups are not taken by RMAN.

{go to content}

Datafile Recovery

Datafile recovery of a file involves the following steps. If there are several files repeat the
steps for each file or see "Database Recovery" below. These steps can be used if the database
is either OPEN or MOUNTED.

OFFLINE the affected data file:

ALTER DATABASE DATAFILE '&datafile_name' OFFLINE;

Copy it to a safe location (in case the backup is bad)


Restore the latest backup of the file onto a GOOD disk

Check the restored file for obvious corruptions with DBVERIFY


For details of using DBVERIFY see Note:35512.1

Assuming the restored file is OK, then RENAME the datafile to the NEW location (if
different from the old location)

ALTER DATABASE RENAME FILE '&old_name' TO '&new_name';

Recover the datafile:

RECOVER DATAFILE '&datafile_name';

Online the file/s

ALTER DATABASE DATAFILE '&datafile_name' ONLINE;

{Continue}

{go to content}

Database Recovery

Database recovery generally involves the following steps:

Shutdown (Immediate or Abort)

Copy the current copy of all files to be recovered to a safe location

Restore the backup files to a GOOD disk location


DO NOT RESTORE THE CONTROL FILES or ONLINE REDO LOG FILES

Check restored files with DBVERIFY. For details of using DBVERIFY see
Note:35512.1

STARTUP MOUNT

Rename any relocated files:

ALTER DATABASE RENAME FILE '&old_name' TO '&new_name';

Ensure all required files are online:

ALTER DATABASE DATAFILE '&datafile_name' ONLINE;

Recover the database:

RECOVER DATABASE
Open the database:

ALTER DATABASE OPEN;

{go to content}

After a Complete Recovery

Once a complete recovery has been performed it is advisable to check the database before
allowing it to be used:

Run:

"ANALYZE &table_name VALIDATE STRUCTURE CASCADE"

against each problem object to check for table/index mis-matches.

If there has been any UNDO discarded this may show a mismatch requiring indexes to be re-
created.

Check the logical integrity of data in the table at the application level.

{go to content}

(5B) Recreating Indexes

If the corrupt object is a user INDEX, simply drop and re-create it PROVIDED the underlying
table is not also corrupt.
If the underlying table is also corrupt it is advisable to sort out the TABLE before recreating
any indexes.

If the information collected shows that the index has dependent FOREIGN KEY constraints
then something like this may be needed:

 for each foreign key

ALTER TABLE &table_name DISABLE CONSTRAINT &pk_constraint;

 Rebuild the primary key using

ALTER TABLE &table_name DISABLE CONSTRAINT &pk_constraint;


DROP INDEX &index_name;
CREATE INDEX &index_name .. with appropriate storage clause
ALTER TABLE &table_name ENABLE CONSTRAINT &pk_constraint;

 Enable the foreign key constraints

ALTER TABLE &child_table ENABLE CONSTRAINT &fk_constraint;


For an index partition:

ALTER INDEX ... REBUILD PARTITION ...;

Notes:

a. It is important not to REBUILD a non-partitioned corrupt index using an "ALTER


INDEX .. REBUILD" command as this will usually try to build the new index from
the existing index segment, which contains a corrupt block.

"ALTER INDEX ... REBUILD ONLINE" and "ALTER INDEX ... REBUILD
PARTITION ..."

do not build the new index from the old index segment and so can be used.

b. Create INDEX can use the data from an existing index if the new index is a sub-set of
the columns in the existing index. Hence if there are 2 corrupt indexes drop them
BOTH before re-creating them.
c. Be sure to use the correct storage details when recreating indexes.

{go to content}

(5C) Salvaging Data from Tables

If the corrupt object is a TABLE or CLUSTER or LOBSEGMENT then it must be understood


that the data within the corrupt block is lost.
Some of the data may be salvageable from a HEX dump of the block, or from columns
covered by indexes.

Important:
As it may be required to salvage data in the corrupt block from the indexes it is a good idea
NOT to drop any existing index until any required data has been extracted

There are many ways to get data out of a table which contains a corrupt block. Choose the
most appropriate method as detailed below. The aim of these methods is to extract as much
data as possible from the table blocks which can be accessed. It is usually a good idea to
RENAME the corrupt table so that the new object can be created with the correct name.

Example: RENAME <emp> TO <emp_corrupt>;

Methods of extracting data from a corrupt table AROUND a corrupt block

1. From Oracle 7.2 onwards, it is possible to SKIP over corrupt blocks in a table.
This is the simplest option to extract table data and is discussed in:

Extracting data using DBMS_REPAIR.SKIP_CORRUPT_BLOCKS or Event 10231


Note:33405.1
If the corruption is in an IOT overflow segment then the same method should be
followed, but using event 10233 together with a full index scan or in 10.2.0.4 /
10.2.0.5 use Event:43810; in 11g onward use parameter
_index_scan_check_skip_corrupt=TRUE. See also section "SKIP ORA-600 in IOT"
in Note:1527738.1

Note that this method can only be used if the block "wrapper" is marked corrupt.
Example: If the block reports ORA-1578.
If the problem is an ORA-600 or other error which does not report and ORA-1578
error then it is often possible to use DBMS_REPAIR to mark the problem blocks in a
table as "soft corrupt" such that they will then signal ORA-1578 when accessed which
then allows to use DBMS_REPAIR.SKIP_CORRUPT_BLOCKS

Note: Any blocks which are marked corrupt by the "FIX_CORRUPT_BLOCKS"


procedure will also be marked corrupt following any restore / recover operation
through the time of the FIX_CORRUPT_BLOCKS.

Full details of using DBMS_REPAIR for this can be found in the documentation but
in summary the steps are:

- Use DBMS_REPAIR.ADMIN_TABLES to create the admin tables


- Use DBMS_REPAIR.CHECK_OBJECT to find problem blocks
- Get any good data out of problem blocks before corrupting them.
- Use DBMS_REPAIR.FIX_CORRUPT_BLOCKS to mark the found problem
blocks as corrupt so that they will then signal ORA-1578
- If required use DBMS_REPAIR.SKIP_CORRUPT_BLOCKS to skip corrupt
blocks on the table.

2. From Oracle 7.1 onwards use a ROWID range scan.


The syntax for this is a little tricky but it is possible to select around a corrupt block
using a ROWID hint.
As the format of ROWIDs changed between Oracle7 and Oracle8 there are 2 articles
which discuss this:

Using ROWID Range Scans to extract data in Oracle8 and higher Note:61685.1
Using ROWID Range Scans to extract data in Oracle7 Note:34371.1

3. If there is a primary key select table data via this index.


It may also be possible to select some of data via any other index.
This can be slow and time consuming and is only normally needed for Oracle 7.0
releases. This method is described in Note:34371.1 (which also describes the ROWID
range scans)
4. There are various salvage programs / PLSQL scripts which can be used to salvage data
from a table. These can take longer to set up and use than the above methods but can
often cope with various kinds of corruption besides an ORA-1578.
As these methods typically require much hand-holding from support then some of
these articles may not be visible to customers.

These require Pro*C to be available and an understanding of how to build Pro*C


executables:
SALVAGE.PC for Oracle8.1 Note:97357.1

These requires manual interaction:

SALVAGE.SQL for Oracle7/8 Note:2064553.4


SALVAGE.SQL for Oracle7/8 Note:28308.1
The following is only possible in Oracle8i and 9i. The aim is to mark the block as
corrupt and then use the SKIP_CORRUPT table attribute to extract the table data:

Use DBMS_REPAIR to mark the block corrupt Note:68013.1


For corruption in a LONG column: Recreating a Table with a corruption in a LONG
Note:876493.1

{go to content}

Methods of extracting data from a table with a corrupt LOBSEGMENT block

It is not possible to used DBMS_REPAIR on LOB segments.

If the corrupt LOB block is NOT referenced by any row in the table then it should be possible
to CREATE TABLE as SELECT (CTAS) or export / drop / import the table as is.

If the corrupt LOB block is referenced by a row then it should be possible to select or export
with a WHERE predicate that excludes the problem row/s.

WARNING:
It is possible to update the LOB column value of a problem row to NULL (empty_blob() /
empty_clob) which will clear the ORA-1578 on SELECT operations *BUT* the corrupt
block will then be waiting to be reclaimed and will eventually signal an ORA-1578 on
attempts to get a new LOB for INSERT or UPDATE operations on any row which can be a
worse situation than having a corruption on a known row.
Hence the LOB column can be set to NULL only if intended to immediately recreate the
table.

Setting the corrupt lob to empty lob will add the blocks formerly mapped to this lob to the
freelist. Eventually when PCTVERSION or RETENTION criteria causes the space to be
salvaged and reused for new data, error ORA-1578 can be seen again in the same LOB
blocks. In that case and after applying the above procedure the lob segment can be moved to
a new segment:
alter table &table_owner.&table_with_lob move LOB (&&lob_column) store as
(tablespace &tablespace_name);

Extracting data from the corrupt block itself

As the corrupt block itself is "corrupt" then any data extracted from the block should be
treated as suspect. The main methods of getting the rows from the corrupt block itself are:

 Use any existing indexes on the table to extract data for columns covered by the index
where the ROWID falls inside the corrupt block. This is described towards the end of
the ROWID range scan articles mentioned above:
For Oracle8/8i see Note:61685.1
For Oracle7 see Note:34371.1
 It may be possible to use LogMiner on the redo stream to find the original
inserts/updates which loaded the data to the problem block. The main factor here is
WHEN the data was actually put in the block.
example: row 2 may have been inserted yesterday but row 1 may have been inserted 5
years ago.

{go to content}

(5D) Leaving A Corruption In Place

It is possible to leave a corruption in place and just accept the errors reported, or prevent
access to the problem rows at an application level.

Example: If the problem block / row is in a child table then it may be possible at application
level to prevent access via the parent row/s such that the child rows are never accessed. (Be
wary of cascade type constraints though)

This may not help with reports and other jobs which access data in bulk so it may also be
desirable to use the DBMS_REPAIR options shown in 4C above to prevent the block/s
erroring when accessed. Marking a corruption like this and leaving it around may give a short
term solution allowing full data salvage and/or recovery to be attempted at scheduled outage,
or allowing time to check other recovery options on a second (clone) database. Note though
that marking a block corrupt with DBMS_REPAIR.FIX_CORRUPT_BLOCKS will cause the
marked block/s to also be corrupt after recovery through the time that
FIX_CORRUPT_BLOCKS was executed.

Leaving a corruption may be sensible for data which rapidly ages and is subsequently purged
(example: In a date partitioned table where older partitions are dropped at some point).

Leaving Corruptions in LOB segments

At application level it can be possible to leave a corrupt LOB column in place until such time
as the table can be rebuilt.

One way to ensure to not hit the "WARNING" scenario above is to ensure that the table is
only ever accessed via a view which includes a WHERE predicate to prevent the problem
row/s from being seen.
example: Consider table MYTAB( a number primary key, b clob ) has one or more rows
pointing at corrupt LOB data.

ALTER TABLE MYTAB ADD ( BAD VARCHAR2(1) );


CREATE VIEW MYVIEW AS SELECT a,b FROM MYTAB WHERE BAD is null;

Set BAD='Y' for any problem row/s

If MYTAB is only accessed via MYVIEW, the row will never be visible and cannot be
updated; keeping the corrupt entry isolated until it can be dealt with.

Clearly this example is more of a design time solution but some applications may already
have similar mechanisms and may only access data via a view (or via an RLS policy) giving
some option/s to hide the problem rows.

{go to content}

Warnings when Leaving a Corruption in Place

Whilst it is possible to leave a corruption in place it should be noted that the corrupt blocks
will still show up in runs of DBVERIFY, in RMAN backup warnings / errors etc..

It is important to make a careful record of any corruption expected to be seen from these
tools, particularly any blocks expected to be skipped with RMAN (having MAX_CORRUPT
set) and be sure to remove any "acceptance" of the errors once the corruptions have been
cleared.

Example: Consider that a corrupt block has been handled by leaving the corruption in place
and avoiding the problem row/s at application level.
RMAN may be configured to allow the corruptions during backup.
The table is then recreated at a later date during some table reorganisation.
If RMAN is not updated to reflect that no errors should now be expected then RMAN may
ignore some other corruption which occurs at a later time.

It is also important to note that leaving corrupt blocks around in table segments can lead to
mismatched results from queries; different results can occur for tables with SKIP_CORRUPT
set depending on whether an index scan or table access occurs. Other reports may just error .

Note that leaving a corruption in place but marking the block with
DBMS_REPAIR.FIX_CORRUPT_BLOCKS writes redo to corrupt the block which may
limit subsequent recovery options.

{go to content}

(5E) Last Options

If there is a STANDBY setup (physical or logical) then check that first.


Whatever sort of block the problem occurred on, one possible option is to recover the
database, or problem tablespace, to a point in time BEFORE the corruption appeared. The
difficulty with this option is that it is not always possible to know when the problem first
appeared.

DBVERIFY can be often be used to check a restored file for corruptions.


For details of using DBVERIFY see Note:35512.1 . In particular the START/END DBV
options can be used to give a quick first test of whether the problem block itself is bad on a
restored backup image.
This section outlines some final options available for recovering.
If arrived here then one or more of the following have happened:

 A "vital" datafile has been lost (or have a corruption on it) and have no good
backup of the problem file/s (without the corruption)
 The database is either not in ARCHIVELOG mode OR do not have all
archivelogs since the file was first created
 Complete recovery keeps reintroducing the problem

Last chance:

Please note that if all copies of a datafile have been lost but DO still have the
ARCHIVE logs from when the file was first created it is still possible to recover the
file.

Example:
ALTER DATABASE CREATE DATAFILE '....' [as '...'] ;
RECOVER DATAFILE '....'
ALTER DATABASE DATAFILE '....' ONLINE;

If currently in this scenario try to recover the datafile using these steps before
proceeding below.

If reached this line there are no options left to recover to the current point in time. It
is advisable to shutdown the instance and take a BACKUP of the current database
NOW in order to provide a fall-back position if the chosen course of action fails.
(Example: if identified that the backup is bad).

Some outline options available are:

Revert to an old COLD backup


- example: If in NOARCHIVELOG mode

Set up a clone database from a COLD backup and extract (export) the problem
table/s or transport the problem tablespace

Point in time recovery to an older point in time that is consistent:

 requires a good backup and any necessary archive logs


 ALL files have to be restored and the whole DB rolled forward to a suitable point in
time.
 It may be possible to do the point in time recovery in a clone database and then
transport the problem tablespace to the problem database, or export / import the
problem table from the clone to the problem database .

Tablespace point in time recovery


- It may be possible to perform a point in time recovery of the affected tablespace only.
There are many notes describing tablespace point in time recovery such as
Note:223543.1.

Rebuild of DB from some logical export / copy


- Requires there to already be a good logical backup of the database
- NB: The database has to be RECREATED for this option.
- As with other options the rebuild could be in a clone database just to get a good image
of the problem table/s.

If a good backup is available then rolling forwards with DB_BLOCK_CHECKING=TRUE


can help find the first point in time where something started to go wrong.
It is not generally necessary to take the problem database down while investigating the
recovery options.

Example: Restore the system tablespace and problem tablespace datafiles only to a totally
different location and/or machine as a different instance to investigate how far the database
can roll forward etc..
As of Oracle9i the "Trial Recovery" option can be used to save having to keep restoring a
backup while looking into possible options.

{go to content}

References
NOTE:1088018.1 - Master Note for Handling Oracle Database Corruption Issues
NOTE:1543698.2 - This document has been replaced with more recent information on this
topic. Please refer to more recent documentation.

You might also like