A Troubleshooting Data Guard: Oracle® Data Guard Concepts and Administration 10g Release 2 (10.2)
A Troubleshooting Data Guard: Oracle® Data Guard Concepts and Administration 10g Release 2 (10.2)
View PDF
Previous Next
Common Problems
Log File Destination Failures
Handling Logical Standby Database Failures
Problems Switching Over to a Standby Database
What to Do If SQL Apply Stops
Network Tuning for Redo Data Transmission
Slow Disk Performance on Standby Databases
Log Files Must Match to Avoid Primary Database Shutdown
Troubleshooting a Logical Standby Database
A.1 Common Problems
If you attempt to use any of these statements on the standby database, an error is
returned. For example:
A.1.3 Standby Database Does Not Receive Redo Data from the Primary Database
If the standby site is not receiving redo data, query the V$ARCHIVE_DEST view and check
for error messages. For example, enter the following query:
If the output of the query does not help you, check the following list of possible issues.
If any of the following conditions exist, redo transport services will fail to transmit redo
data to the standby database:
The service name for the standby instance is not configured correctly in
the tnsnames.ora file for the primary database.
The Oracle Net service name specified by the LOG_ARCHIVE_DEST_n parameter for
the primary database is incorrect.
The LOG_ARCHIVE_DEST_STATE_n parameter for the standby database is not set to
the value ENABLE.
The listener.ora file has not been configured correctly for the standby database.
The listener is not started at the standby site.
The standby instance is not started.
You have added a standby archiving destination to the primary SPFILE or text
initialization parameter file, but have not yet enabled the change.
The databases in the Data Guard configuration are not all using a password file,
or the SYS password contained in the password file is not identical on all
systems.
You used an invalid backup as the basis for the standby database (for example,
you used a backup from the wrong database, or did not create the standby
control file using the correct method).
You cannot mount the standby database if the standby control file was not created with
the ALTER DATABASE CREATE [LOGICAL] STANDBY CONTROLFILE ... statement or RMAN
command. You cannot use the following types of control file backups:
Taking one of these actions prevents SQL Apply from stopping. Later, you can query
the DBA_LOGSTDBY_EVENTS view to find and correct any problems that exist. See Oracle
Database PL/SQL Packages and Types Reference for more information about using
the DBMS_LOGSTDBY package with PL/SQL callout procedures.
In most cases, following the steps described in Chapter 7 will result in a successful
switchover. However, if the switchover is unsuccessful, the following sections may help
you to resolve the problem:
If the switchover does not complete successfully, you can query the SEQUENCE# column in
the V$ARCHIVED_LOG view to see if the last redo data transmitted from the original primary
database was applied on the standby database. If the last redo data was not
transmitted to the standby database, you can manually copy the archived redo log file
containing the redo data from the original primary database to the old standby
database and register it with the SQL ALTER DATABASE REGISTER
LOGFILE file_specification statement. If you then start log apply services, the archived
redo log file will be applied automatically. Query the SWITCHOVER_STATUS column in
the V$DATABASE view. The TO PRIMARY value in the SWITCHOVER_STATUScolumn verifies
switchover to the primary role is now possible.
To continue with the switchover, follow the instructions in Section 7.2.1 for physical
standby databases or Section 7.3.1 for logical standby databases, and try again to
switch the target standby database to the primary role.
When sessions are active, an attempt to switch over fails with the following error
message:
Action: Query the V$SESSION view to determine which processes are causing the error.
For example:
Do not modify the parameter in your initialization parameter file. After you shut down
the instance and restart it after the switchover completes, the parameter will be reset
to the original value. This applies to both primary and physical standby databases.
Table A-1 summarizes the common processes that prevent switchover and what
corrective action you need to take.
Type of
Process Process Description Corrective Action
CJQ0 Job Queue Scheduler Change the JOB_QUEUE_PROCESSES dynamic parameter to the
Process value 0. The change will take effect immediately without
having to restart the instance.
QMN0 Advanced Queue Change the AQ_TM_PROCESSES dynamic parameter to the value
Time Manager 0. The change will take effect immediately without having to
restart the instance.
Type of
Process Process Description Corrective Action
DBSNMP Oracle Enterprise Issue the emctl stop agent command from the operating
Manager system prompt.
Management Agent
If the switchover fails and returns the error ORA-01093 "Alter database close only
permitted with no sessions connected" it is usually because the ALTER DATABASE COMMIT
TO SWITCHOVER statement implicitly closed the database, and if there are any other user
sessions connected to the database, the close fails.
If you receive this error, disconnect any user sessions that are still connected to the
database. To do this, query the V$SESSION fixed view to see which sessions are still
active as shown in the following example:
9 rows selected.
In this example, the first seven sessions are all Oracle Database background processes.
Among the two SQL*Plus sessions, one is the current SQL*Plus session issuing the
query, and the other is an extra session that should be disconnected before you re-
attempt the switchover.
Suppose the standby database and the primary database reside on the same site. After
both the ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY and the ALTER DATABASE
COMMIT TO SWITCHOVER TO PRIMARY statements are successfully executed, shut down and
restart the physical standby database and the primary database.
Note:
It is not necessary to shut down and restart the physical standby database if it has not
been opened read-only since the instance was started.
However, the startup of the second database fails with ORA-01102 error "cannot mount
database in EXCLUSIVE mode."
This could happen during the switchover if you did not set the DB_UNIQUE_NAME parameter
in the initialization parameter file that is used by the standby database (that is, the
original primary database). If the DB_UNIQUE_NAME parameter of the standby database is
not set, the standby and the primary databases both use the same mount lock and
cause the ORA-01102 error during the startup of the second database.
The archived redo log files are not applied to the new standby database after the
switchover.
This might happen because some environment or initialization parameters were not
properly set after the switchover.
Action:
If you do not see an entry corresponding to the standby site, you need to
set LOG_ARCHIVE_DEST_n and LOG_ARCHIVE_DEST_STATE_n initialization parameters.
For physical standby databases in situations where an error occurred and it is not
possible to continue with the switchover, it might still be possible to revert the new
physical standby database back to the primary role by using the following steps:
1. Connect to the new standby database (old primary), and issue the following
statement to convert it back to the primary role:
2. SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
If this statement is successful, then shut down (if necessary) and restart the
database. Once restarted, the database will be running in the primary database
role, and you do not need to perform any more steps.
3. When the switchover to change the role from primary to physical standby was
initiated, a trace file was written in the log directory. This trace file contains the
SQL statements required to re-create the original primary control file. Locate the
trace file and extract the SQL statements into a temporary file. Execute the
temporary file from SQL*Plus. This will revert the new standby database back to
the primary role.
4. Shut down the original physical standby database.
5. Create a new standby control file. This is necessary to resynchronize the primary
database and physical standby database. Copy the physical standby control file
to the original physical standby system. Section 3.2.2 describes how to create a
physical standby control file.
6. Restart the original physical standby instance.
If this procedure is successful and archive gap management is enabled, the FAL
processes will start and re-archive any missing archived redo log files to the
physical standby database. Force a log switch on the primary database and
examine the alert logs on both the primary database and physical standby
database to ensure the archived redo log file sequence numbers are correct.
Log apply services cannot apply unsupported DML statements, DDL statements, and
Oracle supplied packages to a logical standby database running SQL Apply.
When an unsupported statement or package is encountered, SQL Apply stops. You can
take the actions described in Table A-2 to correct the situation and start SQL Apply on
the logical standby database again.
If... Then...
You suspect an Find the last statement in the DBA_LOGSTDBY_EVENTS view. This will indicate
unsupported the statement and error that caused SQL Apply to fail. If an incorrect SQL
statement or statement caused SQL Apply to fail, transaction information, as well as the
Oracle supplied statement and error information, can be viewed. The transaction information
package was can be used with LogMiner tools to understand the cause of the problem.
encountered
An error Fix the problem and resume SQL Apply using the ALTER DATABASE START
requiring LOGICAL STANDBY APPLY statement.
database
management
occurred, such as
running out of
space in a
particular
tablespace
An error Enter the correct SQL statement and use
occurred because the DBMS_LOGSTDBY.SKIP_TRANSACTION procedure to ensure the incorrect
a SQL statement statement is ignored the next time SQL Apply is run. Then, restart SQL
was entered Apply using the ALTER DATABASE START LOGICAL STANDBY APPLY statement.
incorrectly, such
as an incorrect
standby database
filename being
entered in a
tablespace
statement
An error Issue
If... Then...
occurred because the DBMS_LOGSTDBY.SKIP('TABLE','schema_name','table_name',null) proced
skip parameters ure, then restart SQL Apply.
were incorrectly
set up, such as
specifying that
all DML for a
given table be
skipped
but CREATE, ALTER
, and DROP
TABLEstatements
were not
specified to be
skipped
The following example shows a database initialization parameter file segment that
defines a remote destination netserv:
LOG_ARCHIVE_DEST_3='SERVICE=netserv'
The following example shows the definition of that service name in the tnsnames.ora file:
netserv=(DESCRIPTION=(SDU=32768)(ADDRESS=(PROTOCOL=tcp)(HOST=host) (PORT=1521))
(CONNECT_DATA=(SERVICE_NAME=srvc)))
LISTENER=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)
(HOST=host)(PORT=1521))))
SID_LIST_LISTENER=(SID_LIST=(SID_DESC=(SDU=32768)(SID_NAME=sid)
(GLOBALDBNAME=srvc)(ORACLE_HOME=/oracle)))
If you archive to a remote site using a high-latency or high-bandwidth network link, you
can improve performance by using
the SQLNET.SEND_BUF_SIZE and SQLNET.RECV_BUF_SIZE Oracle Net profile parameters to
increase the size of the network send and receive I/O buffers.
See Oracle Database Net Services Administrator's Guide .
If asynchronous I/O on the file system itself is showing performance problems, try
mounting the file system using the Direct I/O option or setting
the FILESYSTEMIO_OPTIONS=SETALL initialization parameter. The maximum I/O size setting
is 1 MB.
If you have configured a standby redo log on one or more standby databases in the
configuration, ensure the size of the standby redo log files on each standby database
exactly matches the size of the online redo log files on the primary database.
At log switch time, if there are no available standby redo log files that match the size of
the new current online redo log file on the primary database:
or
The RFS process on the standby database will create an archived redo log file on
the standby database and write the following message in the alert log:
No standby log files of size <#> blocks available.
For example, if the primary database uses two online redo log groups whose log files
are 100K, then the standby database should have 3 standby redo log groups with log
file sizes of 100K.
Also, whenever you add a redo log group to the primary database, you must add a
corresponding standby redo log group to the standby database. This reduces the
probability that the primary database will be adversely affected because a standby redo
log file of the required size is not available at log switch time.
A.9.1 Recovering from Errors
Logical standby databases maintain user tables, sequences, and jobs. To maintain other
objects, you must reissue the DDL statements seen in the redo data stream.
DDL statements are executed the same way on the primary database and the logical
standby database. If the underlying file structure is the same on both databases, the
DDL will execute on the standby database as expected.
If an error was caused by a DDL transaction containing a file specification that did not
match in the logical standby database environment, perform the following steps to fix
the problem:
In some situations, the problem that caused the transaction to fail can be corrected and
SQL Apply restarted without skipping the transaction. An example of this might be when
available space is exhausted. (Do not let the primary and logical standby databases
diverge when skipping DDL transactions. If possible, you should manually execute a
compensating transaction in place of the skipped transaction.)
The following example shows SQL Apply stopping, the error being corrected, and then
restarting SQL Apply:
Session altered.
EVENT_TIME COMMIT_SCN
------------------ ---------------
EVENT
-------------------------------------------------------------------------------
STATUS
-------------------------------------------------------------------------------
22-OCT-03 15:47:58
In the example, the ORA-01653 message indicates that the tablespace was full and
unable to extend itself. To correct the problem, add a new datafile to the tablespace.
For example:
When SQL Apply restarts, the transaction that failed will be reexecuted and applied to
the logical standby database.
Do not use the SKIP_TRANSACTION procedure to filter DML failures. Not only is the DML
that is seen in the events table skipped, but so is all the DML associated with the
transaction. This will cause multiple tables.
DML failures usually indicate a problem with a specific table. For example, assume the
failure is an out-of-storage error that you cannot resolve immediately. The following
steps demonstrate one way to respond to this problem.
1. Bypass the table, but not the transaction, by adding the table to the skip list:
2. SQL> EXECUTE DBMS_LOGSTDBY.SKIP('DML','SCOTT','EMP');
3. SQL> ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
From this point on, DML activity for the SCOTT.EMP table is not applied. After you
correct the storage problem, you can fix the table, provided you set up a
database link to the primary database that has administrator privileges to run
procedures in the DBMS_LOGSTDBY package.
4. Using the database link to the primary database, drop the local SCOTT.EMP table
and then re-create it, and pull the data over to the standby database.
5. SQL> ALTER DATABASE STOP LOGICAL STANDBY APPLY;
6. SQL> EXECUTE DBMS_LOGSTDBY.INSTANTIATE_TABLE('SCOTT','EMP','PRIMARYDB');
7. SQL> ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;
8. To ensure a consistent view across the newly instantiated table and the rest of
the database, wait for SQL Apply to catch up with the primary database before
querying this table. Refer to Section 9.4.6, "Adding or Re-Creating Tables On a
Logical Standby Database" for a detailed example.
Oracle SQL*Loader provides a method of loading data from different sources into the
Oracle Database. This section analyzes some of the features of the SQL*Loader utility
as it pertains to SQL Apply.
Regardless of the method of data load chosen, the SQL*Loader control files contain an
instruction on what to do to the current contents of the Oracle table into which the new
data is to be loaded, via the keywords of APPEND and REPLACE. The following examples
show how to use these keywords on a table named LOAD_STOK:
When using the APPEND keyword, the new data to be loaded is appended to the
contents of the LOAD_STOK table:
LOAD DATA
INTO TABLE LOAD_STOK APPEND
When using the REPLACE keyword, the contents of the LOAD_STOK table are deleted
prior to loading new data. Oracle SQL*Loader uses the DELETE statement to purge
the contents of the table, in a single transaction:
LOAD DATA
INTO TABLE LOAD_STOK REPLACE
The SQL*Loader script may continue to contain the REPLACE keyword, but it will now
attempt to DELETE zero rows from the object on the primary database. Because no rows
were deleted from the primary database, there will be no redo recorded in the redo log
files. Therefore, no DELETE statement will be issued against the logical standby
database.
Issuing the REPLACE keyword without the DDL command TRUNCATE TABLE provides the
following potential problems for SQL Apply when the transaction needs to be applied to
the logical standby database.
If the table currently contains a significant number of rows, then these rows
need to be deleted from the standby database. Because SQL Apply is not able to
determine the original syntax of the statement, SQL Apply must issue
a DELETEstatement for each row purged from the primary database.
For example, if the table on the primary database originally had 10,000 rows,
then Oracle SQL*Loader will issue a single DELETE statement to purge the 10,000
rows. On the standby database, SQL Apply does not know that all rows are to be
purged, and instead must issue 10,000 individual DELETE statements, with each
statement purging a single row.
If the table on the standby database does not contain an index that can be used
by SQL Apply, then the DELETE statement will issue a Full Table Scan to purge the
information.
Continuing with the previous example, because SQL Apply has issued 10,000
individual DELETE statements, this could result in 10,000 Full Table Scans being
issued against the standby database.
One of the primary causes for long-running transactions in a SQL Apply environment is
because of Full Table Scans. Additionally, long-running transactions could be the result
of DDL operations being replicated to the standby database, such as when creating or
rebuilding an index.
If SQL Apply is executing a single SQL statement for a long period of time, then a
warning message similar to the following is reported in the alert log of the SQL Apply
instance:
It may not be possible to determine the SQL statement being executed by the long-
running statement, but the following SQL statement may help in identifying the
database objects on which SQL Apply is operating:
Additionally, you can issue the following SQL statement to identify the SQL statement
that has resulted in a large number of disk reads being issued per execution:
Oracle recommends that all tables have primary key constraints defined, which
automatically means that the column is defined as NOT NULL. For any table where a
primary-key constraint cannot be defined, an index should be defined on an appropriate
column that is defined as NOT NULL. If a suitable column does not exist on the table,
then the table should be reviewed and, if possible, skipped by SQL Apply. The following
steps describe how to skip all DML statements issued against the FTS table on
the SCOTT schema:
Interested transaction list (ITL) pressure is reported in the alert log of the SQL Apply
instance. Example A-3 shows an example of the warning messages.
Real-Time Analysis
The messages shown in Example A-3 indicate that the SQL Apply process (slavid) #17
has not made any progress in the last 30 seconds. To determine the SQL statement
being issued by the Apply process, issue the following query:
Post-Incident Review
Pressure for a segment's ITL is unlikely to last for an extended period of time. In
addition, ITL pressure that lasts for less than 30 seconds will not be reported in the
standby databases alert log. Therefore, to determine which objects have been
subjected to ITL pressure, issue the following statement:
This statement reports all database segments that have had ITL pressure at some time
since the instance was last started.
Note:
This SQL statement is not limited to a logical standby databases in the Data Guard
environment. It is applicable to any Oracle database.
See Also:
Oracle Database SQL Reference for more information about specifying
the INITRANS integer, which it the initial number of concurrent transaction entries
allocated within each data block allocated to the database object
Also, consider modifying the database object on the primary database, so in the event
of a switchover, the error should not occur on the new standby database.
A.9.4 Troubleshooting ORA-1403 Errors with Flashback Transactions
If SQL Apply returns the ORA-1403: No Data Found error, then it may be possible to use
Flashback Transaction to reconstruct the missing data. This is reliant upon
the UNDO_RETENTION initialization parameter specified on the standby database instance.
When SQL Apply verification fails, the error message is reported in the alert log of the
logical standby database and a record is inserted in the DBA_LOGSTDBY_EVENTS view.The
information in the alert log is truncated, while the error is reported in it's entirety in the
database view. For example:
The Investigation
The first step is to analyze the historical data of the table that caused the error. This
can be achieved using the VERSIONS clause of the SELECT statement. For example, you
can issue the following query on the primary database:
SELECT VERSIONS_XID
, VERSIONS_STARTSCN
, VERSIONS_ENDSCN
, VERSIONS_OPERATION
, PK
, NAME
FROM SCOTT.MASTER
VERSIONS BETWEEN SCN MINVALUE AND MAXVALUE
WHERE PK = 1
ORDER BY NVL(VERSIONS_STARTSCN,0);
Depending upon the amount of undo retention that the database is configured to retain
(UNDO_RETENTION) and the activity on the table, the information returned might be
extensive and you may need to change the versions between syntax to restrict the
amount of information returned.From the information returned, it can be seen that the
record was first inserted at SCN 3492279 and then was deleted at SCN 3492290 as part
of transaction ID 02000D00E4070000.Using the transaction ID, the database should be
queried to find the scope of the transaction. This is achieved by querying
the FLASHBACK_TRANSACTION_QUERY view.
Note that there is always one row returned representing the start of the transaction. In
this transaction, only one row was deleted in the master table. The UNDO_SQL column
when executed will restore the original data into the table.
When you restart SQL Apply, the transaction will be applied to the standby database:
Copyright © 1999,
Previous Next 2008, Oracle. All rights reserved. Hom Boo Conten Inde Mast Conta
Legal Notices e k ts x er ct Us
List Index