0% found this document useful (0 votes)
114 views31 pages

Oracle Data Guard - Fast Start Failover Understood!: Dr. Martin Wunderli

Uploaded by

CiroRistoratore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views31 pages

Oracle Data Guard - Fast Start Failover Understood!: Dr. Martin Wunderli

Uploaded by

CiroRistoratore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Oracle Data Guard –

Fast Start Failover understood!

Dr. Martin Wunderli


http://www.trivadis.com

Principal Consultant
Partner

Basel · Baden · Bern · Lausanne · Zurich · Düsseldorf · Frankfurt/M. · Freiburg i. Br. · Hamburg · Munich · Stuttgart · Wien
Trivadis Facts & Figures

12 locations
D: Dusseldorf, Frankfurt, Freiburg, Hamburg, Munich, Stuttgart
A: Vienna
CH: Baden, Basle, Bern, Lausanne, Zurich
Consolidated income
CHF 85 million / EUR 53 million
Over 470 employees
Over 450 clients
Over 1‘400 projects per year
Over 110 Service Level Agreements
About 4'000 training participants per year

Fast Start Failover understood 2 © 2006


FSFO understood!

Data Guard Concepts & History

The startup issue

Fast Start Failover

Conclusion
Data is always
part of the game.

Fast Start Failover understood 3 © 2006


Oracle Standby Databases and Data Guard – Overview

Primary Site Standby Site

Primary
Standby
Database
Database

Standby
Log Files Archived
Local Log Files
Archiving

Online
Log Files

Fast Start Failover understood 4 © 2006


Standby Databases: A short history
#

"
'(
Oracle 7.3: Creating and mounting
a standby database

&
Oracle 8i: Automated archived
redo log transport and application,
TAF, open read-only of standby
database !!" #$$$ #$$ #$$# #$$%

Oracle 9i: Data Guard and Data Guard Broker with


switchover, close log gap, delayed redo application,
GUI and no-data-loss setups (sync transport)

Oracle 10g: Simplified syntax, RAC support, partial


failover cluster support, reuse of old primary as new
standby database, automatic standby activation
Fast Start Failover understood 5 © 2006
Why Data Guard (and not e.g. a Failover Cluster)?

In case of a disaster protection setup (data must be mirrored


between at least two locations), bandwidth usage is smaller: Even
high transaction systems typically need only approx. 70 MBit/s
bandwidth

No extra software layer and license needed (if you already


licensed Oracle Enterprise Edition on primary and standby
server…)

No file system or instance recovery of database needed after


crash of primary server (standby is up to date in case of No-Data-
Loss setup and 10gR2)

Fast Start Failover understood 6 © 2006


Why a Failover Cluster (and not Data Guard)?

File system based mirroring is needed because of non-database


files

IP address failover is needed for e.g. an application server

DBA knowledge is not available

In case that instance recovery time and bandwidth between


locations is also crucial, a combination of Failover Cluster and
Data Guard between the same machines may be necessary

Fast Start Failover understood 7 © 2006


Failover Cluster vs. Data Guard

Remember…

Fast Start Failover for Data Guard is not a failover cluster with
Two connections between nodes (network and disk) where the loss
of one connection results in node shutdown
A single location of data files (from the point of view of Oracle
RDBMS)

These two points have positive and negative impact


Nodes stay longer up and in their role in case of partial inter node
connection loss
Automatic failover may not be possible after partial inter node
connection loss

Fast Start Failover understood 8 © 2006


FSFO understood!

Data Guard Concepts & History

The startup issue

Fast Start Failover

Conclusion
Data is always
part of the game.

Fast Start Failover understood 9 © 2006


Physical Standby: Startup Behavior 10g versus 9i

alter database open; # Primary


or
recover managed standby database; # Standby

) * + , $
,
alter database mount;
-...

Data Guard Broker !


startup nomount ,

Fast Start Failover understood 10 © 2006


Physical Standby – Startup Issue

What is the biggest problem for data consistency in a cluster?

Split Brain!

What is the biggest problem for data consistency in a Data Guard


environment?

/ PRIM DB

PRIM DB
More than one primary!

How can this happen?


Primary startup (Hardware fixed etc.) after standby activation

Fast Start Failover understood 11 © 2006


Primary Startup after Failover: Network connected

OK: STARTUP MOUNT of former primary database


Data Guard Broker takes over and handles startup process
The Broker knows about the failover and the resulting change of the
primary database
The former primary database is not started
DGMGRL> show configuration;
Error: ORA-16795: database resource guard detects that database
re-creation is required
Configuration details cannot be determined by DGMGRL

BAD: STARTUP of former primary database


Results in two primary databases since sqlplus does not know of the
Data Guard Broker configuration

Fast Start Failover understood 12 © 2006


Primary Startup after Failover: Network interrupted

BAD: STARTUP of former primary database


Results in two primary databases since sqlplus does not know of the
Data Guard Broker configuration

BAD: STARTUP MOUNT of former primary database


Data Guard Broker tries to verify the Data Guard configuration
After 5 unsuccessful requests, Data Guard Broker opens the former
primary database

Fast Start Failover understood 13 © 2006


Startup: Variants

1. Only mount Primary and Standby Database during system boot

2. Manual database startup after system boot


3. Adapt TNSNAMES or LDAP server so that old Primary is not
found anymore. But local jobs…
4. Is there a better solution? Yes, see Fast Start Failover! TBP

Fast Start Failover understood 14 © 2006


FSFO understood!

Data Guard Concepts & History

The startup issue

Fast Start Failover

Conclusion
Data is always
part of the game.

Fast Start Failover understood 15 © 2006


Fast-Start Failover

Main criticism of Oracle standby databases: too much manual


interaction

1. Manual interaction is required for a failover


Need some administrative checks before to validate the status of the
standby database, e.g. if all redo are applied
More downtime

2. Manual interaction to recreate a new standby database


No HA until the setup of the new standby is finished

3. Manual interaction is needed for startup if two primaries have to


be avoided at all cost
Fast Start Failover addresses all three problems!

Fast Start Failover understood 16 © 2006


Fast Start Failover: Concept

1. Observed Data Guard environment

Primary Standby

2. Fast-Start-Failover (automatic)

Primary Primary

3. Reinstate (automatic)

Standby Primary

Fast Start Failover understood 17 © 2006


When is a Fast-Start Failover triggered?

Primary site failure


Server crash or server shutdown (without database shutdown)

Primary database failure


Instance failure (last running instance if RAC)
Shutdown abort (but not with normal or immediate)
Data file is taken offline

Network failure (special case)


Documentation of when and when not automatic activation will
happen is quite large. Read and test carefully. We will show one case.

Fast Start Failover understood 18 © 2006


Network Failure (1)

)+ )+

Select fs_failover_status,fs_failover_observer_present
from v$database; -- on primary site
FS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT
-------------------- -----------------------------
SYNCHRONIZED NO

Fast Start Failover understood 19 © 2006


Network Failure (2)

0
1&
2

) )+

Select fs_failover_status,fs_failover_observer_present
from v$database; -- on primary site
FS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT
-------------------- ----------------------------
STALLED NO

Fast Start Failover understood 20 © 2006


Network Failure (3)

) 3 )+
Select fs_failover_status,fs_failover_observer_present
from v$database; -- on new primary site
FS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT
-------------------- ----------------------------
REINSTATE REQUIRED YES

Fast Start Failover understood 21 © 2006


Network Failure (4)

3 )+ 3 )+

Select fs_failover_status,fs_failover_observer_present
from v$database; -- on primary site and standby site
FS_FAILOVER_STATUS FS_FAILOVER_OBSERVER_PRESENT
-------------------- -----------------------------
SYNCHRONIZED YES

Fast Start Failover understood 22 © 2006


Observer location?

4 5 &

% 6

)+ )+

Fast Start Failover understood 23 © 2006


Observer location …

Best is three locations:


One for primary database
One for standby database
One for observer

In many real life situations (no three locations…)


Observer on primary site will be the best choice if avoiding 'false'
activations is most important
Observer on standby site will be the best choice if protection from
computation center loss is most important

Fast Start Failover understood 24 © 2006


Compromise to minimize false activates

&

)+ )+

Fast Start Failover understood 25 © 2006


Observer – Installation Requirements

Observer machine with Oracle Net configuration

Special entry in Data Guard Broker configuration which


requires…

MaxAvailability protection mode for Primary Database


but: special startup behavior
but: primary stalls in certain situations

Flashback database activated on Primary and Standby Database

Fast Start Failover understood 26 © 2006


Observer - Data Guard additional Configuration

Not much to configure


edit database 'PHYS_LUCERNE'
set property FastStartFailoverTarget = 'PHYS_TOKYO';
edit database 'PHYS_TOKYO'
set property FastStartFailoverTarget = 'PHYS_LUCERNE';
edit configuration
set property FastStartFailoverThreshold = 15;
enable fast_start failover;

Fast-Start Failover is a feature of Oracle Data Guard, and


cannot run without a Data Guard Broker configuration!

Fast Start Failover understood 27 © 2006


FSFO – Does it work? (1)

Usually it works

An interrupt (network, server crash etc.) during reinstate often


results in problems
FSFO configuration hangs
The reinstating instance will not continue
The observer cannot be stopped (with stop observer)
How to solve the problem
dgmgrl
connect system@<new_primary>
disable fast_start failover force;
reinstate database '<old_primary>';
enable fast_start failover;

If the "disable fast_start failover force" also hangs, kill/start the observer
and restart the new primary instance

Fast Start Failover understood 28 © 2006


FSFO – Does it work? (2)

In rare cases, the whole broker configuration is corrupted


Remove the configuration
On both nodes / instances
sql> shutdown immediate

cd $ORACLE_BASE/admin/DG1/pfile/
mv dr1DG1.dat dr1DG1.dat.bck
mv dr2DG1.dat dr2DG1.dat.bck

sql> startup mount

Recreate the configuration (good to have scripts )


dgmgrl> create configuration 'DG1' ...
dgmgrl> add database ...
dgmgrl> edit database ... / edit configuration ...
dgmgrl> enable configuration;
dgmgrl> enable fast_start failover;

Fast Start Failover understood 29 © 2006


FSFO understood!

Data Guard Concepts & History

The startup issue

Fast Start Failover

Conclusion
Data is always
part of the game.

Fast Start Failover understood 30 © 2006


Fast Start Failover understood: Core Messages

FSFO addresses three major problems


of 9i Data Guard

Observer location is not easy to decide

Things can become corrupt: Be


prepared to recreate the Data Guard
configuration
Data is always
part of the game.

Fast Start Failover understood 31 © 2006

You might also like