SlideShare a Scribd company logo
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Understanding Oracle RAC Internals – Part 2
for the Oracle RAC SIG
Markus Michalewicz (Markus.Michalewicz@oracle.com)
Senior Principal Product Manager Oracle RAC and Oracle RAC One Node
3 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features
or functionality described for Oracle’s products remains at the sole discretion of Oracle.
4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Agenda
• Client Connectivity
• Node Membership
• The Interconnect
5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Direct or indirect connect
Production
Email
BATCH
• Connect Time Load Balancing (CTLB)
• Connect Time Connection Failover (CTCF)
• Runtime Connection Load Balancing (RTLB)
• Runtime Connection Failover (RTCF)
Connection
Pool
SCAN
6 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Connect Time Connection Failover
jdbc:oracle:thin:@MySCAN:1521/Email
PMRAC =
(DESCRIPTION =
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521))
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email)))
Production
Email
BATCH
Connection
Pool
MySCAN
7 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Runtime Time Connection Failover
Production
Email
BATCH
Connection
Pool
MySCAN
PMRAC =
(DESCRIPTION =
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521))
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email)
...))
8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Runtime Time Connection Failover
Production
Email
BATCH
Connection
Pool
MySCAN
PMRAC =
(DESCRIPTION =
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521))
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email)
(FAILOVER_MODE= (TYPE=select)(METHOD=basic)(RETRIES=180)(DELAY=5))))
?
9 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
More information
Production
Email
BATCH
Connection
Pool
MySCAN
?
??
• If problems occur, see:
• Note 975457.1 – How to Troubleshoot Connectivity Issues with 11gR2 SCAN Name
• For more advanced configurations, see:
• Note 1306927.1 – Using the TNS_ADMIN variable and changing the default port
number of all Listeners in an 11.2 RAC for an 11.2, 11.1, and 10.2 Database
10 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Two ways to protect the client
Production
Email
BATCH
Connection
Pool
MySCAN
1. Transparent Application Failover (TAF)
• Tries to make the client unaware of a failure
• Provides means of CTCF and RTCF
• Allows for pure ‘selects’ (reads) to continue
• Write transactions need to be re-issued
• The Application needs to be TAF aware
2. Fast Application Notification (FAN)
• FAN wants to inform clients ASAP
• Client can react to failure asap
• Expects clients to re-connect on failure (FCF)
• Sends messages about changes in the cluster
?
11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity and Service Definition
Define settings on the server
Production
Email
BATCH
MySCAN
[GRID]> srvctl config service
-d ORCL -s MyService
Service name: MyService
...
DTP transaction: false
AQ HA notifications: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
• HA (and LB) settings
can be defined per service
• Clients connecting to the service will
adhere to the settings considering the
client used.
12 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Use a FAN aware connection pool
Production
Email
BATCH
Connection Pool
MySCAN
• If a connection pool is used
• The clients (users) get a physical
connection to the connection pool
• The connection pool creates a physical
connection to the database
• It is a direct client to the database
• Internally the pool maintains logical connections
1
13 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Use a FAN aware connection pool
Production
Email
BATCH
Connection Pool
MySCAN
• The connection pool
• Invalidates connections to one instance
• Re-establishes new logical connections
• May create new physical connections
• Prevent new clients to be misrouted
• The application needs to handle the
transaction failure that might have occurred.
2
14 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
The Load Balancing (LB) cases
• Connect Time Load Balancing (CTLB)
• Runtime Connection Load Balancing (RTLB)
• On the Client Side
• On the Server Side
Production
Email
BATCH
Connection
Pool
MySCAN
15 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Connect Time Load Balancing (CTLB) – on the client side
Production
Email
BATCH
Connection
Pool
MySCAN
PMRAC =
(DESCRIPTION =
(FAILOVER=ON)(LOAD_BALANCE=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521))
(CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email)))
16 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Connect Time Load Balancing (CTLB) – on the server side
Production
Email
BATCH
Connection
Pool
MySCAN?
• Traditionally, PMON dynamically registers the services to the specified listeners with:
• Service names for each running instance of the database and instance names for the DB
• The listener is updated with the load information for every instance and node as follows:
• 1-Minute OS Node Load Average all 30 secs.
• Number of Connections to Each Instance
• Number of Connections to Each Dispatcher
17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Use FAN for the Load Balancing cases
• Connect Time Load Balancing (CTLB)
• Connect Time Connection Failover (CTCF)
• Runtime Connection Load Balancing (RTLB)
• Runtime Connection Failover (RTCF)
RAC
Database
Instance1
Instance2
Instance3
I’m busy
I’m very busy
I’m idle
30% connections
10% connections
60% connections
18 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Client Connectivity
Use FAN for the Load Balancing cases
• Connect Time Load Balancing (CTLB)
• Runtime Connection Load Balancing (RTLB)
• Also via AQ (Advanced Queuing) based notifications
• Background is always the Load Balancing Advisory
• For more information, see:
• Oracle® Real Application
Clusters Administration and
Deployment Guide 11g Release 2:
5 Introduction to Automatic Workload Management
RAC
Database
Instance1
Instance2
Instance3
I’m busy
I’m very busy
I’m idle
30% connections
10% connections
60% connections
MySCAN
19 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Node Membership
20 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
The Oracle RAC Architecture
Oracle Grid Infrastructure 11g Release 2 process overview
OS OS
Oracle Grid Infrastructure
Node
Membership
HA Framework
ASM Instance
OS• My Oracle Support (MOS)
• Note 1053147.1 - 11gR2 Clusterware
and Grid Home - What You Need to Know
• Note 1050908.1 - How to Troubleshoot
Grid Infrastructure Startup Issues
21 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Clusterware Architecture
Node Membership Processes and Basics
SAN
Network
SAN
Network
Public LanPublic Lan
CSSDCSSDCSSD
Voting
Disk
Private Lan /
Interconnect
Oracle Clusterware
Main processes involved:
• CSSD (ora.cssd)
• CSSDMONITOR
• was: oprocd
• now: ora.cssdmonitor
22 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Clusterware Architecture
What does CSSD do?
• Monitors nodes using 2 communication channels:
– Private Interconnect  Network Heartbeat
– Voting Disk based communication  Disk Heartbeat
• Evicts (forcibly removes nodes from a
cluster) nodes dependent on heartbeat
feedback (failures)
CSSDCSSD
“Ping”
“Ping”
Oracle Clusterware
23 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Clusterware Architecture
Interconnect basics – network heartbeat
CSSDCSSD
“Ping”
• Each node in the cluster is “pinged” every second
• Nodes must respond in css_misscount time (defaults to 30 secs.)
– Reducing the css_misscount time is generally not supported
• Network heartbeat failures
will lead to node evictions
– CSSD-log:
[date / time] [CSSD][1111902528]
clssnmPollingThread: node mynodename
(5) at 75% heartbeat fatal, removal
in 6.770 seconds
24 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Clusterware Architecture
Voting Disk basics – disk heartbeat
CSSDCSSD
“Ping”
1 • Each node in the cluster “pings” (r/w) the Voting Disk(s) every second
• Nodes must receive a response in (long / short) diskTimeout time
– IF I/O errors indicate clear accessibility problems  timeout is irrelevant
• Disk heartbeat failures
will lead to node evictions
– CSSD-log: …
[CSSD] [1115699552] >TRACE:
clssnmReadDskHeartbeat:
node(2) is down. rcfg(1) wrtcnt(1)
LATS(63436584) Disk lastSeqNo(1)
25 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Clusterware Architecture
Voting Disk basics – Structure
2 • Voting Disks contain dynamic and static data:
– Dynamic data: disk heartbeat logging
– Static data: information about the nodes in the cluster
• With 11.2.0.1 Voting Disks got an “identity”:
– E.g. Voting Disk serial number: [GRID]> crsctl query css votedisk
1. 2 1212f9d6e85c4ff7bf80cc9e3f533cc1 (/dev/sdd5) [DATA]
• Voting Disks must therefore not be copied using “dd” or “cp” anymore
Node information Disk Heartbeat Logging
26 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Clusterware Architecture
Voting Disk basics – Simple Majority rule
CSSDCSSD
“Ping”
3 • Oracle supports redundant Voting Disks for disk failure protection
• “Simple Majority Rule” applies:
– Each node must “see” the simple majority of configured Voting Disks
at all times in order not to be evicted (to remain in the cluster)
 trunc(n/2+1) with n=number of voting disks
configured and n>=1
27 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
CSSDCSSD
• Same principles apply
• Voting Disks are just
geographically dispersed
• http://www.oracle.com/goto/rac
– Using standard NFS to support
a third voting file for extended
cluster configurations (PDF)
Oracle Clusterware Architecture
Simple Majority rule – in extended clusters
28 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
• Oracle ASM auto creates 1/3/5 Voting Files
– Voting Disks reside in one diskgroup only
– Based on Ext/Normal/High redundancy
and on Failure Groups in the Disk Group
– Per default there is one failure group per disk
– ASM will enforce the required number of disks
– New failure group type: Quorum Failgroup
[GRID]> crsctl query css votedisk
1. 2 1212f9d6e85c4ff7bf80cc9e3f533cc1 (/dev/sdd5) [DATA]
2. 2 aafab95f9ef84f03bf6e26adc2a3b0e8 (/dev/sde5) [DATA]
3. 2 28dd4128f4a74f73bf8653dabd88c737 (/dev/sdd6) [DATA]
Located 3 voting disk(s).
Oracle Clusterware Architecture
Voting Disks in Oracle ASM does not change their usage
29 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Clusterware Architecture
Oracle Cluster Registry (OCR) placement in Oracle ASM
• The OCR is managed like a datafile in ASM (new type)
• It adheres completely to the redundancy settings for the diskgroup (DG)
• There can be more than one OCR location in more than one DG (DG:OCR  1:1)
• Recommendation is 2 OCR locations, 1 in DATA, 1 in FRA for example
30 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle Clusterware Architecture
Backup of Clusteware files is fully automatic (11.2+)
• Clusterware Files (managed in ASM) enables fully Automatic Backups:
• The Voting Disks are backed up into the OCR
• Any configuration change in the cluster (e.g. node
addition) triggers a new backup of the Voting Files.
• A single, failed Voting Disks is restored by Clusterware automatically
within a Disk Group, if sufficient disks are used – no action required
• Note: Do not use DD to back up the Voting Disks anymore!
• The OCR is backed up automatically every 4 hours
• Manual Backups can be taken as required
• ONLY IF all Voting Disks are corrupted or failed
AND (all copies of) the OCR are also corrupted or unavailable
THEN manual interference would be required – the rest is automatic.
31 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
• Evicting (fencing) nodes is a preventive measure (it’s a good thing)!
• Nodes are evicted to prevent consequences of a split brain:
– Shared data must not be written by independently operating nodes
– The easiest way to prevent this is to forcibly remove a node from the cluster
Fencing Basics
Why are nodes evicted?
CSSDCSSD
1 2
32 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
How are nodes evicted? – STONITH
• Once it is determined that a node needs to be evicted,
– A “kill request” is sent to the respective node(s)
– Using all (remaining) communication channels
• A node (CSSD) is requested to “kill itself”  “STONITH like”
– “STONITH” foresees that a remote node kills the node to be evicted
CSSDCSSD
1 2
33 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
EXAMPLE: Network heartbeat failure
CSSDCSSD
1 2
2
• The network heartbeat between nodes has failed
– It is determined which nodes can still talk to each other
– A “kill request” is sent to the node(s) to be evicted
 Using all (remaining) communication channels  Voting Disk(s)
 A node is requested to “kill itself”; executer: typically CSSD
34 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
What happens, if CSSD is stuck?
CSSDCSSD
1 2
2
CSSD
CSSDmonitor
• A node is requested to “kill itself”
• BUT CSSD is “stuck” or “sick” (does not execute) – e.g.:
– CSSD failed for some reason
– CSSD is not scheduled within a certain margin
 OCSSDMONITOR (was: oprocd) will take over and execute
• See also: MOS note
1050693.1 -
Troubleshooting 11.2
Clusterware Node
Evictions (Reboots)
35 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
How can nodes be evicted?
• Oracle Clusterware 11.2.0.1 and later supports IPMI (optional)
– Intelligent Platform Management Interface (IPMI) drivers required
• IPMI allows remote-shutdown of nodes using additional hardware
– A Baseboard Management Controller (BMC) per cluster node is required
CSSDCSSD
1
36 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
EXAMPLE: IPMI based eviction on heartbeat failure
CSSD
1
• The network heartbeat between the nodes has failed
– It is determined which nodes can still talk to each other
– IPMI is used to remotely shutdown the node to be evicted
37 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
Which node gets evicted?
CSSDCSSD
1 2
• Voting Disks and heartbeat communication is used to determine the node
• In a 2 node cluster, the node with the lowest node number should survive
• In a n-node cluster, the biggest sub-cluster should survive (votes based)
38 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
Cluster members can escalate a kill request
Oracle RAC
DB Inst. 1
Oracle RAC
DB Inst. 2
Oracle Clusterware
• Cluster members (e.g Oracle RAC instances) can request
Oracle Clusterware to kill a specific member of the cluster
• Oracle Clusterware will then attempt to kill the requested member
Inst. 1:
kill inst. 2
39 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
Cluster members can escalate a kill request
Oracle RAC
DB Inst. 1
Oracle RAC
DB Inst. 2
Oracle Clusterware
• Oracle Clusterware will then attempt to kill the requested member
• If the requested member kill is unsuccessful, a node eviction escalation can be issued,
which leads to the eviction of the node, on which the particular member currently resides
Inst. 1:
kill inst. 2
40 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
Cluster members can escalate a kill request
Oracle RAC
DB Inst. 1
Oracle RAC
DB Inst. 2
Oracle Clusterware
• Oracle Clusterware will then attempt to kill the requested member
• If the requested member kill is unsuccessful, a node eviction escalation can be issued,
which leads to the eviction of the node, on which the particular member currently resides
Inst. 1:
kill inst. 2
41 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Fencing Basics
Cluster members can escalate a kill request
Oracle RAC
DB Inst. 1
Oracle Clusterware
• Oracle Clusterware will then attempt to kill the requested member
• If the requested member kill is unsuccessful, a node eviction escalation can be issued,
which leads to the eviction of the node, on which the particular member currently resides
42 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Re-Bootless Node Fencing
With 11.2.0.2 onwards, fencing may not mean re-boot
Oracle Clusterware
• Until Oracle Clusterware 11.2.0.2, fencing meant “re-boot”
• With Oracle Clusterware 11.2.0.2, re-boots will be seen less, because:
– Re-boots affect applications that might run an a node, but are not protected
– Customer requirement: prevent a reboot, just stop the cluster – implemented...
CSSDCSSD
App X App Y
RAC DB
Inst. 1
RAC DB
Inst. 2
43 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Re-Bootless Node Fencing
How it works…
• With Oracle Clusterware 11.2.0.2, re-boots will be seen less:
– Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
• It starts with a failure – e.g. network heartbeat or interconnect failure
Oracle Clusterware CSSDCSSD
App X App Y
RAC DB
Inst. 1
RAC DB
Inst. 2
44 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Re-Bootless Node Fencing
How it works…
Oracle Clusterware CSSDCSSD
App X App Y
RAC DB
Inst. 1
• With Oracle Clusterware 11.2.0.2, re-boots will be seen less:
– Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
• Then IO issuing processes are killed; it is made sure that no IO process remains
– For a RAC DB mainly the log writer and the database writer are of concern
45 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Re-Bootless Node Fencing
How it works…
Oracle Clusterware CSSDCSSD
App X App Y
RAC DB
Inst. 1
• With Oracle Clusterware 11.2.0.2, re-boots will be seen less:
– Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
• Once all IO issuing processes are killed, remaining processes are stopped
– IF the check for a successful kill of the IO processes, fails → reboot
46 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Re-Bootless Node Fencing
How it works…
Oracle ClusterwareCSSD
App X App Y
RAC DB
Inst. 1
• With Oracle Clusterware 11.2.0.2, re-boots will be seen less:
– Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
• Once all remaining processes are stopped, the stack stops itself with a “restart flag”
OHASD
47 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Re-Bootless Node Fencing
How it works…
Oracle ClusterwareCSSD
App X App Y
RAC DB
Inst. 1
• With Oracle Clusterware 11.2.0.2, re-boots will be seen less:
– Instead of fast re-booting the node, a graceful shutdown of the stack is attempted
• OHASD will finally attempt to restart the stack after the graceful shutdown
OHASD
48 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Re-Bootless Node Fencing
EXCEPTIONS
• With Oracle Clusterware 11.2.0.2, re-boots will be seen less, unless…:
– IF the check for a successful kill of the IO processes fails → reboot
– IF CSSD gets killed during the operation → reboot
– IF cssdmonitor (oprocd replacement) is not scheduled → reboot
– IF the stack cannot be shutdown in “short_disk_timeout”-seconds → reboot
Oracle Clusterware CSSDCSSD
App X App Y
RAC DB
Inst. 1
RAC DB
Inst. 2
49 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
The Interconnect
50 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
The Interconnect
Heartbeat and “memory channel” between instances
Interconnect
with switch
Public Lan
SAN switch
Client
Network
Node 1 Node 2 Node NNode N-1
51 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
The Interconnect
Redundant Interconnect Usage
Node 1 Node 2
HAIP1
HAIP2
HAIP3
HAIP4
• Redundant Interconnect Usage can be used as a bonding alternative
– It works for “private networks” only; the nodeVIPs use a different approach
– It enables HA and Load Balancing for up to 4 NICs per server (on Linux / Unix)
– It can be used by Oracle Databases 11.2.0.2 and Oracle Clusterware 11.2.0.2
– It uses so called HAIPs that are assigned to the private networks on the server
– The HAIPs will be used by the database and ASM instances and processes
1
52 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
The Interconnect
Redundant Interconnect Usage
Node 1 Node 2
HAIP1
HAIP2
HAIP3
HAIP4
• A multiple listening endpoint approach is used
– The HAIPs are taken from the “link-local” (Linux / Unix) IP range (169.254.0.0)
– To find the communication partners, multicasting on the interconnect is required
– With 11.2.0.3 Broadcast is a fallback alternative (BUG 10411721)
– Multicasting is still required on the public lan for MDNS for example.
– Details in My Oracle Support (MOS) Note with Doc ID 1212703.1:
11.2.0.2 Grid Infrastructure Install or Upgrade may fail due to Multicasting
2
53 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
The Interconnect
Redundant Interconnect Usage and the HAIPs
Node 1 Node 2
HAIP1
HAIP2
HAIP3
HAIP4
• If a network interface fails, the assigned HAIP is failed over to a remaining one.
• Redundant Interconnect Usage allows having networks in different subnet
• You can either have one subnet for all networks or a different one for each
• You can also use VLANs with the interconnect. For more information see:
• Note 1210883.1 - 11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip
• Note 220970.1 - RAC: Frequently Asked Questions - How to use VLANs in Oracle RAC? AND
Are there any issues for the interconnect when sharing the same switch as the public network by using VLAN to separate the network?
HAIP1 HAIP3
54 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

More Related Content

PDF
Understanding oracle rac internals part 1 - slides
Mohamed Farouk
 
PDF
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Sandesh Rao
 
PDF
Oracle RAC 19c and Later - Best Practices #OOWLON
Markus Michalewicz
 
PDF
Oracle data guard for beginners
Pini Dibask
 
PDF
Oracle RAC 19c: Best Practices and Secret Internals
Anil Nair
 
PDF
Oracle db performance tuning
Simon Huang
 
PDF
AIOUG : OTNYathra - Troubleshooting and Diagnosing Oracle Database 12.2 and O...
Sandesh Rao
 
PDF
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Zohar Elkayam
 
Understanding oracle rac internals part 1 - slides
Mohamed Farouk
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Sandesh Rao
 
Oracle RAC 19c and Later - Best Practices #OOWLON
Markus Michalewicz
 
Oracle data guard for beginners
Pini Dibask
 
Oracle RAC 19c: Best Practices and Secret Internals
Anil Nair
 
Oracle db performance tuning
Simon Huang
 
AIOUG : OTNYathra - Troubleshooting and Diagnosing Oracle Database 12.2 and O...
Sandesh Rao
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Zohar Elkayam
 

What's hot (20)

PDF
Oracle RAC Internals - The Cache Fusion Edition
Markus Michalewicz
 
PPTX
Oracle RAC features on Exadata
Anil Nair
 
PDF
Exadata master series_asm_2020
Anil Nair
 
PDF
Deep review of LMS process
Riyaj Shamsudeen
 
PDF
Oracle Database performance tuning using oratop
Sandesh Rao
 
PDF
Make Your Application “Oracle RAC Ready” & Test For It
Markus Michalewicz
 
PDF
Oracle Performance Tuning Fundamentals
Enkitec
 
PDF
Oracle Enterprise Manager Cloud Control 13c for DBAs
Gokhan Atil
 
PPTX
Oracle ASM Training
Vigilant Technologies
 
PDF
Oracle Extended Clusters for Oracle RAC
Markus Michalewicz
 
PDF
New Generation Oracle RAC Performance
Anil Nair
 
PDF
Oracle RAC - New Generation
Anil Nair
 
PDF
How to Use EXAchk Effectively to Manage Exadata Environments
Sandesh Rao
 
PDF
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
SrirakshaSrinivasan2
 
PDF
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Markus Michalewicz
 
PDF
A deep dive about VIP,HAIP, and SCAN
Riyaj Shamsudeen
 
PDF
Oracle Clusterware Node Management and Voting Disks
Markus Michalewicz
 
PDF
Understanding Oracle RAC 12c Internals OOW13 [CON8806]
Markus Michalewicz
 
PDF
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
Sandesh Rao
 
PDF
Standard Edition High Availability (SEHA) - The Why, What & How
Markus Michalewicz
 
Oracle RAC Internals - The Cache Fusion Edition
Markus Michalewicz
 
Oracle RAC features on Exadata
Anil Nair
 
Exadata master series_asm_2020
Anil Nair
 
Deep review of LMS process
Riyaj Shamsudeen
 
Oracle Database performance tuning using oratop
Sandesh Rao
 
Make Your Application “Oracle RAC Ready” & Test For It
Markus Michalewicz
 
Oracle Performance Tuning Fundamentals
Enkitec
 
Oracle Enterprise Manager Cloud Control 13c for DBAs
Gokhan Atil
 
Oracle ASM Training
Vigilant Technologies
 
Oracle Extended Clusters for Oracle RAC
Markus Michalewicz
 
New Generation Oracle RAC Performance
Anil Nair
 
Oracle RAC - New Generation
Anil Nair
 
How to Use EXAchk Effectively to Manage Exadata Environments
Sandesh Rao
 
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
SrirakshaSrinivasan2
 
Oracle RAC 12c Practical Performance Management and Tuning OOW13 [CON8825]
Markus Michalewicz
 
A deep dive about VIP,HAIP, and SCAN
Riyaj Shamsudeen
 
Oracle Clusterware Node Management and Voting Disks
Markus Michalewicz
 
Understanding Oracle RAC 12c Internals OOW13 [CON8806]
Markus Michalewicz
 
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
Sandesh Rao
 
Standard Edition High Availability (SEHA) - The Why, What & How
Markus Michalewicz
 
Ad

Viewers also liked (20)

PDF
Understanding Oracle RAC 11g Release 2 Internals
Markus Michalewicz
 
PDF
Oracle RAC on Extended Distance Clusters - Customer Examples
Markus Michalewicz
 
PDF
Oracle RAC 12c Overview
Markus Michalewicz
 
PDF
Oracle RAC 12c Release 2 - Overview
Markus Michalewicz
 
PDF
Paper: Oracle RAC and Oracle RAC One Node on Extended Distance (Stretched) Cl...
Markus Michalewicz
 
PDF
Oracle RAC on Extended Distance Clusters - Presentation
Markus Michalewicz
 
PDF
Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...
Markus Michalewicz
 
PPSX
Oracle 11g R2 RAC implementation and concept
Santosh Kangane
 
DOCX
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Yury Velikanov
 
PDF
11g r2 rac grid clusterware doug presentation 10 21-10
admdbarac
 
PPTX
Sql server scalability fundamentals
Chris Adkin
 
PPTX
Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
PDF
Oracle Database Advanced Querying
Zohar Elkayam
 
PDF
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Zohar Elkayam
 
PPTX
Oracle RAC - Standard Edition, Enterprise Edition & One Node
Leighton Nelson
 
PDF
Oracle Data Guard A to Z
Zohar Elkayam
 
PDF
ORACLE, SQL, PL/SQL Made very very Easy Happy Learning....
Racharla Rohit Varma
 
PDF
OOW2016: Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
PPTX
Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
PPTX
Understand oracle real application cluster
Satishbabu Gunukula
 
Understanding Oracle RAC 11g Release 2 Internals
Markus Michalewicz
 
Oracle RAC on Extended Distance Clusters - Customer Examples
Markus Michalewicz
 
Oracle RAC 12c Overview
Markus Michalewicz
 
Oracle RAC 12c Release 2 - Overview
Markus Michalewicz
 
Paper: Oracle RAC and Oracle RAC One Node on Extended Distance (Stretched) Cl...
Markus Michalewicz
 
Oracle RAC on Extended Distance Clusters - Presentation
Markus Michalewicz
 
Oracle RAC 12c (12.1.0.2) Operational Best Practices - A result of true colla...
Markus Michalewicz
 
Oracle 11g R2 RAC implementation and concept
Santosh Kangane
 
Oracle 12c RAC On your laptop Step by Step Implementation Guide 1.0
Yury Velikanov
 
11g r2 rac grid clusterware doug presentation 10 21-10
admdbarac
 
Sql server scalability fundamentals
Chris Adkin
 
Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
Oracle Database Advanced Querying
Zohar Elkayam
 
Things Every Oracle DBA Needs To Know About The Hadoop Ecosystem
Zohar Elkayam
 
Oracle RAC - Standard Edition, Enterprise Edition & One Node
Leighton Nelson
 
Oracle Data Guard A to Z
Zohar Elkayam
 
ORACLE, SQL, PL/SQL Made very very Easy Happy Learning....
Racharla Rohit Varma
 
OOW2016: Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
Exploring Advanced SQL Techniques Using Analytic Functions
Zohar Elkayam
 
Understand oracle real application cluster
Satishbabu Gunukula
 
Ad

Similar to Understanding oracle rac internals part 2 - slides (20)

PDF
Achieving Continuous Availability for Your Applications with Oracle MAA
Markus Michalewicz
 
PDF
Ebs performance tuning session feb 13 2013---Presented by Oracle
Akash Pramanik
 
PDF
SmartDB Office Hours: Connection Pool Sizing Concepts
Koppelaars
 
PDF
Apouc 2014-enterprise-manager-12c
OUGTH Oracle User Group in Thailand
 
PPTX
Oracle DBA Configuring network environment
pshankarnarayan
 
PPT
Configuración de la Red de DB Oracle 11g
188882
 
PPT
les_01.ppt of the Oracle course train_1 file
YulinLiu27
 
PPTX
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Edward Burns
 
PDF
Oracle Enterprise Manager 12c - OEM12c Presentation
Francisco Alvarez
 
PDF
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
Markus Michalewicz
 
PPTX
Why Upgrade to Oracle Database 12c?
DLT Solutions
 
PDF
🏗️Improve database performance with connection pooling and load balancing tec...
Alireza Kamrani
 
PDF
Whats new in Autonomous Database in 2022
Sandesh Rao
 
PPT
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Toronto-Oracle-Users-Group
 
PDF
6° Sessione Oracle - CRUI: Oracle Database Appliance: Il potere dell’ingegner...
Jürgen Ambrosi
 
PDF
Oracle Drivers configuration for High Availability, is it a developer's job?
Ludovico Caldara
 
PPTX
Oracle WebLogic Server 12c: Seamless Oracle Database Integration (with NEC, O...
jeckels
 
PDF
CON5898 What Servlet 4.0 Means To You
Edward Burns
 
PDF
New availability features in oracle rac 12c release 2 anair ss
Anil Nair
 
PDF
Oracle Drivers configuration for High Availability
Ludovico Caldara
 
Achieving Continuous Availability for Your Applications with Oracle MAA
Markus Michalewicz
 
Ebs performance tuning session feb 13 2013---Presented by Oracle
Akash Pramanik
 
SmartDB Office Hours: Connection Pool Sizing Concepts
Koppelaars
 
Apouc 2014-enterprise-manager-12c
OUGTH Oracle User Group in Thailand
 
Oracle DBA Configuring network environment
pshankarnarayan
 
Configuración de la Red de DB Oracle 11g
188882
 
les_01.ppt of the Oracle course train_1 file
YulinLiu27
 
Servlet 4.0 Adopt-a-JSR 10 Minute Infodeck
Edward Burns
 
Oracle Enterprise Manager 12c - OEM12c Presentation
Francisco Alvarez
 
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
Markus Michalewicz
 
Why Upgrade to Oracle Database 12c?
DLT Solutions
 
🏗️Improve database performance with connection pooling and load balancing tec...
Alireza Kamrani
 
Whats new in Autonomous Database in 2022
Sandesh Rao
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Toronto-Oracle-Users-Group
 
6° Sessione Oracle - CRUI: Oracle Database Appliance: Il potere dell’ingegner...
Jürgen Ambrosi
 
Oracle Drivers configuration for High Availability, is it a developer's job?
Ludovico Caldara
 
Oracle WebLogic Server 12c: Seamless Oracle Database Integration (with NEC, O...
jeckels
 
CON5898 What Servlet 4.0 Means To You
Edward Burns
 
New availability features in oracle rac 12c release 2 anair ss
Anil Nair
 
Oracle Drivers configuration for High Availability
Ludovico Caldara
 

Recently uploaded (20)

PDF
Revolutionize Operations with Intelligent IoT Monitoring and Control
Rejig Digital
 
PPTX
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
Coupa-Overview _Assumptions presentation
annapureddyn
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PPT
L2 Rules of Netiquette in Empowerment technology
Archibal2
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
This slide provides an overview Technology
mineshkharadi333
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
DevOps & Developer Experience Summer BBQ
AUGNYC
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
Revolutionize Operations with Intelligent IoT Monitoring and Control
Rejig Digital
 
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Coupa-Overview _Assumptions presentation
annapureddyn
 
Doc9.....................................
SofiaCollazos
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
L2 Rules of Netiquette in Empowerment technology
Archibal2
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
This slide provides an overview Technology
mineshkharadi333
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
DevOps & Developer Experience Summer BBQ
AUGNYC
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 

Understanding oracle rac internals part 2 - slides

  • 1. 1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
  • 2. 2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Understanding Oracle RAC Internals – Part 2 for the Oracle RAC SIG Markus Michalewicz ([email protected]) Senior Principal Product Manager Oracle RAC and Oracle RAC One Node
  • 3. 3 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
  • 4. 4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Agenda • Client Connectivity • Node Membership • The Interconnect
  • 5. 5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Direct or indirect connect Production Email BATCH • Connect Time Load Balancing (CTLB) • Connect Time Connection Failover (CTCF) • Runtime Connection Load Balancing (RTLB) • Runtime Connection Failover (RTCF) Connection Pool SCAN
  • 6. 6 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Connect Time Connection Failover jdbc:oracle:thin:@MySCAN:1521/Email PMRAC = (DESCRIPTION = (FAILOVER=ON) (ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email))) Production Email BATCH Connection Pool MySCAN
  • 7. 7 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Runtime Time Connection Failover Production Email BATCH Connection Pool MySCAN PMRAC = (DESCRIPTION = (FAILOVER=ON) (ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email) ...))
  • 8. 8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Runtime Time Connection Failover Production Email BATCH Connection Pool MySCAN PMRAC = (DESCRIPTION = (FAILOVER=ON) (ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email) (FAILOVER_MODE= (TYPE=select)(METHOD=basic)(RETRIES=180)(DELAY=5)))) ?
  • 9. 9 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity More information Production Email BATCH Connection Pool MySCAN ? ?? • If problems occur, see: • Note 975457.1 – How to Troubleshoot Connectivity Issues with 11gR2 SCAN Name • For more advanced configurations, see: • Note 1306927.1 – Using the TNS_ADMIN variable and changing the default port number of all Listeners in an 11.2 RAC for an 11.2, 11.1, and 10.2 Database
  • 10. 10 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Two ways to protect the client Production Email BATCH Connection Pool MySCAN 1. Transparent Application Failover (TAF) • Tries to make the client unaware of a failure • Provides means of CTCF and RTCF • Allows for pure ‘selects’ (reads) to continue • Write transactions need to be re-issued • The Application needs to be TAF aware 2. Fast Application Notification (FAN) • FAN wants to inform clients ASAP • Client can react to failure asap • Expects clients to re-connect on failure (FCF) • Sends messages about changes in the cluster ?
  • 11. 11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity and Service Definition Define settings on the server Production Email BATCH MySCAN [GRID]> srvctl config service -d ORCL -s MyService Service name: MyService ... DTP transaction: false AQ HA notifications: false Failover type: NONE Failover method: NONE TAF failover retries: 0 TAF failover delay: 0 Connection Load Balancing Goal: LONG Runtime Load Balancing Goal: NONE TAF policy specification: BASIC • HA (and LB) settings can be defined per service • Clients connecting to the service will adhere to the settings considering the client used.
  • 12. 12 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Use a FAN aware connection pool Production Email BATCH Connection Pool MySCAN • If a connection pool is used • The clients (users) get a physical connection to the connection pool • The connection pool creates a physical connection to the database • It is a direct client to the database • Internally the pool maintains logical connections 1
  • 13. 13 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Use a FAN aware connection pool Production Email BATCH Connection Pool MySCAN • The connection pool • Invalidates connections to one instance • Re-establishes new logical connections • May create new physical connections • Prevent new clients to be misrouted • The application needs to handle the transaction failure that might have occurred. 2
  • 14. 14 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity The Load Balancing (LB) cases • Connect Time Load Balancing (CTLB) • Runtime Connection Load Balancing (RTLB) • On the Client Side • On the Server Side Production Email BATCH Connection Pool MySCAN
  • 15. 15 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Connect Time Load Balancing (CTLB) – on the client side Production Email BATCH Connection Pool MySCAN PMRAC = (DESCRIPTION = (FAILOVER=ON)(LOAD_BALANCE=ON) (ADDRESS = (PROTOCOL = TCP)(HOST = MySCAN)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = Email)))
  • 16. 16 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Connect Time Load Balancing (CTLB) – on the server side Production Email BATCH Connection Pool MySCAN? • Traditionally, PMON dynamically registers the services to the specified listeners with: • Service names for each running instance of the database and instance names for the DB • The listener is updated with the load information for every instance and node as follows: • 1-Minute OS Node Load Average all 30 secs. • Number of Connections to Each Instance • Number of Connections to Each Dispatcher
  • 17. 17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Use FAN for the Load Balancing cases • Connect Time Load Balancing (CTLB) • Connect Time Connection Failover (CTCF) • Runtime Connection Load Balancing (RTLB) • Runtime Connection Failover (RTCF) RAC Database Instance1 Instance2 Instance3 I’m busy I’m very busy I’m idle 30% connections 10% connections 60% connections
  • 18. 18 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Client Connectivity Use FAN for the Load Balancing cases • Connect Time Load Balancing (CTLB) • Runtime Connection Load Balancing (RTLB) • Also via AQ (Advanced Queuing) based notifications • Background is always the Load Balancing Advisory • For more information, see: • Oracle® Real Application Clusters Administration and Deployment Guide 11g Release 2: 5 Introduction to Automatic Workload Management RAC Database Instance1 Instance2 Instance3 I’m busy I’m very busy I’m idle 30% connections 10% connections 60% connections MySCAN
  • 19. 19 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Node Membership
  • 20. 20 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. The Oracle RAC Architecture Oracle Grid Infrastructure 11g Release 2 process overview OS OS Oracle Grid Infrastructure Node Membership HA Framework ASM Instance OS• My Oracle Support (MOS) • Note 1053147.1 - 11gR2 Clusterware and Grid Home - What You Need to Know • Note 1050908.1 - How to Troubleshoot Grid Infrastructure Startup Issues
  • 21. 21 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Clusterware Architecture Node Membership Processes and Basics SAN Network SAN Network Public LanPublic Lan CSSDCSSDCSSD Voting Disk Private Lan / Interconnect Oracle Clusterware Main processes involved: • CSSD (ora.cssd) • CSSDMONITOR • was: oprocd • now: ora.cssdmonitor
  • 22. 22 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Clusterware Architecture What does CSSD do? • Monitors nodes using 2 communication channels: – Private Interconnect  Network Heartbeat – Voting Disk based communication  Disk Heartbeat • Evicts (forcibly removes nodes from a cluster) nodes dependent on heartbeat feedback (failures) CSSDCSSD “Ping” “Ping” Oracle Clusterware
  • 23. 23 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Clusterware Architecture Interconnect basics – network heartbeat CSSDCSSD “Ping” • Each node in the cluster is “pinged” every second • Nodes must respond in css_misscount time (defaults to 30 secs.) – Reducing the css_misscount time is generally not supported • Network heartbeat failures will lead to node evictions – CSSD-log: [date / time] [CSSD][1111902528] clssnmPollingThread: node mynodename (5) at 75% heartbeat fatal, removal in 6.770 seconds
  • 24. 24 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Clusterware Architecture Voting Disk basics – disk heartbeat CSSDCSSD “Ping” 1 • Each node in the cluster “pings” (r/w) the Voting Disk(s) every second • Nodes must receive a response in (long / short) diskTimeout time – IF I/O errors indicate clear accessibility problems  timeout is irrelevant • Disk heartbeat failures will lead to node evictions – CSSD-log: … [CSSD] [1115699552] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(1) LATS(63436584) Disk lastSeqNo(1)
  • 25. 25 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Clusterware Architecture Voting Disk basics – Structure 2 • Voting Disks contain dynamic and static data: – Dynamic data: disk heartbeat logging – Static data: information about the nodes in the cluster • With 11.2.0.1 Voting Disks got an “identity”: – E.g. Voting Disk serial number: [GRID]> crsctl query css votedisk 1. 2 1212f9d6e85c4ff7bf80cc9e3f533cc1 (/dev/sdd5) [DATA] • Voting Disks must therefore not be copied using “dd” or “cp” anymore Node information Disk Heartbeat Logging
  • 26. 26 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Clusterware Architecture Voting Disk basics – Simple Majority rule CSSDCSSD “Ping” 3 • Oracle supports redundant Voting Disks for disk failure protection • “Simple Majority Rule” applies: – Each node must “see” the simple majority of configured Voting Disks at all times in order not to be evicted (to remain in the cluster)  trunc(n/2+1) with n=number of voting disks configured and n>=1
  • 27. 27 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. CSSDCSSD • Same principles apply • Voting Disks are just geographically dispersed • http://www.oracle.com/goto/rac – Using standard NFS to support a third voting file for extended cluster configurations (PDF) Oracle Clusterware Architecture Simple Majority rule – in extended clusters
  • 28. 28 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. • Oracle ASM auto creates 1/3/5 Voting Files – Voting Disks reside in one diskgroup only – Based on Ext/Normal/High redundancy and on Failure Groups in the Disk Group – Per default there is one failure group per disk – ASM will enforce the required number of disks – New failure group type: Quorum Failgroup [GRID]> crsctl query css votedisk 1. 2 1212f9d6e85c4ff7bf80cc9e3f533cc1 (/dev/sdd5) [DATA] 2. 2 aafab95f9ef84f03bf6e26adc2a3b0e8 (/dev/sde5) [DATA] 3. 2 28dd4128f4a74f73bf8653dabd88c737 (/dev/sdd6) [DATA] Located 3 voting disk(s). Oracle Clusterware Architecture Voting Disks in Oracle ASM does not change their usage
  • 29. 29 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Clusterware Architecture Oracle Cluster Registry (OCR) placement in Oracle ASM • The OCR is managed like a datafile in ASM (new type) • It adheres completely to the redundancy settings for the diskgroup (DG) • There can be more than one OCR location in more than one DG (DG:OCR  1:1) • Recommendation is 2 OCR locations, 1 in DATA, 1 in FRA for example
  • 30. 30 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Clusterware Architecture Backup of Clusteware files is fully automatic (11.2+) • Clusterware Files (managed in ASM) enables fully Automatic Backups: • The Voting Disks are backed up into the OCR • Any configuration change in the cluster (e.g. node addition) triggers a new backup of the Voting Files. • A single, failed Voting Disks is restored by Clusterware automatically within a Disk Group, if sufficient disks are used – no action required • Note: Do not use DD to back up the Voting Disks anymore! • The OCR is backed up automatically every 4 hours • Manual Backups can be taken as required • ONLY IF all Voting Disks are corrupted or failed AND (all copies of) the OCR are also corrupted or unavailable THEN manual interference would be required – the rest is automatic.
  • 31. 31 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. • Evicting (fencing) nodes is a preventive measure (it’s a good thing)! • Nodes are evicted to prevent consequences of a split brain: – Shared data must not be written by independently operating nodes – The easiest way to prevent this is to forcibly remove a node from the cluster Fencing Basics Why are nodes evicted? CSSDCSSD 1 2
  • 32. 32 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Fencing Basics How are nodes evicted? – STONITH • Once it is determined that a node needs to be evicted, – A “kill request” is sent to the respective node(s) – Using all (remaining) communication channels • A node (CSSD) is requested to “kill itself”  “STONITH like” – “STONITH” foresees that a remote node kills the node to be evicted CSSDCSSD 1 2
  • 33. 33 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Fencing Basics EXAMPLE: Network heartbeat failure CSSDCSSD 1 2 2 • The network heartbeat between nodes has failed – It is determined which nodes can still talk to each other – A “kill request” is sent to the node(s) to be evicted  Using all (remaining) communication channels  Voting Disk(s)  A node is requested to “kill itself”; executer: typically CSSD
  • 34. 34 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Fencing Basics What happens, if CSSD is stuck? CSSDCSSD 1 2 2 CSSD CSSDmonitor • A node is requested to “kill itself” • BUT CSSD is “stuck” or “sick” (does not execute) – e.g.: – CSSD failed for some reason – CSSD is not scheduled within a certain margin  OCSSDMONITOR (was: oprocd) will take over and execute • See also: MOS note 1050693.1 - Troubleshooting 11.2 Clusterware Node Evictions (Reboots)
  • 35. 35 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Fencing Basics How can nodes be evicted? • Oracle Clusterware 11.2.0.1 and later supports IPMI (optional) – Intelligent Platform Management Interface (IPMI) drivers required • IPMI allows remote-shutdown of nodes using additional hardware – A Baseboard Management Controller (BMC) per cluster node is required CSSDCSSD 1
  • 36. 36 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Fencing Basics EXAMPLE: IPMI based eviction on heartbeat failure CSSD 1 • The network heartbeat between the nodes has failed – It is determined which nodes can still talk to each other – IPMI is used to remotely shutdown the node to be evicted
  • 37. 37 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Fencing Basics Which node gets evicted? CSSDCSSD 1 2 • Voting Disks and heartbeat communication is used to determine the node • In a 2 node cluster, the node with the lowest node number should survive • In a n-node cluster, the biggest sub-cluster should survive (votes based)
  • 38. 38 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Fencing Basics Cluster members can escalate a kill request Oracle RAC DB Inst. 1 Oracle RAC DB Inst. 2 Oracle Clusterware • Cluster members (e.g Oracle RAC instances) can request Oracle Clusterware to kill a specific member of the cluster • Oracle Clusterware will then attempt to kill the requested member Inst. 1: kill inst. 2
  • 39. 39 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Fencing Basics Cluster members can escalate a kill request Oracle RAC DB Inst. 1 Oracle RAC DB Inst. 2 Oracle Clusterware • Oracle Clusterware will then attempt to kill the requested member • If the requested member kill is unsuccessful, a node eviction escalation can be issued, which leads to the eviction of the node, on which the particular member currently resides Inst. 1: kill inst. 2
  • 40. 40 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Fencing Basics Cluster members can escalate a kill request Oracle RAC DB Inst. 1 Oracle RAC DB Inst. 2 Oracle Clusterware • Oracle Clusterware will then attempt to kill the requested member • If the requested member kill is unsuccessful, a node eviction escalation can be issued, which leads to the eviction of the node, on which the particular member currently resides Inst. 1: kill inst. 2
  • 41. 41 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Fencing Basics Cluster members can escalate a kill request Oracle RAC DB Inst. 1 Oracle Clusterware • Oracle Clusterware will then attempt to kill the requested member • If the requested member kill is unsuccessful, a node eviction escalation can be issued, which leads to the eviction of the node, on which the particular member currently resides
  • 42. 42 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Re-Bootless Node Fencing With 11.2.0.2 onwards, fencing may not mean re-boot Oracle Clusterware • Until Oracle Clusterware 11.2.0.2, fencing meant “re-boot” • With Oracle Clusterware 11.2.0.2, re-boots will be seen less, because: – Re-boots affect applications that might run an a node, but are not protected – Customer requirement: prevent a reboot, just stop the cluster – implemented... CSSDCSSD App X App Y RAC DB Inst. 1 RAC DB Inst. 2
  • 43. 43 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Re-Bootless Node Fencing How it works… • With Oracle Clusterware 11.2.0.2, re-boots will be seen less: – Instead of fast re-booting the node, a graceful shutdown of the stack is attempted • It starts with a failure – e.g. network heartbeat or interconnect failure Oracle Clusterware CSSDCSSD App X App Y RAC DB Inst. 1 RAC DB Inst. 2
  • 44. 44 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Re-Bootless Node Fencing How it works… Oracle Clusterware CSSDCSSD App X App Y RAC DB Inst. 1 • With Oracle Clusterware 11.2.0.2, re-boots will be seen less: – Instead of fast re-booting the node, a graceful shutdown of the stack is attempted • Then IO issuing processes are killed; it is made sure that no IO process remains – For a RAC DB mainly the log writer and the database writer are of concern
  • 45. 45 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Re-Bootless Node Fencing How it works… Oracle Clusterware CSSDCSSD App X App Y RAC DB Inst. 1 • With Oracle Clusterware 11.2.0.2, re-boots will be seen less: – Instead of fast re-booting the node, a graceful shutdown of the stack is attempted • Once all IO issuing processes are killed, remaining processes are stopped – IF the check for a successful kill of the IO processes, fails → reboot
  • 46. 46 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Re-Bootless Node Fencing How it works… Oracle ClusterwareCSSD App X App Y RAC DB Inst. 1 • With Oracle Clusterware 11.2.0.2, re-boots will be seen less: – Instead of fast re-booting the node, a graceful shutdown of the stack is attempted • Once all remaining processes are stopped, the stack stops itself with a “restart flag” OHASD
  • 47. 47 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Re-Bootless Node Fencing How it works… Oracle ClusterwareCSSD App X App Y RAC DB Inst. 1 • With Oracle Clusterware 11.2.0.2, re-boots will be seen less: – Instead of fast re-booting the node, a graceful shutdown of the stack is attempted • OHASD will finally attempt to restart the stack after the graceful shutdown OHASD
  • 48. 48 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Re-Bootless Node Fencing EXCEPTIONS • With Oracle Clusterware 11.2.0.2, re-boots will be seen less, unless…: – IF the check for a successful kill of the IO processes fails → reboot – IF CSSD gets killed during the operation → reboot – IF cssdmonitor (oprocd replacement) is not scheduled → reboot – IF the stack cannot be shutdown in “short_disk_timeout”-seconds → reboot Oracle Clusterware CSSDCSSD App X App Y RAC DB Inst. 1 RAC DB Inst. 2
  • 49. 49 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. The Interconnect
  • 50. 50 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. The Interconnect Heartbeat and “memory channel” between instances Interconnect with switch Public Lan SAN switch Client Network Node 1 Node 2 Node NNode N-1
  • 51. 51 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. The Interconnect Redundant Interconnect Usage Node 1 Node 2 HAIP1 HAIP2 HAIP3 HAIP4 • Redundant Interconnect Usage can be used as a bonding alternative – It works for “private networks” only; the nodeVIPs use a different approach – It enables HA and Load Balancing for up to 4 NICs per server (on Linux / Unix) – It can be used by Oracle Databases 11.2.0.2 and Oracle Clusterware 11.2.0.2 – It uses so called HAIPs that are assigned to the private networks on the server – The HAIPs will be used by the database and ASM instances and processes 1
  • 52. 52 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. The Interconnect Redundant Interconnect Usage Node 1 Node 2 HAIP1 HAIP2 HAIP3 HAIP4 • A multiple listening endpoint approach is used – The HAIPs are taken from the “link-local” (Linux / Unix) IP range (169.254.0.0) – To find the communication partners, multicasting on the interconnect is required – With 11.2.0.3 Broadcast is a fallback alternative (BUG 10411721) – Multicasting is still required on the public lan for MDNS for example. – Details in My Oracle Support (MOS) Note with Doc ID 1212703.1: 11.2.0.2 Grid Infrastructure Install or Upgrade may fail due to Multicasting 2
  • 53. 53 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. The Interconnect Redundant Interconnect Usage and the HAIPs Node 1 Node 2 HAIP1 HAIP2 HAIP3 HAIP4 • If a network interface fails, the assigned HAIP is failed over to a remaining one. • Redundant Interconnect Usage allows having networks in different subnet • You can either have one subnet for all networks or a different one for each • You can also use VLANs with the interconnect. For more information see: • Note 1210883.1 - 11gR2 Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip • Note 220970.1 - RAC: Frequently Asked Questions - How to use VLANs in Oracle RAC? AND Are there any issues for the interconnect when sharing the same switch as the public network by using VLAN to separate the network? HAIP1 HAIP3
  • 54. 54 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.