Oracle 11g Release 2 Clusterware Stack
Oracle 11g Release 2 Clusterware Stack
javidhasanov.wordpress.com
Oracle 11g Release 2 Clusterware Stack
Introduction
Before we start let’s fresh our memory and try to remember what is Oracle Clusterware.
Oracle Clusterware is the software which enables the nodes to communicate with each other, allowing them to
form the cluster of nodes which behaves as a single logical server.
Oracle High Availability Services Stack - a lower stack anchored by the Oracle High
Availability Services daemon (ohasd).
Cluster Ready Services Stack - an upper stack anchored by the Cluster Ready Services (CRS)
daemon (crsd)
These two stacks have several processes that facilitate cluster operations.
Starting in Oracle 11g Real Application Clusters (RAC), a new process called Oracle High
Availability Services (OHAS) was implemented.
On the OS layer Oracle High Availability Services is implemented via a new daemon process which
is called OHAS. As the 11g documentation notes that this Oracle High Availability Services Daemon
(ohasd) anchors the lower part of the Oracle Clusterware stack, which consists of processes that
facilitate cluster operations in RAC databases. This includes the GPNPD, GIPC, MDNS and GNS
background processes on Linux and UNIX operating systems, or services on Windows. We will
discuss each of them in more detail later in this post.
In a cluster, the OHAS daemon runs as the root user. In an Oracle Restart environment it runs as
the oracle user.
The high availability stack is based on the Oracle High Availability Services daemon (ohasd.bin).
This daemon is responsible for starting all other Oracle Clusterware processes. It is also responsible
for managing and maintaining the OLR.
There are no direct commands for Oracle High Availability Services - continue to use the
traditional crsctl commands for cluster management.
The background processes that are indirectly controlled by the OHAS are CRS daemon, CSS
daemon, and EVM daemon. We will discuss each of them in more detail later in this post. The
advantage of having the OHAS daemon do this is that the administrator can now issue cluster-wide
commands. The OHAS daemon will start even if Grid Infrastructure is disabled explicitly.
According to Oracle 11g Documentation, processes that comprise the Oracle High Availability
Services stack are:
Grid Plug and Play (GPNPD): GPNPD provides access to the Grid Plug and Play profile,
and coordinates updates to the profile among the nodes of the cluster to ensure that all of
the nodes node have the most recent profile.
Grid Interprocess Communication (GIPC): A helper daemon for the communications
infrastructure. Currently has no functionality; to be activated in a later release.
Multicast Domain Name Service (mDNS): Allows DNS requests. The mDNS process is a
background process on Linux and UNIX, and a service on Windows.
Oracle Grid Naming Service (GNS): A gateway between the cluster mDNS and external
DNS servers. The gnsd process performs name resolution within the cluster.
To enable OHAS for each RAC node, you issue the crsctl enable crs command, and this will
cause OHAS to autostart when each node re-boots. To verify that OHAS is running, check for the
CRS-4123 message in your alert log:
ologgred - This service runs on only two nodes in a cluster - one cluster logger service (ologgerd)
on only one node in a cluster and another node is chosen by the cluster logger service to house the
standby for the master cluster logger service. If the master cluster logger service fails (because the
service is not able come up after a fixed number of retries or the node where the master was running
is down), the node where the standby resides takes over as master and selects a new node for standby.
The master manages the operating system metric database in the CHM (Cluster Health Monitor)
repository and interacts with the standby to manage a replica of the master operating system metrics
database.
osysmond.bin - There is one system monitor service on every node. The system monitor service
(osysmond) is the monitoring and operating system metric collection service that sends the data to
the cluster logger service. The cluster logger service receives the information from all the nodes and
persists in a CHM repository-based database.
mdnsd.bin - Used by Grid Plug and Play to locate profiles in the cluster, as well as by GNS to
perform name resolution.
gnsd.bin - Handles requests sent by external DNS servers, performing name resolution for names
defined by the cluster.
1. Oracle Cluster Registry (OCR) - which records and maintains the cluster and node
membership information;
2. Voting disk - which polls for consistent heartbeat information from all the nodes when the
cluster is running, and acts as a tie-breaker during communication failures.
The CRS service itself has three main components, each handling a variety of functions:
1. Cluster Ready Services daemon (CRSd) - The primary program for managing high availability
operations in a cluster.
2. Cluster Synchronization Service Daemon (CSSd)
3. Event Volume Manager Daemon (EVMd)
Failure or death of the CRS daemon can cause node failure, which triggers automatic reboots of
the nodes to avoid the corruption of data (due to the possible failure of communication between the
nodes), also known as fencing.
The CRS daemon runs as "root" (super user) on UNIX platforms and runs as a service on Windows
platforms.
The following functions are provided by the Oracle Cluster Ready Services daemon (CRSd):
The crsd process is responsible for start, stop, monitor and failover of resource. It maintains OCR and also
restarts the resources when the failure occurs. This is applicable for RAC systems. For Oracle Restart and ASM
ohasd is used.
The CRS daemon (crsd) manages cluster resources based on the configuration information that is
stored in OCR for each resource. This includes start, stop, monitor, and failover operations. The crsd
process generates events when the status of a resource changes. When you have Oracle RAC
installed, the crsd process monitors the Oracle database instance, listener, and so on, and
automatically restarts these components when a failure occurs.
Cluster Synchronization Services daemon (CSSd) provides basic ‘group services’ support. Group
Services is a distributed group membership system that allows the applications to coordinate activities
to achieve a common result. As such, it provides synchronization services between nodes, access to
the node membership information, as well as enabling basic cluster services, including cluster group
services and cluster locking. CSSd also manages the cluster configuration by controlling which nodes
are members of the cluster and by notifying members when a node joins or leaves the cluster. If you
are using certified third-party clusterware, then CSS processes interface with your clusterware to
manage node membership information.
Failure of CSSd causes the machine to reboot to avoid a split-brain situation. This is also required in
a single instance configuration if Automatic Storage Management (ASM) is used. OCSSd runs as the
"oracle" user.
The following functions are provided by the Oracle Cluster Synchronization Services daemon
(OCSSd):
‘Lock Services’ provides the basic cluster-wide serialization locking functions, and uses a
FIFO mechanism to manage locking
'Node Services' uses OCR to store state data, and updates the information during
reconfiguration. It also manages the OCR data, which is static otherwise.
cssdmonitor - Monitors node and CSSd process hangs and monitors vendor clusterware. This is the
multithreaded process that runs with elevated priority.
Startup sequence: INIT => init.ohasd => ohasd => ohasd.bin => cssdmonitor
cssdagent - Spawned by OHASD process the cssdagent process monitors the cluster and provides
I/O fencing. This service formerly was provided by Oracle Process Monitor Daemon (oprocd –
10g), also known as OraFenceService on Windows. A cssdagent failure may result in Oracle
Clusterware restarting the node. Stops, start checks the status of occsd.bin daemon
Startup sequence: INIT => init.ohasd => ohasd => ohasd.bin => cssdagent
ocssd.bin - Manages cluster node membership runs as grid user. Failure of this process results in
node restart.
Startup sequence: INIT => init.ohasd => ohasd => ohasd.bin => cssdagent => ocssd =>
ocssd.bin
Oracle ASM:
The third component in OCS is the Event Management daemon (EVMd). EVMd spawns a
permanent child process called "evmlogger" and generates events. The EVMd child process
‘evmlogger’ spawns new children processes on demand and scans the callout directory to invoke
callouts. It will restart automatically on failures and death of the EVMd process does not halt the
instance. EVMd runs as the "oracle" user.
evmd.bin - Distributes and communicates some cluster events to all of the cluster members so that
they are aware of the cluster changes.
evmlogger.bin - Started by evmd.bin reads the configuration files and determines what events to
subscribe to from EVMD and it runs user defined actions for those events.
A specialized orarootagent.bin process that helps crsd manages resources owned by root, such as
the network, and the Grid Virtual IP address.
The above 2 process are actually threads which looks like processes. This is a Linux specific
A publish and subscribe service for communicating Fast Application Notification (FAN) events.
[oracle@node1 ~]$ ps -ef | grep ons | grep -v grep
oracle 4656 1 0 15:50 ? 00:00:00 /u01/app/11.2.0/grid/opmn/bin/ons -d
oracle 4657 4656 0 15:50 ? 00:00:00 /u01/app/11.2.0/grid/opmn/bin/ons -d
Extends clusterware to support Oracle-specific requirements and complex resources. This process
runs server callout scripts when FAN events occur. This process was known as RACG in Oracle
Clusterware 11g release 1 (11.1).
The Cluster Synchronization Service (CSS), Event Management (EVM), and Oracle Notification
Services (ONS) components communicate with other cluster component layers on other nodes in
the same cluster database environment. These components are also the main communication links
between Oracle Database, applications, and the Oracle Clusterware high availability components. In
addition, these background processes monitor and manage database operations.
The master diskmon process (diskmon.bin) can be seen running in all Grid Infrastructure installs,
but it's only in Exadata that it's actually doing any work. On every compute node there will be one
master diskmon process and one DSKM, slave diskmon process, per every Oracle instance (including
ASM).
In Exadata, the diskmon is responsible for
Handling of storage cell failures and I/O fencing
Monitoring of Exadata Server state on all storage cells in the cluster (heartbeat)
Broadcasting intra database IORM (I/O Resource Manager) plans from databases to storage
cells
Monitoring or the control messages from database and ASM instances to storage cells
Communicating with other diskmons in the cluster
References:
https://en.wikipedia.org/wiki/Oracle_Clusterware
Clusterware Administration and Deployment Guide
https://blogs.oracle.com/myoraclediary/entry/clusterware_processes_in_11g_rac
http://logicalread.solarwinds.com/oracle-crs-in-11g-mc04/#.Vz6nwOSUSec
https://docs.oracle.com/cd/E11882_01/rac.112/e41959/intro.htm#CWADD1111
http://www.dba-oracle.com/t_oracle_high_availability_services_ohas.htm