0% found this document useful (0 votes)
923 views41 pages

Oracle Cluster Ready Services 11g - Tips and Comments

Oracle Cluster Ready Services 11g - tips and comments by it knowledge Ltd. RAC as a part of Oracle Server SE included in the price. Single support resource no more ping-pong between hardware (and clusterware) vendor and Oracle.

Uploaded by

Din4ever
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
923 views41 pages

Oracle Cluster Ready Services 11g - Tips and Comments

Oracle Cluster Ready Services 11g - tips and comments by it knowledge Ltd. RAC as a part of Oracle Server SE included in the price. Single support resource no more ping-pong between hardware (and clusterware) vendor and Oracle.

Uploaded by

Din4ever
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41

Oracle Cluster Ready Services 11g –

tips and comments

Boris Gyurov
IT Knowledge Ltd.

1
Why Oracle Cluster Ready Services? -
Support for Linux


Appeared initially to support Oracle Parallel Server 8.1.7 on
Linux


Looked as an exotic configuration at that time


Benefits: - one can install Parallel Server on Linux

2
Why Oracle Cluster Ready Services? -
Ment To Support RAC

The lower prices and higher speed of the communication
equipment gave Oracle's “share everything” architecture huge
advantage – it started to scale well in 9i.


The customers were still afraid of the complicated setup
(vendor specific clusterware, raw devices to share the storage)
and the high price (option of Oracle EE)


Oracle's answer

Oracle Cluster Ready Services

OCFS and ASM to share the storage

RAC as a part of Oracle Server SE included in the price


The results

Tenths of installations all around Bulgaria

Thousands of installations all around the world 3

RAC becomes commodity
Why Oracle Cluster Ready Services? -
Generic Code


Generic code means generic bugs


Generic bugs are easier and faster to find – no matter on
which platform you run, you can hit it – that means more
testers


Generic bugs are easier to fix – one fix for all platforms


Generic code is cheaper to support – only one team vs.
many platform specific teams

4
Why Oracle Cluster Ready Services? -
Lower Price


No need to buy clusterware

5
Why Oracle Cluster Ready Services? -
Single Support Resource


No more ping-pong between the hardware (and clusterware)
vendor and Oracle


No more different experts to configure different parts


It all comes by Oracle

6
What is Oracle Clusterware?

Enables one system to be composed by many machines


Enables one Service to be provided by many nodes


Enables processes to be failed over to surviving node in case
of failures


Enables network interfaces to be failed over to surviving
node


Monitors all the resources and relocates them as needed


Notifies the cluster members, client applications and all the
subscribers for resource status changes


Creates a base for cluster-enabled applications (such as
R
RAC) 7
Oracle Clusterware Hardware Concepts


One or more (generally 2 or more) servers


Inter-node communication media (most often high speed
n
network)


Public network interface


Shared storage resources

8
Oracle Clusterware Software Concepts -
The Oracle Cluster Registry (OCR)
T

Contains the cluster configuration (the section SYSTEM)
C


Contains the Oracle Database and Services resource
definitions (The section DATABASE)
d


Contains the Third Party resources definition (The CRS
S
Section)


Ocrdump utility – dumps the OCR in text or XML format and
lets us to browse its structure and contents

9
Oracle Clusterware Software Concepts -
The Voting Disk

The need of Voting Disk:
In case of node interconnect failure, nodes cannot find
out if the node is down or the IC is down. Hence each can
decide that the other is down and try to recover the cluster.
The cluster would split to sub-clusters – “brain split”


The Voting disk – a file, shared between the nodes,at the
shared storage where each node writes “heart beat”


Ensures a second communication path between the nodes,
to determine which one should go down and which will stay
and recover


Should be mirrored (at Oracle or OS level) to prevent
corruption. With Voting disk unaccessible the cluster goes 10
down
Oracle Clusterware Processes on Linux
and UNIX Systems

crsd—Performs high availability recovery and management
operations such as maintaining the OCR and managing
application resources. This process runs as LocalSystem. This
process restarts automatically upon failure.


evmd—Event manager daemon. This process also starts
the racgevt process to manage FAN server callouts.


ocssd—Manages cluster node membership and runs as the
oracle user. Uses IC and the Voting disk; failure of this process
results in a node restart.

11
Oracle Clusterware Processes on Linux
and UNIX Systems


oprocd—Process monitor for the cluster. Note that this
process only appears on platforms that do not use third-party
vendor clusterware with Oracle Clusterware.

12
Oracle Clusterware Processes on Linux-
Processes startup
From the Linux man pages
DESCRIPTION
The inittab file describes which processes are started at
bootup and
during normal operation
......
An entry in the inittab file has the following format:

id:runlevels:action:process
.......
Valid actions for the action field are:
respawn
The process will be restarted whenever it terminates
(e.g.
getty). 13
.....
Oracle Clusterware Processes on Linux-
Processes startup
[oracle@class01 bin]$ cat /etc/inittab
......
# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon
h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null

14
Oracle Clusterware Processes startup
on Windows
Oracle Process Manager Daemon (OPMD)—OPMD is registered
with the Windows Service Control Manager (WSCM) and the
startup of all OracleClusterware services are dependent on OPMD.
On system startup, and after the default time period of 60 seconds
has elapsed, OPMD automatically starts all of the registered Oracle
Clusterware services. This startup delay enables other services to
start that are outside of the scope of Oracle control, such as
storage access, anti-virus, or firewall services.
You can set OPMD to start manually.However, this will delay the
startup of the rest of the affected Oracle Clusterware

15
The RACG Infrastructure


Takes care of the Oracle Specific Resources


One racgimon process is spawned for each database or
ASM instance to monitor its health

[oracle@class01 ~]$ ps -ef|grep racg

oracle 5822 1 0 11:31 ? 00:00:04


/u01/app/oracle/product/11.1/db_1/bin/racgimon startd racdb

16
The RACG Infrastructure


CRSD also spawns other child processes to perform
different actions (kill, start/stop resources, change
configurations etc.)

Racgeut to kill timeoutet actions
Usage racgeut [-e ...=...] <timeout> <prog_exe>
<param_list>


Racgmain to start/stop/check/manage resources
Usage racgmain [resource name] start|stop|check
racgmain startorp|failsrvsa dbname instname [srvname]
racgmain startorp|failsrvsa nodename
racgmain cond_resname cond_state func [args...]


Racgvip (run as root) to check and relocate the VIP
17
The Virtual IP (VIP) Concept


The VIP is an IP address, controlled by the CRS


Should be from the public subnet


Should be resolvable trough DNS or /etc/hosts


Used by the RAC database to avoid TCP/IP timeouts when
recognizing node or interface down events


Used by the third party applications, to still be reached at the
same IP, although moved to the surviving node in case of
failover


Should be used instead of the static public IP
18
Using CRS with Third Party APPS
Overview

An application profile should be added to the OCR. The main
attributes are:

Action Program – an executable to start/stop/check the
application

Privileges – which user can start/stop the application

Resource – a resource name for your application

19
Using CRS with Third Party APPS
Creating the profile

[oracle@class01 ~]$ crs_profile -create apache_crs -t application


-dir ./ -a /root/apache_crs.sh -r ora.class01.vip
[oracle@class01 ~]$ ll
total 164
-rw-r--r-- 1 oracle oinstall 760 Aug 19 17:24 apache_crs.cap
drwxr-xr-x 2 oracle oinstall 4096 Aug 13 18:58 Desktop
-rw-r--r-- 1 oracle oinstall 43387 Aug 14 12:47 ocr_bef.dmp
-rw-r--r-- 1 oracle oinstall 56929 Aug 15 16:19 OCRDUMPFILE
[oracle@class01 ~]$

20
Using CRS with Third Party APPS
Registering the Profile

[oracle@class01 ~]$ crs_register apache_crs -dir ./

[oracle@class01 ~]$ crs_stat |grep -A 5 apache


NAME=apache_crs
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE

NAME=ora.class01.LISTENER_CLASS01.lsnr

21
Using CRS with Third Party APPS
Setting the Permitions


Setting the owner
[root@class01 oracle]#
/u01/app/oracle/product/11.1/crs11/bin/crs_setperm apache_crs -o
root


Setting the rights
[root@class01 oracle]#
/u01/app/oracle/product/11.1/crs11/bin/crs_setperm apache_crs -u
user:oracle:r-x

22
Using CRS with Third Party APPS
Starting and Stopping the resource

Checking the state
[oracle@class01 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
apache_crs application OFFLINE OFFLINE


Starting the resource
[oracle@class01 ~]$ crs_start apache_crs
Attempting to start `apache_crs` on member `class01`
Start of `apache_crs` on member `class01` succeeded.

[oracle@class01 ~]$ crs_stat -t


Name Type Target State Host
------------------------------------------------------------
apache_crs application ONLINE ONLINE class01
23
Using CRS with Third Party APPS
Starting and Stopping the resource

Stopping the resource

[oracle@class01 ~]$ crs_stop apache_crs


Attempting to stop `apache_crs` on member `class01`
Stop of `apache_crs` on member `class01` succeeded.
[oracle@class01 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
apache_crs application OFFLINE OFFLINE

24
Using CRS with Third Party APPS
Failover

Step 1: Apache and the VIP running on node 1
[oracle@class02 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
apache_crs application ONLINE ONLINE class01
ora....01.lsnr application ONLINE ONLINE class01
ora....s01.gsd application ONLINE ONLINE class01
ora....s01.ons application ONLINE ONLINE class01
ora....s01.vip application ONLINE ONLINE class01

Here we pull the power supply cable from the node 1

25
Using CRS with Third Party APPS
Failover

Step 2: Apache and the VIP goes offline
[oracle@class02 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
apache_crs application ONLINE OFFLINE
ora....01.lsnr application ONLINE OFFLINE
ora....s01.gsd application ONLINE OFFLINE
ora....s01.ons application ONLINE OFFLINE
ora....s01.vip application ONLINE OFFLINE

26
Using CRS with Third Party APPS
Failover

Step 3: Apache and the VIP goes on-line at node 2
[oracle@class02 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
apache_crs application ONLINE ONLINE class02
ora....01.lsnr application ONLINE ONLINE class01
ora....s01.gsd application ONLINE ONLINE class01
ora....s01.ons application ONLINE ONLINE class01
ora....s01.vip application ONLINE ONLINE class02

NOTE: Customer should not change the IP it requests via the


browser. Apache is still accessible at the VIP IP

27
Using CRS with Third Party APPS
VIP Note

Oracle does not recommend using same VIP for more applications.
In our case we use the database VIP to operate with the APACHE as
well.


To complain with that we should create new VIP, dedicated for the
APACHE server and use it instead of the database VIP. It would
operate exactly the same as the database VIP but would be different

28
Using CRS with Third Party APPS
Failover

Step 4: Node 1 comes back. VIP goes back to node 1. Apache is
still present at node 2. Apache is not reachable at the VIP at that
moment
[oracle@class02 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
apache_crs application ONLINE ONLINE class02
ora....01.lsnr application ONLINE OFFLINE
ora....s01.gsd application ONLINE OFFLINE
ora....s01.ons application ONLINE OFFLINE
ora....s01.vip application ONLINE ONLINE class01

29
Using CRS with Third Party APPS
Failover

Step 5: Apache also goes back to Node 1 since it is declared to
be dependent on the node 1 VIP. It is reachable again
[oracle@class02 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
apache_crs application ONLINE ONLINE class01
ora....01.lsnr application ONLINE ONLINE class01
ora....s01.gsd application ONLINE ONLINE class01
ora....s01.ons application ONLINE ONLINE class01
ora....s01.vip application ONLINE ONLINE class01

30
Using CRS with Third Party APPS
Using its own VIP

Creating a new, application specific VIP

[oracle@class01 ~]$ crs_profile -create apache_vip -dir ./ -t


application -a\
/u01/app/oracle/product/11.1/crs11/bin/usrvip \
-o oi=eth1,ov=192.168.16.110,on=255.255.255.0,ap=0

The ap (active placement) option tells the system not to reevaluate the
resource placement in case of new node addition.
Our VIP is not connected to particular node. It starts on any node on
startup, fails over to any surviving node in case of failure and do not
returns back in case if the original node starts again


Setting permitions
[root@class01 ~]# ./crs_setperm apache_vip -o root
[root@class01 ~]# ./crs_setperm apache_vip -u user:oracle:r-x 31
Using CRS with Third Party APPS
Using its own VIP

Making apache_crs dependent on the new apache_vip.
apache_crs is now dependent on ora.class01.vip.
To change that
[root@class02 oracle]# ./crs_register apache_crs -update -r
apache_vip


Now apache_crs will follow apache_vip on every node. When
apace_vip starts on a node, apache_crs will go at the same node


When apache_vip fails over to ANY surviving node, apache_crs will
fail over to the same node


When the failed node starts up again, the apache_vip will not go
back to it (active placement) and so will the apache_crs

32
Using CRS with Third Party APPS
Using its own VIP – we got a Service

No particular node. We never know where the application runs, but
we always access it at the apache_vip


We need to share binaries


We need to share the configuration files


We need to share everything the application needs to operate, so
that each node can access it in the same directory tree

And OCFS is here to help


33
The clusters and the Oracle Universal
Installer

OUI supports cluster level installations – installing CRS and Oracle
Database on all the cluster nodes simultaneously


Scripts provided under install_directory/install to:

runSSHSetup.sh – to set user equivalecy

addNode.sh – to add node to an existing cluster – calls OUI

attachHome.sh/detachHome.sh  to attach/detach existing homes
from the Oracle Inventory


Under install_directory the runcluvfy.sh to check all the prerequisites

34
The clusters and the Oracle Universal
Installer
The Oracle Inventory now cares about which cluster members
contains particular home directories

35
[oracle@class01 ~]$ ls /u01/app/oraInventory/ContentsXML/
comps.xml inventory.xml libs.xml
[oracle@class01 ~]$ cat /u01/app/oraInventory/ContentsXML/inventory.xml
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2006, Oracle. All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
<SAVED_WITH>11.1.0.6.0</SAVED_WITH>
<MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="OraCrs11g_home" LOC="/u01/app/oracle/product/11.1/crs11" TYPE="O"
IDX="1" CRS="true">
<NODE_LIST>
<NODE NAME="class01"/>
<NODE NAME="class02"/>
</NODE_LIST>
</HOME>
<HOME NAME="OraDb11g_home1" LOC="/u01/app/oracle/product/11.1/db_1" TYPE="O"
IDX="2">
<NODE_LIST>
<NODE NAME="class01"/>
<NODE NAME="class02"/>
</NODE_LIST>
</HOME>
</HOME_LIST> 36
</INVENTORY>
The clusters and the Oracle Universal
Installer – the command line options
[oracle@class01 bin]$ ./runInstaller -help
......
-clusterware oracle.crs,<crs version>
Version of Cluster ready services installed.

-addNode
For adding node(s) to the installation.
Wraped by the addNode.sh

-attachHome
For attaching homes to the OUI inventory.
Wrapped by attachHome.sh

-detachHome
For detaching homes from the OUI inventory without deleting
37
inventory directory inside Oracle home.
The clusters and the Oracle Universal
Installer – the command line options
-updateNodeList
For updating node list for this home in the OUI inventory.
Particularly useful when removing node from the cluster

-remoteshell <Path>
Unix specific option. Used only for cluster installs, specifies the path to
the remote shell program on the local cluster node.

And may more

38
The Bottom Line
(or what I like )

CRS looks good, reliable and mature since 10gR2


Now we have complete set of tools to change almost everything in the
configuration


Now we can multiplex the OCR and the Voting Disk for better reliability


Now Oracle fully supports adding and removing nodes from the clustrer
along with the utilities for that

39
The Bottom Line
(or what I don't like )

There are many utilities for management,
often duplicating the functionality


It is still easy to mess it up (say mess up the
private and the public IPs)


Although possible, reconfiguration (say fixing
the problem with the messed up private and
public IP) is still quite a pain. Lot of
commands, often not very intuitive


Some of the tasks (for example managing the
inventory while adding and removing nodes )
have to be done by hand, typing commands,
which are sort of “black magic”


There is still what to be done in documenting 40
CRS.
Q&A

41

You might also like