0% found this document useful (0 votes)
80 views32 pages

Brock Frank CTO DB Consult

This document provides information about tools used for troubleshooting and verifying Oracle RAC installations, including: - CLUVFY checks cluster configuration by verifying stages or components have been correctly installed. - Log files for Oracle Clusterware are located in $CRS_HOME/log on shared storage. Log files for databases are in $ORACLE_HOME/log. - CRSCTL and node applications like VIPs are used for managing Oracle Clusterware resources and providing high availability of applications across nodes.

Uploaded by

dbanothing
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views32 pages

Brock Frank CTO DB Consult

This document provides information about tools used for troubleshooting and verifying Oracle RAC installations, including: - CLUVFY checks cluster configuration by verifying stages or components have been correctly installed. - Log files for Oracle Clusterware are located in $CRS_HOME/log on shared storage. Log files for databases are in $ORACLE_HOME/log. - CRSCTL and node applications like VIPs are used for managing Oracle Clusterware resources and providing high availability of applications across nodes.

Uploaded by

dbanothing
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Brock Frank

CTO
DB Consult
Troubleshooting
Installation and Configuration
Oracle Clusterware
ASM and RDBMS
Cluster Verification Utility (CLUVFY)
Oracle Universal Installer (OUI)
Database Configuration Assistant (DBCA)
Introduced in Oracle 10.2

Checks cluster configuration


stages - verifies all steps for specified stage have been completed
components - verifies specified component has been correctly installed

Supplied with Oracle Clusterware


Can be downloaded from OTN (Linux and Windows)

For earlier versions see Metalink Note 135714.1


Script to Collect RAC Diagnostic Information (racdiag.sql)
On Red Hat and Oracle Enterprise Linux platforms, the
following additional RPM is required for “CLUVFY”
Cvuqdisk1.0.1-1.rpm
This package is supplied in the clusterware/cluvfy/rpm
directory on the clusterware CD-ROM
It can also be download from OTN
On each node as the root user install the RPM using:
rpm –ivh cvuqdisk1.0.1-1.rpm
-post hwos post check for hardware and operating system
-pre cfs pre-check for CFS setup
-post cfs post-check for CFS setup
-pre crsinst pre-check for Oracle Clusterware installation
-post crsinst post-check for Oracle Clusterware installation
-pre dbinst pre-check for database installation
-pre dbcfg pre-check for database configuration
nodereach Checks reachability between nodes
nodecon Checks node connectivity
cfs Checks CFS integrity
ssa Checks shared storage accessibility
space Checks space availability
sys Checks minimum system requirements
clu Checks cluster integrity
clumgr Checks cluster manager integrity
ocr Checks OCR integrity
crs Checks Oracle Clusterware (CRS) integrity
nodeapp Checks node applications exist
admprv Checks administrative privileges
peer Compares properties with peers
For example, to check configuration before
installing Oracle Clusterware on node1 and
node2 use:
sh runcluvfy.sh stage -pre crsinst -n testrac1,testrac2 -verbose
Checks:
node reachability
user equivalence
administrative privileges
node connectivity
shared stored accessibility
To enable trace in CLUVFY use:
export SRVM_TRACE = true
Trace files are written to the $CV_HOME/cv/log directory
By default this directory is removed immediately after CLUVFY is
executed

On Linux/Unix comment out the following line in runcluvfy.sh


# $RM -rf $CV_HOME
Pathname of CV_HOME directory is based on operating system
process e.g:
/tmp/18124
It can be useful to echo value of CV_HOME in runcluvfy.sh:
echo CV_HOME=$CV_HOME
On Unix/Linux to launch the OUI with tracing
enabled use:
./runInstaller -J-DTRACING.ENABLED=true -J-DTRACING.LEVEL=2

Log files will be written to


$ORACLE_BASE/oraInventory/logs
To trace root.sh execute it using:
sh -x root.sh

Note that it may be necessary to cleanup the


CRS installation before executing root.sh again
To enable trace for the DBCA in Oracle 9.0.1 and above
Edit $ORACLE_HOME/bin/dbca and change
# Run DBCA
$JRE_DIR/bin/jre -DORACLE_HOME=$OH -DJDBC_PROTOCOL=thin
-mx64m -classpath $CLASSPATH oracle.sysman.assistants.dbca.Dbca
$ARGUMENTS

To
# Run DBCA
$JRE_DIR/bin/jre -DORACLE_HOME=$OH -DJDBC_PROTOCOL=thin
-mx64m -DTRACING.ENABLED=true -DTRACING.LEVEL=2
-classpath $CLASSPATH oracle.sysman.assistants.dbca.Dbca
$ARGUMENTS

Redirect standard output to a file e.g.


$ dbca > dbca.out &
Clusterware Overview
Nodeapps
CRSCTL
Provides
Node membership services (CSS)
Resource management services (CRS)
Event management services (EVM)
In Oracle 10.1 and above resources include
Node apps
ASM Instances
Database
Instances
Services
Node applications include:
Virtual IP (VIP)
Listeners
Oracle Notification Service (ONS)
Global Services Daemon (GSD)
Node application introduced in Oracle 10.1
Allows Virtual IP address to be defined for each
node
All applications connect using Virtual IP
addresses
If node fails Virtual IP address is automatically
relocated to another node
Only applies to newly connecting sessions
Before After

VIP1 VIP2 VIP1 VIP1 VIP2

Listener1 Listener2 Listener1 Listener2

Instance1 Instance2 Instance1 Instance2

Node 1 Node 2 Node 1 Node 2


On Linux during normal operation, each node
will have one VIP address. For example:
[root@server3]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:11:D8:58:05:99
inet addr:192.168.2.103 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::211:d8ff:fe58:599/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6814 errors:0 dropped:0 overruns:0 frame:0
TX packets:10326 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:684579 (668.5 KiB) TX bytes:1449071 (1.3 MiB)
Interrupt:217 Base address:0x8800
eth0:1 Link encap:Ethernet HWaddr 00:11:D8:58:05:99
inet addr:192.168.2.203 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:217 Base address:0x8800
If Oracle Clusterware on server3 is shutdown,
the VIP resource is transferred to another node
(in this case server11)
[root@server11]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55
inet addr:192.168.2.111 Bcast:192.168.2.255 Mask:255.255.255.0
inet6 addr: fe80::21d:7dff:fea3:a55/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2792 errors:0 dropped:0 overruns:0 frame:0
TX packets:4097 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:329891 (322.1 KiB) TX bytes:593615 (579.7 KiB)
Interrupt:177 Base address:0x2000
eth0:1 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55
inet addr:192.168.2.211 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:177 Base address:0x2000
eth0:2 Link encap:Ethernet HWaddr 00:1D:7D:A3:0A:55
inet addr:192.168.2.203 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:177 Base address:0x2000
In Oracle 10.2, Oracle Clusterware log files are created in the
$CRS_HOME/log directory
can be located on shared storage
$CRS_HOME/log directory contains
subdirectory for each node e.g. $CRS_HOME/log/server6
$CRS_HOME/log/<node> directory contains:
Oracle Clusterware alert log e.g. alertserver6.log
client - logfiles for OCR applications including CLSCFG, CSS,
OCRCHECK, OCRCONFIG, OCRDUMP and OIFCFG
crsd - logfiles for CRS daemon including crsd.log
cssd - logfiles for CSS daemon including ocssd.log
evmd - logfiles for EVM daemon including evmd.log
racg - logfiles for node applications including VIP and ONS
Log File locations in $ORA_CRS_HOME
$ORA_CRS_HOME

log

<nodename>

client crsd cssd evmd racg alert<nodename>.log

racgeut racgimon racgmain


Log File locations in $ORACLE_HOME (RDBMS and ASM)

$ORACLE_HOME

log

<nodename>

client racg

racgeut racgimon racgmain racgmdb


If OCR or voting disk are not available, error files may be
created in /tmp e.g. /tmp/crsctl.4038
For example, if OCR cannot be found:
OCR initialization failed accessing OCR device: PROC-26: Error while
accessing the physical storage Operating System error [No such file or
directory] [2]
OCR is inaccessible - no CRS daemons will start
No errors written to log files

If Voting Disk has incorrect ownership


clsscfg_vhinit: unable(1) to open disk (/dev/raw/raw2)
Internal Error Information:
Category: 1234
Operation: scls_block_open
Location: statfs
Other: statfs failed /dev/raw/raw2
Dep: 2
Failure 1 checking the Cluster Synchronization Services voting
disk '/dev/raw/raw2'.
Not able to read adequate number of voting disks
Script called on each node by SRVCTL to control resources
Copy of script in each Oracle home
$ORA_CRS_HOME/bin/racgwrap
$ORA_ASM_HOME/bin/racgwrap
$ORACLE_HOME/bin/racgwrap
Sets environment variables
Invokes racgmain executable
Generated from racgwrap.sbs
Differs in each home

Sets $ORACLE_HOME and $ORACLE_BASE environment variables for


racgmain
Also sets $LD_LIBRARY_PATH
Enable trace by setting _USR_ORA_DEBUG to 1
Process Monitor Daemon
Provides Cluster I/O Fencing
Implemented on Unix systems
Not required with third-party clusterware
Implemented in Linux in 10.2.0.4 and above
In 10.2.0.3 and below hangcheck timer module is used
Provides hangcheck timer functionality to maintain cluster
integrity
Behaviour similar to hangcheck timer
Runs as root
Locked in memory
Failure causes reboot of system
See /etc/init.d/init.cssd for operating system reboot commands
OPROCD takes two parameters
-t - Timeout value
Length of time between executions (milliseconds)
Normally defaults to 1000
-m - Margin
Acceptable margin before rebooting (milliseconds)
Normally defaults to 500
Parameters are specified in /etc/init.d/init.cssd
OPROCD_DEFAULT_TIMEOUT=1000
OPROCD_DEFAULT_MARGIN=500
Contact Oracle Support before changing these values
CSS maintains two heartbeats
Network heartbeat across interconnect
Disk heartbeat to voting device
Disk heartbeat has an internal I/O timeout (in seconds)
Varies between releases
In Oracle 10.2.0.2 and above disk heartbeat timeout can be
specified by CSS disktimeout parameter
Maximum time allowed for a voting file I/O to complete
If exceeded file is marked offline
Defaults to 200 seconds
crsctl get css disktimeout
crsctl set css disktimeout <value>
CRSCTL can also be used to enable and disable
Oracle Clusterware
To enable Clusterware use:
# crsctl enable crs
To disable Clusterware use:
# crsctl disable crs
These commands update the following file:
/etc/oracle/scls_scr/<node>/root/crsstart
In Oracle 10.2, CRSCTL can be used to check the current state
of Oracle Clusterware daemons
To check the current state of all Oracle Clusterware daemons
# crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
To check the current state of individual Oracle Clusterware
daemons
# crsctl check cssd
CSS appears healthy
# crsctl check crsd
CRS appears healthy
# crsctl check evmd
EVM appears healthy
In Oracle 10.2 and above
Oracle Clusterware debugging can be enabled and disabled
for
CRS
CSS
EVM
Resources
Subcomponents
Debugging can be controlled
statically using environment variables
dynamically using CRSCTL
Debug settings can be persisted in OCR for use in
subsequent restarts
The olsnodes utility lists all nodes currently
running on the cluster
With no arguments olsnodes lists the nodes
e.g.
In Oracle 10.2 and above, with -p argument
olsnodes lists node names and private
interconnect
In Oracle 10.2 and above, with -i argument
olsnodes lists node names and VIP address
In Oracle 10.1 and above the OCRCONFIG utility performs
various administrative operations on the OCR including:
displaying backup history
configuring backup location
restoring OCR from backup
exporting OCR
importing OCR
upgrading OCR
downgrading OCR
In Oracle 10.2 and above OCRCONFIG can also
manage OCR mirrors
overwrite OCR files
repair OCR files
With the -t option, crs_stat lists resources together
with their state and the current node
Name Type Target State Host
------------------------------------------------------------
ora....T1.inst application ONLINE ONLINE server3
ora....T2.inst application ONLINE ONLINE server4
ora....T3.inst application ONLINE ONLINE server11
ora....T4.inst application ONLINE ONLINE server12
ora.TEST.db application ONLINE ONLINE server3
ora....SM3.asm application ONLINE ONLINE server11
ora....11.lsnr application ONLINE ONLINE server11
ora....r11.gsd application ONLINE ONLINE server11
ora....r11.ons application ONLINE ONLINE server11
ora....r11.vip application ONLINE ONLINE server11
ora....SM4.asm application ONLINE ONLINE server12
ora....12.lsnr application ONLINE ONLINE server12
ora....r12.gsd application ONLINE ONLINE server12
ora....r12.ons application ONLINE ONLINE server12
ora....r12.vip application ONLINE ONLINE server12
Oracle Cluster Registry
Vulnerable to corruption
Versions experiencing OCR corruptions have included:
10.1.0.3
10.2.0.2
10.2.0.3
11.1.0.6
Also experienced by
many Oracle employees
Typical symptom is "placement error"
May be related to configuration of services
Corruption may occur at an earlier date
May occur when service is configured on non-master node

You might also like