Aspiring to be a RAC DBA Part 2
Ashok Singh
Winona,MN-55987
[email protected]
Previous Session
What is Real Application
Cluster?
Cluster Manager(9i) ,Cluster
Ready Services(10g)
Cluster Files System
Cache Fusion and Interconnect
Routine Administration of RAC
Real Application Clusters
Objective
Cache Fusion
Instance Recovery
Backup and Recovery
Wait Events specific to RAC
Performance Tuning
Areas of Concern
Routine Administration
Conclusion
Cache Fusion Architecture
Revolutionary concept ,removes inefficiencies , enhances
performance and scalability
Use GC_FILES_TO_LOCKS for old cache fusion behavior
Cache Fusion provides a single buffer Cache through the
Interconnect .
Data Dictionary and Library Cache are synced
Cache Fusion model uses –
Global Cache Service (GCS)
Global Enqueue Service (GES)
Global Resource Directory (GRD)
Cache Fusion :Resource Modes
and Roles
lock mode : Null (N), Shared (S) , Exclusive (E) Lock Role : Local (L)
Global (G) Past Image (PI) :(1) PI exists , (0) No PI exists
The different combinations could be one of the following :
NL0 Null Local and No past Images
SL0 Shared Local with no past image
XL0 Exclusive Local with no past image
NG0 Null Global - Instance owns current block image
SG0 Global Shared Lock - Instance owns current image
XG0 Global Exclusive Lock - Instance own current image
NG1 Global Null – Another Instance Owns the Past image Block.
SG1 Shared Global – Another Instance owns past image
XG1 Global Exclusive Lock – Another Instance owns Past Image.
Cache Fusion Model
Global Resource Directory :
-in SGA ,maintained by GES and GCS
-used at instance recovery and cluster reconfigure.
-latest locations of a block(node)
-modes and roles of the block
Enqueues : serializes access to any resource
Past Image (PI) :Whenever a dirty block is sent to any
remote cache using CF, a copy is kept in the local cache
(in case of failures)
Dynamic Remastering of resources ,enhances scalability
_lm_dynamic_remastering (false)
Buffers States
Buffers can have many States (gv$bh):
- e.g. free, xcur, scur, cr, read, mrec, irec, wri, pi
• Only one current copy per dba but many in CR state
- limited by _db_block_max_cr_dba (6) and 10 cs
• Light_work_rule : This rule is applied when the cost of
creating a CR block is high .
• Fairness_down_converts : X Lock is converted to
null depending upon the _fairness_threshold (deflt 4)
• Chk values of light_works & fairness_down_converts
from gv$bsp OR gv$cr_block_server
Cache Fusion :Read From Disk
Cache Fusion :Read to Read
Cache Fusion :Read to Write
Cache Fusion :Write to Write
Checkpoint /Writing to Disk
At log switch/checkpoint the following action takes
place :
9 GCS will notify all nodes with PI of the block
9 DBWR of the most current PI will perform the write to
disk
9 BWR (block written record ) may not be the checkpoint
node
9 Resources modified i.e. Global to local
9 GRD is updated
Instance Recovery in RAC
Failure is detected by Cluster manager or CGS
SMON will complete the Instance Recovery
GRD is frozen
All resources on the lost are re-distributed
GES resources are reconfigured first
GCS recovery is done in two pass
- recovery set is prepared (list of blocks) and
locks requested on these blocks
-blocks recovered and made available
Alert log :Instance Recovery
ALERT LOG
List of nodes:
1
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Update rdomain variables
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon Sep 5 10:13:00 2005
LMS 1: 0 GCS shadows cancelled, 0 closed
Mon Sep 5 10:13:00 2005
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
v$resource_limit
ges_procs 30 37 251 251
ges_ress 7810 7821 6419 UNLIMITED
ges_locks 8298 8305 9518 UNLIMITED
ges_cache_ress 557 750 0 UNLIMITED
ges_reg_msgs 69 247 930 UNLIMITED
ges_big_msgs 20 62 930 UNLIMITED
ges_rsv_msgs 0 0 500 500
gcs_resources 4041 4041 9992 9992
gcs_shadows 2745 2745 9992 9992
Backup and Recovery
RAC and RMAN integrate well .
Conceptually ,same as single instance
Archived logs should be on CFS and
readable by all nodes .
RMAN will autolocate –data,control &
archieved log files
Channel Failover introduced in 10gR1
RMAN Configuration
Configuration is persistent across all nodes
Location (ie absolute path) must be available to all nodes
} Configure controlfile autobackup Format ‘ ‘
RMAN will perform backup for controlfile on the first allocated
channel when
} Configure controlfile autobackup ON
Channels can be allocated on all nodes of RAC
After this configuration
} Backup database
Archived Logs –
One common location *.log_archive_dest=‘+usr02/TEST/ARCH/’
Shared storage (CFS,ASM)
Read by all nodes during backup and recovery
Log_archive_format should have %t%s%r
Archived Log Location
1. Common Shared location for Archived Logs
2. This Shared Location can be on CFS/ASM
3. *. log_archive_dest_1=/arc_dest
4. *.log_archive_format=arch%s_%t_%r.log
RMAN
RMAN connects to only one of the nodes from the
command line .
} rman target sys/oracle@node2 catalog rman/cat@catdb
RMAN allocates channels on multiple nodes using
allocate channel or configure channel
RMAN detects Node Affinity and uses this info while
deciding which datafile should be backed by which node.
Any allocated channel can delete archived logs from any
thread (if using shared location )
Recovery Catalog should not be used as per the HA
recommendation ie
use controlfile
Catalog the backup after backups –resyn catalog
Rman:Backups
racnode2:oracle> rman target /
connected to target database: WMST (DBID=957552121)
RMAN> @bck.sh
select inst_id,module from gv$session where module like 'back%'
INST_ID MODULE
2 backup incr datafile: ORA_DISK_2
1 backup incr datafile: ORA_DISK_1
2 rows selected
The script :bck.sh
configure channel 1 device type disk format ‘…..’
connect 'sys/XXXX@wmst1' ;
configure channel 2 device type disk format ‘…..’
connect 'sys/XXXX@wmst2' ;
backup incremental level 0 database ;
Release channel 1,2 ;
Channel Failover in RAC
Any Backup is done using multiple Channels
These channels could be from multiple nodes
Nodes may fail during the duration of this Backup
RMAN is now capable of completing this work on
another allocated channel .
The error is also recorded in the v$rman_output
Currently not working as documented –Bug 4422034
Cloning a RAC Database
Duplicate to a Single Instance then Convert into a
Cluster DB
Create and Copy the Pfile to the desired host.
Make the required changes --
} *.cluster_database=FALSE
} db_file_name_convert=('+TBLA','+USR01')
} log_file_name_convert=('+TBLA','+USR01')
} control_files='+USR01/picky/controlfile/control01.ctl‘,….
} mkdir –p …udump ,.. bdump,….,cdump
} Add the auxiliary instance to tnsnames.ora
} Depending upon the config add in the listener.ora file
} Create a password file
Startup nomount using this new pfile –auxiliary instance is ready
Cloning a Database
From one of the RAC nodes :
} rman target / auxiliary sys/xxxx@yyy
} Duplicate target database to …..
} Starting 10gR2, from alert log “Re-creating tempfile +USR01 as
+USR01/picky/tempfile/temp.270.566600837”
alter database add logfile thread 2
group 3 ('+USR01/PICKY/ONLINELOG/redo3.log') size 25m,
group 4 ('+USR01/PICKY/ONLINELOG/redo4.log') size 25m;
Alter database enable public thread 2
Start the other instance from the other node
Register this database in the OCR
} racnode4:oracle> srvctl add database -d PICKY -o $ORACLE_HOME
} racnode4:oracle> srvctl config
PICKY
} racnode4:oracle> srvctl add instance -d PICKY -i PICKY1 -n racnode3
} racnode4:oracle> srvctl add instance -d PICKY -i PICKY2 -n racnode4
Recovering a Datafile
rm /usr05/oradata/WMST/WMST/test01.dbf
srvctl start database -d WMST -o mount
SQL> alter tablespace test offline;
SQL> alter database datafile '/usr05/oradata/WMST/WMST/test01.dbf'
offline;
SQL> alter database open ;
restore datafile '/usr05/oradata/WMST/WMST/test01.dbf' preview ;
List of Backup Sets
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
1178 Incr 0 5M SBT_TAPE 00:01:48 04-SEP-05
BP Key: 1532 Status: AVAILABLE Compressed: NO Tag: TAG20050904T160811
Handle: LEVEL0WMST481120050904 Media: dd_01.RO
List of Datafiles in backup set 1178
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
8 0 Incr 41914078 04-SEP-05 /usr05/oradata/WMST/WMST/test01.dbf
restore datafile '/usr05/oradata/WMST/WMST/test01.dbf' ;
recover datafile '/usr05/oradata/WMST/WMST/test01.dbf' ;
sql 'alter tablespace test online' ;
Performance Tuning
All techniques used in single instance databases are also
applicable here
Global Cache is made up multiple local Cache.
Cache Fusion introduces additional waits
Waits -gc_cr_request,gc_cache_busy, enqueues
select * from v$sqlarea where cluster_wait_time > 0
Level 7–identify segments for interconnect traffic
Statspack report has cluster specific statistics
9 GCS workload Characteristics
9 GES summary statistics
9 GCS and GES messaging statistics
9 GES detail statistics
New Wait Events
global enqueue gets sync
global enqueue gets async
global enqueue get time
global enqueue releases
gcs messages sent
ges messages sent
global enqueue CPU used by this session
gc cr blocks served
gc cr block build time
gc cr block flush time
gc cr block send time
gc current blocks served
gc current block pin time
gc current block flush time
gc current block send time
gc cr blocks received
gc cr block receive time
gc current blocks received
gc current block receive time
gc blocks lost
gc claim blocks lost
gc blocks corrupt
gc CPU used by this session
global undo segment hints helped
global undo segment hints were stale
STATSPACK
Statspack can still be used
} Change to level 7 to collect stats related to block
contention and segment block waits
} execute statspack.snap(i_snap_level=>7,i_modify_parameter=>'true');
EM used statspack data (until 9i)
Starting 10g,EM uses data collected by AWR
} AWR collects and stores data internally .
Statspack needs to be executed on all nodes
Use sprepins for any specific instance
However ,AWR is the recommended method
STASPACK report
Global Cache Load Profile
~~~~~~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction
Global Cache blocks received: 0.08 0.41
Global Cache blocks served: 0.19 0.91
GCS/GES messages received: 1.86 9.17
GCS/GES messages sent: 3.45 17.02
DBWR Fusion writes: 0.04 0.17
Global Cache Efficiency Percentages (Target local+remote 100%)
Buffer access - local cache %: 99.91
Buffer access - remote cache %: 0.07
Buffer access - disk %: 0.02
Global Cache and Enqueue Services - Workload Characteristics
Avg global enqueue get time (ms): 0.1
Avg global cache cr block receive time (ms): 0.0
Avg global cache current block receive time (ms): 1.3
Avg global cache cr block build time (ms): 0.3
Global cache log flushes for cr blocks served %: 5.7
Global Cache and Enqueue Services - Messaging Statistics
% of direct sent messages: 30.58
% of indirect sent messages: 55.65
% of flow controlled messages: 13.77
Performance Tuning
Calculate and Monitor :
9 Current Block average Latency
9 Block Mode conversions
9 GCS Request latency
9 Avg Global Enqueue Time
9 Avg lock convert time
Populate ges_convert_local and remote by enabling
event 29700
Set fast_start_mttr_target to reduce PI and CR
If using 10g – Use ADDM Analysis
ADDM is proactive ,captures RAC related service Issues
AWR reports –use awrrpti.sql for RAC
Use GV$sysstat,gv$system_event,gv$...transfer
gv$bsp,gv$cr_block_server
Reverse Key Indexes
OPS introduced RKI to avoid contention
To preserve the B-Tree structure keys have to be
inserted into specific blocks.
Many rows are inserted in the same block
Many nodes may try to insert into the same block
All these will lead to contention which will reduce
scalability
Contention may lead to gcs and ges waits
This contention is more prominent if sequences are used
Reverse Key Indexes
RKI will –
} Distribute the load ie place the leaf blocks far away
} Reduces contention when nodes are competing
} NOT allow range scan
} NOT do efficient space management
Use dbms_utility.current_instance to avoid this
contention
} It will place leaf blocks away to avoid this contention
create table ioug (id number ,instance_id number );
alter table ioug add constraint ioug_pk primary key (instance_id,id) ;
insert into ioug values
(dbms_utility.current_instance,ioug_sequence.nextval) ;
Sequences in RAC
Many applications have to use them for unique number
generation .
Use alongwith dbms_utility.current_instance if it is a PK
Be prepared to have some gaps .
Options CACHE and ORDER cannot be used for RAC
If ORDER is used then Data Dictionary will be accessed
every time –overhead
GCS maintains the sequences across the instances.
System Change Numbers
Generated Globally,Unique across all instances
Written to redo logs for synchronization
Two schemes –Lamport / Broadcast_on_commit
Depends on max_commit_propogation_delay(mcpd)
Lamport (9i) / Latch_free SCN scheme 2 (10g)
- default ,mcpd = 700 centiseconds
- no overhead ,all messages contain scn
- not suitable for rapid and simultaneous update/query
Broadcast_on_commit(mcpd=0) :some overhead as on commit
LGWR will send a msg to other nodes
Parallelism in RAC
When dealing with large volume of data
When we have extra CPU and memory
Hint is different -- /*+ parallel (ps_item,2,2) */
QC will always be on the initiating node
Slaves can be on others nodes
Configured using
} instance_group –instances have to be member(one or more)
} parallel_instance_group –dynamically ,all instances(default)
Freelist Group vs ASSM
Primarily ,to avoid buffer_busy_waits
Freelist Group is especially useful for RAC
- to reduce contention on segment header
- space usage may not be very efficient
Starting 9i ,ASSM may be a good solution
- internal and automatic maintenance
- good when nodes are added
- not tunable, space ,slower for FTS
- Contradictory results
ASSM need not be manipulated if nodes are added or removed
Remember this while migrating single instances to RAC
} select tablespace_name from dba_tablespaces where segment_space_management =
'MANUAL‘
DBMS_SCHEDULER
Will replace dbms_jobs.
Also support shell scripts which is very useful for RAC
Scripts can now be located at a shared location
Any available node of RAC can execute these script
Instance_stickiness is very helpful for RAC .
instance_stickiness- BOOLEAN
If this option is set to TRUE, then for the first run of the job the
scheduler will choose the instance with the lightest load to run this job
on. Subsequent runs will use the same instance that the first run used
(unless this instance is down).
If this is FALSE then the scheduler will choose the first available
instance to schedule the job on on all runs.
SQLPLUS /OIFCFG
Alter system checkpoint local
Alter system checkpoint global
Alter system switch logfile –LOCAL
Alter system archive log current –GLOBAL
USE OIFCFG
} To define network interfaces
} To retrieve configuration information
} To change network configuration
oifcfg getif
} eth0 172.16.8.0 global public
} eth1 192.168.215.0 global cluster_interconnect
oifcfg iflist
} eth0 172.16.8.0
} eth1 192.168.215.0
The Universal installer uses oifcfg to identify and display
Other Features
Logminer – no continuous_mine
Glogin.sql –
set sqlprompt ‘_connect_identifier’
Dbms_jobs –instance is a parameter
Crontabs /dbms_scheduler
gv$Views—catclust.sql ,inst_id
External files ,utl_file
Question & Answers
Questions?