Oracle Cache Fusion - in Operation

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 35

Oracle Cache Fusion In Operation

Agenda
Cache Fusion
What is it?
Cache Coherency Vs. Cache Fusion
Key Components and terminology
Cache Fusion in operation
Lock Mastering & Resource Affinity
Type of Contentions
Cache Fusion I
Cache Fusion II
Examples
Instance Crash Recovery in RAC
Key Components in a Instance crash

I Pass recovery
II Pass recovery

Cache Fusion What is it?

What is it?

Oracle introduced the framework of sharing data using private interconnects


between the nodes, which was used only for messaging purposes in previous
versions. This protocol is Cache Fusion. Data blocks are shipped throughout
the network similar to messages, reducing the most expensive component of
data transfer, disk I/O, to data sharing.
According to the manual:
Process that implement Cache Fusion. It maintains the block mode for blocks
in the global role. It is responsible for block transfers between instances. The
Global Cache Service employs various background processes such as the
Global Cache Service Processes (LMSn) and Global Enqueue Service
Daemon (LMD).
A diskless cache coherency mechanism in Oracle Real Application Clusters
that provides copies of blocks directly from a holding instance's memory
cache to a requesting instance's memory cache.

Cache Coherency
According to Manual
The synchronization of data in multiple caches so that reading a memory
location through any cache will return the most recent data written to that
location through any other cache. Sometimes called cache consistency.
Can We say its something to maintain the resource (block) status, If so, the
following two together provides the same for us.
GCS (Global Cache Services)
GES (Global Enqueue Services)
In the name of
Global Resource Directory

Now both together


The GCS manages all types of data blocks. Cache coherency is maintained through the GCS by
requiring that instances acquire a resource (lock or enqueue on a block) cluster-wide before
modifying or reading a database block. The GCS is used to synchronize global cache access,
allowing only one instance to modify a block at any single point in time. The GCS, through the RAC
wide Global Services Directory, ensures that the status of data blocks cached in any mode in the
cluster is globally visible and maintained.
Oracles RAC has multi-versioning architecture. This multi-versioning architecture distinguishes
between current data blocks and one or more consistent read (CR) versions of a block. A current
block contains changes for all committed and yet-to-be-committed transactions. A consistent read
(CR) version of a block represents a consistent snapshot of the data at a previous point in time. A
data block can reside in many buffer caches under the auspices of shared resources.
In Oracle9i RAC, applying rollback segment information to current blocks produces consistent read
versions of a block. Both the current and consistent read blocks are managed by the GCS.
To transfer data blocks among database caches, buffers are shipped by means of the high speed
IPC interconnect. Disk writes are only required for cache replacement. A past image (PI) of a block
is kept in memory before the block is sent if it is a dirty (modified) block. In the event of failure,
Oracle reconstructs the current version of the block by reading the PI blocks.

Background Process and their roles


LMSx Lock Monitor Services (GCS)
Primarily responsible for shipping the blocks across buffers
Provides/creates a CR image whenever there is cross instance call for a dirtyblcok
LMS must also check constantly with the LMD background process (or our GES process) to get the lock
requests placed by the LMD process.
Parameter: GCS_SERVER_PROCESS upto 36 as of 10.2, Min. cpu_count/2

LMON Lock Monitor Process (GES)


LMON Processes manages the global locks & resources.
Reconfiguration of locks & resources when an instance joins or leaves the cluster are handled by LMON (
During reconfiguration LMON generate the trace files)
LMON also provides cluster group services.

LMD Lock Manager Daemon


LMD process performs global lock deadlock detection local and remote . (GES)
Also monitors for lock conversion timeouts.
Basically maintains the lock queues, traverse through the GES structures

LCK Lock Process


Manages instance resource requests & cross instance calls for shared resources.
During instance recovery,it builds a list of invalid lock elements and validates lock elements.

DIAG Diagnostic Daemon


Oracle 10g - this one new background processes ( New enhanced diagnosability framework).
Regularly monitors the health of the instance.
Also checks instance hangs & deadlocks.

History of Cache Fusion


Oracle
Release

Feature

Description

Prior to 8.1.5

OPS

OPS used disk-based pings

8.1.5

Cache Fusion I or Consistent Read


Server

Consistent read version of the block is


transferred over the interconnect

9i

Cache Fusion II (write/write cache fusion)

Current version of the block is transferred


over the interconnect

10g R1

Oracle Cluster Ready Services (CRS)

CRS eliminates the need for third-party


clusterware, though it can be used

10g R2

Oracle CRS for High Availability

CRS provides high availability for nonOracle applications

Key Components in Cache Fusion


Ping
The transfer of a data block from one instances buffer cache to another instances buffer cache is known as a ping.
Whenever an instance needs a block, it sends a request to the lock master to obtain a lock in the desired mode. If
another lock resides on the same block, the master will ask the current holder to downgrade/release the current lock.,
this process is known as a blocking asynchronous trap (BAST). When an instance receives a BAST it downgrades the
lock as soon as possible. However, before downgrading the lock, it might have to write the corresponding block to disk.
This operation sequence is known as disk ping or a hard ping.
CR Fabrication
When ever there is Consistent read request from any other instance, the holding instance (LMS) has to create a
Consistent read image by applying the undo information to the Current Block. Since CR fabrication is I/O
expensive which requires a undo into the buffer and apply the undo image etc.
Past Image (PI) Blocks
PI blocks are copies of blocks in the local buffer cache. Whenever an instance has to send a block it has recently
modified to another instance, it preserves a copy of that block, marking it as PI. An instance is obliged to keep Pls until
that block is written to the disk by the current owner of the block. Pls are discarded after the latest version of the block is
written to disk. When a block is written to disk and is known to have global role, indicating the presence of Pls in other
instances buffer caches, Global Cache Services (GCS) informs the instance holding the Pls to discard the Pls. With
Cache Fusion, a block is written to disk to satisfy checkpoint requests and so on, not to transfer the block from one
instance to another via disk.
Lock Mastering
The memory structure where GCS keeps information about a data block (and other sharable resources) usage is known
as the lock resource. The responsibility of tracking locks is distributed among all the instances and the required memory
also comes from the participating instances System Global Area (SGA). Due to this distributed ownership of the
resources, a master node exists for each lock resource. The master node maintains complete information about current
users and requestors for the lock resource. The master node also contains information about the Pls of the block.

Resource Affinity and Dynamic remastering


Each block is mastered in any one of the instance at any given point of time
Resource Master can be changed based on frequency of the block that is requested by other
instances
For a period of 10 Mins if an instance request 50 times for a particular resource the requested instance
become the master. This is called resource affinity

- Block Mastering
In Oracle 9.2
documentation describes dynamic remastering
not implemented in code

In Oracle 10.1

work at data file level


very high threshold so difficult to test
does occur on some customer sites
may cause LMON process to crash in 10.1.0.4

bug 3659289 - patch available


fixed in 10.1.0.5/10.2.0.1

In Oracle 10.2
works at object level
thresholds are relatively low.
Object re mastering is recorded in V$GCSPFMASTER_INFO

Cache Fusion- Possible Types of Contention


Contention of a resource occurs when two or more instances want the same
resource. If a resource such as a data block is being used by an instance and is
needed by another instance at the same time, a contention occurs. There are three
types of contention for data blocks:
Read/Read contention Read/read contention is never a problem because of the shared disk system. A block read by one
instance can be read by other instances without the intervention of GCS.
Write/Read contention Write/read contention was addressed in Oracle 8i by the consistent read server. The holding
instance constructs the CR block and ships the requesting instance using interconnects.
Write/Write contention Write/write contention is addressed by the Cache Fusion technology. Since Oracle 9i, cluster
interconnect is used in some cases to ship data blocks among the instances that need to modify the same data block
simultaneously.

Prior to Cache Fusion

(before 8.1.5)

Write/read contention before Cache Fusion

Cache Fusion I aka Consistent Read Server

Write/Read contention - CR Block Transfer in Cache Fusion


Oracle Introduced a background process called BSP (Block Server process) makes the CR fabrication at the holders cache and ships the
CR version of the block across the interconnect

Still need to address Write/Write Contention

Write / Write Contention before Cache Fusion II (before 9i)

So now Cache Fusion II or Write/Write Cache Fusion

Cache Fusion current block transfer (from 9i r2 )

Buffer States In Cache Fusion


Mode/Role

Local

Global

Null: N

NL

NG

Shared: S

SL

SG

Exclusive: X

XL

XG

SL When an instance has a resource in SL form, it can serve a copy of the block to other instances and it can read
the block from disk. Since the block is not modified, there is no need to write to disk.
XL When an instance has a resource in XL form, it has sole ownership and interest in that resource. It also has the
exclusive right to modify the block. All changes to the blocks are in its local buffer cache, and it can write the block to
disk. If another instance wants the block, it will contact the instance via GCS.
NL A NL form is used to protect consistent read blocks. If a block is held in SL mode and another instance wants it in
X mode, the current instance will send the block to the requesting instance and downgrade its role to NL.
SG In SG form, a block is present in one or more instances. An instance can read the block from disk and serve it to
other instances.
XG In XG form, a block can have one or more Pls, indicating multiple copies of the block in several instances buffer
caches. The instance with the XG role has the latest copy of the block and is the most likely candidate to write the
block to disk. GCS can ask the instance with the XG role to write the block to disk or to serve it to another instance.
NG After discarding Pls when instructed by GCS, the block is kept in the buffer cache with NG role. This serves only
as the CR copy of the block.

Example 1: Reading a Block from Disk

Example 2: Reading a Block from the Cache

Example 3: Getting a (Cached) Clean Block for Update

Example 4: Getting a (Cached) Modified Block for Update and Commit

Example 5: Commit the Previously Modified Block and Select the Data

Example 6: Write the Dirty Buffers to Disk Due to Checkpoint

Example 7: Master Instance Crash

Example 7: What Alert log says abt reconfiguration.

List of nodes:
012
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
* domain 0 valid = 0 according to instance 0
Wed Jun 21 23:22:22 2006
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Jun 21 23:22:22 2006
LMS 0: 0 GCS shadows cancelled, 0 closed
Wed Jun 21 23:22:22 2006
LMS 2: 0 GCS shadows cancelled, 0 closed
Wed Jun 21 23:22:22 2006
LMS 3: 0 GCS shadows cancelled, 0 closed
Wed Jun 21 23:22:22 2006
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Wed Jun 21 23:22:22 2006
LMS 0: 2189 GCS shadows traversed, 332 replayed
Wed Jun 21 23:22:22 2006
LMS 2: 2027 GCS shadows traversed, 364 replayed
Wed Jun 21 23:22:22 2006
LMS 3: 2098 GCS shadows traversed, 364 replayed
Wed Jun 21 23:22:22 2006
LMS 1: 2189 GCS shadows traversed, 343 replayed
Wed Jun 21 23:22:22 2006
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete

Crash Recovery Key Components


Redo Threads and Streams

Redo Records and Change Vectors


Checkpoints
Thread Checkpoint or Local Checkpoint
Database Checkpoint or Global Checkpoint
Incremental Checkpoint
Bounded Recovery
Block Written Record (BWR)
Past Image (PI)
Checkpoints and PI
I Pass Recovery
II Pass Recovery
Merge Threads

Cache Fusion - Crash Instance


Recovery

The steps for GRD reconfiguration are as follows:


Instance death is detected by the cluster manager.
Requests for PCM locks are frozen.
Enqueues are reconfigured and made
available.
DLM recovery.
GCS (PCM lock) is remastered.
Pending writes and notifications are
processed.
The steps for I Pass recovery are as follows:
The instance recovery (IR) lock is acquired
by SMON.
The recovery set is prepared and built.
Memory space is allocated in the SMON
Program Global Area (PGA).
SMON acquires locks on buffers that need
recovery.
II Pass recovery steps are as follows:
II Pass is initiated. The database is partially
available.
Blocks are made available as they are
recovered.
The IR lock is released by SMON. Recovery
is complete.
The system is available.

Example 8: Select the Rows from Instance A

Just for a clear understanding


Its time to play

Cross Instance Consistent Read


Instance 1

Instance 2

Session 15

LMS0

SELECT runs,wickets
FROM score
WHERE team = 'ENG';

Build read
consistent version
of block 42

Session 27
UPDATE score
SET runs = runs + 6
4
2
WHERE team = 'ENG';

segment 5 slot 18:


state: 10
wrap#: 4E7
dba: 00800777
Undo Header

ITL1

ITL1

ITL1

seq: 530 irb 12

xid: 0005.018.4E7

xid: 0005.018.4E7

xid: 0005.018.4E7

xid: 0005.018.4E7

uba:
uba: -800777.530.12
800777.530.13
800777.530.12
800777.530.13
800777.530.14

uba:
uba: -800777.530.12
800777.530.13
800777.530.12
800777.530.13
800777.530.14

uba: 800777.530.14
800777.530.12
800777.530.13

slot 0

slot 0

slot 0

col1: ENG

col1: ENG

col1: ENG

col2: 340
350
344
352

col2: 340
350
344
352

340
col2: 352
344
350

col3: 1

col3: 1

col3: 1

12 uba: 5.1

slot 1

slot 1

col1: AUS

col1: AUS

col1: AUS

col2: 99

col2: 99

col2: 99

col3: 10

col3: 10

col3: 10

DataData
Block
Block
42 (copy)
42

DataData
Block
Block
42 (copy)
42

Data Block 42

col3: 340

13 uba 800777.530.12
5.1

slot 1

block 42 slot 0

block 42 slot 0
col3: 344

14 uba 800777.530.13
5.1

block 42 slot 0
col3: 350

Undo Block 800777

Commited Block Block on Disk

Session1
5

LMS0

Session2
7

22:9
22:10

ENG 199
ENG 205

ENG 205
199
200
204

ENG 200

AUS 99

AUS 99

ENG 204

Block 42

Undo
Block

SELECT runs
FROM score
WHERE team = 'ENG';
199
ENG 205
AUS 99

Instance 1

Instance 2

UPDATE score
SET runs = 200
WHERE team = 'ENG';
UPDATE score
SET runs = 204
WHERE team = 'ENG';
UPDATE score
SET runs = 205
WHERE team = 'ENG';
COMMIT;

Committed Block Block on Buffer Cache

Session1
5

LMS0

Session2
7

22:9
22:10

ENG 199
ENG 205

ENG 205
200
204
199

ENG 200

AUS 99

AUS 99

ENG 204

Block 42

Undo
Block

SELECT runs
FROM score
WHERE team = 'ENG';
ENG 199
AUS 99

Instance 1
STOP

Instance 2

UPDATE score
SET runs = 200
WHERE team = 'ENG';
UPDATE score
SET runs = 204
WHERE team = 'ENG';
UPDATE score
SET runs = 205
WHERE team = 'ENG';
COMMIT;

Uncommitted Block Block in Buffer cache

Session1
5

LMS0

Session2
7

22:10

ENG 199
ENG 199

ENG 199
205
204
200

ENG 205
199
200
204

ENG 200

AUS 99

AUS 99

AUS 99

ENG 204

Block 42
Copy

Block 42

Undo
Block

SELECT runs
FROM score
WHERE team = 'ENG';
ENG 199
AUS 99

Instance 1

Instance 2

UPDATE score
SET runs = 200
WHERE team = 'ENG';
UPDATE score
SET runs = 204
WHERE team = 'ENG';
UPDATE score
SET runs = 205
WHERE team = 'ENG';

Uncommitted Block On Disk

Session1
5

LMS0

Session2
7

ENG 199

22:10

ENG 199

ENG 205
200
204
199

ENG 200

ENG 205
199
200
204

ENG 200

AUS 99

ENG 204

AUS 99

ENG 204

Block 42

Undo
Block

SELECT runs
FROM score
WHERE team = 'ENG';

UPDATE score
SET runs = 200
WHERE team = 'ENG';
UPDATE score
SET runs = 204
WHERE team = 'ENG';
UPDATE score
SET runs = 205
WHERE team = 'ENG';

ENG 205
199
200
204

SEE SLIDE NOTES


FOR ADDITIONAL
INFORMATION

AUS 99

Instance 1

Instance 2

Q&A

References: Oracle 10g Real Application Clusters handbook K Gopalkrishnan


Julian Dyke RAC Presentation
Oracle 10g RAC Administrators Guide

You might also like