Oracle Databases On EMC VMAX PDF
Oracle Databases On EMC VMAX PDF
Storage
Oracle Remote Replication and Disaster Restart Using
Symmetrix Storage
Storage
Yaron Dar
Copyright 2008, 2009, 2010, 2011 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is
subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable
software license.
For the most up-to-date regulatory document for your product line, go to the Technical Documentation and
Advisories section on EMC Powerlink.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
All other trademarks used herein are the property of their respective owners.
H2603.3
Contents
Preface
Chapter 1
Chapter 2
Contents
Chapter 3
Contents
Chapter 4
Contents
Chapter 5
Contents
Chapter 6
Contents
Chapter 7
Contents
Chapter 8
Data Protection
EMC Double Checksum overview ............................................... 340
Traditional methods of preventing data corruption............340
Data corruption between host and conventional storage ...341
Benefits of checking within Symmetrix arrays .....................341
Implementing EMC Double Checksum for Oracle .................... 342
Other checksum operations.....................................................342
Enabling checksum options.....................................................343
Verifying checksum is enabled ...............................................344
Validating for checksum operations ......................................344
Disabling checksum..................................................................345
Implementing Generic SafeWrite for generic applications ....... 346
Torn pages: Using Generic SafeWrite to protect
applications................................................................................346
Why generic? .............................................................................347
Where to enable Generic SafeWrite........................................347
Configuring Generic SafeWrite...............................................348
How to disable Generic SafeWrite .........................................350
Contents
Chapter 9
10
Contents
Appendix A
11
Contents
Appendix B
Appendix C
Appendix D
474
474
475
476
477
477
480
483
485
487
490
491
491
492
495
496
497
Appendix E
12
510
510
510
511
512
Figures
Title
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Page
Oracle Databases on
13
Figures
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
14
Figures
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
15
Figures
102
103
104
105
106
107
16
439
440
440
441
464
488
Tables
Title
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Page
17
Tables
30
31
32
33
34
18
410
416
417
444
464
Preface
Audience
19
Preface
Related
documentation
Symmetrix operation
Oracle concepts and operation
20
Preface
Organization
21
Preface
IMPORTANT
An important notice contains information essential to operation of
the software or hardware.
Typographical conventions
EMC uses the following type style conventions in this document:
Normal
Bold
Italic
22
Preface
Courier
Used for:
System output, such as an error message or script
URLs, complete paths, filenames, prompts, and syntax when
shown outside of running text
Courier bold
Used for:
Specific user input (such as commands)
Courier italic
<>
[]
{}
...
23
Preface
24
1
Oracle on Open
Systems
Introduction ........................................................................................
Oracle overview .................................................................................
Storage management .........................................................................
Cloning Oracle objects or environments ........................................
Backup and recovery .........................................................................
Oracle Real Application Clusters.....................................................
Optimizing Oracle layouts on EMC Symmetrix............................
EMC and Oracle integration.............................................................
26
27
33
34
35
36
38
39
25
Introduction
The Oracle RDBMS on open systems first became available in 1979
and has steadily grown to become the marketshare leader in
enterprise database solutions. With a wide variety of features and
functionality, Oracle provides a stable platform for handling
concurrent, read-consistent access to a customer's application data.
Oracle database 10g and 11g, the latest releases of the Oracle RDBMS,
have introduced a variety of new and enhanced features over
previous versions of the database. Among these are:
26
Oracle overview
The Oracle RDBMS can be configured in multiple ways. The
requirement for 24x7 operations, replication and disaster recovery,
and the capacity of the host(s) that will contain the Oracle instance(s)
will, in part, determine how the Oracle environment must be
architected.
Redo
Log
ARCn
PMON
Shared Redo Log
Pool
Buffers
LGWR
Active
Redo
Log
CKPT
PGA
SMON
Redo
Log
Data
Dictionary
DB Block Buffers
Archive
Logs
DBWn
Data files
Data files
Data files
Data files
ICO-IMG-0
Figure 1
The System Global Area (SGA) contains the basic memory structures
that an Oracle database instance requires to function. The SGA
contains memory structures such as the Buffer Cache (shared area for
users to read or write Oracle data blocks), Redo Log Buffer (circular
buffer for the Oracle logs), Shared Pool (including user SQL and
PL/SQL code, data dictionary, and more), Large Pool, and others.
Oracle overview
27
28
Description
DBWn
(Database
Writer)
LGWR (Log
Writer)
Manages the redo log buffer and transmitting data from the buffer to the redo
logs on disk. Log writer writes to the logs whenever one of these four
scenarios occurs:
A user committed transaction
Every three seconds
When the redo buffer is third full
If DB writer needs to write dirty blocks, but their redo log is still in the
redo buffer
ARCn
(Database
Archiver)
Copies the redo logs to one or more log directories when a log switch
occurs. The ARCn process is only turned on if the database is in
ARCHIVELOG mode and automatic archiving is enabled. Up to 10 archive
processes can be started per Oracle instance, controlled by the init.ora
parameter LOG_ARCHIVE_MAX_PROCESSES.
CKPT
(Checkpoint)
Table 1
Description
SMON
(System
Monitor)
PMON
(Process
Monitor)
Snnn (Server
processes)
Oracle overview
29
SYSTEM
REDO1
REDO2
CNTL 1
CNTL 2
ARCH 14
CNTL 2
ARCH 15
Binaries
ARCH 16
ICO-IMG-000502
Figure 2
30
Oracle redo logs contain data and undo changes. All changes to the
database are written to the redo logs, unless logging of allowed
database objects, such as user tables, is explicitly disabled. Two or
more redo logs are configured, and normally the logs are multiplexed
to prevent data loss in the event that database recovery is required.
Archive logs are offloaded copies of the redo logs and are normally
required for recovering an Oracle database. Archive logs can be
multiplexed, both locally and remotely.
Oracle binaries are the executables and libraries used to initiate the
Oracle instance. Along with the binaries, Oracle uses many other
files to manage and monitor the database. These files include the
initialization parameter file (init<sid>.ora), server parameter file
(SPFILE), alert log, and trace files.
Logical data elements
Datafiles are the primary physical data element. Oracle tablespaces
are the logical element configured on top of the datafiles. Oracle
tablespaces are used as containers to hold the customer's information.
Each tablespace is built on one or more of the datafiles.
Tablespaces are the containers for the underlying Oracle logical data
elements. These logical elements include data blocks, extents, and
segments. Data blocks are the smallest logical elements configurable
at the database level. Data blocks are grouped into extents that are
then allocated to segments. Types of segments include data, index,
temporary and undo.
Oracle overview
31
Figure 3 shows the relationship between the data blocks, extents, and
segments.
Segment
(1920 KB)
Extent
(960 KB)
Extent
(960 KB)
Figure 3
32
Storage management
Standard Oracle backup/restore, disaster recovery, and cloning
methods can be difficult to manage and time-consuming. EMC
Symmetrix provides many alternatives or solutions that make these
operations easy to manage, fast, and very scalable. In addition, EMC
developed many best practices that increase Oracle performance and
high availability when using Symmetrix storage arrays.
Storage management
33
34
35
36
RAC Node 1
RAC Node 1
SGA
SGA
Shared storage
SYSTEM
DATA
INDEX
Binaries
Binaries
Node 1 files
REDO
1
Node 2 files
UNDO
1
REDO
2
UNDO
2
ICO-IMG-000504
Figure 4
37
38
Install base
With more than 55,000 mutual customers, EMC and Oracle are
recognized as the leaders in automated networked storage and
enterprise software, respectively. The EMC Symmetrix VMAX and
DMX offer the highest levels of performance, scalability and
availability along with industry-leading software for successfully
managing and maintaining complex Oracle database environments.
In addition, EMC IT has one of the largest deployments of Oracle
Applications in the world, with over 35,000 named users and over
3,500 concurrent users at peak periods. Also Oracle IT uses both
CLARiiON and Symmetrix extensively.
Joint engineering
Engineers for EMC and Oracle continue to work together to develop
integrated solutions, document best practices, and ensure
interoperability for customers deploying Oracle databases in EMC
Symmetrix VMAX and DMX storage environments. Key EMC
technologies such as TimeFinder and SRDF have been certified
through Oracle's Storage Certification Program (OSCP). As Oracle
phased out OSCP based on the maturity of the technology,
Engineering efforts continue between the two companies to ensure
successful integration between each company's products. With each
major technology or new product line EMC briefs Oracle Engineering
about the technology changes and together they review best
practices. EMC publishes many of the technology and deployment
best practices as joint logo papers with the presence of the Oracle logo
showing the strong communication and relationship between the
companies.
39
40
2
EMC Foundation
Products
Introduction ........................................................................................ 42
Symmetrix hardware and EMC Enginuity features...................... 45
EMC Solutions Enabler base management .................................... 49
EMC Change Tracker......................................................................... 52
EMC Symmetrix Remote Data Facility ........................................... 53
EMC TimeFinder ................................................................................ 68
EMC Storage Resource Management.............................................. 81
EMC Storage Viewer.......................................................................... 86
EMC PowerPath ................................................................................. 88
EMC Replication Manager................................................................ 97
EMC Open Replicator ....................................................................... 99
EMC Virtual Provisioning............................................................... 100
EMC Virtual LUN migration.......................................................... 103
EMC Fully Automated Storage Tiering (FAST)........................... 106
41
Introduction
EMC provides many hardware and software products that support
Oracle environments on Symmetrix systems. This chapter provides a
technical overview of the EMC products referenced in this document.
The following products, which are highlighted and discussed, were
used and/or tested with VMware Infrastructure deployed on EMC
Symmetrix.
EMC offers an extensive product line of high-end storage solutions
targeted to meet the requirements of mission-critical databases and
applications. The Symmetrix product line includes the DMX Direct
Matrix Architecture series and the VMAX Virtual Matrix series.
EMC Symmetrix is a fully redundant, high-availability storage
processor, providing nondisruptive component replacements and
code upgrades. The Symmetrix system features high levels of
performance, data integrity, reliability, and availability.
EMC Enginuity Operating Environment Enginuity enables
interoperation between the latest Symmetrix platforms and previous
generations of Symmetrix systems and enables them to connect to a
large number of server types, operating systems and storage software
products, and a broad selection of network connectivity elements and
other devices, ranging from HBAs and drivers to switches and tape
systems.
EMC Solutions Enabler Solutions Enabler is a package that
contains the SYMAPI runtime libraries and the SYMCLI command
line interface. SYMAPI provides the interface to the EMC Enginuity
operating environment. SYMCLI is a set of commands that can be
invoked from the command line or within scripts. These commands
can be used to monitor device configuration and status, and to
perform control operations on devices and data objects within a
storage complex.
EMC Symmetrix Remote Data Facility (SRDF) SRDF is a
business continuity software solution that replicates and maintains a
mirror image of data at the storage block level in a remote Symmetrix
system. The SRDF component extends the basic SYMCLI command
set of Solutions Enabler to include commands that specifically
manage SRDF.
42
Introduction
43
44
Symmetrix hardware
Solutions Enabler
Symmetrix-based applications
Fault isolation
45
46
ICO-IMG-000752
Figure 5
47
48
Command
Description
Performs operations on a device group (dg)
symdg
create
delete
rename
49
Table 2
Command
Description
release
list
show
symcg
create
add
remove
delete
rename
release
hold
unhold
list
show
symld
50
add
list
remove
rename
show
Table 2
Command
Description
Performs support operations on BCV pairs
symbcv
list
associate
disassociate
associate rdf
disassociate
rdf
51
52
53
<200Km
Escon
FC
GigE
Server
Source
Target
ICO-IMG-000001
Figure 6
SRDF benefits
SRDF offers the following features and benefits:
High performance
Flexible configurations
54
55
56
Consistency group
Host component
Symmetrix control Facility
5 Suspend R1/R2
relationship
DBMS
E-ConGroup
definition
(X,Y,Z)
7 DBMS
restartable
copy
R1(A)
6
R1(B)
R1(X)
RDF-ECA
R2(X)
R1(Y)
R2(Y)
Host 2
R1(C)
Consistency group
Host component
Symmetrix control Facility
R2(Z)
R2(A)
R2(B)
R2(C)
DBMS
R1(Z)
RDF-ECA
X = DBMS data
Y = Application data
Z = Logs
ICO-IMG-000106
Figure 7
57
58
SRDF terminology
This section describes various terms related to SRDF operations.
Suspend and resume operations
Practical uses of suspend and resume operations usually involve
unplanned situations in which an immediate suspension of I/O
between the R1 and R2 devices over the SRDF links is desired. In this
way, data propagation problems can be stopped. When suspend is
used with consistency groups, immediate backups can be performed
off the R2s without affecting I/O from the local host application. I/O
can then be resumed between the R1 and R2 and return to normal
operation.
Establish and split operations
The establish and split operations are normally used in planned
situations in which use of the R2 copy of the data is desired without
interfering with normal write operations to the R1 device. Splitting a
point-in-time copy of data allows access to the data on the R2 device
for various business continuity tasks. The ease of splitting SRDF pairs
to provide exact database copies makes it convenient to perform
scheduled backup operations, reporting operations, or new
application testing from the target Symmetrix data while normal
processing continues on the source Symmetrix system.
The R2 copy can also be used to test disaster recovery plans without
manually intensive recovery drills, complex procedures, and
application service interruptions. Upgrades to new versions can be
tested or changes to actual code can be made without affecting the
online production server. For example, modified server code can be
run on the R2 copy of the database until the upgraded code runs with
no errors before upgrading the production server.
In cases where an absolute real-time copy of the production data is
not essential, users may choose to split the SRDF pair periodically
and use the R2 copy for queries and report generation. The SRDF pair
can be re-established periodically to provide incremental updating of
data on the R2 device. The ability to refresh the R2 device periodically
provides the latest information for data processing and reporting.
Failover and failback operations
Practical uses of failover and failback operations usually
involve the need to switch business operations from the production
site to a remote site (failover) or the opposite (failback). Once failover
EMC Symmetrix Remote Data Facility
59
Data Mobility
Data mobility is an SRDF configuration that restricts SRDF devices to
operating only in adaptive copy mode. This is a lower-cost licensing
option that is typically used for data migrations. It allows data to be
transferred in adaptive copy mode from source to target, and is not
designed as a solution for DR requirements unless used in
combination with TimeFinder.
Dynamic SRDF
Dynamic SRDF allows the creation of SRDF pairs from non-SRDF
devices while the Symmetrix system is in operation. Historically,
source and target SRDF device pairing has been static and changes
required assistance from EMC personnel. This feature provides
greater flexibility in deciding where to copy protected data.
Dynamic RA groups can be created in a SRDF switched fabric
environment. An RA group represents a logical connection between
two Symmetrix systems. Historically, RA groups were limited to
those static RA groups defined at configuration time. However, RA
groups can now be created, modified, and deleted while the
Symmetrix system is in operation. This provides greater flexibility in
forming SRDF-pair-associated links.
Restore resynchronizes a data copy from the target (R2) side to the
source (R1) side. This operation can be a full or incremental
restore. Changes on the R1 volumes are discarded by this process.
Split stops mirroring for the SRDF pair(s) in a device group and
write-enables the R2 devices.
61
Failover switches data processing from the source (R1) side to the
target (R2) side. The source side volumes (R1), if still available,
are write-disabled.
Failback switches data processing from the target (R2) side to the
source (R1) side. The target side volumes (R2), if still available,
are write-disabled.
62
Production
DBMS
Disaster recovery
DBMS
Establish
Data
Production
server
Logs
Data
Restore
Logs
R1
DR
server
R2
ICO-IMG-000003
Figure 8
Synchronized
Suspended
R1 updated
SyncInProg (if the symforce option is specified for the split
resulting in a set of R2 devices that are not dependent-write
consistent and are not usable)
63
64
Disaster recovery
DBMS
Failover
Data
Production
server
Logs
Data
Failback
R1
Logs
DR
server
R2
ICO-IMG-000004
Figure 9
65
Synchronized
Suspended
R1 updated
Partitioned (when invoking this operation at the target site)
Failback
To resume normal operations on the R1 side, a failback (R1 device
takeover) operation is initiated. This means read/write operations on
the R2 device must be stopped, and read/write operations on the R1
device must be started. When the failback command is initiated,
the R2 becomes read-only to its host, while the R1 becomes
read/write-enabled to its host. The following command performs a
failback operation on all SRDF pairs in the device group named
MyDevGrp:
symrdf g MyDevGrp failback -noprompt
The SRDF pair must already be in one of the following states for the
failback operation to succeed:
Failed over
Suspended and write-disabled at the source
Suspended and not ready at the source
R1 Updated
R1 UpdInProg
66
Enterprise LAN/WAN
Primary
site nodes
Secondary
site nodes
Fibre Channel
or SCSI
Fibre Channel
or SCSI
R1
R2
R1
SRDF
R2
ICO-IMG-000005
Figure 10
67
EMC TimeFinder
The SYMCLI TimeFinder component extends the basic SYMCLI
command set to include TimeFinder or business continuity
commands that allow control operations on device pairs within a
local replication environment. This section specifically describes the
functionality of:
68
Server
running
SYMCLI
STD
BCV
STD
BCV
STD
BCV
ICO-IMG-000006
Figure 11
EMC TimeFinder
69
Write pending tracks for the standard device that have not yet
been written out to the BCV are duplicated in cache to be written
to the BCV.
Regular split
A regular split is the type of split that has existed for
TimeFinder/Mirror since its inception. With a regular split (before
Enginuity version 5568), I/O activity from the production hosts to a
standard volume was not accepted until it was split from its BCV
pair. Therefore, applications attempting to access the standard or the
BCV would experience a short wait during a regular split. Once the
split was complete, no further overhead was incurred.
Beginning with Enginuity version 5568, any split operation is an
instant split. A regular split is still valid for earlier versions and
for current applications that perform regular split operations.
However, current applications that perform regular splits with
Enginuity version 5568 actually perform an instant split.
By specifying the instant option on the command line, an instant
split with Enginuity versions 5x66 and 5x67 can be performed.
Since version 5568, this option is no longer required because instant
split mode has become the default behavior. It is beneficial to
continue to supply the instant flag with later Enginuity versions,
otherwise the default is to wait for the background split to
complete.
70
Instant split
An instant split shortens the wait period during a split by
dividing the process into a foreground split and a background
split. During an instant split, the system executes the foreground
split almost instantaneously and returns a successful status to the
host. This instantaneous execution allows minimal I/O disruptions to
the production volumes. Furthermore, the BCVs are accessible to the
hosts as soon as the foreground process is complete. The background
split continues to split the BCV pair until it is complete. When the
-instant option is included or defaulted, SYMCLI returns
immediately after the foreground split, allowing other operations
while the BCV pair is splitting in the background.
The following operation performs an instant split on all BCV pairs
in MyDevGrp, and allows SYMCLI to return to the server process
while the background split is in progress:
symmir -g MyDevGrp split instant noprompt
EMC TimeFinder
71
72
Controlling host
Host A
Database
servers
Host B
STD
BCV
STD
BCV
STD
BCV
SYMAPI
ECA
prodgrp
Consistent split
Host C
Figure 12
ICO-IMG-000007
EMC TimeFinder
73
Host
Symmetrix
SYMAPI
SYMCLI
DBMS
PowerPath or
ECA
3
Application Application
data
data
LOGS
Other
data
BCV
BCV
BCV
BCV
5
ICO-IMG-000008
Figure 13
TimeFinder/Clone operations
Symmetrix TimeFinder/Clone operations using SYMCLI can create
up to 16 copies from a source device onto target devices. Unlike
TimeFinder/Mirror, TimeFinder/Clone does not require the
traditional standard-to-BCV device pairing. Instead,
TimeFinder/Clone allows any combination of source and target
EMC TimeFinder
75
devices. For example, a BCV can be used as the source device, while
another BCV can be used as the target device. Any combination of
source and target devices can be used. Additionally,
TimeFinder/Clone does not use the traditional mirror positions the
way that TimeFinder/Mirror does. Because of this,
TimeFinder/Clone is a useful option when more than three copies of
a source device are desired.
Normally, one of the three copies is used to protect the data against
hardware failure.
The source and target devices must be the same emulation type (FBA
or CKD). The target device must be equal in size to the source device.
Clone copies of striped or concatenated metavolumes can also be
created providing the source and target metavolumes are identical in
configuration. Once activated, the target device can be instantly
accessed by a targets host, even before the data is fully copied to the
target device.
TimeFinder/Clone copies are appropriate in situations where
multiple copies of production data is needed for testing, backups, or
report generation. Clone copies can also be used to reduce disk
contention and improve data access speed by assigning users to
copies of data rather than accessing the one production copy. A single
source device may maintain as many as 16 relationships that can be a
combination of BCVs, clones and snaps.
Clone copy sessions
TimeFinder/Clone functionality is controlled via copy sessions,
which pair the source and target devices. Sessions are maintained on
the Symmetrix system and can be queried to verify the current state
of the device pairs. A copy session must first be created to define and
set up the TimeFinder/Clone devices. The session is then activated,
enabling the target device to be accessed by its host. When the
information is no longer needed, the session can be terminated.
TimeFinder/Clone operations are controlled from the host by using
the symclone command to create, activate, and terminate
the copy sessions.
Figure 14 illustrates a copy session where the controlling host creates
a TimeFinder/Clone copy of standard device DEV001 on target
device DEV005, using the symclone command.
76
1
2
Server running
SYMCLI
DEV
001
Target host
DEV
005
ICO-IMG-000490
Figure 14
The activation of a clone enables the copying of the data. The data
may start copying immediately if the copy keyword is used. If the
copy keyword is not used, tracks are only copied when they are
accessed from the target volume or when they are changed on the
source volume.
Activation of the clone session established in the previous create
command can be accomplished using the following command.
symclone g MyDevGrp activate -noprompt
EMC TimeFinder
77
TimeFinder/Snap operations
Symmetrix arrays provide another technique to create copies of
application data. The functionality, called TimeFinder/Snap, allows
users to make pointer-based, space-saving copies of data
simultaneously on multiple target devices from a single source
device. The data is available for access instantly. TimeFinder/Snap
allows data to be copied from a single source device to as many as 128
target devices. A source device can be either a Symmetrix standard
device or a BCV device controlled by TimeFinder/Mirror, with the
exception being a BCV working in clone emulation mode. The target
device is a Symmetrix virtual device (VDEV) that consumes
negligible physical storage through the use of pointers to track
changed data.
The VDEV is a host-addressable Symmetrix device with special
attributes created when the Symmetrix system is configured.
However, unlike a BCV which contains a full volume of data, a VDEV
is a logical-image device that offers a space-saving way to create
instant, point-in-time copies of volumes. Any updates to a source
device after its activation with a virtual device, causes the pre-update
image of the changed tracks to be copied to a save device. The virtual
devices indirect pointer is then updated to point to the original track
data on the save device, preserving a point-in-time image of the
volume. TimeFinder/Snap uses this copy-on-first-write technique to
conserve disk space, since only changes to tracks on the source cause
any incremental storage to be consumed.
The symsnap create and symsnap activate commands are
used to create source/target Snap pair.
78
Device
Description
Virtual device
A logical-image device that saves disk space through the use of pointers to
track data that is immediately accessible after activation. Snapping data to a
virtual device uses a copy-on-first-write technique.
Save device
A device that is not host-accessible but accessed only through the virtual
devices that point to it. Save devices provide a pool of physical space to store
snap copy data to which virtual devices point.
BCV
A full volume mirror that has valid data after fully synchronizing with its source
device. It is accessible only when split from the source device that it is mirroring.
EMC TimeFinder
79
Controlling host
1
I/O
2
I/O
Target host
DEV
001
VDEV
005
SAV
DEV
Device pointers
from VDEV to
original data
Data copied to
save area due to
copy on write
ICO-IMG-000491
Figure 15
80
Relational databases
File systems
Performance statistics
SRM allows users to examine the mapping of storage devices and the
characteristics of data files and objects. These commands allow the
examination of relationships between extents and data files or data
objects, and how they are mapped on storage devices. Frequently,
SRM commands are used with TimeFinder and SRDF to create
point-in-time copies for backup and restart.
Figure 16 on page 82 outlines the process of how SRM commands are
used with TimeFinder in a database environment.
81
SRM
Host
SYMAPI
SYMCLI
DBMS
PowerPath or
ECA
TimeFinder SPLIT
Data
BCV
DEV
001
DEV
001
Data
BCV
DEV
002
DEV
002
Log
BCV
DEV
003
DEV
003
Log
BCV
DEV
004
DEV
004
ICO-IMG-000011
Figure 16
SRM commands
EMC Solutions Enabler with a valid license for TimeFinder and SRM
is installed on the host. In addition, the host must also have
PowerPath or use ECA, and must be utilized with a supported DBMS
system. As discussed in TimeFinder split operations on page 70,
when splitting a BCV, the system must perform housekeeping tasks
that may require a few seconds on a busy Symmetrix system. These
tasks involve a series of steps (shown in Figure 16 on page 82) that
result in the separation of the BCV from its paired standard:
1. Using the SRM base mapping commands, first query the
Symmetrix system to display the logical-to-physical mapping
information about any physical device, logical volume, file,
directory, and/or file system.
2. Using the database mapping command, query the Symmetrix to
display physical and logical database information.
3. Next, use the database mapping command to translate:
The devices of a specified database into a device group or a
consistency group, or
The devices of a specified table space into a device group or a
consistency group.
4. The BCV is split from the standard device.
82
Command
Argument
Action
symrslv
pd
lv
file
dir
fs
Command
Argument
Action
symrdb
list
show
rdb2dg
83
Table 5
Command
Argument
Action
rdb2cg
tbs2cg
tbs2dg
Command
Argument
Action
symhostfs
list
show
84
Table 7 lists the SYMCLI commands that can be used to examine the
logical volume mapping.
Table 7
Command
Argument
Action
symvg
deport
import
list
rescan
show
vg2cg
vg2dg
list
show
symlv
Command
Argument
Action
symhost
show
stats
85
86
The global EMC Storage view. This view configures the global
settings for the Storage Viewer, including the Solutions Enabler
client/server settings, log settings, and version information.
Additionally, an arrays tab lists all of the storage arrays currently
being managed by Solutions Enabler, and allows for the
discovery of new arrays and the deletion of previously
discovered arrays.
The EMC Storage tab for hosts. This tab appears when an
ESX/ESXi host is selected. It provides insight into the storage
that is configured and allocated for a given ESX/ESXi host.
The SRDF SRA tab for hosts. This view also appears when an
ESX/ESXi host is selected on a vSphere Client running on
VMware Site Recovery Manager Server. It allows you to
configure device pair definitions for the EMC SRDF Storage
Replication Adapter (SRA), to use when testing VMware Site
Recovery Manager recovery plans, or when creating gold copies
before VMware Site Recovery Manager recovery plans are
executed.
The EMC Storage tab for virtual machines. This view appears
when a virtual machine is selected. It provides insight into the
storage that is allocated to a given virtual machine, including
both virtual disks and raw device mappings (RDM).
A typical view of the Storage Viewer for vSphere Client can be seen in
Figure 17.
Figure 17
87
EMC PowerPath
EMC PowerPath is host-based software that works with networked
storage systems to intelligently manage I/O paths. PowerPath
manages multiple paths to a storage array. Supporting multiple paths
enables recovery from path failure because PowerPath automatically
detects path failures and redirects I/O to other available paths.
PowerPath also uses sophisticated algorithms to provide dynamic
load balancing for several kinds of path management policies that the
user can set. With the help of PowerPath, systems administrators are
able to ensure that applications on the host have highly available
access to storage and perform optimally at all times.
A key feature of path management in PowerPath is dynamic,
multipath load balancing. Without PowerPath, an administrator must
statically load balance paths to logical devices to improve
performance. For example, based on current usage, the administrator
might configure three heavily used logical devices on one path, seven
moderately used logical devices on a second path, and 20 lightly used
logical devices on a third path. As I/O patterns change, these
statically configured paths may become unbalanced, causing
performance to suffer. The administrator must then reconfigure the
paths, and continue to reconfigure them as I/O traffic between the
host and the storage system shifts in response to usage changes.
Designed to use all paths concurrently, PowerPath distributes I/O
requests to a logical device across all available paths, rather than
requiring a single path to bear the entire I/O burden. PowerPath can
distribute the I/O for all logical devices over all paths shared by
those logical devices, so that all paths are equally burdened.
PowerPath load balances I/O on a host-by-host basis, and maintains
statistics on all I/O for all paths. For each I/O request, PowerPath
intelligently chooses the least-burdened available path, depending on
the load-balancing and failover policy in effect. In addition to
improving I/O performance, dynamic load balancing reduces
management time and downtime because administrators no longer
need to manage paths across logical devices. With PowerPath,
configurations of paths and policies for an individual device can be
changed dynamically, taking effect immediately, without any
disruption to the applications.
88
EMC PowerPath
89
PowerPath/VE
EMC PowerPath/VE delivers PowerPath Multipathing features to
optimize VMware vSphere virtual environments. With
PowerPath/VE, you can standardize path management across
heterogeneous physical and virtual environments. PowerPath/VE
enables you to automate optimal server, storage, and path utilization
in a dynamic virtual environment. With hyper-consolidation, a
virtual environment may have hundreds or even thousands of
independent virtual machines running, including virtual machines
with varying levels of I/O intensity. I/O-intensive applications can
disrupt I/O from other applications and before the availability of
PowerPath/VE, load balancing on an ESX host system had to be
manually configured to correct for this. Manual load-balancing
operations to ensure that all virtual machines receive their individual
required response times are time-consuming and logistically difficult
to effectively achieve.
PowerPath/VE works with VMware ESX and ESXi as a multipathing
plug-in (MPP) that provides enhanced path management capabilities
to ESX and ESXi hosts. PowerPath/VE is supported with vSphere
(ESX4) only. Previous versions of ESX do not have the PSA, which is
required by PowerPath/VE.
PowerPath/VE installs as a kernel module on the vSphere host.
PowerPath/VE will plug in to the vSphere I/O stack framework to
bring the advanced multipathing capabilities of PowerPath dynamic load balancing and automatic failover - to the VMware
vSphere platform (Figure 18 on page 91).
90
Figure 18
EMC PowerPath
91
92
PowerPath/VE management
PowerPath/VE uses a command set, called rpowermt, to monitor,
manage, and configure PowerPath/VE for vSphere. The syntax,
arguments, and options are very similar to the traditional powermt
commands used on all the other PowerPath Multipathing supported
operating system platforms. There is one significant difference in
that rpowermt is a remote management tool.
Not all vSphere installations have a service console interface. In
order to manage an ESXi host, customers have the option to use
vCenter Server or vCLI (also referred to as VMware Remote Tools) on
a remote server. PowerPath/VE for vSphere uses the rpowermt
command line utility for both ESX and ESXi. PowerPath/VE for
vSphere cannot be managed on the ESX host itself. There is neither a
local nor remote GUI for PowerPath on ESX.
Administrators must designate a Guest OS or a physical machine to
manage one or multiple ESX hosts. rpowermt is supported on
Windows 2003 (32-bit) and Red Hat 5 Update 2 (64-bit).
When the vSphere host server is connected to the Symmetrix system,
the PowerPath/VE kernel module running on the vSphere host will
associate all paths to each device presented from the array and
associate a pseudo device name (as discussed earlier). An example of
this is shown in Figure 15 on page 80, which shows the output of
rpowermt display host=x.x.x.x dev=emcpower0. Note in the output
that the device has four paths and displays the optimization mode
(SymmOpt = Symmetrix optimization).
EMC PowerPath
93
Figure 19
94
Figure 20
EMC PowerPath
95
No virtual machines are using physical media from their ESX host
system (that is, CD-ROMs, USB drives)
The remaining ESX hosts in the cluster will be able to handle the
additional load of the temporarily migrated virtual machines.
96
Oracle databases
97
HP StorageWorks arrays
98
99
Only delivers space from the thin pool when it is written to, that
is, on-demand. Overallocated application components only use
space that is written to not requested.
Provides for thin-pool wide striping and for the most part relieves
the storage administrator of the burden of physical device/LUN
configuration
Thin device
A thin device is a Host accessible device that has no storage
directly associated with it. Thin devices have pre-configured sizes
and appear to the host to have that exact capacity. Storage is allocated
in chunks when a block is written to for the first time. Zeroes are
provided to the host for data that is read from chunks that have not
yet been allocated.
Data device
Data devices are specifically configured devices within the
Symmetrix that are containers for the written-to blocks of thin
devices. Any number of data devices may comprise a data device
pool. Blocks are allocated to the thin devices from the pool on a round
robin basis. This allocation block size is 768K.
Figure 21 on page 101 depicts the components of a Virtual
Provisioning configuration:
100
Pool A
Data
devices
Thin
Devices
Pool B
Data
devices
ICO-IMG-000493
Figure 21
101
102
Flash
Protection Type
Flash
Fibre
Channel
SATA
Fibre
Channel
SATA
UnProtected
RAID 1
RAID 6
RAID 6
UnProtected
x
ICO-IMG-000754
Figure 22
103
transition from one protection type to another while servers and their
associated applications and Symmetrix software are accessing the
device.
The Virtual LUN feature offers customers the ability to effectively
utilize SATA storage - a much cheaper, yet reliable, form of high
capacity storage. It also facilitates fluid movement of data across the
various storage tiers present within the subsystem - the realization of
true "tiered storage in the box." Thus, Symmetrix VMAX becomes the
first enterprise storage subsystem to offer a comprehensive "tiered
storage in the box," ILM capability that complements the customer's
tiering initiatives. Customers can now achieve varied
cost/performance profiles by moving lower priority application data
to less expensive storage, or conversely, moving higher priority or
critical application data to higher performing storage as their needs
dictate.
Specific use cases for customer applications enable the moving of
data volumes transparently from tier to tier based on changing
performance (moving to faster or slower disks) or availability
requirements (changing RAID protection on the array). This
migration can be performed transparently without interrupting those
applications or host systems utilizing the array volumes and with
only a minimal impact to performance during the migration.
The following sample commands show how to move two LUNs of a
host environment from RAID 6 drives on Fibre Channel 15k rpm
drives to Enterprise Flash drives. The new symmigrate command,
which comes in EMC Solutions Enabler 7.0, is used to perform the
migrate operation. The source Symmetrix hypervolume numbers are
200 and 201, and the target Symmetrix hypervolumes on the
Enterprise Flash drives are A00 and A01.
1. A file (migrate.ctl) is created that contains the two LUNs to be
migrated. The file has the following content:
200 A00
201 A01
104
The two host accessible LUNs are migrated without having to impact
application or server availability.
105
3
Creating Oracle
Database Clones
107
108
Overview
There are many choices when cloning databases with EMC
array-based replication software. Each software product has differing
characteristics that affect the final deployment. A thorough
understanding of the options available leads to an optimal replication
choice.
An Oracle database can be in one of three data states when it is being
copied:
Shutdown
Processing normally
Overview
109
110
111
Oracle
2 4
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000505
Figure 23
112
3. When the database is deactivated, split the BCV mirrors using the
following command:
symmir -g device_group split -noprompt
113
Oracle
2 4
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000505
Figure 24
114
115
I/O
STD
4
I/O
Target
host
Device pointers
from VDEV to
original data
VDEV
Data copied to
save area due to
copy on write
SAVE
DEV
ICO-IMG-000506
Figure 25
2. Once the create operation has completed, shut down the database
to make a cold TimeFinder/Snap of the DBMS. Execute the
following Oracle commands:
sqlplus "/ as sysdba"
SQL> shutdown immediate;
116
117
Oracle
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000507
Figure 26
119
120
Oracle
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000507
Figure 27
121
122
STD
2
I/O
Target
host
Device pointers
from VDEV to
original data
VDEV
Data copied to
save area due to
copy on write
SAVE
DEV
ICO-IMG-000508
Figure 28
123
124
as sysdba"
system archive log current;
tablespace DATA begin backup;
tablespace INDEXES begin backup;
125
Alternatively, with Oracle10g, the entire database can be put into hot
backup mode with:
sqlplus "/ as sysdba"
SQL> alter system archive log current;
SQL> alter database begin backup;
When these commands are issued, data blocks for the tablespaces are
flushed to disk and the datafile headers are updated with the last
SCN. Further updates of the SCN to the datafile headers are not
performed. When these files are copied, the nonupdated SCN in the
datafile headers signifies to the database that recovery is required.
as sysdba"
tablespace DATA end backup;
tablespace INDEXES end backup;
tablespace SYSTEM end backup;
system archive log current;
The log file switch command is used to ensure that the marker
indicating that the tablespaces have been taken out of hot backup
mode is found in an archive log.
Oracle
2 4
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
5
ICO-IMG-000509
Figure 29
127
5. After tablespaces are taken out of hot backup mode and a log
switch is performed, split the Log BCV devices from their source
volumes:
symmir -g log_group split -noprompt
128
Oracle
2 4
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
5
ICO-IMG-000509
Figure 30
129
5. After the tablespaces are taken out of hot backup mode and a log
switch is performed, activate the log clone devices:
symclone -g log_group activate -noprompt
130
I/O
STD
4
I/O
Target
host
Device pointers
from VDEV to
original data
VDEV
Data copied to
save area due to
copy on write
SAVE
DEV
ICO-IMG-000510
Figure 31
131
5. After the database is taken out of hot backup mode and a log
switch is performed, activate the Log snap devices:
symsnap -g log_group activate -noprompt
132
5
Data STD
3 4
Data BCV
Log STD
6 7
1
Arch STD
8
9
10
Figure 32
ICO-IMG-000511
133
134
Host considerations
One of the primary considerations when starting a copy of an Oracle
database is whether to present it back to the same host or mount the
database on another host. While it is significantly simpler to restart a
database on a secondary host, it is still possible to restart a copy of the
database on the same host with only a few extra steps. The extra steps
required to mount a database to the same host, mounting a set of
copied volumes back to the same host, changing the mount points,
and relocating the datafiles, are described next.
135
136
137
MAXDATAFILES 30
MAXINSTANCES 2
MAXLOGHISTORY 224
LOGFILE
GROUP 1 (
'/oracle/oradata/test/oraredo1a.dbf',
'/oracle/oradata/test/oraredo2a.dbf'
) SIZE 10M,
GROUP 2 (
'/oracle/oradata/test/oraredo1b.dbf',
'/oracle/oradata/test/oraredo2b.dbf'
) SIZE 10M,
GROUP 3 (
'/oracle/oradata/test/oraredo1c.dbf',
'/oracle/oradata/test/oraredo2c.dbf'
) SIZE 10M
-- STANDBY LOGFILE
DATAFILE
'/oracle/oradata/test/orasys.dbf',
'/oracle/oradata/test/oraundo.dbf',
'/oracle/oradata/test/orausers.dbf'
CHARACTER SET US7ASCII
;
# Recovery is required if any of the datafiles are restored
# backups, or if the last shutdown was not normal or
# immediate.
RECOVER DATABASE
# Database can now be opened normally.
ALTER DATABASE OPEN;
# Commands to add tempfiles to temporary tablespaces.
# Online tempfiles have complete space information.
# Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP_TS ADD TEMPFILE
'/oracle/oradata/test/oratest.dbf'
SIZE 524288000 REUSE AUTOEXTEND OFF;
# End of tempfile additions.
#
#
Set #2. RESETLOGS case
#
# The following commands will create a new control file and
# use it to open the database. The contents of online logs
# will be lost and all backups will be invalidated. Use this
# only if online logs are damaged.
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "TEST" RESETLOGS
NOARCHIVELOG
-- SET STANDBY TO MAXIMIZE PERFORMANCE
MAXLOGFILES 16
MAXLOGMEMBERS 2
MAXDATAFILES 30
MAXINSTANCES 2
MAXLOGHISTORY 224
138
LOGFILE
GROUP 1 (
'/oracle/oradata/test/oraredo1a.dbf',
'/oracle/oradata/test/oraredo2a.dbf'
) SIZE 10M,
GROUP 2 (
'/oracle/oradata/test/oraredo1b.dbf',
'/oracle/oradata/test/oraredo2b.dbf'
) SIZE 10M,
GROUP 3 (
'/oracle/oradata/test/oraredo1c.dbf',
'/oracle/oradata/test/oraredo2c.dbf'
) SIZE 10M
-- STANDBY LOGFILE
DATAFILE
'/oracle/oradata/test/orasys.dbf',
'/oracle/oradata/test/oraundo.dbf',
'/oracle/oradata/test/orausers.dbf'
CHARACTER SET US7ASCII
;
# Recovery is required if any of the datafiles are restored
# backups, or if the last shutdown was not normal or
# immediate.
RECOVER DATABASE USING BACKUP CONTROLFILE
# Database can now be opened zeroing the online logs.
ALTER DATABASE OPEN RESETLOGS;
# Commands to add tempfiles to temporary tablespaces.
# Online tempfiles have complete space information.
# Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP_TS ADD TEMPFILE
'/oracle/oradata/test/oratest.dbf'
SIZE 524288000 REUSE AUTOEXTEND OFF;
# End of tempfile additions.
#
After deciding whether to open the database with a reset logs and
editing the file appropriately, the datafile locations can change.
When run, the instance will search in the new locations for the
Oracledatafiles.
sqlplus "/ as sysdba"
SQL> @create_control
This will create the new database, relocating the datafiles into the
newly specified locations.
139
140
2. After the appropriate devices are available to the host, make the
operating system aware of the devices. In addition, import the
volume or disk groups and mount any file systems. This is
operating-system dependent and is discussed in Appendix C,
Related Host Operation.
3. Since the database was shut down when the copy was made, no
special processing is required to restart the database. Start the
database as follows:
sqlplus "/ as sysdba"
SQL> startup;
141
SQL> startup;
2. After the appropriate devices are available to the host, make the
operating system aware of the devices. In addition, import the
volume or disk groups and mount any file systems. This is
operating-system dependent and is discussed in Appendix C,
Related Host Operation.
3. Since the database was shutdown when the copy was made, no
special processing is required to restart the database. The
following is used to start the database:
sqlplus "/ as sysdba"
SQL> startup mount;
SQL> recover database;
142
143
145
WHERE
OWNER
--------------DEV1
USER1
These owners (schemas) need to be verified on the
target side:
SELECT username
FROM
dba_users;
USERNAME
--------------SYS
SYSTEM
In this case, the DEV1 user exists but the USER1 user does not.
The USER1 user must be created with the command:
CREATE USER user1
IDENTIFIED BY user1;
146
147
In this case, both required datafiles are on the d:\ drive. This
volume will be identified and replicated using TimeFinder. Note
that careful database layout planning is critical when TimeFinder
is used for replication. First, create a device group for the
standard device used by the d:\ drive and a BCV that will be
used for the new e:\ drive. Appendix B, Sample SYMCLI Group
Creation Commands,provides examples of creating device
groups.
4. After creating the device group, establish the BCV to the standard
device:
symmir -g device_group establish -full -noprompt
symmir -g device_group verify -i 30
5. After the BCV is fully synchronized with the standard device, the
devices can split since the tablespaces on the device are in
read-only mode.
symmir -g device_group split -noprompt
file = d:\oracle\exp\meta1.dmp
tablespaces = (data1,index1)
tts_owners = (dev1,dev2)
Alternatively, with Data Pump in Oracle10g:
IMPDPsystem/manager
DUMPFILE = meta1.dmp
DIRECTORY = d:\oracle\exp\
TRANSPORT_DATAFILES =
e:\oracle\oradata\db1\data1.dbf,
e:\oracle\oradata\db1\index1.dbf
149
Overview
Cross-platform transportable tablespaces enable data from an Oracle
database running on one operating system to be cloned and
presented to another database running on a different platform. Oracle
datafiles differences, as a result of the need to run on different
operating systems, are a function of byte ordering, or "endianness," of
the files. The endian format of the datafiles is classified as either "big
endian" or "little endian" (in "big endian," the first byte is the most
significant while in "little endian", the first byte is the least
significant). If two operating systems both use "big endian" byte
ordering, the files can transferred between operating systems and
used successfully in an Oracle database (through a feature such as
transportable tablespaces). For source and target operating systems
with different byte ordering, a process to convert the datafiles from
one "endianness" to another is required.
Oracle uses an RMAN option to convert a data file from "big endian"
to "little endian" and vice versa. First, the "endianness" of the source
and target operating systems must be identified. If different, then the
datafiles are read and converted by RMAN. Upon completion, the
"endianness" of the datafiles is converted to the format needed in the
new environment. The process of converting the cloned datafiles
occurs either on the source database host before copying to the new
environment or once it is received on the target host. Other than this
conversion process, the steps for cross-platform transportable
tablespaces are the same as for normal transportable tablespaces.
150
151
FROM
v$transportable_platform a, v$database b
WHERE a.platform_name = b.platform_name;
On the Solaris host, the output from this SQL command is:
PLATFORM_NAME
ENDIAN_FORMAT
----------------------------- -------------Solaris[tm] OE (32-bit)Big
On the Linux host, this command returns:
PLATFORM_NAME
ENDIAN_FORMAT
----------------------------- -------------Linux IA (32-bit)Little
152
153
TimeFinder/Snap
TimeFinder/Clone
TimeFinder/Mirror
Replication Manager
Maximum number of
copies
15
Incremental: 16
Non-inc: Unlimited
Incremental: 16
Non-inc: Unlimited
Incremental: 16
Non-inc: Unlimited
No. simultaneous
Copies
15
16
Production impact
COFW
None
None
Scripting
Required
Required
Required
Automated
Database clone
needed a long time
Not recommended
Recommended
Recommended
Recommended
Not recommended
Recommended
Recommended
Recommended
The following are examples of some of the choices you might make
for database cloning based on the information in Table 10.
Table 10
154
System requirements
Replication choices
The application on the source volumes is very performancesensitive and the slightest degradation will cause responsiveness
of the system to miss SLAs.
TimeFinder/Mirror
TimeFinder/Snap
Replication Manager
TimeFinder/Clone
Multiple copies are being made, some with production mount. The
copies are reused in a cycle expiring the oldest one first.
Replication Manager
Replication
ManagerIntroductionIntrodu
ction
4
Backing Up Oracle
Environments
Introduction ............................................................................................
Comparing recoverable and restartable copies of databases...........
Database organization to facilitate recovery ......................................
Oracle backup overview .......................................................................
Using EMC replication in the Oracle backup process ......................
Copying the database with Oracle shutdown ...................................
Copying a running database using EMC consistency technology .
Copying the database with Oracle in hot backup mode ..................
Backing up the database copy..............................................................
Backups using EMC Replication Manager for Oracle backups ......
Backups using Oracle Recovery Manager (RMAN) .........................
Backups using TimeFinder and Oracle RMAN .................................
156
157
159
161
166
168
175
182
190
191
193
195
155
Introduction
As a part of normal day-to-day operations, the DBA creates backup
procedures that run one or more times a day to protect the database
against errors. Errors can originate from many sources (such as
software, hardware, user, and so on) and it is the responsibility of the
DBA to provide error recovery strategies that can recover the
database to a point of consistency and also minimize the loss of
transactional data. Ideally, this backup process should be simple,
efficient, and fast.
Today, the DBA is challenged to design a backup (and recovery)
strategy to meet the ever-increasing demands for availability that can
also manage extremely large databases efficiently while minimizing
the burden on servers, backup systems, and operations staff.
This section describes how the DBA can leverage EMC technology in
a backup strategy to:
156
157
158
Standard
devices
Redo
logs
Control
Archive
logs
Redo
logs
Control
SYSTEM
SYSAUX
DATA
INDEX
UNDO
TEMP
SYSTEM
SYSAUX
DATA
INDEX
UNDO
BCVs
Archive
logs
ICO-IMG-000512
Figure 33
159
160
161
163
Making a hot copy of the database is now the standard, but this
method has its own challenges. How can a consistent copy of the
database and supporting files be made when they are changing
throughout the duration of the backup? What exactly is the content of
the tape backup at completion? The reality is that the tape data is a
"fuzzy image" of the disk data, and considerable expertise is required
to restore the database back to a database point of consistency.
Online backups are made when the database is running in log
archival mode. While there are performance considerations for
running in archive log mode, the overhead associated with it is
generally small compared with the enhanced capabilities and
increased data protection afforded by running in it. Except in cases
such as large data warehouses where backups are unnecessary, or in
other relatively obscure cases, archive log mode is generally
considered a best practice for all Oracle database environments.
164
165
Shutdown
Processing normally
Conditioned using hot backup mode
166
depends on the state of the database at the time the copy was made.
Chapter 5, Restoring and Recovering Oracle Databases, covers the
restore of the database.
The following sections describe how to make a copy of the database
using three different EMC technologies with the database in the three
different states described in the prior paragraph.
The primary method of creating copies of an Oracle database is
through the use of the EMC local replication product TimeFinder.
TimeFinder is also used by Replication Manager to make database
copies. Replication Manager facilitates the automation and
management of database copies.
The TimeFinder family consists of two base products and several
component options. TimeFinder/Mirror, TimeFinder/Clone and
TimeFinder/Snap were discussed in general terms in Chapter 2,
EMC Foundation Products. In this chapter, they are used in a
database backup context.
167
168
Oracle
2 4
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000505
Figure 34
169
3. When the database is shut down, split the BCV mirrors using the
following command:
symmir -g device_group split -noprompt
170
Oracle
2 4
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000505
Figure 35
171
172
I/O
STD
4
I/O
Device pointers
from VDEV to
original data
VDEV
Data copied to
save area due to
copy on write
Target
host
SAVE
DEV
ICO-IMG-000506
Figure 36
2. Once the create operation has completed, shut down the database
in order to make a cold TimeFinder/Snap of the DBMS. Execute
the following Oracle commands:
sqlplus "/ as sysdba"
SQL> shutdown immediate;
173
174
175
Oracle
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000507
Figure 37
176
177
Oracle
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000507
Figure 38
178
179
STD
2
I/O
Target
host
Device pointers
from VDEV to
original data
VDEV
Data copied to
save area due to
copy on write
SAVE
DEV
ICO-IMG-000508
Figure 39
180
181
182
as sysdba"
system archive log current;
tablespace DATA begin backup;
tablespace INDEXES begin backup;
tablespace SYSTEM begin backup;
Alternatively, with Oracle10g, the entire database can be put into hot
backup mode with:
sqlplus "/ as sysdba"
SQL> alter system archive log current;
SQL> alter database begin backup;
When these commands are issued, data blocks for the tablespaces are
flushed to disk and the datafile headers are updated with the last
checkpoint SCN. Further updates of the checkpoint SCN to the data
file headers are not performed while in this mode. When these files
are copied, the nonupdated SCN in the datafile headers signifies to
the database that recovery is required.
as sysdba"
tablespace DATA end backup;
tablespace INDEXES end backup;
tablespace SYSTEM end backup;
system archive log current;
The log file switch command is used to ensure that the marker
indicating that the tablespaces are taken out of hot backup mode is
found in an archive log. Switching the log automatically ensures that
this record is found in a written archive log.
183
Oracle
2 4
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
5
ICO-IMG-000509
Figure 40
184
5. After the tablespaces are taken out of hot backup mode and a log
switch is performed, split the Log BCV devices from their source
volumes:
symmir -g log_group split -noprompt
185
Oracle
2 4
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
5
ICO-IMG-000509
Figure 41
186
5. After the tablespaces are taken out of hot backup mode and a log
switch is performed, activate the Log clone devices:
symclone -g log_group activate -noprompt
187
I/O
STD
4
I/O
Target
host
Device pointers
from VDEV to
original data
VDEV
Data copied to
save area due to
copy on write
SAVE
DEV
ICO-IMG-000510
Figure 42
5. After the database is taken out of hot backup mode and a log
switch is performed, activate the lsnap devices:
symsnap -g log_group activate -noprompt
189
190
5
Data STD
3 4
Data BCV
Log STD
6 7
1
Arch STD
8
9
10
Figure 43
ICO-IMG-000511
191
192
193
194
195
196
5
Restoring and
Recovering Oracle
Databases
Introduction ......................................................................................
Oracle recovery types ......................................................................
Oracle recovery overview ...............................................................
Restoring a backup image using TimeFinder ..............................
Restoring a backup image using Replication Manager..............
Oracle database recovery procedures ...........................................
Database recovery using Oracle RMAN.......................................
Oracle Flashback ..............................................................................
198
199
203
205
215
217
223
224
197
Introduction
Recovery of a production database is an event that all DBAs hope is
never required. Nevertheless, DBAs must be prepared for unforeseen
events such as media failures or user errors requiring database
recovery operations. The keys to a successful database recovery
include the following:
198
Crash recovery
A critical component of all ACID-compliant (Atomicity Consistency
Isolation Durability) databases is the ability to perform crash
recovery to a consistent database state after a failure. Power failures
to the host are a primary concern for databases to go down
inadvertently and require crash recovery. Other situations where
crash recovery procedures are needed include databases shut down
with the "abort" option and database images created using a
consistent split mechanism.
Crash recovery is an example of using the database restart process,
where the implicit application of database logs during normal
initialization occurs. Crash recovery is a database driven recovery
mechanism-it is not initiated by a DBA. Whenever the database is
started, Oracle verifies that the database is in a consistent state. It
does this by reading information out of the control file and verifying
the database was previously shut down cleanly. It also determines the
latest checkpoint system change number (SCN) in the control file and
verifies that each datafile is current by comparing the SCN in each
data file header. In the event that a crash occurred and recovery is
required, the database automatically determines which log
information needs to be applied. The latest redo log is read and
change information from them is applied to the database files, rolling
forward any transactions that were committed but not applied to the
database files. Then, any transaction information written to the
datafiles, but not committed, are rolled back using data in the undo
logs.
199
Media recovery
Media recovery is another type of Oracle recovery mechanism.
Unlike crash recovery however, media recovery is always
user-invoked, although both user-managed and RMAN recovery
types may be used. In addition, media recovery rolls forward changes
made to the datafiles restored from disk or tape due to their loss or
corruption. Unlike crash recovery, which uses only the online redo
log files, media recovery uses both the online redo logs and the
archived log files during the recovery process.
The granularity of a media recovery depends on the requirements of
the DBA. It can be performed for an entire database, for a single
tablespace, or even for a single datafile. The process involves
restoring a copy of a valid backed up image of the required data
structure (database, tablespace, datafile) and using Oracle standard
recovery methods to roll forward the database to the point in time of
the failure by applying change information found in the archived and
online redo log files. Oracle uses SCNs to determine the last changes
applied to the data files involved. It then uses information in the
control files that specifies which SCNs are contained in each of the
archive logs to determine where to start the recovery process.
Changes are then applied to appropriate datafiles to roll them
forward to the point of the last transaction in the logs.
Media recovery is the predominant Oracle recovery mechanism.
Media recovery is also used as a part of replicating Oracle databases
for business continuity or disaster recovery purposes. Further details
of the media recovery process are in the following sections.
200
Complete recovery
Complete recovery is the primary method of recovering an Oracle
database. It is the process of recovering a database to the latest point
in time (just before the database failure) without the loss of
committed transactions. The complete recovery process involves
restoring a part or all of the database data files from a backup image
on tape or disk, and then reading and applying all transactions
subsequent to the completion of the database backup from the
archived and online log files. After restarting the database, crash
recovery is performed to make the database transactionally
consistent for continued user transactional processing.
The processes needed to perform complete recovery of the database
are detailed in the following sections.
Incomplete recovery
Oracle sometimes refers to incomplete database recovery as a
point-in-time recovery. Incomplete recovery is similar to complete
recovery in the process used to bring the database back to a
transactionally consistent state. However, instead of rolling the
database forward to the last available transaction, roll-forward
procedures are halted at a user-defined prior point. This is typically
done to recover a database prior to a point of user error such as the
deletion of a table, undesired deletion or modification of customer
data, or rollback of an unfinished batch update. In addition,
incomplete recovery is also performed when recovery is required, but
there are missing or corrupted archive logs. Incomplete recovery
always incurs some data loss.
Typically, incomplete recovery operations are performed on the entire
database since Oracle needs all database files to be consistent with
one another. However, an option called Tablespace Point-in-Time
Recovery (usually abbreviated TSPITR), which allows a single
tablespace to be only partially recovered, is also available. This
recovery method, in Oracle10g, uses the transportable tablespace
feature described in Section 3.8. The Oracle documentation Oracle
Database Backup and Recovery Advanced Users Guide provides
additional information on TSPITR.
201
202
Partial (tablespace or datafile) versus full database recovery Oracle provides the ability to recover a single data file or
tablespace in addition to recovering the entire database. This
option is useful for example if a single datafile becomes
corrupted or data from a single tablespace is lost.
203
Ensuring that the database is consistent and has not lost any
transactions.
Verifying the database after restore and recovery depends upon the
customer's specific applications, requirements, and environment. As
such, it is not discussed further in this document.
204
205
This first case is depicted in Figure 44, where both the volumes
containing the datafiles and the database recovery structures (archive
logs, redo logs, and control files) are restored.
Prior to any disk-based restore using EMC technology, the database
must be shut down, and file systems unmounted. The operating
system should have nothing in its memory that reflects the content of
the database file structures.
Oracle
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000513
Figure 44
Oracle
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000514
Figure 45
206
In the example that follows, the data_group device group holds all
Symmetrix volumes containing Oracle tablespaces. The log_group
group has volumes containing the Oracle recovery structures (the
archive logs, redo logs, and control files). The following steps
describe the process needed to restore the database image from the
BCVs:
1. Verify the state of the BCVs. All volumes in the Symmetrix device
group should be in a split state. The following commands identify
the state of the BCVs for each of the device groups:
symmir -g data_group query
symmir -g log_group query
3. After the primary database has shut down, unmount the file
system (if used) to ensure that nothing remains in cache. This
action is operating-system dependent.
4. Once the primary database has shut down successfully and the
file system is un-mounted, initiate the BCV restore process. In this
example, both the data_group and log_group device groups are
restored, indicating a point-in-time recovery. If an incomplete or
complete recovery is required, only the data_group device group
would be restored. Execute the following TimeFinder/Mirror
SYMCLI commands:
symmir -g data_group restore -nop
symmir -g log_group restore -nop
symmir -g data_group query
symmir -g log_group query
207
5. Once the BCV restore process has been initiated, the production
database copy is ready for recovery operations. It is possible to
start the recovery process even though the data is still being
restored from the BCV to the production devices. Any tracks
needed, but not restored, will be pulled directly from the BCV
device. It is recommended however, that the restore operation
completes and the BCVs are split from the standard devices
before the source database is started and recovery (if required) is
initiated.
Note: It is important to understand that if the database is restarted before the
restore process completes, any changes to the source database volumes will
also be written to the BCVs. This means that the copy on the BCV will no
longer be a consistent database image. It is always recommended that the
restore process completes and the BCVs are split from the source volumes
before processing or recovery is initiated on the source devices.
6. After the restore process completes, split the BCVs from the
standard devices with the following commands:
symmir
symmir
symmir
symmir
-g
-g
-g
-g
Oracle
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000513
Figure 46
Oracle
Data STD
Data BCV
Log STD
Log BCV
Arch STD
Arch BCV
ICO-IMG-000514
Figure 47
209
In the example that follows, the data_group device group holds all
Symmetrix volumes containing Oracle tablespaces. The log_group
group has volumes containing the Oracle recovery structures (the
archive logs, redo logs, and control files). Follow these steps to restore
the database image from the BCV clone devices:
1. Verify the state of the clone devices. Volumes in the Symmetrix
device group should be in an active state, although the
relationship between the source and target volumes may have
terminated. The following commands identify the state of the
clones for each of the device groups (the -multi flag is used to
show all relationships available):
symclone -g data_group query -multi
symclone -g log_group query -multi
3. After the primary database has shut down, unmount the file
system (if used) to ensure that nothing remains in server cache.
This action is operating-system dependent.
4. Initiate the clone restore process. In this example, both the
data_group and log_group device groups are restored, indicating a
point-in-time recovery. If an incomplete or complete recovery is
required, only the data_group device group would be restored.
Execute the following TimeFinder/Clone SYMCLI commands:
symclone
symclone
symclone
symclone
210
-g
-g
-g
-g
-g
-g
-g
-g
211
STD
Data copied to
save area is
restored to
standard
Log
DEV
Arch
DEV
SAVE
DEV
Data
DEV
ICO-IMG-000515
Figure 48
Database
host
STD
Data copied to
save area is
restored to
standard
Data
DEV
SAVE
DEV
ICO-IMG-000516
Figure 49
212
3. After the primary database shuts down, unmount the file system
(if used) to ensure that nothing remains in cache. This action is
operating-system dependent.
4. Once the file systems are unmounted, initiate the snap restore
process. In this example, both the data_group and log_group device
groups are restored, indicating a point-in-time recovery. If an
incomplete or complete recovery is required, only the data_group
device group would be restored. Execute the following
TimeFinder/Clone SYMCLI commands:
symsnap -g data_group restore -nop
symsnap -g log_group restore -nop
symsnap
symsnap
symsnap
symsnap
-g
-g
-g
-g
213
6. When the snap restore process is initiated, both the snap device
and the source are set to a Not Ready status (that is, they are
offline to host activity). Once the restore operation commences,
the source device is set to a Ready state. Upon completion of the
restore process, terminate the restore operations as follows:
symsnap
symsnap
symsnap
symsnap
-g
-g
-g
-g
Note: Terminating the restore session does not terminate the underlying
snap session.
214
-g
-g
-g
-g
Data BCV
Log STD
3
Arch STD
4
5
ICO-IMG-000517
Figure 50
215
216
217
218
219
220
or
SQL> recover database until time timestamp;
221
or
SQL> recover database until time timestamp;
222
223
Oracle Flashback
Oracle Flashback is a technology that helps DBAs recover from user
errors to the database. Initial Flashback functionality was provided in
Oracle9i but was greatly enhanced in Oracle10g. Flashback retains
undo data in the form of flashback logs. Flashback logs containing
undo information are periodically written by the database in order
for the various types of Flashback to work.
Each type of Flashback relies on undo data being written to the flash
recovery area. The flash recovery area is a file system Oracle uses to
retain the flashback logs, archive logs, backups, and other
Some of the ways Flashback helps DBAs recover from user errors are:
Flashback Query
Flashback Version Query
Flashback Transaction Query
Flashback Table
Flashback Drop
Flashback Database
Flashback configuration
Flashback is enabled in a database by creating a flash recovery area
for the Flashback logs to be retained, and by enabling Flashback
logging. Flashback allows the database to be flashed back to any
point in time. However, the Flashback logs represent discrete
database points in time, and as such, ARCHIVELOG mode must also
be enabled for the database. Archive log information is used in
conjunction with the flashback logs to re-create any given database
point-in-time state desired.
The default flashback recovery area is defined by the Oracle
initialization parameter DB_RECOVERY_FILE_DEST. It is important
to set this parameter with the location of a directory that can hold the
flashback logs. The required size of this file system depends on how
far back a user may want to flash back the database to, and whether
224
Flashback Query
Flashback Query displays versions of queries run against a database
as they looked at a previous time. For example, if a user dropped a
selection of rows from a database erroneously, Flashback Query
allows that user to run queries against the table as if it were at that
time.
The following is an example of the Flashback Query functionality:
SELECT first_name, last_name
FROM
emp
AS OF TIMESTAMP
TO_TIMESTAMP('2005-11-25 11:00:00', 'YYYY-MM-DD
HH:MI:SS')
Oracle Flashback
225
WHERE
salary = '100000';
Flashback Table
Flashback Table returns a table back into the state that it was at a
specified time. It is particularly useful in that this change can be made
while the database is up and running. The following is an example of
the Flashback Table functionality:
FLASHBACK TABLE emp
TO TIMESTAMP
TO_TIMESTAMP('2005-11-26 10:30:00', 'YYYY-MM-DD
HH:MI:SS');
An SCN can also be used:
FLASHBACK TABLE emp
TO SCN 54395;
Flashback Drop
If tables in Oracle are dropped inadvertently using a DROP TABLE
command, Flashback Drop can reverse the process, reenabling access
to the drop table. As long as space is available, the DROP TABLE
command does not delete data in the tablespace data files. Instead,
226
the table data is retained (in Oracle's "recycle bin") and the table is
renamed to an internally system-defined name. If the table is needed,
Oracle can bring back the table by renaming it with its old name.
The following shows an example of a table being dropped and then
brought back using the FLASHBACK TABLE command.
1. Determine the tables owned by the currently connected user:
SQL> SELECT * FROM tab;
TNAME
TABTYPE
CLUSTERID
-------------------------------TEST
TABLE
Flashback Database
Flashback Database logically recovers the entire database to a
previous point in time. A database can be rolled back in time to the
point before a user error such as a batch update or set of transactions
logically corrupted the database. The database can rolled back to a
particular SCN, redo log sequence number, or timestamp. The
following is the syntax of the FLASHBACK DATABASE command:
Oracle Flashback
227
4. Open the database for use. To make the database consistent, open
the database as follows:
SQL> alter database open resetlogs;
After opening the database with the resetlogs option,
immediately perform a full database backup.
228
6
Understanding Oracle
Disaster Restart &
Disaster Recovery
Introduction ............................................................................................
Definitions...............................................................................................
Design considerations for disaster restart and disaster recovery ...
Tape-based solutions .............................................................................
Remote replication challenges..............................................................
Array-based remote replication ...........................................................
Planning for array-based replication...................................................
SRDF/S single Symmetrix array to single Symmetrix array...........
SRDF/S and consistency groups .........................................................
SRDF/A...................................................................................................
SRDF/AR single hop.............................................................................
SRDF/AR multihop...............................................................................
Database log-shipping solutions..........................................................
Running database solutions .................................................................
230
231
233
239
241
246
247
250
253
260
266
269
272
286
229
Introduction
A critical part of managing a database is planning for unexpected loss
of data. The loss can occur from a disaster such as a fire or flood or it
can come from hardware or software failures. It can even come
through human error or malicious intent. In each instance, the
database must be restored to some usable point, before application
services can resume.
The effectiveness of any plan for restart or recovery involves
answering the following questions:
Distance
Propagation delay (latency)
Network infrastructure
Data loss
230
Definitions
In the following sections, the terms dependent-write consistency,
database restart, database recovery, and roll-forward recovery are used. A
clear definition of these terms is required to understand the context of
this section.
Dependent-write consistency
A dependent-write I/O is one that cannot be issued until a related
predecessor I/O has completed. Dependent-write consistency is a
data state where data integrity is guaranteed by dependent-write
I/Os embedded in application logic. Database management systems
are good examples of the practice of dependent-write consistency.
Database management systems must devise protection against
abnormal termination to successfully recover from one. The most
common technique used is to guarantee that a dependent-write
cannot be issued until a predecessor write has completed. Typically
the dependent-write is a data or index write while the predecessor
write is a write to the log. Because the write to the log must be
completed prior to issuing the dependent-write, the application
thread is synchronous to the log write (that is, it waits for that write to
complete prior to continuing). The result of this strategy is a
dependent-write consistent database.
Database restart
Database restart is the implicit application of database logs during
the database's normal initialization process to ensure a
transactionally consistent data state.
If a database is shut down normally, the process of getting to a point
of consistency during restart requires minimal work. If the database
abnormally terminates, then the restart process will take longer
depending on the number and size of in-flight transactions at the
time of termination. An image of the database created by using EMC
consistency technology while it is running, without conditioning the
database, will be in a dependent-write consistent data state, which is
similar to that created by a local power failure. This is also known as
a DBMS restartable image. The restart of this image transforms it to a
Definitions
231
Database recovery
Database recovery is the process of rebuilding a database from a
backup image, and then explicitly applying subsequent logs to roll
forward the data state to a designated point of consistency. Database
recovery is only possible with databases configured with archive
logging.
A recoverable Oracle database copy can be taken in one of three
ways:
With the database in hot backup mode and copying the database
using external tools
Roll-forward recovery
With some databases, it may be possible to take a DBMS restartable
image of the database, and apply subsequent archive logs, to roll
forward the database to a point in time after the image was created.
This means that the image created can be used in a backup strategy in
combination with archive logs. At the time of printing, a DBMS
restartable image of Oracle cannot use subsequent logs to roll
forward transactions. In most cases, during a disaster, the storage
array image at the remote site will be an Oracle DBMS restartable
image and cannot have archive logs applied to it.
232
233
Operational complexity
The operational complexity of a DR solution may be the most critical
factor in determining the success or failure of a DR activity. The
complexity of a DR solution can be considered as three separate
phases.
1. Configuration of initial setup of the implementation
2. Maintenance and management of the running solution
3. Execution of the DR plan in the event of a disaster
While initial configuration complexity and running complexity can
be a demand on human resources, the third phase, execution of the
plan, is where automation and simplicity must be the focus. When a
disaster is declared, key personnel may be unavailable in addition to
the loss of servers, storage, networks, buildings, and so on. If the
complexity of the DR solution is such that skilled personnel with an
234
Production impact
Some DR solutions delay the host activity while taking actions to
propagate the changed data to another location. This action only
affects write activity and although the introduced delay may only be
of the order of a few milliseconds, it can impact response time in a
high-write environment. Synchronous solutions introduce delay into
write transactions at the source site; asynchronous solutions do not.
235
operational functions like power on and off. Ideally, this server could
have some usage such as running development or test databases and
applications. Some DR solutions require more target server activity
and some require none.
Bandwidth requirements
One of the largest costs for DR is in provisioning bandwidth for the
solution. Bandwidth costs are an operational expense; this makes
solutions that have reduced bandwidth requirements very attractive
to customers. It is important to recognize in advance the bandwidth
consumption of a given solution to be able to anticipate the running
costs. Incorrect provisioning of bandwidth for DR solutions can have
an adverse affect on production performance and can invalidate the
overall solution.
236
Federated consistency
Databases are rarely isolated islands of information with no
interaction or integration with other applications or databases. Most
commonly, databases are loosely and/or tightly coupled to other
databases using triggers, database links, and stored procedures. Some
databases provide information downstream for other databases using
information distribution middleware; other databases receive feeds
and inbound data from message queues and EDI transactions. The
result can be a complex interwoven architecture with multiple
interrelationships. This is referred to as a federated database
architecture.
With a federated database architecture, making a DR copy of a single
database without regard to other components invites consistency
issues and creates logical data integrity problems. All components in
a federated architecture need to be recovered or restarted to the same
dependent-write consistent point of time to avoid these problems.
It is possible then that point database solutions for DR, such as log
shipping, do not provide the required business point of consistency
in a federated database architecture. Federated consistency solutions
guarantee that all components, databases, applications, middleware,
flat files, and such are recovered or restarted to the same
dependent-write consistent point in time.
237
Cost
The cost of doing DR can be justified by comparing it to the cost of
not doing it. What does it cost the business when the database and
application systems are unavailable to users? For some companies,
this is easily measurable, and revenue loss can be calculated per hour
of downtime or per hour of data loss.
Whatever the business, the DR cost is going to be an extra expense
item and, in many cases, with little in return. The costs include, but
are not limited to:
238
Facility leasing/purchase
Utilities
Network infrastructure
Personnel
Tape-based solutions
This sectin discusses the following tape-based solutions:
239
240
Bandwidth requirements
Network infrastructure
Method of instantiation
Method of reinstantiation
Locality of reference
Failback operations
Propagation delay
Electronic operations execute at the speed of light. The speed of light
in a vacuum is 186,000 miles per second. The speed of light through
glass (in the case of fiber-optic media) is less, approximately 115,000
miles per second. In other words, in an optical network, such as
241
Bandwidth requirements
All remote replication solutions have some bandwidth requirements
because the changes from the source site must be propagated to the
target site. The more changes there are, the greater the bandwidth
that is needed. It is the change rate and replication methodology that
determine the bandwidth requirement, not necessarily the size of the
database.
Data compression can help reduce the quantity of data transmitted
and therefore the size of the "pipe" required. Certain network devices,
like switches and routers, provide native compression, some by
software and some by hardware. GigE directors provide native
compression in a DMX to DMX SRDF pairing. The amount of
compression achieved depends on the type of data being
compressed. Typical character and numeric database data
compresses at about a 2-to-1 ratio. A good way to estimate how the
data will compress is to assess how much tape space is required to
store the database during a full-backup process. Tape drives perform
hardware compression on the data prior to writing it. For instance, if
a 300 GB database takes 200 GB of space on tape, the compression
ratio is 1.5 to 1.
For most customers, a major consideration in the disaster recovery
design is cost. It is important to recognize that some components of
the end solution represent a capital expenditure and some an
operational expenditure. Bandwidth costs are operational expenses
and thus any reduction in this area, even at the cost of some capital
expense, is highly desirable.
Network infrastructure
The choice of channel extension equipment, network protocols,
switches, routers, and such, ultimately determines the operational
characteristics of the solution. EMC has a proprietary "BC Design
Tool" to assist customers in analysis of the source systems and to
determine the required network infrastructure to support a remote
replication solution.
242
Method of instantiation
In all remote replication solutions, a common requirement is for an
initial, consistent copy of the complete database to be replicated to
the remote site. The initial copy from source to target is called
instantiation of the database at the remote site. Following instantiation,
only the changes made at the source site are replicated. For large
databases, sending only the changes after the initial copy is the only
practical and cost-effective solution for remote database replication.
In some solutions, instantiation of the database at the remote site uses
a process similar to the one that replicates the changes. Some
solutions do not even provide for instantiation at the remote site (log
shipping for instance). In all cases it is critical to understand the pros
and cons of the complete solution.
Method of reinstantiation
Some methods of remote replication require periodic refreshing of the
remote system with a full copy of the database. This is called
reinstantiation. Technologies such as log shipping frequently require
this since not all activity on the production database may be
represented in the log. In these cases, the disaster recovery plan must
account for re-instantiation and also for the fact there may be a
disaster during the refresh. The business objectives of RPO and RTO
must likewise be met under those circumstances.
243
Locality of reference
Locality of reference is a factor that needs to be measured to
understand if there will be a reduction of bandwidth consumption
when any form of asynchronous transmission is used. Locality of
reference is a measurement of how much write activity on the source
is skewed. For instance, a high locality of reference application may
make many updates to a few tables in the database, whereas a low
locality of reference application rarely updates the same rows in the
same tables during a given time period. While the activity on the
tables may have a low locality of reference, the write activity into an
index might be clustered when inserted rows have the same or
similar index column values. This renders a high locality of reference
on the index components.
In some asynchronous replication solutions, updates are "batched"
into periods of time and sent to the remote site to be applied. In a
given batch, only the last image of a given row/block is replicated to
the remote site. So, for highly skewed application writes, this results
in bandwidth savings. Generally, the greater the time period of
batched updates, the greater the savings on bandwidth.
Log-shipping technologies do not consider locality of reference. For
example, a row updated 100 times, is transmitted 100 times to the
remote site, whether the solution is synchronous or asynchronous.
Failback operations
If there is the slightest chance that failover to the DR site may be
required, then there is a 100 percent chance that failback to the
primary site also will be required, unless the primary site is lost
permanently. The DR architecture should be designed to make
failback simple, efficient, and low risk. If failback is not planned for,
there may be no reasonable or acceptable way to move the processing
from the DR site, where the applications may be running on tier 2
servers and tier 2 networks, and so forth, back to the production site.
In a perfect world, the DR process should be tested once a quarter,
with database and application services fully failed over to the DR site.
The integrity of the application and database must be verified at the
remote site to ensure all required data copied successfully. Ideally,
production services are brought up at the DR site as the ultimate test.
This means production data is maintained on the DR site, requiring a
failback when the DR test completed. While this is not always
possible, it is the ultimate test of a DR solution. It not only validates
the DR process, but also trains the staff on managing the DR process
should a catastrophic failure occur. The downside for this approach is
that duplicate sets of servers and storage need to be present to make
an effective and meaningful test. This tends to be an expensive
proposition.
245
246
247
SYMMETRIX
Standard
devices
Redo
logs
Control
Archive
logs
Redo
logs
Control
SYSTEM
SYSAUX
DATA
INDEX
UNDO
TEMP
SYSTEM
SYSAUX
DATA
INDEX
UNDO
BCVs
Archive
logs
ICO-IMG-000512
Figure 51
248
SQL> shutdown
SQL> exit
2. Move the datafiles using O/S commands from the old location to
the new location:
mv /oracle/oldlogs/log1a.rdo /oracle/newlogs/log1a.rdo
mv /oracle/oldlogs/log1b.rdo /oracle/newlogs/log1b.rdo
249
4
1
Oracle
Source
Target
ICO-IMG-000518
Figure 52
250
4. Instruct the source Symmetrix array to send all the tracks on the
source site to the target site using the current mode:
symrdf -g device_group establish -full -noprompt
251
At this point, the host can issue the necessary commands to access the
disks. For instance, on a UNIX host, import the volume group,
activate the logical volumes, fsck the file systems and mount them.
Once the data is available to the host, the database can restart. The
database will perform an implicit recovery when restarted.
Transactions that were committed, but not completed, are rolled
forward and completed using the information in the redo logs.
Transactions that have updates applied to the database, but were not
committed, are rolled back. The result is a transactionally consistent
database.
252
Rolling disaster
Protection against a rolling disaster is required when the data for a
database resides on more than one Symmetrix array or multiple RA
groups. Figure 53 on page 254 depicts a dependent-write I/O
sequence where a predecessor log write is happening prior to a page
flush from a database buffer pool. The log device and data device are
on different Symmetrix arrays with different replication paths.
Figure 53 demonstrates how rolling disasters can affect this
dependent-write sequence
253
4 Data
ahead
of Log
R1(A)
Host
3
R1(X)
R1(B)
3
R1(Y)
R2(X)
R2(Y)
DBMS
1
R2(Z)
R2(A)
R2(B)
R2(C)
R1(C)
X = Application Data
Y = DBMS Data
Z = Logs
R1(Z)
ICO-IMG-000519
Figure 53
254
255
Host 1
Solutions Enabler
ConGroup definition
5 Suspend R1/R2
relationship
E-ConGroup
definition
(X,Y,Z)
7 DBMS
DBMS
restartable
copy
R1(A)
SCF/SYMAPI
R1(B)
R1(X)
IOS/PowerPath
DBMS
SCF/SYMAPI
IOS/PowerPath
R2(B)
R2(Y)
Host 2
Solutions Enabler
ConGroup definition
R2(A)
R2(X)
R1(Y)
R1(C)
E-ConGroup
definition
(X,Y,Z)
R2(C)
R2(Z)
R1(Z)
X = Application data
Y = DBMS data
Z = Logs
ICO-IMG-000520
Figure 54
256
257
Target
Data
files
Data
files
Redo
logs
Redo
logs
Synchronous
Source
Archive
logs
Data
files
Oracle
Archive
logs
Data
files
Redo
logs
Redo
logs
Archive
logs
Archive
logs
ICO-IMG-000521
Figure 55
2. Add to the consistency group the R1 devices 121 and 12f from
Symmetrix with ID 111, and R1 devices 135 and 136 from
Symmetrix with ID 222:
symcg
symcg
symcg
symcg
258
-cg
-cg
-cg
-cg
device_group
device_group
device_group
device_group
add
add
add
add
dev
dev
dev
dev
121
12f
135
136
-sid
-sid
-sid
-sid
111
111
222
222
259
SRDF/A
SRDF/A, or asynchronous SRDF, is a method of replicating
production data changes from one Symmetrix array to another using
delta set technology. Delta sets are the collection of changed blocks
grouped together by a time interval configured at the source site. The
default time interval is 30 seconds. The delta sets are then transmitted
from the source site to the target site in the order created. SRDF/A
preserves dependent-write consistency of the database at all times at
the remote site.
The distance between the source and target Symmetrix arrays is
unlimited and there is no host impact. Writes are acknowledged
immediately when they hit the cache of the source Symmetrix array.
SRDF/A is only available on the DMX family of Symmetrix arrays.
Figure 56 shows the process.
Oracle
R1
N
N-1
N-1
R2
N-2
R1
N
N-1
N-1
R2
N-2
ICO-IMG-000522
Figure 56
SRDF/A
261
2. Add to the device group the R1 devices 121 and 12f from the
Symmetrix array with ID 111, and R1 devices 135 and 136 from
the Symmetrix array with ID 222:
symld
symld
symld
symld
-g
-g
-g
-g
device_group
device_group
device_group
device_group
add
add
add
add
dev
dev
dev
dev
121
12f
135
136
-sid
-sid
-sid
-sid
111
111
222
222
4. Instruct the source Symmetrix array to send all the tracks at the
source site to the target site using the current mode:
symrdf -g device_group establish -full -noprompt
-cg
-cg
-cg
-cg
device_group
device_group
device_group
device_group
add
add
add
add
dev
dev
dev
dev
121
12f
135
136
-sid
-sid
-sid
-sid
111
111
222
222
SRDF/A
263
4. Instruct the source Symmetrix array to send all the tracks at the
source site to the target site using the current mode:
symrdf -g device_group establish -full -noprompt
At this point, the host can issue the necessary commands to access the
disks. For instance, on a UNIX host, import the volume group,
activate the logical volumes, fsck the file systems, and mount them.
264
Once the data is available to the host, the database can be restarted.
The database will perform crash recovery when restarted.
Transactions committed, but not completed, are rolled forward and
completed using the information in the redo logs. Transactions with
updates applied to the database, but not committed, are rolled back.
The result is a transactionally consistent database.
SRDF/A
265
5
1
STD
BCV/R1
STD
BCV/R1
4
2
R2
BCV
R2
BCV
Oracle
ICO-IMG-000523
Figure 57
266
267
268
SRDF/AR multihop
SRDF/AR multihop is an architecture that allows long-distance
replication with zero seconds of data loss through use of a bunker
Symmetrix array. Production data is replicated synchronously to the
bunker Symmetrix array, which is within 200 km of the production
Symmetrix array allowing synchronous replication, but also far
enough away that potential disasters at the primary site may not
affect it. Typically, the bunker Symmetrix array is placed in a
hardened computing facility.
BCVs in the bunker frame are periodically synchronized to the R2s
and consistent split in the bunker frame to provide a dependent-write
consistent point-in-time image of the data. These bunker BCVs also
have an R1 personality, which means that SRDF in adaptive copy
mode can be used to replicate the data from the bunker array to the
target site. Since the BCVs are not changing, the replication can be
completed in a finite length of time. The replication time depends on
the size of the "pipe" between the bunker location and the DR
location, the distance between the two locations, the quantity of
changed data, and the locality of reference of the changed data. On
the remote Symmetrix array, another BCV copy of the data is made
using the R2s. This is because the next SRDF/AR iteration replaces
the R2 image, in a nonordered fashion, and if a disaster were to occur
while the R2s were synchronizing, there would not be a valid copy of
the data at the DR site. The BCV copy of the data in the remote
Symmetrix array is commonly called the "gold" copy of the data. The
whole process then repeats.
SRDF/AR multihop
269
Bunker
Oracle
DR
5
2
R1
R2
BCV/R1
R2
BCV
R1
R2
BCV/R1
R2
BCV
Short Distance
Long Distance
ICO-IMG-000524
Figure 58
270
SRDF/AR multihop
271
Log-shipping considerations
When considering a log shipping strategy it is important to
understand:
272
Log-shipping limitations
Log shipping transfers only the changes happening to the database
that are written into the redo logs and then copied to an archive log.
Consequently, operations happening in the database not written to
the redo logs do not get shipped to the remote site. To ensure that all
transactions are written to the redo logs, run the following command:
alter database force logging;
Log shipping is a database-centric strategy. It is completely agnostic
and does not address changes that occur outside of the database.
Changes include, but are not limited to the following:
273
274
If FTP is being used to ship the log files, what kind monitoring is
needed to guarantee success?
Data
files
1
Redo
logs
Other
data
Other
data
Oracle
Archive
logs
Data
files
Archive
logs
3
Figure 59
Active
logs
Oracle
ICO-IMG-000525
275
276
While replicating the logs to the remote site, the receiving volumes
(R2s) cannot be used by the host, as they are read-only. If the business
requirements are such that the standby database should be
continually applying logs, BCVs can be used periodically to
synchronize against the R2s and split. Then, the BCVs can be accessed
by a host at the remote site and the archive logs retrieved and applied
to the standby database.
277
278
279
Source site
Target Site
LGWR
RFS
Standby
DB
Primary DB
Standby
Redo log
Redo log
ARCn
MRP or LSP
ARCn
LOG_ARCH_DEST_1
Archive log
Archive log
ICO-IMG-000528
Figure 60
280
Description
LGWR
(Log Writer)
Sends Redo Log information from the primary host to the standby host via
Oracle Net. LGWR can be configured to send data to standby redo logs on
the standby host for synchronous operations.
ARCn
(Archiver)
Sends primary database archive logs to the standby host. This process is
used primarily in configurations that do not use standby redo logs and are
configured for asynchronous operations.
RFS
(Remote File
Server)
Receives log data, either from the primary LGWR or ARCn processes, and
write data on the standby site to either the standby redo logs or archive logs.
This process is configured on the standby host when Data Guard is
implemented.
Table 11
Description
FAL
(Fetch Archive
Log)
Manages the retrieval of corrupted or missing archive from the primary to the
standby host.
MRP
(Managed
Recovery)
Used by a physical standby database to apply logs, retrieved from either the
standby redo logs or from local copies of archive logs.
LSP
(Logical
Standby)
Used by a logical standby database to apply logs, retrieved from either the
standby redo logs or from local copies of the archive logs.
LNS
(Network
Server)
Enables asynchronous writes to the standby site using the LGWR process
and standby redo logs.
Data Guard only ships change records from the redo logs to the
target; this information can be significantly smaller than the
actual block-level changes in the database.
281
282
283
Data
files
1
Redo
logs
Other
data
Other
data
Oracle
Archive
logs
Archive
logs
3
Figure 61
Data
files
Active
logs
Oracle
ICO-IMG-000525
284
285
Overview
Running database solutions attempt to use DR solutions in an active
fashion. Instead of having the database and server sitting idly waiting
for a disaster to occur, the idea of having the database running and
serving a useful purpose at the DR site is an attractive one. Also,
active databases at the target site minimize the recovery time
required to have an application available in the event of a failure of
the primary. The problem is that hardware, server, and database
replication-level solutions typically require exclusive access to the
database, not allowing users to access the target database. The
solutions presented in this section perform replication at the
application layer and therefore allow user access even when the
database is being updated by the replication process.
In addition to an Oracle Data Guard logical standby database, which
can function as a running database while log information is being
applied to it, Oracle has two other methods of synchronizing data
between disparate running databases. These running database
solutions are Oracle's Advanced Replication and Oracle Streams,
which are described at a high level in the following sections.
Advanced Replication
Advanced Replication is one method of replicating objects between
Oracle databases. Advanced Replication is similar to Oracle's
previous Snapshot technology, where changes to the underlying
tables were tracked internally within Oracle and used to provide a list
of necessary rows to be sent to a remote location when a refresh of the
remote object was requested. Instead of snapshots, Oracle now uses
materialized views to track and replicate changes. Materialized views
are a complete or partial copy of a target table from a single point in
time.
286
Oracle Streams
Streams is Oracle's distributed transaction solution for propagating
table, schema, or entire database changes to one or many other Oracle
databases. Streams uses the concept of change records from the
source database, which are used to asynchronously distribute
changes to one or more target databases. Both DML and DDL
changes can be propagated between the source and target databases.
Queues on the source and target databases are used to manage
change propagation between the databases.
287
288
7
Oracle Database
Layouts on EMC
Symmetrix DMX
Introduction ......................................................................................
The performance stack ....................................................................
Traditional Oracle layout recommendations ...............................
Symmetrix DMX performance guidelines....................................
RAID considerations .......................................................................
Host- versus array-based striping .................................................
Data placement considerations ......................................................
Other layout considerations ...........................................................
Oracle database-specific configuration settings ..........................
The database layout process...........................................................
290
291
294
297
311
318
322
328
331
333
289
Introduction
Monitoring and managing database performance should be a
continuous process in all Oracle environments. Establishing baselines
and collecting database performance statistics for comparison against
them are important to monitor performance trends and maintain a
smoothly running system. The following section discusses the
performance stack and how database performance should be
managed in general. Subsequent sections discuss Symmetrix DMX
layout and configuration issues to help ensure the database meets the
required performance levels.
290
291
Application
SQL Statements
SQL logic errors, missing index
DB Engine
Database resource contention
Operating System
Storage System
ICO-IMG-000040
Figure 62
293
294
295
296
Front-end connectivity
Optimizing front-end connectivity requires an understanding of the
number and size of I/Os, both reads and writes, which will be sent
between the hosts and the Symmetrix DMX array. There are
limitations to the amount of I/O that each front-end director port,
each front-end director processor, and each front-end director board
can handle. Additionally, SAN fan-out counts (that is, the number of
hosts that can be attached through a Fibre Channel switch to a single
front-end port) need to be carefully managed.
A key concern when optimizing front-end performance is
determining which of the following I/O characteristics is more
important in the customer's environment:
297
Throughput (MB/s)
Percent of maximum
90%
80%
70%
60%
50%
40%
30%
20%
I/O per sec
10%
MB per sec
0%
512
4096
8192
32768
65536
Blocksize
Figure 63
ICO-IMG-000042
298
the host to send larger I/O sizes for DSS applications can increase the
overall throughput (MB/s) from the front-end directors on the DMX.
Database block sizes are generally larger (16 KB or even 32 KB) for
DSS applications. Sizing the host I/O size as a power of two multiple
of the DB_BLOCK_SIZE and tuning the
MULTI_BLOCK_READ_COUNT appropriately is important for
maximizing performance in a customer's Oracle environment.
Currently, each Fibre Channel port on the Symmetrix DMX is
theoretically capable of 200 MB/s of throughput. In practice however,
the throughput available per port is significantly less and depends on
the I/O size and on the shared utilization of the port and processor
on the director. Increasing the size of the I/O from the host
perspective decreases the number of IOPS that can be performed, but
increases the overall throughput (MB/s) of the port. As such,
increasing the I/O block size on the host is beneficial for overall
performance in a DSS environment. Limiting total throughput to a
fraction of the theoretical maximum (100 to 120 MB/s is a good "rule
of thumb") will ensure that enough bandwidth is available for
connectivity between the Symmetrix DMX and the host.
Symmetrix cache
The Symmetrix cache plays a key role in improving I/O performance
in the storage subsystem. The cache improves performance by
allowing write acknowledgements to be returned to a host when data
is received in solid-state cache, rather than being fully destaged to the
physical disk drives. Additionally, reads benefit from cache when
sequential requests from the host allow follow-on reads to be
prestaged in cache. The following briefly describes how the
Symmetrix cache is used for writes and reads, and then discusses
performance considerations for it.
Write operations and the Symmetrix cache
All write operations on a Symmetrix array are serviced by cache.
When a write is received by the front-end director, a cache slot must
be found to service the write operation. Since cache slots are a
representation of the underlying hypervolume, if a prior read or
write operation caused the required data to already be loaded into
cache, the existing cache slot may be used to store the write I/O. If a
cache slot representing the storage area is not found, a call is made to
locate a free cache slot for the write. The write operation is moved to
the cache slot and the slot is then marked write pending. At a later
299
point, Enginuity will destage the write to physical disk. The decision
of when to destage is based on overall system load, physical disk
activity; read operations to the physical disk, and availability of
cache.
Cache is used to service the write operation to optimize the
performance of the host system. As write operations to cache are
significantly faster than physical writes to disk media, the write is
reported as complete to the host operating system much earlier.
Battery backup and priority destage functions within the Symmetrix
ensure that no data loss occurs in the event of system power failure.
If the write operation to a given disk is delayed due to higher priority
operations (read activity is one such operation), the write-pending
slot remains in cache for longer time periods. Cache slots are
allocated as needed to a volume for this purpose. Enginuity
calculates thresholds for allocations to limit the saturation of cache by
a single hypervolume. These limits are referred to as write-pending
limits.
Cache allocations are based on a per hypervolume basis. As
write-pending thresholds are reached, additional allocations may
occur, as well as reprioritization of write activity. As a result, write
operations to the physical disks may increase in priority to ensure
that excessive cache allocations do not occur. This is discussed in
more detail in the next section.
Thus, the cache enables buffering of writes and allows for a steady
stream of write activity to service the destaging of write operations
from a host. In a "bursty" write environment, this serves to even out
the write activity. Should the write activity constantly exceed the low
write priority to the physical disk, Enginuity will raise the priority of
write operations to attempt to meet the write demand. Ultimately,
should write load from the host exceed the physical disk ability to
write, the volume maximum write-pending limit may be reached. In
this condition, new cache slots only will be allocated for writes to a
particular volume once a currently allocated slot is freed by destaging
it to disk. This condition, if reached, may severely impact write
operations to a single hypervolume.
300
301
KB/sec
KB/sec
303
13:09:52
13:09:52 035A
0430
0431
0432
0434
0435
043A
043C
043E
043F
0440
0441
0442
0443
0444
(Not
(Not
(Not
(Not
(Not
(Not
(Not
(Not
(Not
(Not
(Not
(Not
(Not
(Not
(Not
Visible
Visible
Visible
Visible
Visible
Visible
Visible
Visible
Visible
Visible
Visible
Visible
Visible
Visible
Visible
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
0
0
0
0
0
0
0
0
0
0
0
0
13
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
66
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
N/A
100
100
82
0
0
N/A
N/A
N/A
N/A
N/A
N/A
0
100
N/A
N/A
0
0
28
100
100
N/A
N/A
N/A
N/A
N/A
N/A
100
0
N/A
N/A
2
100 2679
100 2527
0 2444
0 14157
0 14157
N/A
49
N/A
54
N/A
15
N/A
10
N/A
807
N/A
328
0
17
100 1597
N/A
4
From this, we see devices 434 and 435 have reached the device
write-pending limit of 14,157. Further analysis on the cause of the
excessive writes and methods of alleviating this performance
bottleneck against these devices should be made.
Alternatively, Performance Manager may be used to determine the
device write-pending limit, and whether device limits are being
reached. Figure 64 on page 305 is a Performance Manager view
displaying both the device write-pending limits and device
write-pending counts for a given device, in this example Symmetrix
device 055. For the Symmetrix in this example, the write-pending
slots per device was 9,776 and thus the max write-pending limit was
29,328 slots (3 * 9776). In general, a distinct flat line in such graphs
indicates that a limit is reached.
304
30000
25000
Devices 055write pending
count 12/16/200n
20000
15000
Devices 055maximum
write pending
threshold
12/16/200n
10000
5000
0
16:50 16:52 16:54 16:56 16:58 17:00 17:02 17:04 17:06 17:08 17:10 17:12 17:14
ICO-IMG-000043
Figure 64
305
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
12:49 12:53 12:57 13:01 13:05 13:09 13:13 13:17 13:27 13:25
ICO-IMG-000038
Figure 65
306
Meta
10 ps
Read 10 ps
Hyper
Write 10 ps
Transactions
per second
ICO-IMG-000039
Figure 66
Note that the number of cache boards also has a minor affect on
performance. When comparing Symmetrix DMX arrays with the
same amount of cache, increasing the number of boards (for example,
four cache boards with 16 GB each as opposed to two cache boards
with 32 GB each) has a small positive affect on the performance in
DSS applications. This is due to the increased number of paths
between front-end directors and cache, and has the affect of
improving overall throughput. However, configuring additional
boards is only helpful in high-throughput environments such as DSS
applications. For OLTP workloads, where IOPS are more critical,
additional cache directors provide no added performance benefits.
This is because the number of IOPS per port or director is limited by
the processing power of CPUs on each board.
Note: In the DMX-3, write-pending limits for individual volumes is
modified. Instead of allowing writes up to three times the initial
write-pending limit, up to ~ 1/20 of the cache can be used by any individual
hypervolume.
307
Back-end considerations
Back-end considerations are typically the most important part of
optimizing performance on the Symmetrix DMX. Advances in disk
technologies have not kept up with performance increases in other
parts of the storage array such as director and bandwidth (that is,
Direct Matrix versus Bus) performance. Disk-access speeds have
increased by a factor of three to seven in the last decade while other
components have easily increased one to three orders of magnitude.
As such, most performance bottlenecks in the Symmetrix DMX are
attributable to physical spindle limitations.
An important consideration for back-end performance is the number
of physical spindles available to handle the anticipated I/O load.
Each disk is capable of a limited number of operations. Algorithms in
the Symmetrix DMX Enginuity operating environment optimize
I/Os to the disks. Although this helps to reduce the number of reads
and writes to disk, access to disk, particularly for random reads, is
still a requirement. If an insufficient number of physical disks are
available to handle the anticipated I/O workload, performance will
suffer. It is critical to determine the number of spindles required for
an Oracle database implementation based on I/O performance
requirements, and not solely on the physical space considerations.
To reduce or eliminate back-end performance issues on the
Symmetrix DMX, carefully spread access to the disks across as many
back-end directors and physical spindles as possible. EMC has long
recommended for data placement of application data to "go wide
before going deep." This means that performance is improved by
spreading data across the back-end directors and disks, rather than
allocating individual applications to specific physical spindles.
Significant attention should be given to balancing the I/O on the
physical spindles. Understanding the I/O characteristics of each
datafile and separating high application I/O volumes on separate
physical disks will minimize contention and improve performance.
Implementing Symmetrix Optimizer may also help to reduce I/O
contention between hypervolumes on a physical spindle. Symmetrix
Optimizer identifies I/O contention on individual hypervolumes and
nondisruptively moves one of the hypers to a new location on
another disk. Symmetrix Optimizer is an invaluable tool in helping to
reduce contention on physical spindles should workload
requirements change in an environment.
308
309
Configuration recommendations
Key recommendations for configuring the Symmetrix DMX for
optimal performance include the following:
Understand the I/O It is critical to understand the characteristics
of the database I/O including the number, type (read or write) size,
location (that is, data files, logs), and sequentiality of the I/Os.
Empirical data or estimates are needed to assist in planning.
310
Spread out the I/O Both reads and writes should be spread
across the physical resources (front-end and back-end ports,
physical spindles, hypervolumes) of the DMX. This helps to
prevent bottlenecks such as hitting port or spindle I/O limits, or
reaching write-pending limits on a hypervolume.
RAID considerations
For years, Oracle has recommended that all database storage be
mirrored; their philosophy of stripe and mirror everywhere (SAME)
is well known in the Oracle technical community. While laying out
databases using SAME may provide optimal performance in most
circumstances, in some situations acceptable data performance (IOPS
or throughput) can be achieved by implementing more economical
RAID configurations such as RAID 5. Before discussing RAID
recommendations for Oracle, a definition of each RAID type available
in the Symmetrix DMX is required.
Types of RAID
The following RAID configurations are available on the Symmetrix
DMX:
RAID 1 - These are mirrored devices and are the most common
RAID type in a Symmetrix DMX. Mirrored devices require writes
to both physical spindles. However, intelligent algorithms in the
Enginuity operating environment can use both copies of the data
to satisfy read requests not in the cache of the Symmetrix DMX.
RAID 1 offers optimal availability and performance, but at an
increased cost over other RAID protection options.
RAID considerations
311
Disk 1
Disk 2
Disk 3
Disk 4
Parity 1 - 12
Data 13 - 16
Data 25 - 28
Data 37 - 40
Data 1 - 4
Parity 13 - 24
Data 29 - 32
Data 41 - 44
Data 5 - 8
Data 17 - 20
Parity 25 - 36
Data 45 - 48
Data 9 - 12
Data 21 - 24
Data 33 - 36
Parity 37 - 48
Stripe size
(4 tracks wide)
ICO-IMG-000083
Figure 67
312
Host data
Data slot
Parity slot
2
3
4
rite
XOR data w
of old
XOR
ta
w da
e
and n
XOR
parity
write
Data
Parity
ICO-IMG-000045
Figure 68
313
Data
Data
Data
Data slots
Host data
1
Data slots
Data slots
Write new
data
2
XOR in Cache
Parity slots
Parity
ICO-IMG-000527
Figure 69
314
RAID recommendations
Oracle has long recommended RAID 1 over RAID 5 for database
implementations. This was largely attributed to RAID 5's historical
poor performance versus RAID 1 (due to software implemented
RAID schemes) and also due to high disk drive failure rates that
caused RAID 5 performance degradation after failures and during
rebuilds. However, disk drives and RAID 5 in general have seen
significant optimizations and improvements since Oracle initially
recommended avoiding RAID 5. In the Symmetrix DMX, Oracle
databases can be deployed on RAID 5 protected disks for all but the
highest I/O performance intensive applications. Databases used for
test, development, QA, or reporting are likely candidates for using
RAID 5 protected volumes.
Another potential candidate for deployment on RAID 5 storage is
DSS applications. In many DSS environments, read performance
greatly outweighs the need for rapid writes. This is because data
warehouses typically perform loads off-hours or infrequently (once a
week or month); read performance in the form of database user
queries is significantly more important. Since there is no RAID
penalty for RAID 5 read performance, only write performance, these
types of applications are generally good candidates for RAID 5
storage deployments. Conversely, production OLTP applications
typically require small random writes to the database, and as such,
are generally more suited to RAID 1 storage.
RAID considerations
315
Symmetrix metavolumes
Individual Symmetrix hypervolumes of the same RAID type (RAID
1, RAID 5) may be combined together to form a virtualized device
called a Symmetrix metavolume. Metavolumes are created for a
number of reasons including:
316
RAID considerations
317
Host-based striping
Host-based striping is configured through the Logical Volume
Manager used on most open-systems hosts. For example, in an
HP-UX environment, striping is configured when logical volumes are
created in a volume group as shown below:
lvcreate -i 4 -I 64KB -L 1024 -n stripevol activevg
In this case, the striped volume is called stripevol (using the -n flag),
is created on the volume group activevg, is of volume size 1 GB (-L
1024), uses a stripe size of 64 KB (-I 64KB), and is striped across four
physical volumes (-i 4). The specifics of striping data at the host level
are operating-system-dependent.
Two important things to consider when creating host-based striping
are the number of disks to configure in a stripe set and an appropriate
stripe size. While no definitive answer can be given that optimizes
these settings for any given configuration, the following are general
guidelines to use when creating host-based stripes:
318
Ensure that the stripe size used is a power of two multiple of the
track size configured on the Symmetrix DMX (that is, a multiple
of 32 KB on DMX-2 and 64KB on DMX-3), the database, and host
I/Os. Alignment of database blocks, Symmetrix tracks, host I/O
size, and the stripe size can have considerable impact on database
performance. Typical stripe sizes are 64 KB to 256 KB, although
the stripe size can be as high as 512 KB or even 1 MB.
Ensure that volumes used in the same stripe set are located on
different physical spindles. Using volumes from the same
physicals reduces the performance benefits of using striping. An
exception to this rule is when RAID 5 devices are used in DSS
environments.
319
Striping recommendations
Determining the appropriate striping method depends on many
factors. In general, striping is a tradeoff between manageability and
performance. With host-based striping, CPU cycles are used to
manage the stripes; Symmetrix metavolumes require no host cycles
to stripe the data. This small performance decrease in host-based
striping is offset by the fact that each device in a striped volume
group maintains an I/O queue, thereby increasing performance over
a Symmetrix metavolume, which only has a single I/O queue on the
host.
Recent tests show that striping at the host level provides somewhat
better performance than comparable Symmetrix-based striping, and
is generally recommended if performance is paramount. Host-based
striping is also recommended with environments using synchronous
SRDF, since stripe sizes in the host can be tuned to smaller increments
than are currently available with Symmetrix metavolumes, thereby
increasing performance.
Management considerations generally favor Symmetrix-based
metavolumes over host-based stripes. In many environments,
customers have achieved high-performance back-end layouts on the
Symmetrix system by allocating all of the storage as four-way striped
metavolumes. The advantage of this is any volume selected for host
data is always striped, with reduced chances for contention on any
given physical spindle. Additional storage requirements for any host
volume group, since additional storage is configured as a
metavolume, also are striped. Management of added storage to an
existing volume group using host-based striping may be significantly
more difficult, requiring in some cases a full backup, reconfiguration
of the volume group, and restore of the data to successfully expand
the stripe.
An alternative in Oracle environments gaining popularity recently is
the combined use of both host-based and array-based striping.
Known as double striping or a plaid, this configuration utilizes
striped metavolumes in the Symmetrix array, which are then
presented to a volume group and striped at the host level. This has
many advantages in database environments where read access is
320
small and highly random in nature. Since I/O patterns are pseudo
random, access to data is spread across a large quantity of physical
spindles, thereby decreasing the probability of contention on any
given disk. Double striping, in some cases, can interfere with data
prefetching at the Symmetrix DMX level when large, sequential data
reads are predominant. This configuration may be inappropriate for
DSS workloads.
Another method of double striping the data is through the use of
Symmetrix metavolumes and RAID 5. A RAID 5 hypervolume stripes
data across either four or eight physical disks using a stripe size of
four tracks (128 KB for DMX-2 or 256 KB for DMX-3). Striped
metavolumes stripe data across two or more hypers using a stripe
size of two cylinders (960 KB in DMX-2 or 1920 KB in DMX-3). When
using striped metavolumes with RAID 5 devices, ensure that
members do not end up on the same physical spindles, as this will
adversely affect performance. In many cases however, double
striping using this method also may affect prefetching for long,
sequential reads. As such, using striped metavolumes is generally not
recommended in DSS environments. Instead, if metavolumes are
needed for LUN presentation reasons, concatenated metavolumes on
the same physical spindles are recommended.
The decision of whether to use host-based, array-based, or double
striping in a storage environment has elicited considerable fervor on
all sides of the argument. While each configuration has positive and
negative factors, the important thing is to ensure that some form of
striping is used for the storage layout. The appropriate layer for disk
striping can have a significant impact on the overall performance and
manageability of the database system. Deciding which form of
striping to use depends on the specific nature and requirements of the
database environment in which it is configured.
With the advent of RAID 5 data protection in the Symmetrix DMX, an
additional option of triple striping data using RAID 5, host-based
striping, and metavolumes combined is now available. However,
triple striping increases data layout complexity, and in testing has
shown no performance benefits over other forms of striping. In fact, it
is shown to be detrimental to performance and as such, is not
recommended in any Symmetrix DMX configuration.
321
322
Rotational Speed - This is due to the need for the platter to rotate
underneath the head to correctly position the data to be accessed.
Rotational speeds for spindles in the Symmetrix DMX range from
7,200-15,000 rpm. The average rotational delay is the time it takes
for half of a revolution of the disk. In the case of a 15 KB drive,
this is about 2 milliseconds.
Interface Speed -A measure of the transfer rate from the drive into
the Symmetrix cache. It is important to ensure that the transfer
rate between the drive and cache is greater than the drive's rate to
deliver data. Delay caused by this is typically a very small value,
on the order of a fraction of a millisecond.
Delay caused by the movement of the disk head across the platter
surface is called seek time. The time associated with a data track
rotating to the required location under the disk head is referred to as
rotational delay. The cache capacity on the drive, disk algorithms,
interface speed, and the areal density (or zoned bit recording)
combines to produce a disk transfer time. Therefore, the time taken to
complete an I/O (or disk latency) consists of these three elements:
seek time, rotational delay, and transfer time.
Data transfer times are typically on the order of fractions of a
millisecond and as such, rotational delays and delays due to
repositioning the actuator heads are the primary sources of latency on
a physical spindle. Additionally, rotational speeds of disk drives have
increased from top speeds of 7,200 rpm up to 15,000 rpm, but still
average on the order of a few milliseconds. The seek time continues
to be the largest source of latency in disk assemblies when using the
entire disk.
Transfer delays are lengthened in the inner parts of the drive; more
data can be read per second on the outer parts of the drive than by
data located on the inner regions. Therefore, performance is
significantly improved on the outer parts of the disk. In many cases,
performance improvements of more than 50 percent can sometimes
be realized on the outer cylinders of a physical spindle. This
performance differential typically leads customers to place high I/O
objects on the outer portions of the drive.
While placing high I/O objects such as redo logs on the outer edges
of the spindles has merit, performance differences across the drives
inside the Symmetrix DMX are significantly smaller than the
stand-alone disk characteristics would attest. Enginuity operating
Data placement considerations
323
Areal density
Rotational speed
Position actuator
Cache and
algorithms
Interface speed
ICO-IMG-000037
Figure 70
Hypervolume contention
Disk drives can receive only a limited number of read or write I/Os
before performance degradation occurs. While disk improvements
and cache, both on the physical drives and in disk arrays, have
improved disk read and write performance, the physical devices can
still become a critical bottleneck in Oracle database environments.
Eliminating contention on the physical spindles is a key factor in
ensuring maximum Oracle performance on Symmetrix DMX arrays.
Contention can occur on a physical spindle when I/O (read or write)
to one or more hypervolumes exceeds the I/O capacity of the disk.
While contention on a physical spindle is undesirable, this type of
contention can be rectified by migrating high I/O data onto other
devices with lower utilization. This is accomplished using a number
324
325
326
327
separate physical disks. There are pros and cons to each of the
solutions; the optimal solution generally depends on the anticipated
workload.
The primary benefit of spreading BCVs across all physical spindles is
performance. Spreading I/Os across more spindles reduces the risk
of bottlenecks on the physical disks. Workloads that use BCVs, such
as backups and reporting databases, may generate high I/O rates.
Spreading this workload across more physical spindles may
significantly improve performance in these environments.
The main drawbacks to spreading BCVs across all spindles in the
Symmetrix system are:
329
330
Description
DB_BLOCK_BUFFE Specifies the number of data "pages" available in host memory for
RS
data pulled from disk. Typically, the more block buffers available in
memory, the better the potential performance of the database.
DB_BLOCK_SIZE
Determines the size of the data pages Oracle stores in memory and
out on disk. For DSS applications, using larger block sizes such as
16 KB (or 32 KB when available) improves data throughput, while for
OLTP applications, a 4 KB or 8 KB block size may be more
appropriate.
DB_FILE_MULTIBL
OCK_READ_
COUNT
DB_WRITER_PRO
CESSES
DBWR_IO_SLAVES
Configures multiple I/O server process for the DBW0 process. This
parameter is only used on single CPU servers where only a single
DBWR process is enabled. Configuring I/O slaves can improve write
performance to disk through multiplexing the writes.
DISK_ASYNCH_IO
FAST_START_MTT
R_TARGET
331
Table 12
Description
LOG_BUFFER
Specifies the size of the redo log buffer. Increasing the size of this
buffer can decrease the frequency of required writes to disk.
LOG_CHECKPOINT Specifies the number of redo log blocks that can be written before a
_INTERVAL
checkpoint must be performed. This affects performance since a
checkpoint requires that data be written to disk to ensure
consistency. Frequent checkpoints reduce the amount of recovery
needed if a crash occurs but can also be detrimental to Oracle
performance.
LOG_CHECKPOINT Specifies the number of seconds that can elapse before a
_TIMEOUT
checkpoint must be performed. This affects performance since a
checkpoint requires that data be written to disk to ensure
consistency. Frequent checkpoints reduce the amount of recovery
needed if a crash occurs, but also can be detrimental to Oracle
performance.
SORT_AREA_SIZE
332
Implementation
333
How many data files for the database will be created? Which have
the highest I/O activity?
How many data paths are required from the host to the storage
array? Will multipathing software be used?
How will the host connect d to the storage array (direct attach,
SAN)?
334
335
336
337
338
8
Data Protection
Data Protection
340
342
346
353
339
Data Protection
340
Data Protection
341
Data Protection
The Oracle instance being checked must be started with the following
init.ora configuration parameter set to true:
db_block_checksum=true
Straddle - Check that the write does not straddle known Oracle
areas.
With the addition of these new tests, the output when you list the
extents will look similar to the following:
342
Data Protection
W I T H
C H E C K S U M
E X T E N T S
Action
Checks
R P C
C
A N
D
e h h M h S l Z F i
j o k a k t l
r s
L e n s g D r B D a c
Num Blk
o c e u i B a l B c r
Device Name
Dev Exts Siz Type
g t H m c A d k A t d
-----------------------------------------------------------------------------/dev/sdi
/dev/sdj
/dev/sdk
/dev/sdl
047
048
049
04A
16
16
16
15
32b
32b
32b
32b
Oracle
Oracle
Oracle
Oracle
X
X
X
X
.
.
.
.
X
X
X
X
X
X
X
X
X
X
X
X
.
.
.
.
.
.
.
.
.
.
.
.
X
X
X
X
.
.
.
.
.
.
.
.
343
Data Protection
344
Data Protection
Disabling checksum
The symchksum disable command understands the Oracle
database structure. The feature can be enabled for tablespaces,
control files, redo logs, or the entire database.
The symchksum disable command also is used on a device basis.
This capability is not normally used, but is provided in the event the
tablespace was dropped before EMC Double Checksum was disabled
for that object.
When the disable action is specified for a Symmetrix device, the
-force flag is required. Disabling extents in this way can cause a
mapped tablespace or database to be only partially protected,
therefore, use this option with caution. All the extents monitored for
checksum errors on the specified Symmetrix device will be disabled.
345
Data Protection
346
Data Protection
Why generic?
Generic SafeWrite is deemed generic because the checks performed to
ensure complete data are application independent. For instance,
Generic SafeWrite will not perform any Oracle- or Exchange-specific
checksums to verify data integrity. It is important to note that for
Oracle, EMC Double Checksum for Oracle provides a rich set of
checks which can be natively performed by the Symmetrix array. For
more information on EMC Double Checksum for Oracle, consult
Implementing EMC Double Checksum for Oracle on page 342.
.edb files
.stm files
.mdf files
.ndf files
Oracle
Data files
Control files
347
Data Protection
Note: It is always a best practice to separate the location of database files and
log files for a given database onto unique devices. There are cases, however,
where the datafile and log file may share the same device. In this case, it is
still possible to have GSW enabled; however, there will be a performance
impact to the log writes that may impact application performance.
348
Data Protection
The devices enabled for Generic SafeWrite are visible to the host
from where the symchksum command will be run.
349
Data Protection
Log - Indicates that errors will be sent to the Symmetrix error log.
These events should be visible via the symevent command.
350
Data Protection
ICO-IMG-000526
Figure 71
Performance considerations
Performance testing was done with Microsoft Exchange, Microsoft
SQL Server and Oracle on standard devices, and in the case of
Microsoft Exchange, also on SRDF/S and SRDF/A devices. For the
Microsoft SQL Server and Oracle performance tests, a TPCC
Implementing Generic SafeWrite for generic applications
351
Data Protection
352
Data Protection
Argument
Description
symchksum
list
show
enable
disable
validate
verify
To enable Checksum on the extents of all the devices that define the
current database instance and then to phone home on error, enter:
symchksum enable -type Oracle -phone_home
To enable Checksum on the extents of all the devices that define the
tablespace and then to log on error, enter:
symchksum enable -type Oracle -tbs SYSTEM
353
Data Protection
354
9
Storage TieringVirtual
LUN and FAST
Overview ...........................................................................................
Evolution of storage tiering ............................................................
Symmetrix Virtual Provisioning ....................................................
Enhanced Virual LUN migrations for Oracle databases ............
Fully Automated Storage Tiering for Virtual Pools ....................
Fully Automated Storage Tiering ..................................................
Conclusion ........................................................................................
356
359
361
372
381
404
419
355
Overview
The EMC Symmetrix VMAX series with Enginuity is the newest
addition to the Symmetrix product family. Built on the strategy of
simple, intelligent, modular storage, it incorporates a new scalable
Virtual Matrix interconnect that connects all shared resources across
all VMAX Engines, allowing the storage array to grow seamlessly
and cost-effectively from an entry-level configuration into the
worlds largest storage system. The Symmetrix VMAX provides
improved performance and scalability for demanding enterprise
storage environments while maintaining support for EMCs broad
portfolio of platform software offerings.
EMC Symmetrix VMAX delivers enhanced capability and flexibility
for deploying Oracle databases throughout the entire range of
business applications, from mission-critical applications to test and
development. In order to support this wide range of performance
and reliability at minimum cost, Symmetrix VMAX arrays support
multiple drive technologies that include Enterprise Flash Drives
(EFDs), Fibre Channel (FC) drives, both 10k rpm and 15k rpm, and
7,200 rpm SATA drives. In addition, various RAID protection
mechanisms are allowed that affect the performance, availability, and
economic impact of a given Oracle system deployed on a Symmetrix
VMAX array.
As companies increase deployment of multiple drive and protection
types in their high-end storage arrays, storage and database
administrators are challenged to select the correct storage
configuration for each application. Often, a single storage tier is
selected for all data in a given database, effectively placing both
active and idle data portions on fast FC drives. This approach is
expensive and inefficient, because infrequently accessed data will
reside unnecessarily on high-performance drives.
Alternatively, making use of high-density low-cost SATA drives for
the less active data, FC drives for the medium active data, and EFDs
for the very active data enables efficient use of storage resources, and
reduces overall cost and the number of drives necessary. This, in turn,
also helps to reduce energy requirements and floor space, allowing
the business to grow more rapidly.
Database systems, due to the nature of the applications that they
service, tend to direct the most significant workloads to a relatively
small subset of the data stored within the database and the rest of the
database is less frequently accessed. The imbalance of I/O load
356
Overview
357
number of drives, and improve the total cost of ownership (TCO) and
ROI. FAST VP enables users to achieve these objectives while
simplifying storage management.
This chapter describes Symmetrix Virtual Provisioning, a tiered
storage architecture approach for Oracle databases, and the way in
which devices can be moved nondisruptively, using either Virtual
LUN, FAST for traditional thick devices and FAST VP for virtual
provisioned devices, in order to put the right data on the right storage
tier at the right time.
358
359
Traditional
FAST
FAST VP
ICO-IMG-000927
Figure 72
360
Introduction
Symmetrix Virtual Provisioning, the Symmetrix implementation of
what is commonly known in the industry as thin provisioning,
enables users to simplify storage management and increase capacity
utilization by sharing storage among multiple applications and only
allocating storage as needed from a shared virtual pool of physical
disks.
Symmetrix thin devices are logical devices that can be used in many
of the same ways that Symmetrix standard devices have traditionally
been used. Unlike traditional Symmetrix devices, thin devices do not
need to have physical storage preallocated at the time the device is
created and presented to a host (although in many cases customers
interested only in wide striping and ease of management choose to
fully preallocate the thin devices). A thin device is not usable until it
has been bound to a shared storage pool known as a thin pool.
Multiple thin devices may be bound to any given thin pool. The thin
pool is comprised of devices called data devices that provide the
actual physical storage to support the thin device allocations.
When a write is performed to a part of any thin device for which
physical storage has not yet been allocated, the Symmetrix allocates
physical storage from the thin pool for that portion of the thin device
only. The Symmetrix operating environment, Enginuity, satisfies the
requirement by providing a block of storage from the thin pool called
a thin device extent. This approach reduces the amount of storage
that is actually consumed.
The minimum amount of physical storage that can be reserved at a
time for the dedicated use of a thin device is referred to as a data
device extent. The data device extent is allocated from any one of the
data devices in the associated thin pool. Allocations across the data
361
362
devices associated with thin Pool A and three thin devices associated
with thin pool B. The data extents for thin devices are distributed on
various data devices as shown in Figure 73.
Pool A
Data
devices
Thin
devices
Pool B
Data
devices
ICO-IMG-000929
Figure 73
The way thin extents are allocated across the data devices results in a
form of striping in the thin pool. The more data devices in the thin
pool (and the associated physical drives behind them), the wider
striping will be, creating an even I/O distribution across the thin
pool. Wide striping simplifies storage management by reducing the
time required for planning and execution of data layout.
363
(metadata) to each initialized block. This will cause the thin pool to
allocate the amount of space that is being initialized by the database.
As database files are added, more space will be allocated in the pool.
Due to Oracle file initialization, and in order to get the most benefit
from a Virtual Provisioning infrastructure, a strategy for sizing files,
pools, and devices should be developed in accordance with
application and storage management needs. Some strategy options
are explained next.
Oversubscription
An oversubscription strategy is based on using thin devices with a
total capacity greater than the physical storage in the pools that they
are bound to. This can increase capacity utilization by sharing storage
among applications, thereby reducing the amount of allocated but
unused space. The thin devices each seem to be a full-size device to
the application, while in fact the thin pool cannot accommodate the
total LUNs capacity. Since Oracle database files initialize their space
even though they are still empty, it is recommended that instead of
creating very large data files that remain largely empty for most of
their lifetime, smaller data files should be considered to
accommodate near-term data growth. As they fill up over time, their
size can be increased, or more data files added, in conjunction with
the capacity increase of the thin pool. The Oracle auto-extend feature
can be used for simplicity of management, or DBAs may prefer to use
manual file size management or addition.
An oversubscription strategy is recommended for database
environments when database growth is controlled, and thin pools can
be actively monitored and their size increased when necessary in a
timely manner.
Undersubscription
An undersubscription strategy is based on using thin devices with a
total capacity smaller than the physical storage in the pools that they
are bound to. This approach doesnot necessarily improve storage
capacity utilization but still makes use of wide striping, thin pool
sharing, and other benefits of Virtual Provisioning. In this case the
data files can be sized to make immediate use of the full thin device
size, or alternatively, auto-extend or manual file management can be
used.
364
Figure 74
365
366
them into one pool, the pool has eight RAID 5 devices of four drives
each. If one of the drives in this pool fails, you are not losing one
drive from a pool of 32 drives; rather, you are losing one drive from
one of the eight RAID-protected data devices and that RAID group
can continue to service read and write requests, in degraded mode,
without data loss. Also, as with any RAID group, with a failed drive
Enginuity will immediately invoke a hot sparing operation to restore
the RAID group to its normal state. While this RAID group is
rebuilding, any of the other RAID groups in the thin pool can have a
drive failure and there is still no loss of data. In this example, with
eight RAID groups in the pool there can be one failed drive in each
RAID group in the pool without data loss. In this manner data stored
in the thin pool is no more vulnerable to data loss than any other data
stored on similarly configured RAID devices. Therefore, a protection
of RAID 1 or RAID 5 for thin pools is acceptable for most applications
and RAID 6 is only required in situations where additional parity
protection is warranted.
The number of thin pools is affected by a few factors. The first is the
choice of drive type and RAID protection. Each thin pool is a group of
data devices sharing the same drive type and RAID protection. For
example, a thin pool that consists of multiple RAID 5 protected data
devices based on 15k rpm FC disk can host the Oracle data files for a
good choice of capacity/performance optimization. However, very
often the redo logs that take relatively small capacity are best
protected using RAID 1 and therefore another thin pool containing
RAID 1 protected data devices can be used. In order to ensure
sufficient spindles behind the redo logs the same set of physical
drives that is used for the RAID 5 pool can also be used for the RAID
1 thin pool. Such sharing at the physical drive level, but separation at
the thin pool level, allows efficient use of drive capacity without
compromising on the RAID protection choice. Oracle Fast Recovery
Area (FRA), for example, can be placed in a RAID 6 protected SATA
drives thin pool.
Therefore the choice of the appropriate drive technology and RAID
protection is the first factor in determining the number of thin pools.
The other factor has to do with the business owners. When
applications share thin pools they are bound to the same set of data
devices and spindles, and they share the same overall thin pool
capacity and performance. If business owners require their own
control over thin pool management they will likely need a separate
set of thin pools based on their needs. In general, however, for ease of
367
368
+GRID:
Starting with database 11gR2 Oracle has merged Cluster Ready
Services (CRS) and ASM and they are installed together as part of
Grid installation. Therefore when the clusterware is installed, the
first ASM disk group is also created to host the quorum and
cluster configuration devices. Since these devices contain local
environment information such as hostnames and subnet masks,
there is no reason to clone or replicate them. EMC best practice
starting with Oracle database 11.2 is to only create a very small
disk group during Grid installation for the sake of CRS devices
and not place any database components in it. When other ASM
disk groups containing database data are replicated with storage
technology, they can simply be mounted to a different +GRID
disk group at the target host or site, already with Oracle CRS
installed with all the local information relevant to that host and
site. Note that while external redundancy (RAID protection is
handled by the storage array) is recommended for all other ASM
disk groups, EMC recommends high redundancy only for the
+GRID disk group. The reason is that Oracle automates the
number of quorum devices based on redundancy level and it will
allow the creation of more quorum devices. Since the capacity
Symmetrix Virtual Provisioning
369
requirements of the +GRID ASM disk group are tiny, very small
devices can be provisioned (High redundancy implies three
copies/mirrors and therefore a minimum of three devices is
required).
370
+FRA: Fast Recovery Area typically hosts the archive logs and
sometimes flashback logs and backup sets. Since the I/O
operations to FRA are typically sequential writes, it is usually
sufficient to have it located on a lower tier such as SATA drives. It
is also an Oracle recommendation to have FRA as a separate disk
group from the rest of the database to avoid keeping the database
files and archive logs or backup sets (that protect them) together.
371
372
373
375
Figure 75
The target devices for the migration can be chosen from configured
space or new devices can be automatically configured by migrating
to unconfigured space.
Migration to configured space
The LUN migration to configured space requires pre-configured
LUNs with equal or larger capacity and desired RAID protection on
the target storage and mapping source and target LUNs. The target
LUN will contain the complete copy of the application data on the
source LUN at the end of migration. The storage associated with
376
790
M1
M2
M3
RAID1
(FC)
RDF
RAID5
(SATA)
M1
M2
M4
1FD7
M3
M4
RAID1
(FC)
ICO-IMG-000780
Figure 76
Steps:
1. Migrating device 790 from a RAID 1 (FC) to RAID 5 (3+1) on EFD
configured as 1FD7.
Enhanced Virual LUN migrations for Oracle databases
377
378
790
M1
Primary
M2
M3
RDF
RAID5
(SATA)
Remote
Secondary
M4
ICO-IMG-000781
Figure 77
Steps:
1. Migrating device 790 from a RAID 1 (FC) to RAID 5 (EFD) pool.
2. Configuration lock is taken.
3. The RAID 5 mirror is created from unconfigured space and added
as the secondary mirror.
4. Configuration lock is released.
5. The secondary mirror is synchronized from the primary mirror.
6. Once synchronization is done, the configuration lock is taken
again.
7. Primary and secondary roles are switched and the original
primary mirror is detached from the source and moved to the
target device 1FD7.
8. The original primary mirror on RAID 1 (FC) is deleted.
9. Configuration lock is released.
379
Change the drive media on which the thin devices are stored
While VLUN VP has the ability to move all allocated thin device
extents from one pool to another, it also has the ability to move
specific thin device extents from one pool to another, and it is this
feature that is the basis for FAST VP.
380
381
FAST VP Elements
FAST VP has three main elementsstorage tiers, storage groups, and
FAST policiesas shown in Figure 78.
Storage Tier
EFD Tier
FC Tier
SATA Tier
FAST Policy
Policy
Storage Group
Storage Group 1
10%
40%
50%
ICO-IMG-000930
Figure 78
382
ICO-IMG-000931
Figure 79
383
FAST VP architecture
There are two components of FAST VP: Symmetrix microcode and
the FAST controller.
384
Figure 80
FAST VP components
Allocation compliance
The allocation compliance algorithm enforces the upper limits of
storage capacity that can be used in each tier by a given storage
group by also issuing data movement requests to the VLUN VP
data movement engine.
385
386
The file system will traditionally host multiple data files, each
containing database objects in which some will tend to be more active
than others as discussed earlier, creating I/O access skewing at a
sub-LUN level.
ICO-IMG-000919
Figure 81
387
Since such events are usually short term and only touch each dataset
once it is unlikely (and not desirable) for FAST VP to migrate data at
that same time and it is best to simply let the storage handle the
workload appropriately. If the event is expected to last a longer
period of time (such as hours or days), then FAST VP, being a reactive
mechanism, will actively optimize the storage allocation as it does
natively.
Changes in data placement initiated by the host (such as ASM rebalance)
Changes in data placement initiated by the host can be due to file
system defrag, volume manager restriping, or even simply a user
moving database objects. When Oracle ASM is used, the data is
automatically striped across the disk group. There are certain
operations that will cause ASM to restripe (rebalance) the data,
effectively moving existing allocated ASM extents to a new location,
which may cause the storage tiering optimized by FAST VP to
temporarily degrade until FAST VP re-optimizes the database layout.
ASM rebalance commonly takes place when devices are added or
dropped from the ASM disk group. These operations are normally
known in advance (although not always) and will take place during
maintenance or low-activity times. Typically new thin devices given
to the database (and ASM) will be bound to a medium- or
high-performance storage tier, such as FC or EFD. Therefore when
such devices are added, ASM will rebalance extents into them, and it
is unlikely that database performance will degrade much afterward
(since they are already on a relatively fast storage tier). If such activity
takes place during low-activity or maintenance time it may be
beneficial to disable FAST VP movement until it is complete and then
let FAST VP initiate a move plan based on the new layout. FAST VP
will respond to the changes and re-optimize the data layout. Of
course it is important that any new devices that are added to ASM
should be also added to the FAST VP controlled storage groups so
that FAST VP can operate on them together with the rest of the
database devices.
Which Oracle objects to place under FAST VP control
Very often storage technology is managed by a different group from
the database management team and coordination is based on need. In
these cases when devices are provisioned to the database they can be
placed under FAST VP control by the storage team without clear
knowledge on how the database team will be using them. Since FAST
VP analyzes the actual I/O workload based on the FAST Policy it will
actively optimize the storage tiering of all controlled devices.
388
389
390
Description
Storage Array
Symmetrix VMAX
Enginuity
5875
11gR2
EFD
8 x 400 GB
FC
SATA
Table 14
Description
Linux
Multipathing
Host
Dell R900
Databases
Initial tier allocation for test cases with shared ASM disk group
ASM disk
groups
Thin
devices
Storage
groups
Thin pool
RAID
FC_Pool
+DATA
FINDB & HRDB
12 x 100
GB
DATA_SG
EFD_Pool
RAID 5
SATA_Pool
+REDO
FC_Pool
RAID 1
Tier
associated
Initial tier
alocation
FC
100%
EFD
0%
SATA
0%
FC
10%
One server was used for this test. Each of the Oracle databases was
identical in size (about 600 GB) and designed for an
industry-standard OLTP workload. However, during this test one
database had high activity whereas the other database remained idle
to provide a simple example of the behavior of FAST VP.
391
During the baseline run the database devices were 100 percent
allocated on the FC tier as shown in Table 16 on page 393. Per the
AWR report given in Table 17 on page 393, user I/O random read
activity (db file sequential read) is the main database wait event,
with an average I/O response time of 6 ms. For FC drives this is a
good response time that reflects a combination of 15k rpm drives
(typically 6 ms response time at best per I/O, regardless of storage
vendor) with efficient Symmetrix cache utilization.
392
Table 16
Database size
+DATA
Table 17
Storage
groups
DATA_SG
EFD
0%
FC
100%
1.2 TB
SATA
0%
Event
Waits
Time (s)
%DB time
Wait class
3,730,770
12,490
88.44
User I/O
85,450
1,249
14
6.74
User I/O
DB CPU
674
4.79
193,448
108
0.56
Commit
3,241
20
11
0.22
User I/O
ICO-IMG-000924
Figure 82
393
EFD Tier
FC Tier
SATA Tier
1000
500
0
1
6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
Interval
ICO-IMG-000925
Figure 83
394
Storage tier allocation changes during the FAST VP test for FINDB
Table 18
ASM disk
group
Database size
FINDB (600
GB)
HRDB (600
GB)
+Data
(1.2 TB)
Initial
FAST VP enabled
EFD
0%
35%
626 GB
FC
100%
1.2 TB
50%
941 GB
SATA
0%
12%
204 GB
As can be seen in Table 19, the average I/O response time at the end
of the run changed to 3 ms, which is a considerable improvement
over the initial test that utilized the FC tier for the entire ASM disk
group. This is the result of migration of active extents of the ASM
disk group to EFD tiers and allocation of 35 percent capacity on that
tier.
Table 19
Event
Waits
Time (s)
Avg wait
(ms)
%DB time
Wait class
3,730,770
12,490
86.84
User I/O
85,450
1,249
14
8.68
User I/O
DB CPU
674
4.69
193,448
108
0.75
Commit
3,241
20
11
0.14
User I/O
395
FAST Enabled
14 Hour
TPCC Run
2000
1000
0
Initial Transaction
Rate
2000
FAST VP Enabled
Transaction Rate
1000
0
Figure 84
Test Case 2: Oracle databases sharing the ASM disk group and FAST policy
Oracle ASM makes it easy to provision and share devices across
multiple databases. The databases, running different workloads, can
share the ASM disk group for ease of manageability and
provisioning. Multiple databases can share the Symmetrix thin pools
for ease of provisioning, wide striping, and manageability at the
storage level as well. This section describes the test case in which a
FAST VP policy is applied to the storage group associated with the
shared ASM disk group. At the end of the run we can see improved
transaction rates and response times of both databases, and very
efficient usage of the available tiers.
396
During the baseline run the databases devices were 100 percent
allocated on the FC tier as shown in . Both databases executed an
OLTP-type workload (similar to the previous use case) where FINDB
had more processes executing the workload in comparison to
HRDBs workload, and therefore FINDB had a higher workload
profile than HRDB.
Table 20
Database size
FINDB
(600 GB)
HRDB
(600 GB)
+DATA
(1.2 TB)
Table 21
0%
FC
100%
1.2 TB
SATA
0%
Waits
Time (s)
%DB time
Wait class
3,730,770
12,490
88.44
User I/O
85,450
1,249
14
6.74
User I/O
DB CPU
674
4.79
193,448
108
0.56
Commit
3,241
20
11
0.22
User I/O
397
As the ASM disk group and Symmetrix storage groups are identical
to the ones used in Test Case 1 the same FAST Policy is used for this
use case.
Running the database workload after enabling the FAST VP policy
At the start of the test FAST VP was enabled and workloads on both
databases started with FINDB running a higher workload compared
to HRDB. After an initial analysis period (which was 2 hours by
default) FAST performed the movement to available tiersAnalyzing
the performance improvements with FAST VP
Active extents from both databases were distributed to the EFD and
FC tiers with the majority of active extents on EFDs while inactive
extents migrated to the SATA tier. Figure 85 shows the performance
improvements for both databases resulting from FAST VP controlled
tier allocation..
FINDB High
Workload
2000
1500
HRDB Low
Workload
1000
500
0
ICO-IMG-000922
Figure 85
398
Table 22
Initial
FAST VP enabled
% Improvement
FINDB
1144
2497
118%
HRDB
652
1222
87%
Test Case 3: Oracle databases on separate ASM disk groups and FAST policies
Not all the databases have the same I/O profile or SLA requirements
and may also warrant different data protection policies. By deploying
the databases with different profiles on separate ASM disk groups,
administrators can achieve the desired I/O performance and ease of
manageability. On the storage side these ASM disk groups will be on
separate storage groups to allow for definition of FAST VP policies
appropriate for the desired performance. This section describes a use
case with two Oracle databases with different I/O profiles on
separate ASM disk groups and independent FAST policies.
The hardware configuration of this test was the same as the previous
two use cases (as shown in Table 1). This test configuration had two
Oracle databasesCRMDB (CRM) and SUPCHDB (Supply
Chain)on separate ASM disk groups, storage groups, and FAST VP
policies, as shown in Table 23.
Table 23
Databases
Initial tier allocation for a test case with independent ASM disk groups
ASM disk
groups
Thin
devices
Storage
groups
Thin pool
+DATA
6 x 100 GB
OraDevices_C1
FC_Pool
+REDO
2 x 6 GB
OraRedo
EFD_Pool
+DATA
6 x 100 GB
OraDevices_S1
SATA_Pool
+REDO
2 x 6 GB
OraRedo
REDO_Poo
l
RAID
Tier
associated
Initial tier
allocation
FC
100%
EFD
0%
SATA
0%
FC
100%
CRMDB
SUPCHDB
RAID 5
RAID 1
399
The Symmetrix VMAX array had a mix of storage tiersEFD, FC, and
SATA. One server was used for this test. Each of the Oracle databases
was identical in size (about 600 GB) and designed for an
industry-standard OLTP workload.
The Oracle databases CRMDB and SUPCHDB used independent
ASM disk groups based on thin devices that were initially bound to
FC_Pool (FC tier).
The CRMDB database in this configuration was part of a customer
relationship management system that was critical to the business. To
achieve higher performance the FAST VP policy GoldPolicy was
defined to make use of all three available storage tiers and storage
group-OraDevices_C1 was associated with the policy.
The SUPCHDB database was important to the business and had
proper performance characteristics. Business would benefit if the
performance level can be maintained at lower cost. To meet this goal
the FAST VP policy SilverPolicy was defined to make use of only
FC and SATA tiers and storage group - OraDevices_S1 was associated
with the policy.
Test case execution
Objectives
400
Event
Waits
Time (s)
%DB time
Wait class
13,566,056
104,183
92.87
User I/O
300,053
10,738
19
6.07
User I/O
DB CPU
4,338
2.45
1,635,001
1,157
0.65
Commit
33,212
285
0.16
User I/O
SUPCHDB
Event
Waits
Time (s)
%DB time
Wait class
8,924,638
104,183
92.87
User I/O
194,525
10,738
19
6.07
User I/O
DB CPU
4,338
2.45
746,897
1,157
0.65
Commit
17,860
285
0.16
User I/O
For CRMDB, our goal was to improve the performance. For FC-based
configurations, a response time of 8 ms is reasonable, but can
improve with better storage tiering. The FAST VP Gold policy was
defined to improve both the performance for this critical database as
well to tier it across SATA, HDD, and EFD thin pools. The Gold
policy allowed a maximum 40 percent allocation on the EFD tier and
100 percent allocations on both of the FC and SATA tiers. By setting
FC and SATA allocations to 100 percent in this policy, FAST VP has
the liberty to leave up to 100 percent of the data on any of these tiers
or move up to 40 percent of it to EFD, based on the actual workload.
401
For SUPCHDB, our goal was to lower the cost while maintaining or
improving the performance. The FAST VP Silver policy was defined
to allocate the extents across FC and SATA drives to achieve this goal.
The Silver policy allows a maximum of 50 percent allocation on the
FC tier and up to 100 percent allocation on the SATA tier.
Running the database workload after enabling the FAST VP policy
Event
Waits
Time (s)
%DB time
Wait class
32,332,795
104,183
91.04
User I/O
720,608
10,738
15
6.07
User I/O
DB CPU
4,338
2.45
1,635,001
1,157
0.65
Commit
33,212
285
0.16
User I/O
SUPCHDB
Event
Waits
Time (s)
%DB time
Wait class
15,035,122
104,183
92.87
User I/O
328,884
10,738
18
6.07
User I/O
DB CPU
4,338
1.86
746,897
1,157
0.5
Commit
17,860
285
12
0.18
User I/O
402
3000
2500
2000
1500
1000
500
0
SUPCHDB Low
Workload 4
Driver
(Silver Policy)
ICO-IMG-000923
Figure 86
As shown in Table 26, CRMDB used the FAST Gold policy and FAST
VP migrated 40 percent of the CRMDB FC extents to the EFD tier and
10 percent to SATA. The rest of the extents remained on FC drives.
This resulted in improvement of response time from 8 ms to 5 ms and
a very decent improvement in transaction rate from 962 to 2,500,
which represents 160 percent growth in transaction rate without any
application change.
SUPCHDB used the FAST Silver policy and therefore FAST VP
moved the less active extents to SATA drives. Still, the response time
improved from 8 ms to 7 ms and hence we reached both cost savings
while maintaining or improving performance.
Table 26
ASM
disk
group
Database
DB
size
FAST VP
enabled
transaction
rate (TPM)
%
Change
FAST
policy used
CRMDB
600 GB
962
2500
160%
240 GB
600 GB
682
826
21%
50%
SATA
10%
GOLD Policy
+DATA
SUPCHDB
FC
300 GB 60 GB
44%
56%
266 GB
334
GB
SILVER Policy
0
403
Introduction
Businesses use multiple databases in environments that serve DSS
and OLTP application workloads. Even though multiple levels of
cache exist in the database I/O stack including host cache, database
server cache and Symmetrix cache, the disk response time is critical at
times for application performance. Selection of the correct storage
class for various database objects is a challenge. Also the storage
selection that works in one situation may not be optimal for other
cases. Jobs executed at periodic intervals or on an adhoc basis such as
quarter-end batch jobs demand high degree of performance and
availability and make disk selection and data placement even more
challenging. As the size and number of databases grow, analysis of
performance of various databases, identifying the bottlenecks, and
selection of the right storage tier for the multitude of databases turns
into a daunting task.
Introduced in the Enginuity 5874 Q4 2009 service release, EMC
Symmetrix VMAX Fully Automated Storage Tiering (FAST) is
Symmetrix software that utilizes intelligent algorithms to
continuously analyze device I/O activity and generate plans for
moving and swapping devices for the purposes of allocating or
re-allocating application data across different performance storage
tiers within a Symmetrix array. FAST proactively monitors workloads
at the Symmetrix device (LUN) level in order to identify busy
devices that would benefit from being moved to higher-performing
drives such as EFD. FAST will also identify less busy devices that
could be relocated to higher-capacity, more cost-effective storage such
as SATA drives without altering performance.
Time windows can be defined to specify when FAST should collect
performance statistics (upon which the analysis to determine the
appropriate storage tier for a device is based), and when FAST should
perform the configuration changes necessary to move devices
between storage tiers. Movement is based on user-defined storage
tiers and FAST Policies.
404
FAST configuration
FAST configuration involves three components:
Storage Groups
A Storage Group is a logical grouping of Symmetrix devices.
Storage Groups are shared between FAST and Auto-provisioning
Groups; however, a Symmetrix device may only belong to one
Storage Group that is under FAST control. A Symmetrix VMAX
storage array supports upto 8,192 Storage Groups associated with
FAST Policies.
Storage Tiers
storage tiers are a combination of a drive technology (for
example, EFD, FC 15k rpm, or SATA) and a RAID protection type
(for example, RAID 1, RAID 5 (3+1), RAID 5 (7+1), RAID 6 (6+2)).
There are two types of storage tiers-static and dynamic. A static
type contains explicitly specified Symmetrix device groups, while
a dynamic type will automatically contain all Symmetrix disk
Fully Automated Storage Tiering
405
FAST Policies
FAST Policies associate a set of Storage Groups with up to three
storage tiers. FAST Policy includes the maximum percentage that
Storage Group devices can occupy in each of the storage tiers. The
percentage of storage specified for each type in the policy when
aggregated must total at least 100 percent and may total more
than 100 percent. For example, if the Storage Groups associated
with the policy are allowed 100 percent in any of the types, FAST
can recommend for all the storage devices to be together on any
one type (capacity limit on the storage tier is not enforced). In
another example, to force the Storage Group to one of the storage
tiers, simply set the policy to 100 percent on that type and 0
percent for all other types. At the time of association, a Storage
Group may also be given a priority (between 1 and 3) with a
policy. If a conflict arises between multiple active FAST Policies,
the FAST Policy priorty will help determine which policy gets
precedence. The Symmetrix VMAX supports up to 256 FAST
Policies.
406
407
ASM disk group of each database that was moved between the
storage tiers. The +REDO and +TEMP disk groups remained on 15k
rpm drives, and FRA on SATA drives.
The first database, DB1, started on FC 15k rpm drives but was
designed to simulate a low I/O activity database that has very few
users, low importance to the business, and is a candidate to move to a
lower storage tier, or down-tier. The DB1 database could be one
that was once active but is now being replaced by a new application.
The second database, DB2, was designed to simulate a medium active
database that was initially deployed on SATA drives, but its activity
level and importance to the business are increasing and it is a
candidate to be moved to a higher storage tier, or up-tier. The last
database, DB3, started on FC 15k rpm drives and was designed to
simulate the high I/O activity level of a mission-critical application
with many users and is a candidate to up-tier from FC 15k rpm to
EFD.
The test configuration details are provided in Table 27.
Table 27
408
Test configuration
Configuration aspect
Description
Storage Array
Enginuity
Oracle
EFD
HDD
SATA
Linux
Multipathing
Each of the three databases was using the ASM disk group
configuration as shown in Table 28.
Storage and ASM configuration for each test database
Table 28
Number of
LUNs
Size (GB)
Total (GB)
RAID
DATA
10
120
1,200
RAID 5 (3+1)
REDO
20
100
RAID 1
TEMP
120
600
RAID 5 (3+1)
FRA
40
120
4,800
RAID 5 (3+1)
Table 29 shows the initial storage drive types and count behind each
of the +DATA ASM disk groups at the beginning of the tests. It also
shows the OLTP workload and potential business goals for each
database.
Database storage placement (initial) and workload profile
Table 29
Database
Number of physical
drives
Drive type
Workload
Business goal
DB1
40
FC 15k
Very low
Down-tiering/cost saving
DB2
32
SATA
Medium
Up-tiering/preserver SLA
DB3
40
FC 15k
High
Up-tiering/improve SLA
Figure 77 on page 379 shows the logical FAST profile we used for
database 3, or DB3. In this case, while we have three drive types in
the Symmetrix VMAXEFD, FC 15k rpm, and SATA driveswe do
not want DB3 to reside on SATA so we could potentially not include a
SATA type. However, including it and setting the allowable
percentage to 0 percent has the same effect.
409
Match
Storage type
Storage class
FAST Policies
Storage Groups
DB3_FP
Type 1
400 GB EFD
RAID 5 (3+1)
100%
DB3_SG
100%
Type 2
300 GB 15K FC
RAID 5 (3+1)
0%
DB2_SG
Type 3
1 TB SATA
RAID 5 (3+1)
DB2_SG
ICO-IMG-000782
Figure 87
Database
Events
Waits
Time(s)
% DB time
DB1
684,271
3,367
84.6
DB2
13,382
250
18
89.2
DB3
18,786,472
163,680
76.2
Based on these results we can see that DB1 is mainly busy waiting for
random read I/O (db file sequential read Oracle event refers to
host random I/O). A wait time of 5 ms is very good; however, this
410
411
Storage type
Type 1
400 GB EFD
RAID 5 (3+1)
Type 2
300 GB 15K FC
RAID 5 (3+1)
Storage class
FAST Policies
Storage Groups
DB3_FP
100%
DB3_SG
100%
0%
DB2_SG
Type 3
1 TB SATA
RAID 5 (3+1)
DB2_SG
ICO-IMG-000782
Figure 88
DB2 on SATA
Figure 89
IMG-ICO-000783
FAST control can contain the same devices. In Figure 90 we can see
how the devices of ASM disk group +DATA, of database 3 (DB3), are
placed into a Storage Group that can later be assigned a FAST Policy.
As shown in Figure 90, FAST configuration parameters are specified.
The user approval mode is chosen.
ICO-IMG-000784
Figure 90
Figure 91
413
Figure 92 shows provisioning the target storage tier for the FAST
policies.
ICO-IMG-000786
Figure 92
When creating FAST policies, the Storage Groups prepared earlier for
FAST control are being assigned storage tiers they can be allocated
on, and the capacity percentage the Storage Group is allowed on each
of them.
The last screen in the wizard is a summary and approval of the
changes. Additional modifications to FAST configuration and
settings can be done using Solutions Enabler or SMC directly, without
accessing the wizard again. Solutions Enabler uses the symfast
command line syntax, and SMC uses the FAST tab.
The following example shows how FAST can be used to migrate data
for DB3 to the appropriate storage tier. The DB3 Storage Group
properties box has three tabsGeneral, Devices, and Fast
Compliance. The Devices tab shows the 10 Symmetrix devices that
belong to the +DATA ASM disk group devices that contain DB3 data
files and comprise the DB3_SG Storage Group. The FAST
Compliance tab shows what tiers of storage this Storage Group may
reside in. In this case we have defined the FC storage tier as the place
where the drives are now and the EFD storage tier is where FAST
may choose to move this ASM disk group. Note that there is no
option for a SATA storage tier for the DB3 Storage Group. This will
prohibit FAST from ever recommending a down-tier of DB3 to SATA.
414
ICO-IMG-000787
Figure 93
The final step of the process is to associate the Storage Group with the
FAST tiers and define a policy to manage FAST behavior. In our case
we have one Storage Group (DB3_SG), two FAST tiers (EFD and FC),
and one FAST Policy (Figure 94 on page 416). The FAST Policy allows
for up to 100 percent of the Storage Group to reside on the Flash
storage tier and allows for 100 percent of DB3 to reside on FC. Since
there is no SATA storage tier defined for DB3, a third storage tier
option does not exist. By allowing up to 100 percent of the DB3
Storage Group to reside on EFD we expected that if FAST was going
to move any DB3 LUNs to EFD, it would move them all because they
all have the same I/O profile, and there is ample capacity available
on that storage tier to accommodate all the capacity of those ASM
disk group devices or the FAST Storage Group.
415
ICO-IMG-000791
Figure 94
Table 31
Database
Number of physical
drives
Drive type
Avg. txn/min
% Change
DB1
40
FC 15k
349.20
0.00%
DB2
32
SATA
890.53
0.00%
DB3
40
FC 15k
11736.03
0.00%
416
ICO-IMG-000790
Figure 95
Table 32
Drive type
Avg. txn/min
% Change
DB1
40
FC 15k
358.12
2.55%
DB2
32
SATA
897.27
0.76%
DB3
Flash
13334.98
13.62%
417
DB2 on SATA
DB3 on EFD
ICO-IMG-000789
Figure 96
418
Conclusion
Symmetrix Virtual Provisioning offers great value to Oracle
environments with improved performance and ease of management
due to wide striping and higher capacity utilization. Oracle ASM and
Symmetrix Virtual Provisioning complement each other very well.
With a broad range of data protection mechanisms and tighter
integration between Symmetrix and Oracle now available even for
thin devices, adoption of Virtual Provisioning for Oracle
environments is very desirable.
With the Enginuity 5874 Q4 2009 service release enhancements made
to Virtual LUN migration and the introduction of FAST technology,
data center administrators are now able to dynamically manage data
placement in a Symmetrix array to maximize performance and
minimize costs.Introduced with the Symmetrix Enginuity 5875 in Q1
2011, FAST VP in Oracle environments improves storage utilization
and optimizes the performance of databases by effectively making
use of multiple storage tiers at a lower overall cost of ownership
when using Symmetrix Thin Provisioning.In a multi-tiered Oracle
storage configuration, moving the highly accessed volumes from FC
drives to EFDs can help administrators maintain or improve
performance and free up FC drives for other uses. Moving active
drives from SATA to FC drives improves performance and allows for
increased application activity. Moving lightly accessed volumes from
FC to SATA helps utilization and drives down cost. This volume or
sub-LUN level movement can be done nondisruptively on a
Symmetrix VMAX using Virtual LUN,FAST VP and FAST
capabilities.
Conclusion
419
420
A
Symmetrix VMAX with
Enginuity
422
444
462
463
421
423
425
RAC1_HBAs
RAC2_HBAs
Storage
SAN
Port: 07E:1
Port: 10E:1
Figure 97
426
4. Cascade the cluster nodes' initiator groups into a single one for
the entire cluster
symaccess -name RAC_hbas -type initiator create
symaccess -name RAC_hbas -type initiator add -ig
RAC1_hbas
symaccess -name RAC_hbas -type initiator add -ig
RAC2_hbas
5. Create the view for the entire RAC cluster storage provisioning
symaccess create view -name RAC_view
-storgrp RAC_devs -portgrp RAC_ports -initgrp
RAC_hbas
427
428
Figure 98
429
Virtual LUN can utilize configured or unconfigured disk space for the
target devices. Migration to unconfigured disk space means that
devices will move to occupy available free space in a target storage
diskgroup. After the migration, the original storage space of the
source devices will be unconfigured. In either case the source devices'
identity doesn't change, making the migration seamless to the host;
no changes to DR, backup, or high availability configuration aspects
are necessary. When specifying configured disk space for the
migration, in essence the source and target devices simply swap their
storage characteristics. However, after the data was migrated to the
target devices, the original source drive storage space will be
reformatted, to prevent exposure of the data that once belonged to it.
With Enginuity 5874, migration of logical devices and metavolumes
is supported. (Only the metahead volumes needs to be specified. The
metamembers will be automatically selected.) Virtual LUN migration
does not support migration of thin devices (or thin pool devices),
virtual devices (or save pool devices), and internal Symmetrix
devices such as VCM, SFS, or Vault.
Migration to configured space
This option is useful when most of the space in the target diskgroup
is already configured (and therefore not enough free space is
available). It is also useful when it is expected that the migration is
temporary and a reverse migration will take place at a later time to
the same target devices. One example of this is migrating the SALES
ASM diskgroup to a Flash drive tier before the end-of-the-month
closing report. That way when the time comes to migrate back, the
source devices return to occupy their previous storage space. When
migrating to a configured space both source and target devices are
specified. The target devices should match in size to the source
devices and they should be at least unmasked to any host, and
optionally unmapped from any Symmetrix FA port. These
requirements ensure that the target devices of the migration do not
contain currently active customer data. Likewise, the target devices
430
431
432
433
can be created out of that clone for test, development, and reporting
instances. When SRDF/A is used any remote TimeFinder operation
should use the consistent split feature to coordinate the replica with
SRDF/A cycle switching. The use cases in this appendix illustrate
some of the basic Oracle business continuity operations that
TimeFinder and SRDF can perform together.
434
Figure 99
SRDF/Synchronous replication
Single Roundtrip and Concurrent Write SRDF performance
enhancements
435
436
Figure 100
SRDF/Asynchronous replication
SRDF/A Consistency Exempt
437
Figure 101
438
SRDF topologies
SRDF can be set in many topologies other than the single SRDF
source and target. Thus SRDF satisfies different needs for high
availability and disaster restart. It can use a single target or two
concurrent targets; it can provide a combination of synchronous and
asynchronous replications; it can provide a three-site solution that
allows no data loss over very long distances and more. Some of the
basic topologies that can be used with SRDF are shown in the
following section .
Concurrent SRDF
SRDF allows simultaneous replication of single R1 source devices to
up to two target devices using multiple SRDF links. All SRDF links
can operate in either Synchronous or Asynchronous mode or one or
more links can utilize Adaptive Copy mode for efficient utilization of
available bandwidth on that link. This topology allows simultaneous
data protection over short and long distances as shown in Figure 102.
Figure 102
Concurrent SRDF
Cascaded SRDF
SRDF allows cascaded configurations in which data is propagated
from one Symmetrix to the next. This configuration requires
Synchronous mode for the first SRDF leg and Asynchronous or
Adaptive Copy modes for the next. As shown in Figure 103, this
topology provides remote replications over greater distances with
varying degree of bandwidth utilization and none to limited data loss
(depends on the choice of SRDF modes and disaster type).
439
Figure 103
Cascaded SRDF
Figure 104
440
SRDF/Star
SRDF/Star is a two- or three-site protection topology where data is
replicated from source Site A to two other Symmetrix systems
simultaneously (Site B and Site C). The data remains protected even
in case one target site (B or C) goes down. If site A (the primary site)
goes down, the customer can choose where to come up (site B or C)
based on SRDF/Star information. If the storage data in the other
surviving site is more current then changes will be incrementally sent
to the surviving site that will come up. For protection and
compliance, remote replications can start immediately to the new DR
site. For example, as shown in Figure 105, if database operations
resume in Site C, data will be sent first from Site B to create a no data
loss solution, and then Site B will become the new DR target.
SRDF/Star has a lot of flexibility and can change modes and topology
to achieve best protection with each disaster scenario. For full
description of the product refer to the SRDF product guide.
Figure 105
SRDF/Star
441
443
ASM
diskgroups
Database
devices
Recovery
Device
Groups (DG)
+DATA
18 LUNs x 50 GB
DATA_DG
+REDO
4 LUNs x 50 GB
REDO_DG
+FRA
3 LUNs x 50 GB
FRA_DG
Restart
Device
Groups
(DG)
DB_DG
SRDF
Consistency
Group (CG)
ALL_CG
ASM was set with three diskgroups: +REDO (redo logs), +DATA
(data, control, temp files), and +FRA (archives, flashback logs).
Typically EMC recommends separating logs from data for
performance monitoring and backup offload reasons. When
SRDF is used, temp files can go to their own "+TEMP" diskgroup
if replication bandwidth is limited as temp is not required for
database restart or recovery. In these use cases, however, SRDF
FC bandwidth was not an issue and temp files were included in
the +DATA diskgroup. Finally, +FRA can typically use a
lower-cost storage tier like SATA drives and therefore require
their own diskgroup.
(together with control files), log files, and archive logs each had
their own DG, allowing the replica of each to take place at slightly
different times as shown in the recovery use cases. For example, if
a valid datafile's backup replica should be restored to production,
and the production logs are intact, by separating the datafiles and
logs to their own DG and ASM diskgroups, such a restore won't
compromise the logs and full database recovery would be
possible. For a restart solution, a single DG was used that
includes all data (control) and log files, allowing them to be split
consistently creating a restartable and consistent replica.
For the sake of simplicity the use cases assume that GNS is used
and replicated remotely. When remote TimeFinder or SRDF
operations are used, they are issued on the target host. It is also
possible to issue remote TimeFinder and SRDF commands from
the local management host using the -rdf flag; however it requires
the SRDF links to be functional.
445
446
For SRDF replications a single CG was used that included all the
database devices (data, control and log files). As shown in Table 1
it also included the FRA devices. SRDF on its own is a restart
solution and since database crash recovery never uses archive
logs there is no need to include FRA in the SRDF replications.
However there are two reasons why they could be included. The
first is if Flashback database functionality is required for the
target. Replicating the FRA (and the flashback logs) in the same
consistency group with the rest of the database allows its usage
on the target of flashback functionality. The second reason is that
to allow offload of backup images remotely, the archive logs are
required (as shown in Use Case 6: Remote database valid backup
replicas on page 456).
447
5. Create two backup control files and place them in the FRA
diskgroup for convenience (RMAN syntax is shown, although
SQL can be used as well). One will be used to mount the database
for RMAN backup; the other will be saved with the backup set.
RMAN>run {
allocate channel ctl_file type disk;
copy current controlfile to
'+FRA/control_file/control_start';
copy current controlfile to
'+FRA/control_file/control_bakup';
release channel ctl_file;
}
448
9. Back up the database with RMAN from the backup host. The
control file copy that was not used to mount the instance
(control_bak) should be part of the backup set. The control_start file
should not be backed up because the SCN will be updated when
the database is mounted for backup.
RMAN>run {allocate channel t1 type disk;
backup format 'ctl%d%s%p%t'
449
controlfilecopy '+FRA/control_file/control_bak';
backup full format 'db%d%s%p%t' database;
backup format 'al%d%s%p%t' archivelog all;
release channel t1;
}
Note: Note: The format specifier %d is for date, %t for 4-byte timestamp, %s
for backup set number, and %p for the backup piece number.
450
Detailed steps
On the production host
1. Shut down any production database and ASM instances (if still
running).
# export ORACLE_SID=RACDB1
# sqlplus "/ as sysdba"
SQL> shutdown abort
# export ORACLE_SID=+ASM1
# sqlplus "/ as sysdba"
SQL> shutdown abort
3. Start the ASM instance (follow the same activities as in Use Case
1, step 7).
4. Mount the database (follow the same activities as in Use Case 1,
step 8).
5. Recover and open the production database. Use resetlogs if
incomplete recovery was performed.
# export ORACLE_SID=RACDB1
# sqlplus "/ as sysdba"
SQL> startup mount
SQL> recover automatic database using backup
controlfile until cancel;
SQL> alter database open;
451
Note: Note: Follow the same target host prerequisites as in Use Case 1 prior
to step 7.
# export ORACLE_SID=+ASM
# sqlplus "/ as sysdba"
SQL> startup
At this point the clone database is opened and available for user
connections.
4. Optionally, it is easy and fast to refresh the TimeFinder replica
from production as TimeFinder/Clone operations are incremental
as long as the clone session is not terminated. Once the clone
session is reactivated, the target devices are available
immediately for use, even if background copy is still taking place.
1. Shut down the clone database instance since it needs to be
refreshed
SQL> shutdown abort
453
2. Once the SRDF target is close enough to the source change the
replication mode to SRDF/S or SRDF/A.
1. For SRDF/S, set protection mode to sync:
# symrdf -cg ALL_CG set mode sync
2. Start the ASM instance. Follow the same activities as in Use Case
3 step 2.
3. Start the database instance. Follow the same activities as in Use
Case 3 step 3.
At this point the clone database is opened and available for user
connections.
4. Optionally, to refresh the database clone follow the same activities
as in Use Case 3 step 4.
455
High-level steps
1. Place the database in hot backup mode.
2. If using SRDF/A, perform SRDF checkpoint (no action required
for SRDF/S).
3. Activate a remote DATA_DG clone (with -consistent if SRDF/A
and/or ASM are used).
4. End hot backup mode.
5. Archive the current log.
6. Copy two backup control files to the FRA ASM diskgroup.
7. If using SRDF/A then perform SRDF checkpoint (no action
required for SRDF/S).
8. Activate the remote ARCHIVE_DG clone (with -consistent if
SRDF/A and/or ASM is used).
9. Optionally mount the remote clone devices on the backup host
and perform RMAN backup.
Device groups used
DATA_DG and ARCH_DG for TimeFinder operations, ALL_CG for
SRDF operations
456
Detailed steps
On the production host
1. Place production in hot backup mode. Follow the same activities
as in Use Case 1 step 1.
2. If SRDF/A is used then an SRDF checkpoint command will make
sure the SRDF target has the datafiles in backup mode as well.
# symrdf -cg ALL_CG checkpoint
457
High-level steps
1. Shut down production database and ASM instances.
2. Restore the remote DATA_DG clone (split afterwards). Restore
SRDF in parallel.
3. Start ASM.
4. Mount the database.
5. Perform database recovery (possibly while the TimeFinder and
SRDF restore are still taking place) and open the database.
Device groups used
DATA_DG; ALL_CG for SRDF operations
Detailed steps
On the production host
1. Shut down any production database and ASM instances (if still
running). Follow the same activities as in Use Case 2 step 1.
2. Restore the remote TimeFinder/Clone replica to the SRDF target
devices, then restore SRDF. If SRDF is still replicating from source
to target stop the replication first. Then start TimeFinder restore,
and once started start SRDF restore in parallel.
In some cases the distance is long, the bandwidth is limited, and
many changes have to be restored. In these cases it might make
more sense to change SRDF mode to Adaptive Copy first until the
differences are small before placing it again in SRDF/S or
SRDF/A mode.
# symrdf -cg ALL_CG split
# symclone -dg DATA_DG -tgt restore [-force]
# symrdf -cg ALL_CG restore
458
459
3. Start the ASM instance (follow the same activities as in Use Case 1
step 7).
4. Mount the database (follow the same activities as in Use case 1
step 8).
5. Perform database recovery based on one of the following options.
Full (complete) database recovery
When all online redo logs and archive logs are available it is possible
to perform a full media recovery of the Oracle database to achieve a
no data loss of committed transactions.
SQL> recover automatic database;
SQL> alter database open;
Note: Note: It might be necessary to point the location of the online redo logs
or archive logs if the recovery process didn't locate them automatically
(common in RAC implementations with multiple online or archive logs
locations). The goal is to apply any necessary archive logs as well as the
online logs fully.
460
set serveroutput on
declare
scn number(12) := 0;
scnmax number(12) := 0;
begin
for f in (select * from v$datafile) loop
scn := dbms_backup_restore.scandatafile(f.file#);
dbms_output.put_line('File ' || f.file# ||'
absolute fuzzy scn = ' || scn);
if scn > scnmax then scnmax := scn; end if;
end loop;
dbms_output.put_line('Minimum PITR SCN = ' ||
scnmax);
end;
461
Conclusion
Symmetrix VMAX is a new offering in the Symmetrix product line
with enhanced scalability, performance, availability, and security
features, allowing Oracle databases and applications to be deployed
rapidly and with ease.
With the introduction of Enterprise Flash Drives, and together with
Fibre Channel and SATA drives, Symmetrix provides a consolidation
platform covering performance, capacity, and cost requirements of
small and large databases. The correct use of storage tiers together
with the ability to move data seamlessly between tiers allow
customers to place their most active data on the fastest tiers, and their
less active data on high-density, low-cost media like SATA drives.
Features such as Autoprovisioning allow ease of storage provisioning
to Oracle databases, clusters, and physical or virtual server farms.
TimeFinder and SRDF technologies simplify high availability and
disaster protection of Oracle databases and applications, and provide
the required level of scalability from the smallest to the largest
databases. SRDF and TimeFinder are easy to deploy and very well
integrated with Oracle products like Automatic Storage Management
(ASM), RMAN, Grid Control, and more. The ability to offload
backups from production, rapidly restore backup images, or create
restartable database clones enhances the Oracle user experience and
data availability.
Oracle and EMC have been investing in an engineering partnership
to innovate and integrate both technologies since 1995. The
integrated solutions increase database availability, enhance disaster
recovery strategy, reduce backup impact on production, minimize
cost, and improve storage utilization across a single database instance
or RAC environments.
462
Copies of the production init.ora files for the ASM instance and
the database instance were copied to the target host and modified
if required to fit the target host environment.
463
Figure 106
Test configuration
Table 34
464
All RAC nodes share the same set of devices and have proper
ownerships.
Symmetrix device groups are created for shared storage for RAC.
OS
Oracle version
Dell
Dell
Dell
Table 34
OS
Oracle version
Type
Enginuity version
Symmetrix
VMAX
5874
Symmetrix
VMAX
5874
465
466
B
Sample SYMCLI Group
Creation Commands
467
-g
-g
-g
-g
-g
device_group
device_group
device_group
device_group
device_group
add
add
add
add
add
dev
dev
dev
dev
dev
0CF
0F9
0FA
0FB
101
-g
-g
-g
-g
-g
device_group
device_group
device_group
device_group
device_group
associate
associate
associate
associate
associate
dev
dev
dev
dev
dev
00C
00D
063
064
065
Composite group:
1. Create the composite group:
symcg create device_group -type regular
468
symcg
symcg
symcg
symcg
-g
-g
-g
-g
device_group
device_group
device_group
device_group
add
add
add
add
dev
dev
dev
dev
0F9
0FA
0FB
101
-sid
-sid
-sid
-sid
123
456
456
456
-cg
-cg
-cg
-cg
-cg
device_group
device_group
device_group
device_group
device_group
associate
associate
associate
associate
associate
dev
dev
dev
dev
dev
00C
00D
063
064
065
-sid
-sid
-sid
-sid
-sid
123
123
456
456
456
This example shows how to build and populate a device group and a
composite group for TimeFinder/Clone usage:
Device group:
1. Create the device group device_group:
symdg create device_group -type regular
-g
-g
-g
-g
-g
device_group
device_group
device_group
device_group
device_group
add
add
add
add
add
dev
dev
dev
dev
dev
0CF
0F9
0FA
0FB
101
3. Add the target clone devices to the group. The targets for the
clones can be standard devices or BCV devices. In this example,
BCV devices are used. The number of BCV devices should be the
same as the number of standard devices, and the same size or
larger than the paired standard device. The device serial numbers
of the BCVs used in the example are 00C, 00D, 063, 064, and 065.
symbcv
symbcv
symbcv
symbcv
symbcv
-g
-g
-g
-g
-g
device_group
device_group
device_group
device_group
device_group
associate
associate
associate
associate
associate
dev
dev
dev
dev
dev
00C
00D
063
064
065
469
Composite group:
1. Create the composite group device_group:
symcg create device_group -type regular
-g
-g
-g
-g
-g
device_group
device_group
device_group
device_group
device_group
add
add
add
add
add
dev
dev
dev
dev
dev
0CF
0F9
0FA
0FB
101
-sid
-sid
-sid
-sid
-sid
123
123
456
456
456
3. Add the target for the clones to the device group. In this example,
BCV devices are added to the composite group to simplify the
later symclone commands. The number of BCV devices should be
the same as the number of standard devices and the same size.
The device serial numbers of the BCVs used in the example are
00C, 00D, 063, 064, and 065.
symbcv
symbcv
symbcv
symbcv
symbcv
-cg
-cg
-cg
-cg
-cg
device_group
device_group
device_group
device_group
device_group
associate
associate
associate
associate
associate
dev
dev
dev
dev
dev
00C
00D
063
064
065
-sid
-sid
-sid
-sid
-sid
123
123
456
456
456
470
-g
-g
-g
-g
-g
device_group
device_group
device_group
device_group
device_group
add
add
add
add
add
dev
dev
dev
dev
dev
0CF
0F9
0FA
0FB
101
-g
-g
-g
-g
-g
device_group
device_group
device_group
device_group
device_group
add
add
add
add
add
dev
dev
dev
dev
dev
291
292
394
395
396
-vdev
-vdev
-vdev
-vdev
-vdev
Composite group:
1. Create the composite group device_group:
symcg create device_group -type regular
-g
-g
-g
-g
-g
device_group
device_group
device_group
device_group
device_group
add
add
add
add
add
dev
dev
dev
dev
dev
0CF
0F9
0FA
0FB
101
-sid
-sid
-sid
-sid
-sid
123
123
456
456
456
-cg
-cg
-cg
-cg
-cg
device_group
device_group
device_group
device_group
device_group
add
add
add
add
add
dev
dev
dev
dev
dev
291
292
394
395
396
-sid
-sid
-sid
-sid
-sid
123
123
456
456
456
-vdev
-vdev
-vdev
-vdev
-vdev
471
472
C
Related Host Operation
473
Overview
Previous sections demonstrated methods of creating a database
copy using storage-based replication techniques. While in some
cases, customers create one or more storage-based database
copies of the database as "gold" copies (copies that are left in a
pristine state on the array), in most cases they want to present
copied devices to a host for backups, reporting, and other
business continuity processes. Mounting storage- replicated
copies of the database requires additional array-based,
SAN-based (if applicable), and host-based steps including LUN
presentation and masking, host device recognition, and
importing of the logical groupings of devices so that the
operating system and logical volume manager recognize the data
on the devices. Copies of the database can be presented to a new
host or presented back to the same host that sees the source
database. The following sections describe the host-specific
considerations for these processes.
Whether using SRDF, TimeFinder, or Replication Manager to
create a copy of the database, there are six essential requirements
for presenting the replicated devices and making the copies
available to a host. They include:
SAN considerations
Hosts can be attached to a Symmetrix DMX either by direct
connectivity (FC-AL, iSCSI, ESCON, or FICON), or through a
SAN using Fibre Channel (FC-SW). When using direct-connect,
all LUNs presented to a front-end port are presented to the host.
In the case of a SAN, additional steps must be considered. These
include zoning, which is a means of enabling security on the
switch, and LUN masking, which is used to restrict hosts to see
only the devices that they are meant to see. Also, there are
HBA-specific SAN issues that must be configured on the hosts.
SAN zoning is a means of restricting FC devices (for example,
HBAs and Symmetrix front-end FC director ports) from accessing
all other devices on the fabric. It prevents FC devices from
accessing unauthorized or unwanted LUNs. In essence, it
establishes relationships between HBAs and FC ports using
World Wide Names (WWNs). WWNs are unique hardware
identifiers for FC devices. In most configurations, a one-to-one
relationship (the zone) is established between an HBA and FC
port, restricting other HBAs (or FC ports) from accessing the
LUNs presented down the port. This simplifies configuration of
shared SAN access and provides protection against other hosts
gaining shared access to the LUNs.
In addition to zoning, LUN masking, which on the Symmetrix
array is called Volume Logix, can also be used to restrict hosts to
see only specified devices down a shared FC director port. SANs
are designed to increase connectivity to storage arrays such as the
Symmetrix. Without Volume Logix, all LUNs presented down a
FC port would be available to all hosts that are zoned to the
front-end port, potentially compromising both data integrity and
security.
The combination of zoning and Volume Logix, when configured
correctly for a customer's environment, ensures that each host
only sees the LUNs designated for it. They ensure data integrity
and security, and also simplify the management of the SAN
environment. There are many tools to configure zoning and LUN
Overview
475
476
AIX considerations
When presenting copies of devices from an AIX environment to a
different host from the one the production copy is running on, the
first step is to scan the SCSI bus, which allows AIX to recognize
the new devices. The following demonstrates the steps needed for
the host to discover and verify the disks, bring the new devices
under PowerPath control if necessary, import the volume groups,
and mount the file systems (if applicable).
1. Before presenting the new devices, it is useful to run the
following commands and save the information to compare to
after the devices are presented:
lsdev -Cc disk
lspv
syminq
477
4. The next step is for the target host to recognize the new devices.
The following command scans the SCSI buses and examines all
adapters and devices presented to the target system:
cfgmgr -v
478
Once the devices are discovered by AIX, the next step is to import
the volume groups. The key is to keep track of the PVIDs on the
source system. The PVID is the physical volume identifier that
uniquely identifies a volume across multiple AIX systems. When
the volume is first included in a volume group, the PVID is
assigned based on the host serial number and the timestamp. In
this way, no two volumes should ever get the same PVID.
However, array-based replicating technologies copy everything
on the disk including the PVID.
7. On the production host, use the lspv command to list the physical
volumes Locate the PVID of any disk in the volume group being
replicated. On the secondary host, do an lspv as well. Locate the
hdisk that corresponds to the PVID noted in the first step.
Suppose the disk has the designation hdisk33. The volume group
can now be imported using the command:
importvg -y vol_grp hdisk33
479
10. The first time this procedure is performed, create mount points
for the file systems if raw volumes are not used. The mount
points should be made the same as the mount points for the
production file systems.
AIX and BCV considerations
TimeFinder/Mirror uses BCVs, which are by default in the
"defined" state to AIX. To change these volumes to the "available"
state, execute the following command:
/usr/lpp/EMC/Symmetrix/bin/mkbcv -a
HP-UX considerations
When presenting clone devices in an HP-UX environment to a
host different from the one the production copy is running on,
initial planning and documentation of the source host
environment is first required. The following demonstrates the
steps needed for the target host to discover and verify the disks,
bring the new devices under PowerPath control if necessary,
import the volume groups and mount the file systems (if
applicable).
1. Before presenting the new devices, it is useful to run the
following commands on the target host and save the information
to compare to output taken after the devices are presented:
vgdisplay -v | grep "Name"(List all volume groups)
syminq(Find Symmetrix volume for each c#t#d#)
480
3. Create map files for each volume group to replicate. The Volume
Group Reserve Area (VGRA) on disk contains descriptor
information about all physical and logical volumes that make up
a volume group. This information is used when a volume group
is imported to another host. However, logical volume names are
not stored on disk. When a volume group is imported, the host
assigns a default logical volume name. To ensure that the logical
volume names are imported correctly, a map file generated on the
source is created for each volume group and used on the target
host when the group is imported.
vgexport -v -p -m /tmp/vol_grp.map vol_grp
7. Create device special files for the volumes presented to the host:
insf -e
481
482
12. Import the volume groups onto the target host. Volume group
information from the source host is stored in the Volume Group
Reserve Area (VGRA) on each volume presented to the target
host. Volume groups are imported by specifying a volume group
name, if the volume group names are not used on the target.
vgimport -v -m vg_map_file vol_grp /dev/rdsk/c#t#d#
[/dev/rdsk/c#t#d#]
14. Once the volume groups are activated, mount on the target any
file systems from the source host. These file systems may require
a file system check using fsck as well. Add an entry to /etc/fstab
for each file system.
Linux considerations
Enterprise releases of Linux from Red Hat and SuSE provide a
logical volume manager for grouping and managing storage.
However, it is not common to use the logical volume manager on
Linux. The technique deployed to present and use a copy of
Oracle database on a different host depends on whether or not the
logical volume manager is used on the production host. To access
the copy of the database on a secondary host, follow these steps:
1. Create a mapping of the devices that contain the database to file
systems. This mapping information is used on the secondary
host. The mapping can be performed by using the information in
the /etc/fstab file and/or the output from the df command.
In addition, if the production host does not use logical volume
manager, the output from syminq and
symmir/symclone/symsnap command is required to associate
the operating-system device names (/dev/sd<x>) with
Symmetrix device numbers on the secondary host.
2. Unlike other UNIX operating systems, Linux does not have a
utility to rescan the SCSI bus. Any of the following methods allow
a user to discover changes to the storage environment:
483
vgchange -a y volume_group_name
The pvscan command displays all the devices that are initialized,
but not belonging to a volume group. The command should
display all members of the volume groups that constitute the
copy of the database. The vgimport command imports the new
devices and creates the appropriate LVM structures needed to
access the data. If LVM is not used, this step can be skipped.
6. Once the volume groups, if any, are activated, mount on the target
any file systems from the source host. If logical volume manager
is not being used, execute syminq on the secondary host. The
output documents the relationship between the operating system
device names (/dev/sd<x>) and the Symmetrix device numbers
associated with the copy of the database. The output from step 1
can be then used to determine the devices and the file systems
that need to be mounted on the secondary host.
These file systems may require a file system check (using fsck)
before they can be mounted. If it does not exist, make an entry to
/etc/fstab for each file system.
Solaris considerations
When presenting replicated devices in a Solaris environment to a
different host from the one production is running on, the first step
is to scan the SCSI bus which allows the secondary Solaris system
to recognize the new devices. The following steps cause the host
to discover and verify the disks, bring the new devices under
PowerPath control if necessary, import the disk groups, start the
logical volumes, and mount the file systems (if applicable). The
following commands assume that VERITAS Volume Manager
(VxVM) is used for logical volume management.
1. Before presenting the new devices, run the following commands
and save the information to compare to, after the devices are
presented:
vxdisk list
vxprint -ht
syminq
485
Oracle disk group which physical devices are used, and show the
relationship between hdisks should be run on the host prior to
making any device changes. This is a precaution only and is to
document the environment should it reqiore a manual restore
later.
vxdg list(List all the disk groups)
vxdisk list(List all the disks and associated groups)
syminq(Find Symmetrix volume numbers for each Oracle
disk)
4. The next step is for the target host to recognize the new devices.
The following command scans the SCSI buses, examines all
adapters and devices presented to the target system, and builds
the information into the /dev directory for all LUNs found:
drvconfig;devlinks;disks
6. VERITAS needs to discover the new devices after the OS can see
them. To make VERITAS discover new devices, enter:
vxdctl enable
486
8. Once VERITAS has found the devices, import the disk groups.
The disk group name is stored in the private area of the disk. To
import the disk group, enter:
vxdg -C import diskgroup
Use the -C flag to override the host ownership flag on the disk.
The ownership flag on the disk indicates the disk group is online
to another host. When this ownership bit is not set, the vxdctl
enable command actually performs the import when it finds the
new disks.
9. Run the following command to verify that the disk group
imported correctly:
vxdg list
11. For every logical volume in the volume group, run fsck must to
fix any incomplete file system unit of work:
fsck -F vxfs /dev/vx/dsk/diskgroup/lvolname
12. Mount the file systems. If the UID and GIDs are not the same
between the two hosts, run the chown command to change the
ownerships of the logical volumes to the DBA user and group
that administers the server:
chown dbaadmin:dbagroup /dev/vx/dsk/diskgroup/lvolname
chown dbaadmin:dbagroup
/dev/vx/rdsk/diskgroup/lvolname
13. The first time this procedure is performed, create mount points
for the file systems, if raw volumes are not used. The mount
points should be made the same as the mount points for the
production file systems.
Windows considerations
To facilitate the management of volumes, especially those of a
transient nature such as BCVs, EMC provides the Symmetrix
Integration Utility (SIU). SIU provides the necessary functions to scan
for, register, mount, and unmount BCV devices.
487
Figure 107
488
This command will unmount the volume from the drive letter and
dismiss the Windows cache that relates to the volume. If any running
application maintains an open handle to the volume. SIU will fail and
report an error. The administrator should ensure that no applications
are using any data from the required volume; proceeding with an
unmount while processes have open handles is not recommended.
The SIU can identify those processes that maintain open handles to
the specified drive, using the following command:
symntctl openhandle -drive W:
489
490
AIX considerations
When presenting database copies back to the same host in an AIX
environment, one must deal with the fact that the OS now sees the
source disk and an identical copy of the source disk. This is because
the replication process copies not only the data part of the disk, but
also the system part, which is known as the Volume Group
Descriptor Area (VGDA). The VGDA contains the physical volume
identifier (PVID) of the disk, which must be unique on a given AIX
system.
The issue with duplicate PVIDs prevents a successful import of the
copied volume group and has the potential to corrupt the source
volume group. Fortunately, AIX provides a way to circumvent this
limitation. AIX 4.3.3 SP8 and later provides the recreatevg command
to rebuild the volume group from a supplied set of hdisks or
powerdisks. Use syminq to determine the hdisks or powerdisks that
belong to the volume group copy. Then, issue either of the two
commands:
recreatevg -y replicavg_name -l lvrename.cfg hdisk##
hdisk## hdisk ##
recreatevg -y replicavg_name -l lvrename.cfg hdiskpower##
hdiskpower## hdiskpower##
where the ## represents the disk numbers of the disks in the volume
group. The recreatevg command gives each volume in the set of
volumes a new PVID, and also imports and activates the volume
group.
Presenting database copies to the same host
491
HP-UX considerations
Presenting database copies in an HP-UX environment to the same
host as the production copy is nearly identical to the process used for
presenting the copy to a different host. The primary differences are
the need to use a different name for the volume groups and the need
to change the volume group IDs on the disks.
1. Before presenting the new devices, it is useful to run the
following commands on the target host and save the information
to compare to outputs taken after the devices are presented:
vgdisplay -v | grep "Name"(List all volume groups)
syminq
(Find Symmetrix volume for each c#t#d#)
492
7. Create device special files for the volumes presented to the host:
insf -e
493
10. Once the devices are found by HP-UX, identify them with their
associated volume groups from the source host so that they can
be imported successfully. When using the vgimport command,
specify all of the devices for the volume group to be imported.
Since the target and LUN designations for the target devices are
different from the source volumes, the exact devices must be
identified using the syminq and symmir output. Source volume
group devices can be associated with Symmetrix source devices
through syminq output. Then Symmetrix device pairings from
the source to target hosts are found from the symmir device
group output. And finally, Symmetrix target volume to target
host device pairings are made through the syminq output from
the target host.
11. Change the volume group identifiers (VGIDs) on each set of
devices making up each volume group. For each volume group,
change the VGID on each device using the following:
vgchgid /dev/rdsk/c#t#d# [/dev/rdsk/c#t#d#] . . .
12. After changing the VGIDs for the devices in each volume group,
create the volume group structures needed to successfully import
the volume groups onto the new host. A directory and group file
for each volume group must be created before the volume group
is imported. Ensure each volume group has a unique minor
number and is given a new name.
ls -l /dev/*/group(Identify used minor numbers)
mkdir /dev/newvol_grp
mknod /dev/newvol_grp/group c 64 0xminor#0000
(minor# must be unique)
13. Import the volume groups onto the target host. Volume group
information from the source host is stored in the VGRA on each
volume presented to the target host. Volume groups are imported
by specifying a volume group name, if the volume group names
are not used on the target.
vgimport -v -m vg_map_file vol_grp /dev/rdsk/c#t#d#
[/dev/rdsk/c#t#d#]
494
15. Once the volume groups are activated, mount on the target any
file systems from the source host. These file systems may require
a file system check using fsck as well. An entry should be made
to /etc/fstab for each file system.
Linux considerations
Presenting database copies back to the same Linux host is possible
only if the production volumes are not under the control of the logical
volume manager. Linux logical volume manager does not have utility
such as vgchgid to modify the UUID (universally unique identifier)
written in the private area of the disk.
For a Oracle database not under LVM management, the procedure to
import and access a copy of the production data on the same host is
similar to the process for presenting the copy to a different host. The
following steps are required:
1. Execute syminq and symmir/symclone/symsnap to determine
the relationship between the Linux device name (/dev/sd<x>),
the Symmetrix device numbers that contain the production data,
and the Symmetrix device numbers that hold the copy of the
production data. In addition, note the mount points for the
production devices as listed in /etc/fstab and the output from the
command df.
2. Initiate the scan of SCSI bus by running the following command
as root:
echo "scsi scan-new-devices" > /proc/scsi/scsi
495
Solaris considerations
Presenting database copies to a Solaris host using VERITAS volume
manager where the host can see the individual volumes from the
source volume group is not supported other than with Replication
Manager. Replication Manager provides "production host" mount
capability for VERITAS.
The problem is that the VERITAS Private Area on both the source and
target volumes is identical. A vxdctl enable finds both volumes and
gets confused as to which are the source and target.
To get around this problem, the copied volume needs to be processed
with a vxdisk init command. This re-creates the private area. Then, a
vxmake using a map file from the source volume created with a
vxprint -hvmpsQq -g dggroup can be used to rebuild the volume
496
group structure after all the c#t#d# numbers are changed from the
source disks to the target disks. This process is risky and difficult to
script and maintain and is not recommended by EMC.
Windows considerations
The only difference for Windows when bringing back copies of
volumes to the same Windows server is that duplicate volumes or
volumes that appear to be duplicates are not supported in a cluster
configuration.
497
498
D
Sample Database
Cloning Scripts
499
500
############################################################
# Get the tablespace names using sqlplus
############################################################
su - oracle -c ${SCR_DIR}/get_tablespaces_sub.ksh
RETURN=$?
if [ $RETURN != 0 ]; then
exit 2
fi
############################################################
# Put the tablespaces into hot backup mode
############################################################
su - oracle -c ${SCR_DIR}/begin_hot_backup_sub.ksh
############################################################
# Split the DATA_DG device group
############################################################
${SCR_DIR}/split_data.ksh
RETURN=$?
if [ $RETURN != 0 ]; then
exit 3
fi
############################################################
# Take the tablespaces out of hot backup mode
############################################################
su - oracle -c ${SCR_DIR}/end_hot_backup_sub.ksh
############################################################
# Split the LOG_DG device group
############################################################
${SCR_DIR}/split_log.ksh
RETURN=$?
if [ $RETURN != 0 ]; then
exit 4
fi
echo "Script appeared to work successfully"
exit 0
=================================================================
#!/bin/ksh
############################################################
# establish.ksh
#
This script initiates a BCV establish for the $DATA_DG
#
and $LOG_DG device groups on the Production Host.
501
############################################################
############################################################
# Define Variables
############################################################
CLI_BIN=/usr/symcli/bin
DATA_DG=data_dg
LOG_DG=log_dg
############################################################
# Establish the DATA_DG and LOG_DG device groups
############################################################
${CLI_BIN}/symmir -g ${DATA_DG} -noprompt establish
RETURN=$?
if [ $RETURN != 0 ]; then
ERROR_DATE=`date`
echo "Split failed for Device Group ${DATA_DG}!!!"
echo "Script Terminating."
echo
echo "establish: failed"
echo "$ERROR_DATE: establish: failed to establish ${DATA_DG}"
exit 1
fi
${CLI_BIN}/symmir -g ${LOG_DG} -noprompt establish
RETURN=$?
if [ $RETURN != 0 ]; then
ERROR_DATE=`date`
echo "Establish failed for Device Group ${LOG_DG}!!!"
echo "Script Terminating."
echo
echo "establish: failed"
echo "$ERROR_DATE: establish: failed to establish ${LOG_DG}"
exit 2
fi
############################################################
# Cycle ${CLI_BIN}/symmir query for status
############################################################
RETURN=0
while [ $RETURN = 0 ]; do
${CLI_BIN}/symmir -g ${LOG_DG} query | grep SyncInProg \
> /dev/null
RETURN=$?
REMAINING=`${CLI_BIN}/symmir -g ${LOG_DG} query | grep MB | \
awk '{print $3}'`
echo "$REMAINING MBs remain to be established."
echo
502
sleep 10
done
RETURN=0
while [ $RETURN = 0 ]; do
${CLI_BIN}/symmir -g ${DATA_DG} query | grep SyncInProg \
> /dev/null
RETURN=$?
REMAINING=`${CLI_BIN}/symmir -g ${DATA_DG} query | grep MB | \
awk '{print $3}'`
echo "$REMAINING MBs remain to be established."
echo
sleep 10
done
exit 0
=================================================================
#!/bin/ksh
############################################################
# get_tablespaces_sub.ksh
#
This script queries the Oracle database and returns with
#
a list of tablespaces which is then used to identify
#
which tablespaces need to be placed into hotbackup mode.
############################################################
############################################################
# Define Variables
############################################################
SCR_DIR=/opt/emc/scripts
############################################################
# Get the tablespace name using sqlplus
############################################################
sqlplus internal <<EOF > /dev/null
set echo off;
spool ${SCR_DIR}/tablespaces.tmp;
select tablespace_name from dba_tablespaces;
spool off;
exit
EOF
############################################################
# Remove extraneous text from spool file
############################################################
cat ${SCR_DIR}/tablespaces.tmp | grep -v "TABLESPACE_NAME" | \
grep -v "-" |grep -v "rows selected." \
503
> ${TF_DIR}/tablespaces.txt
############################################################
# Verify the creation of the tablespace file
############################################################
if [ ! -s ${SCR_DIR}/tablespaces.txt ]; then
exit 1
fi
exit 0
=================================================================
#!/bin/ksh
############################################################
# begin_hot_backup_sub.ksh
#
This script places the oracle database into hot backup
#
mode.
############################################################
############################################################
# Define Variables
############################################################
SCR_DIR=/opt/emc/scripts
############################################################
# Do a log switch
############################################################
sqlplus internal <<EOF
alter system archive log current;
exit
EOF
############################################################
# Put all tablespaces into hot backup mode
############################################################
TABLESPACE_LIST=`cat ${SCR_DIR}/tablespaces.txt`
for TABLESPACE in $TABLESPACE_LIST; do
sqlplus internal <<EOF
alter tablespace ${TABLESPACE} begin backup;
exit
EOF
done
exit 0
=================================================================
504
#!/bin/ksh
############################################################
# split_data.ksh
#
This script initiates a Split for the $DATA_DG Device
#
group on the Production Host.
############################################################
############################################################
# Define Variables
############################################################
CLI_BIN=/usr/symcli/bin
DATA_DG=data_dg
############################################################
# Split the DATA_DG device group
############################################################
${CLI_BIN}/symmir -g ${DATA_DG} -noprompt -instant split
RETURN=$?
if [ $RETURN != 0 ]; then
ERROR_DATE=`date`
echo "Split failed for Device Group ${DATA_DG}!!!"
echo "It is not safe to continue..."
echo "Script Terminating."
echo
echo "split_data: failed"
echo "$ERROR_DATE: split_data: failed to split ${DATA_DG}"
exit 1
fi
############################################################
# Cycle ${CLI_BIN}/symmir query for status
############################################################
RETURN=0
while [ $RETURN = 0 ]; do
${CLI_BIN}/symmir -g ${DATA_DG} query | grep SplitInProg \
> /dev/null
RETURN=$?
REMAINING=`${CLI_BIN}/symmir -g ${DATA_DG} query | grep MB | \
awk '{print $3}'`
echo "$REMAINING MBs remain to be split."
echo
sleep 5
done
exit 0
=================================================================
505
#!/bin/ksh
############################################################
# end_hot_backup_sub.ksh
#
This script ends the hot backup mode for the oracle
#
database. The script is initiated by the end_hot_backup
#
scrips
############################################################
############################################################
# Define Variables
############################################################
SCR_DIR=/opt/emc/scripts
###########################################################
# Take all tablespaces out of hotbackup mode
############################################################
TABLESPACE_LIST=`cat ${SCR_DIR}/tablespaces.txt`
for TABLESPACE in $TABLESPACE_LIST; do
sqlplus internal <<EOF
alter tablespace ${TABLESPACE} end backup;
exit
EOF
done
############################################################
# Do a log switch
############################################################
sqlplus internal <<EOF
alter system archive log current;
exit
EOF
exit 0
=================================================================
#!/bin/ksh
############################################################
# split_log.ksh
#
This script initiates a Split for the $LOG_DG Device
#
group on the Production Host.
############################################################
############################################################
# Define Variables
############################################################
506
CLI_BIN=/usr/symcli/bin
LOG_DG=log_dg
############################################################
# Split the LOG_DG device group
############################################################
${CLI_BIN}/symmir -g ${LOG_DG} -noprompt -instant split
RETURN=$?
if [ $RETURN != 0 ]; then
ERROR_DATE=`date`
echo "Split failed for Device Group ${LOG_DG}!!!"
echo "It is not safe to continue..."
echo "Script Terminating."
echo
echo "split_data: failed"
echo "$ERROR_DATE: split_data: failed to split ${LOG_DG}"
exit 1
fi
############################################################
# Cycle ${CLI_BIN}/symmir query for status
############################################################
RETURN=0
while [ $RETURN = 0 ]; do
${CLI_BIN}/symmir -g ${LOG_DG} query | grep SplitInProg \
> /dev/null
RETURN=$?
REMAINING=`${CLI_BIN}/symmir -g ${LOG_DG} query | grep MB | \
awk '{print $3}'`
echo "$REMAINING MBs remain to be split."
echo
sleep 5
done
exit 0
=================================================================
507
508
E
Solutions Enabler
Command Line
Interface (CLI) for FAST
VP Operations and
Monitoring
Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and Monitoring
509
Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and
Monitoring
Overview
This appendix describes the Solutions Enabler commands lines (CLI)
that can be used to configure and monitor FAST VP operations. All
such operations can also be executed using the GUI of SMC.
Although there are command line counterparts for the majority of the
SMC-based operations, the focus here is to show only some basic
tasks that operators may want to use CLI for.
Enabling FAST
Operation: Enable or disable FAST operations.
Command:
symfast sid <Symm ID> enable/disable
Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and
Monitoring
}
Pool Bound Thin Devices(20): <== Number of Bound Thin Devices
(TDEV) in the Thin Pool
{
----------------------------------------------------------------------Pool
Pool
Total
Sym
Total Subs
Allocated
Written
Dev
Tracks (%) Tracks (%) Tracks (%) Status
----------------------------------------------------------------------0162
1650000 5 1010940 61 1291842 78 Bound
Overview
511
Solutions Enabler Command Line Interface (CLI) for FAST VP Operations and
Monitoring
0171 FC_Pool
FX
Bound
EFD_Pool
-SATA_Pool --
1650000 5
-
3720
2040 0
1499184 91
1505281 91
- -- --
Legend:
Flags: (E)mulation : A = AS400, F = FBA, 8 = CKD3380, 9 = CKD3390
(M)ultipool : X = multi-pool allocations, . = single pool allocation
: 000192601262
2565
1
2814 1214
1435 1131
512
: S = Static, D = Dynamic