SG 248485
SG 248485
Dino Quintero
Andrew Braid
Frederic Dubois
Alexander Hartmann
Octavian Lascu
Francois Martin
Wayne Martin
Stephan Navarro
Norbert Pistoor
Hubert Savio
Ralf Schmidt-Dannert
Redbooks
IBM Redbooks
November 2021
SG24-8485-00
Note: Before using this information and the product it supports, read the information in “Notices” on
page vii.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Contents v
vi Oracle on IBM Power Systems
Notices
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
Redbooks (logo) ® IBM Garage™ POWER8®
AIX® IBM Spectrum® POWER9™
FlashCopy® IBM Z® PowerVM®
IBM® InfoSphere® Redbooks®
IBM Cloud® Interconnect®
IBM Cloud Pak® POWER®
Itanium, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Ansible, OpenShift, Red Hat, are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in
the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
VMware, and the VMware logo are registered trademarks or trademarks of VMware, Inc. or its subsidiaries in
the United States and/or other jurisdictions.
Other company, product, or service names may be trademarks or service marks of others.
This IBM® Redbooks® publication helps readers to understand how Oracle uses the
architectural capabilities of IBM Power Systems.
This book delivers a technical snapshot of Oracle on Power Systems by using the general
available and supported software and hardware resources to help you guide you to
understand:
The goal of this publication is to complement Oracle and Power Systems features to deliver
general and typical content, including details with a modern publication view of the solution for
deploying Oracle (RAC and Single instance) on Power Systems by using theoretical
knowledge, hands-on exercises, and documenting the findings by way of sample scenarios.
This publication addresses topics for developers, IT specialists, systems architects, brand
specialist, sales team, and anyone looking for a guide about how to implement the best
options for Oracle on Power Systems. Moreover, this book provides documentation to transfer
the how-to-skills to the technical teams, and solution guidance to the sales team.
This publication complements the documentation that is available at the IBM Documentation,
and aligns with the educational materials that are provided by the IBM Garage for Systems
Technical Training.
Authors
This book was produced by a team of specialists from around the world working at IBM
Redbooks, Poughkeepsie Center.
Dino Quintero is a Power Systems Technical Specialist with Garage for Systems. He has 25
years of experience with IBM® Power Systems technologies and solutions. Dino shares his
technical computing passion and expertise by leading teams to develop technical content in
the areas of enterprise continuous availability, enterprise systems management,
high-performance computing (HPC), cloud computing, artificial intelligence (including
machine and deep learning), and cognitive solutions. He also is a Certified Open Group
Distinguished IT Specialist. Dino holds a Master of Computing Information Systems degree
and a Bachelor of Science degree in Computer Science from Marist College.
Frederic Dubois is a Global Competitive Sales Specialist at the IBM Garage™ for Systems
at IBM Global Markets in France. He delivers client value by way of his technical,
presentation, and writing skills, while supporting brand specific business strategies.
Alexander Hartmann is a Senior IT Specialist working for IBM Systems Lab Services in
Germany and is a member of the IBM Migration Factory. He holds a master’s degree in
Business Informatics from the University of Mannheim, Germany. With more than 25 years of
experience with relational databases, he has been working intensively for the last 16 years on
all aspects of Oracle Databases, with a focus on migration, and performance-, and
license-optimization. In addition to Oracle Database specialties, his areas of expertise include
AIX, Linux, scripting, and automation.
Octavian Lascu is an IBM Redbooks® Project Leader and a Senior IT Consultant for IBM
Romania with over 25 years of experience. He specializes in designing, implementing, and
supporting complex IT infrastructure environments (systems, storage, and networking),
including high availability and disaster recovery solutions and high-performance computing
deployments. He has developed materials for and taught over 50 workshops for technical
audiences around the world. He is the author of several IBM publications.
Francois Martin is a Global Competitive Sales Specialist is responsible for developing brand
and product specific solutions that address customer’s business needs (industry and
business) and deliver client value while supporting brand-specific business strategies.
Francois has experience and skills with Power Systems Sales competition. He understands
customer situations, sales, and technical sales to tackle competitive proposals. Francois
knows of competitor’s sales strategy, especially competition against Oracle Launch worldwide
sales plays, Enablement session, workshop, Webex for sellers, and BPs Skills and
experience from previous assignments in IBM Education, Cloud consolidation, Teaching,
Technical Sales Enablement, Performance benchmarks, TCO, AIX, Power systems,
Virtualization, and Architecture design and technology.
Wayne Martin is the IBM Systems Solution Manager who is responsible for the technology
relationship between IBM and Oracle Corporation for all IBM server brands. He also is
responsible for developing the mutual understanding between IBM and Oracle about
technology innovations that generate benefits for mutual customers. Wayne has held various
technical and management roles at IBM that focused on driving enhancements of ISV
software that use IBM mainframe, workstation, and scalable parallel products.
Stephan Navarro is an Oracle for IBM Power Systems Architect at the IBM Garage for
Systems at IBM Global Markets in France.
Norbert Pistoor was a Senior Consultant at Systems Lab Services in Germany and a
Member of the Migration Factory until his retirement in June 2021. He has more than 20 years
of experience with Oracle Databases and more than 10 years with cross-platform database
migrations. He contributed several enhancements to standard Oracle migration methods and
incorporated them into the IBM Migration Factory Framework. He holds a PhD in physics from
University of Mainz, Germany.
Wade Wallace
IBM Redbooks, Poughkeepsie Center
Majidkhan Remtoula
IBM France
Reinaldo Katahira
IBM Brazil
Preface xi
Now you can become a published author, too!
Here’s an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an IBM Redbooks residency project and help write a book
in your area of expertise, while honing your experience using leading-edge technologies. Your
efforts will help to increase product acceptance and customer satisfaction, as you expand
your network of technical contacts and relationships. Residencies run from two to six weeks
in length, and you can participate either in person or as a remote resident working from your
home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
[email protected]
Mail your comments to:
IBM Corporation, IBM Redbooks
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
1.1.1 Reliability
From a server hardware perspective, reliability is a collection of technologies (such as
chipkill memory error detection and correction and dynamic configuration) that enhance
system reliability by identifying specific hardware errors and isolating the failing components.
Built-in system failure recovery methods enable cluster nodes to recover, without falling over
to a backup node, when problems are detected by a component within a node in the cluster.
Built-in system failure recovery is transparent and achieved without the loss or corruption of
data. It is also much faster compared to system or application failover recovery (failover to a
backup server and recover).
Because the workload does not shift from this node to another, no other node’s performance
or operation is affected. Built-in system recovery covers applications (monitoring and restart),
disks, disk adapters, LAN adapters, power supplies (battery backups) and fans.
From a software perspective, reliability is the capability of a program to perform its intended
functions under specified conditions for a defined period. Software reliability is achieved
mainly in two ways: infrequent failures (built-in software reliability), and extensive recovery
capabilities (self-healing - availability).
IBM’s fundamental focus on software quality is the primary driver of improvements in reducing
the rate of software failures. As for recovery features, IBM-developed operating systems
historically mandated recovery processing in the mainline program and in separate recovery
routines as part of basic program design.
As IBM Power Systems become larger, more customers expect mainframe levels of reliability.
For some customers, this expectation derives from their previous experience with mainframe
systems, which were “downsized” to UNIX servers. For others, this large server is a
consequence of having systems that support more users.
The cost that is associated with an outage grows every year; therefore, avoiding outages
becomes increasingly important. This issue leads to new design requirements for all
AIX-related software.
For all operating system or application errors, recovery must be attempted. When an error
occurs, it is not valid to give up and end processing. Instead, the operating system or
application must at least try to keep the component that is affected by the error up and
running. If that is not possible, the operating system or application makes every effort to
capture the error data and automate system restart as quickly as possible.
The amount of effort that is put into the recovery is proportional to the effect of a failure and
the reasonableness of “trying again”. If recovery is not feasible, the effect of the error is
reduced to the minimum suitable level.
Today, many customers require that recovery processing be subject to a time limit. They
concluded that rapid termination with quick restart or takeover by another application or
system is preferable to delayed success.
1.1.2 Availability
Today’s systems feature hot plug capabilities for many subcomponents, from processors to
input/output cards to memory. Also, clustering techniques, reconfigurable input and output
data paths, mirrored disks, and hot swappable hardware help to achieve a significant level of
system availability.
From a software perspective, availability is the capability of a program to perform its function
whenever it is needed. Availability is a basic customer requirement. Customers require a
stable degree of certainty, and require that schedules and user needs are met.
Availability evaluates the percentage of time a system or program can be used by the
customer for productive use. Availability is determined by the number of interruptions and the
duration of the interruptions. It also depends on characteristics and capabilities, including the
ability to perform the following tasks:
Change program or operating system parameters without rebuilding the kernel and
restarting the system.
Configure new devices without restarting the system.
Install software or update software without restarting the system.
Monitor system resources and programs and cleanup or recover resources when failures
occur.
Maintain data integrity in spite of errors.
The AIX operating system includes many availability characteristics and capabilities from
which your overall environment benefits.
1.1.3 Serviceability
The focus on serviceability is shifting from providing customer support remotely through
conventional methods, such as phone and email, to automated system problem reporting and
correction, without user (or system administrator) intervention.
Hot swapping capabilities of some hardware components enhance the serviceability aspect.
A service processor with advanced diagnostic and administrative tools further enhances the
system serviceability. An IBM System p server’s service processor can call home in the
service report, which provides detailed information for IBM Service to act upon. This
automation not only eases the burden that is placed on system administrators and IT support
staff, but it enables rapid and precise collection of problem data.
On the software side, serviceability is the ability to diagnose and correct or recover from an
error when it occurs. The most significant serviceability capabilities and enablers in AIX are
referred to as the software service aids. The primary software service aids are error logging,
system memory dump, and tracing.
IBM made AIX robust regarding continuous availability characteristics. This robustness
makes IBM UNIX servers the best in the market. IBM AIX continuous availability strategy
features the following characteristics:
Reduce the frequency and severity of (planned and unplanned) AIX system outages.
Improve serviceability by enhancing AIX failure data capture tools.
Provide enhancements to debug and problem analysis tools.
Ensure that all necessary information that involves unplanned outages is provided to
correct the problem with minimal customer effort.
Use of mainframe hardware features for operating system continuous availability that us
brought to IBM System p hardware.
Provide key error detection capabilities through hardware-assist.
Use other IBM System p hardware aspects to continue transition to “stay-up” designs,
which are used for continuous availability.
Maintain operating system availability in the face of errors while minimizing application
effects.
Use of sophisticated and granular operating system error detection and recovery
capabilities.
Maintain a strong tie between serviceability and availability.
Provide problem diagnosis from data that is captured at first failure without the need for
further disruption.
Provide service aids that are nondisruptive to the customer environment.
Provide end-to-end and integrated continuous availability capabilities across the server
environment and beyond the base operating system.
Provide operating system enablement and application and storage use of the continuous
availability environment.
Other components that are not designed or manufactured by IBM are chosen and specified
by IBM to meet system requirements. These components are procured for use by IBM by
using a rigorous procurement process that is intended to deliver reliability and design quality
expectations.
The systems incorporate software layers (firmware) for error detection, fault isolation and
support, and virtualization in a multi-partitioned environment. These features include IBM
designed and developed service firmware. IBM PowerVM hypervisor also is designed and
supported by IBM.
In addition, IBM offers two operating systems that were developed by IBM: AIX and IBM i.
Both operating systems come from a code base with a rich history of design for reliable
operation.
These components are designed with application availability in mind, including the software
layers, which also can use hardware features, such as storage keys that enhance software
reliability.
Within the IBM POWER9 processor and memory subsystem, an investment was made in
error detection and fault tolerance. This investment includes checking for data in memory and
caches and validating that data was transferred correctly across busses.
It goes well beyond error detection to include techniques for checking state machine
transitions, residue checking for specific operations, and protocol checking to ensure that the
transmitted bits are correct, and that the data went when and where it was expected.
Although error detection seems like a well-understood and expected goal, it is not always the
goal of every possible subsystem design. For example, graphics processing units (GPUs)
whose primary purpose is rendering graphics in noncritical applications exist in which a single
dropped pixel on a window is of no significant importance, and a solid fault is an issue only if
it is noticed.
In general, I/O adapters also can have less hardware error detection capability where they
can rely on a software protocol to detect and recover from faults.
Another type of failure is what is broadly classified as a soft error. Soft errors are faults that
occur in a system and are occasional events that are inherent in the design or temporary
faults that are the result of an external cause.
For example, data cells in caches and memory can include a bit-value that is temporarily
upset by an external event, such as a cosmic ray-generated particle. Logic in processors
cores also can be subject to soft errors in which a latch can flip because of a particle strike or
similar event. Busses that transmit data can experience soft errors because of clock drift or
electronic noise.
The susceptibility to soft errors in a processor or memory subsystem depends on the design
and technology that is used in these devices. This capability is the first line of defense.
Methods for interleaving data so that two adjacent bits in array flipping do not cause
undetected multi-bit flips in a data word is another important design technique.
Ultimately when data is critical, detecting soft error events that occur needs to be done
immediately, inline to avoid relying on bad data because periodic diagnostics is insufficient to
catch an intermittent problem before damage is done.
The simplest approach to detecting many soft error events can be the use of parity protection
on data that can detect a single bit flip. However, when such simple single bit error detection
is deployed, the effect of a single bit upset is bad data. Discovering bad data without being
able to correct it results in the termination of an application (or even a system) if data
correctness is important.
To prevent such a soft error from affecting a system, a bit flip must be detected and corrected.
This process requires more hardware than simple parity. It is common now to deploy a bit
correcting error correction code (ECC) in caches that can contain modified data.
However, because such flips can occur in more than caches, such ECC codes are widely
deployed in POWER9 processors in critical areas on busses, caches, and so on.
Protecting a processor from more than data errors requires more than ECC checking and
correction. CRC checking with a retry capability is used on a number of busses, for example.
However, if a solid fault is continually being corrected, the second fault that occurs typically
causes data that is not correctable. This issue results in the need to at least end whatever
uses the data.
In many system designs, when a solid fault occurs in something like a processor cache, the
management software on the system (the hypervisor or operating system) can be signaled to
migrate the failing hardware off the system.
This process is called predictive deallocation. Successful predictive deallocation allows for
the system to continue to operate without an outage. However, to restore full capacity to the
system, the failed component still must be replaced, which results in a service action.
Examples include a spare data lane on a bus, a spare bit-line in a cache, having caches that
are split up into multiple small sections that can be deallocated, or a spare DRAM module on
a DIMM.
2.2.5 Redundancy
Redundancy is generally a means of continuing operation in the presence of specific faults by
providing more components or capacity than is needed for system operation, but where a
service action is to be taken to replace the failed component after a fault.
Sometimes, redundant components are not actively in use unless a failure occurs. For
example, a processor actively uses only one clock source at a time, even when redundant
clock sources are provided.
In contrast, if a system is said to have “n+1” fan redundancy, all “n+1” fans normally are active
in a system absence a failure. If a fan fail occurs, the system runs with “n” fans. In such a
case, power and thermal management code compensate by increasing fan speed or making
adjustments according to operating conditions per the power management mode and policy.
If on the first-phase failure, the system continues to operate, no call out is made for repair, and
the first failing phase is considered spare. After the failure (spare is said to be used), the
Voltage Regulator Module (VRM) can experience another phase failure with no outage. This
component maintains the required n+1 redundancy. If a second phase fail, a redundant phase
fails and a call-out for repair is made.
To a significant degree, this error handling is contained within the processor hardware.
However, service diagnostics firmware (depending on the error) can aid in the recovery.
When fully virtualized, specific operating system involvement in such tasks as migrating off a
predictively failed component also can be performed transparent to the operating system.
The PowerVM hypervisor can create logical partitions with virtualized processor and memory
resources. When these resources are virtualized by the hypervisor, the hypervisor has the
capability of deallocating fractional resources from each partition when necessary to remove
a component, such as a processor core or logical memory block (LMB).
When an I/O device is directly under the control of the operating system, the device error
handling is the device driver’s responsibility. However, I/O can be virtualized through the VIO
Server offering; that is, I/O redundancy can be achieved independently of the operating
system.
2.2.8 Building system level RAS rather than just processor and memory RAS
IBM builds Power Systems with the understanding that every item that can fail in a system is a
potential source of outage.
Although building a strong base of availability for the computational elements, such as the
processors and memory is important, it is hardly sufficient to achieve application availability.
The failure of a fan, power supply, voltage regulator, or I/O adapter might be more likely than
the failure of a processor module that is designed and manufactured for reliability.
Scale-out servers maintain redundancy in the power and cooling subsystems to avoid system
outages that occur because of common failures in those areas. Concurrent repair of these
components also is provided.
This level of RAS investment extends beyond what is expected and often what is seen in
other server designs. For example, at the system level, selective sparing includes such
elements as a spare voltage phase within a voltage regulator module.
Further, the error detection and fault isolation capabilities are intended to enable retry and
other mechanisms to avoid outages that are caused by soft errors. It also allows for use of
self-healing features, which requires a detailed approach to error detection.
This approach is beneficial to systems as they are deployed by users. it also includes benefits
in the design, simulation, and manufacturing test of systems.
Including this level of RAS into the hardware cannot be an afterthought. It must be integral to
the design from the beginning, as part of an overall system architecture for managing errors.
Therefore, during the design process of a processor, IBM places considerable emphasis on
developing structures within it specifically for error detection and fault isolation.
Each subsystem in the processor hardware features registers that are devoted to collecting
and reporting fault information as they occur. The design for error checking is rigorous and
detailed. The value of data is checked generally wherever it is stored. This checking
mechanism is true for data that is used in computations, but also nearly any other data
structure, including arrays that only are used to store performance and debug data.
Error checkers are derived for logic structures by using various techniques. Such techniques
include checking the validity of state-machine transitions, defining and checking protocols for
generated commands, performing residue checks for specific computational instructions and
by other means. These checks are made to detect faults before the resulting effect
propagates beyond the detecting subsystem.
The exact number of checkers and type of mechanisms is not as important as is the fact that
the processor is designed for detailed error checking. That is, much more is required than just
reporting that a fault occurred at run time.
All of these errors feed a data reporting structure within the processor. Registers are used
that collect the error information. When an error occurs, that event typically results in
generating an interrupt.
The error detection and fault isolation capabilities maximize the ability to categorize errors by
severity and handle faults with the minimum effect as possible.
Ideally, this code primarily handles recoverable errors, including orchestrating the
implementation of specific “self-healing” features, such as the use of spare DRAM modules in
memory, purging and deleting cache lines, and the use of spare processor fabric bus lanes.
Code within a hypervisor controls specific system virtualized functions, especially as it relates
to I/O including the PCIe controller and specific shared processor accelerators. Generally,
errors in these areas are signaled to the hypervisor.
In addition, a reporting mechanism still exists for what amounts to the more traditional
machine-check or checkstop handling.
In an IBM POWER7 generation system, the PRD was said to run and manage most errors
whether the fault occurred at run time, or at system IPL time, or after a system-checkstop,
which is the descriptive term for entire system termination by the hardware because of a
detected error.
In IBM POWER8®, the processor module included a Self-Boot-Engine (SBE). This engine is
loaded code on the processors that is intended to bring the system up to the point where the
hypervisor can be started. Specific faults in early steps of that IPL process were managed by
this code and PRD ran as host code as part of the start process.
In POWER9™ process-based systems, during normal operation the PRD code is run in a
special service partition in the system on the POWER9 processors by using the hypervisor to
manage the partition. This process has the advantage in systems with a single service
processor of allowing the PRD code to run during normal system operation, even if the
service processor is faulty.
In the rare event that a system outage resulted from a problem, the service processor
accessed the basic error information that identified the type of fault that occurred. It also
accessed considerable information about the state of the system hardware, including the
arrays and data structures that represent the state of each processing unit in the system, and
debug and trace arrays that can be used to further understand the root cause of faults.
Even if a severe fault caused system termination, this access provided the means for the
service processor to determine the root cause of the problem, deallocate the failed
components, and allow the system to restart with failed components removed from the
configuration.
POWER8 gained a System-Boot-Engine, which allowed processors to run code and boot by
using the POWER8 processors to speed up the process and provide for parallelization across
multiple nodes in the high-end system. During the initial stages of the IPL process, the boot
engine code handled specific errors and the PRD code ran as an application as later stages if
necessary.
If a fault occurred, the code running fails, and the hypervisor can restart the partition.
The system service processors also are still monitored at run time by the hypervisor code and
can report errors if the service processors are not communicating.
The PowerVM hypervisor uses a distributed model across the server’s processor and
memory resources. In this approach, some individual hypervisor code threads can be started
and ended as needed when a hypervisor resource is required. Ideally, when a partition must
access a hypervisor resource, a core that was running the partition then runs a hypervisor
code thread.
Specific faults that might affect a PowerVM thread result in a system outage if these faults
occur. This fault can be by stopping PowerVM or by the hardware determining that the system
must checkstop for PowerVM integrity.
Not designing a single system to have multiple physical partitions reflects the belief that the
best availability can be achieved if each physical partition runs in separate hardware.
Otherwise, a concern exists that when resources for separate physical partitions come
together in a system, even IBM POWER9 Processor-based Systems RAS with redundancy,
some common access point exists and the possibility a “common mode” fault that affects the
entire system.
Note: Oracle features specific considerations regarding what they support as a Public
Cloud. For more information, see the following resources:
Licensing Oracle Software in the Cloud Computing Environment
Oracle Database Support for Non-Oracle Public Cloud Environments
(MyOracleSupport Doc ID 2688277.1)
The following motivators drive enterprises to switch from traditional IT to a cloud computing
model to run enterprise infrastructure more effectively and expand the business:
Lower IT costs: Offload some or most of the costs and effort of purchasing, installing,
configuring, and managing your own on-premises infrastructure.
Improve agility and time-to-value: Use enterprise applications in minutes, instead of
waiting weeks or months for IT to respond to a request, purchase and configure
supporting hardware, and install software.
Scale more easily and cost-effectively: Instead of purchasing excess capacity that sits
unused during slow periods, you can scale capacity up and down in response to spikes
and dips in traffic.
Many companies choose hybrid cloud to establish a mix of public and private cloud
resources. This mixture includes a level of orchestration between them that gives an
organization the flexibility to choose the optimal cloud for each application or workload.
Organizations also can move workloads freely between the two clouds as circumstances
change.
IBM Power Systems provides a cloud-ready platform that integrates with most of the needed
tools and software to enable customers to implement cloud and Hybrid Multicloud.
The following packaged solutions are available from IBM to address this requirement:
On-premises: IBM Private Cloud with Dynamic Capacity
Off-premises: hIBM Power Systems Virtual Server
The base capacity that a customer must purchase is as low as 1 core and 256 GB. Users can
buy capacity credits for resource usage beyond the base capacity. They also can add multiple
systems to the pool, as shown in Figure 3-2.
When resource usage exceeds the aggregated base of the pool, capacity credits are debited
in real time, based on by-the-minute resource consumption. Clients no longer need to worry
about over-provisioning capacity to support growth.
For more information, see IBM Power Systems Private Cloud with Shared Utility Capacity:
Featuring Power Enterprise Pools 2.0, SG24-8478.
All LPARs that are running Oracle DB on Power Systems can be assigned to a capped pool of
processors to benefit from CPU sharing mechanisms that are provided by the hypervisor
(shared processor pool). LPARs that require more CPU resource can get more capacity from
other LPARs in that pool that cede idle CPU cycles to the pool. Overall CPU capacity is hard
limited by the shared CPU pool.
An VM/LPAR can be limited to the maximum number of cores or processors to which it has
access. This technology is consistent with Oracle’s hard partitioning guidelines for Oracle
licensing terms and conditions.
Power Enterprise Pools 2.0 has no effect on shared processor pools or LPARs capping
mechanism. It relies on real CPU consumption and does not interact with resources that are
assigned to a specific LPAR. That is, Power Enterprise Pool 2.0 is not a technology to reduce
software licensing costs, but optimize hardware acquisition costs.
The IBM Power Virtual Server environment consists of Power Systems servers, PowerVM
Hypervisor, and AIX operating systems that are certified for Oracle DB 12c, 18c, and 19c and
other Oracle SW Products (Application, Middleware). This same stack is used by tens of
thousands of customers in their current IT environment.
Oracle publishes their certifications of the PowerVM hypervisor, its features, AIX 7.1 and 7.2,
and confirms support of these features at the following websites:
Certified Virtualization and Partitioning Technologies for Oracle Database and RAC
Product Releases
My Oracle Support
The environment uses LPARs and adheres to Oracle’s hard partitioning guidelines, if LPM is
not used with those LPARs running Oracle software. Oracle licensing terms and conditions
are always based on the contract between the customer and Oracle. For more information,
see the Oracle Partitioning Policy document.
The customer is responsible to comply with the terms of the licensing contract with Oracle
(regardless of the chosen deployment option chosen) on-premises or IBM Power Systems
Virtual Server.
1 https://www.oracle.com/us/support/library/057419.pdf
This hybrid multicloud design requires a unified cloud management solution. Power Systems
is compatible with leading cloud orchestration technologies, including Red Hat Ansible, Chef,
Puppet, and VMWare vRealize. Those single panes of glass solutions help and centralize
deployment and management of environments wherever they run.
This section addresses the steps to move from a traditional Power Systems IT landscape to a
hybrid (on-premises + off-premises) POWER infrastructure.
Combined with on-demand capacity, clients realize cloud-like agility to provision virtual
machines (VMs) that is based on templates that can incorporate software to accelerate
deployment and installation of the complete stack. AIX, IBM i, or Linux on POWER VMs can
easily be captured as new images and used by users to build new environments faster by way
of the PowerVC portal.
Within this transformation of traditional IT, organizations also are modernizing traditional
software to containerized applications to help improve operational efficiency, cloud integration
from multiple vendors, and to build a more unified cloud strategy.
Red Hat OpenShift is fully enabled and supported on IBM Power Systems to rapidly build,
deploy, and manage cloud-native applications.
IBM Cloud Paks are lightweight, enterprise-grade, modular cloud solutions, that integrate a
container platform, containerized IBM middleware and open source components, and
common software services for development and management.
Thanks to Red Hat OpenShift, which is paired with IBM Cloud Paks, customers gain
enterprise-ready, containerized software solutions for an open, faster, and more secure way
to move core business applications to any cloud (starting by on-premises private cloud).
For more information about Red Hat OpenShift and IBM Cloud Pak on IBM Power Systems,
see Red Hat OpenShift V4.X and IBM Cloud Pak on IBM Power Systems Volume 2,
SG24-8486.
The same technology stack as on-premises allows users to develop and customize those new
environments and return them to on-premises (or vice versa) without risks or massive effort.
Our example shows how to export a custom AIX image from on-premises Power Systems
infrastructure to IBM Power Systems Virtual Server by using PowerVC and IBM Cloud Object
Storage. PowerVC images that are built with custom content can be exported in OVA format
into an IBM Cloud Object Storage bucket to be imported into the boot image catalog of your
Power Virtual Server Instance.
This process allows you to quickly build environments based on your own customized AIX
Image. The process can be easily reversed to migrate an AIX LPAR built-in Power Virtual
Server on which customization was done to back on-premises.
Clients can subscribe for the minimal virtual server configuration and expand the
configuration depending on the needs at the time of failover or workload needs. IBM
Infrastructure as a Service (IaaS) management expertise and experience is used to provide
the services that relate to the hardware and infrastructure. Applications, databases, and
operating systems layers remain under customer or out-sourced management control.
IBM Cloud Paks include solutions, such as IBM Multicloud Manager, that allow customers to
adopt VMs and containers in a hybrid multicloud environment while using your infrastructure
investments. This multicloud self-service management platform empowers developers and
administrators to meet business demands.
This platform allows you to efficiently manage and deliver services through end-to-end
automation. It also enables developers to build applications that are aligned with enterprise
policies and use open source Terraform to manage and deliver cloud infrastructure as code.
With Red Hat Ansible Tower, users can now define one master workflow that ties different
areas of IT together, which is designed to cover a hybrid infrastructure without being stopped
at specific technology silos. Red Hat Ansible ensures your cloud deployments work
seamlessly across off-premises, on-premises, or hybrid cloud as easily as you can build a
single system.
Figure 3-6 on page 23 shows the high-level hybrid multicloud reference architecture that is
inclusive of the major industry hardware platforms; IBM Power Systems, IBM Z®, and x86.
The architecture that is shown in Figure 3-6 applies to VMs (logical partitions) that are
running AIX, which allows you to automate the deployment and management of the Oracle
Database on AIX across the management stack.
Other orchestration tools are available, such as VMWare vRealize. IBM’s partnership with
VMware provides clients with the vRealize Suite to unify applications and infrastructure
management across IBM Power Systems, x86 servers, and IBM Z environments.
The following sections describe how to take advantage of key components of the process to
move to Hybrid Cloud on POWER for your Oracle Database. Examples of automated Oracle
Database environment deployments are illustrated.
For more information about the process to build an Oracle DB as a Service (DBaaS) offering
on AIX/Power Systems by developing a reference image with Oracle Database and Oracle
Grid Infrastructure installed, see this IBM Support web page.
A post-provisioning step then updates the Oracle configuration to obtain the IP address, host
name, and database hostname that are defined during the LPAR Creation process by way of
PowerVC.
Note: Implementation details and all developed scripts to implement Oracle Database as a
Service are included. You can reuse them as-is to start your private cloud journey and then
customize and enhance them as per your own constraints and requirements.
Users of the DBaaS service can select from a set of deployable images, such as Oracle
Database 12c or 19c, with JFS2 or with ASM. Then, customization parameters are provided,
such as a database name or required storage capacity by way of the PowerVC interface. The
DBaaS service administrator also can define constraints and any specific approval
requirements as needed before a request is fulfilled.
After the user request is approved, all later steps are fully automated.
IBM Cloud® PowerVC Manager sends the deployment request to PowerVC with the provided
customization parameters. IBM PowerVC then does the bulk of the work. It creates the LPAR
by way of the HMC, allocates and maps storage, and then, starts the LPAR.
The deployment is then completed by running the post-deployment scripts that were included
in the deployment image at capture time. Cloud-Init is the technology that takes user inputs
and configures the operating system and software on the deployed virtual machine.
PowerVC relies on capturing and restoring images. This first step is part of the process to
convert traditional on-premises Power Systems infrastructure to a private cloud and offer
services and templates to users.
This PowerVC requires to maintain image sets and handle a large set of images to offer wide
flexibility and choice regarding the operating system version and software stack version
combination. However, PowerVC can be convenient when large databases must be cloned,
for example. The capture and restore process saves time and avoids reinstalling the software
stack and exporting and importing the Oracle DB.
To increase agility and choice in services to offer to users, you add an orchestration and a
decomposition solution. This addition brings you to next level of Hybrid Cloud because such
orchestration tools apply to on-premises and off-premises environments.
For more information about how to use IBM Terraform and Service Automation (instead of
PowerVC) to provide the control point and user interface for DBaaS for an Oracle Database,
while reusing existing PowerVC image, see this IBM Support web page.
This image approach for provisioning limits the operating system and database versions you
offer to users because it increases the number of images to build and maintain. Decomposing
the steps to build a deployed Oracle Database environment into several steps results in
longer deployment times, but allows the reuse of parts in other services and provides higher
flexibility and customization to the user of the DBaaS offering.
Terraform is an Open Source tool that was created by HashiCorp to handle provisioning of
cloud and infrastructure resources. These resources are described to Terraform by using a
high-level configuration language that is stored in configuration files or templates. From these
templates, Terraform generates an execution plan that describes its process and then runs it
to build the described infrastructure.
Figure 3-10 shows the decomposition of an Oracle DBaaS on AIX service to deploy an Oracle
Database on-premises or in an IBM Power Systems Virtual Server with the capability to
customize each of the steps based on user input.
Figure 3-10 Dividing an Oracle DBaaS to deploy it on an IBM Power System Virtual Server
Important: You must decide in advance what you want to include in the image and what
you want to make customizable. For instance, we decided to include in the base AIX image
the Oracle user and group creation in addition to applying our AIX and Oracle best
practices settings to have a ready to use AIX image for Oracle. This operating system
configuration and customization alternatively can be implemented by way of a post-deploy
scripted template that is responsible to set AIX and Oracle prerequisites and best
practices.
The first part of this DBaaS provisioning workflow is creating the LPAR. Depending on what
the user selects, the corresponding template is called to create the LPAR on Power Systems
on-premises or in Power Virtual Server (off-premises) by using an image with a specific
version of AIX and all Oracle Prerequisites and best practices set.
The number and size of extra volumes that host the Oracle DB was integrated to this LPAR
Creation Template to customize to DBA requirements for this new environment the storage
layout.
The second part of this provisioning is about installing Oracle Grid Infrastructure and the
configuration of Oracle ASM with custom disks settings. Oracle Home Path can be set by
default or modified by the user in addition to Oracle ASM Instance Password and Oracle
software version to install. This example offers the installation of an Oracle 18c or 19c
version.
A similar template was developed to install the Oracle Database Engine and offer similar
customization and choice; for example, Oracle Home Path of the Oracle DB installation
directory and Oracle DB Version to install.
Last is the creation of an Oracle Database Standalone Instance. We defined the following a
set of input and output parameters for each template:
Input parameters define the customization that you want to bring and offer regarding the
execution of each template. The list is not exhaustive and you can easily extend and
replace some parameters that we defined to customize the Oracle software component
installation.
Output parameters are the result of creation tasks during the execution of the template
and can be used by other templates that are run after it.
More parameters can be set for more customization and fit with your own requirements,
constraints, and needs.
For more information about Terraform and IBM Power Systems Virtual Server code, see IBM
Documentation.
The example that is available at this web page shows a new Oracle Database that is created
on-premises or off-premises in IBM Power Systems Virtual Server. It also shows the
orchestration of those independent Terraform templates combined to create such DBaaS for
the user. It also addresses the deployment of an application tier followed by its database tier.
Red Hat Ansible is agentless and does not require agents to be installed on your target
servers. It connects through the secured SSH protocol to run its tasks.
Red Hat Ansible is an open source software and easy to install by way of Yum. It can be
supported by a subscription from Red Hat.
IBM created an extensive set of Red Hat Ansible modules for the Power Systems user
community, ranging from operating system management to Cloud Management and
everything in between. You can use those modules to codify key maintenance and operational
tasks for AIX and the software stack so that you can focus on business priorities (see
Figure 3-11).
A notable entry to rapidly jump-start into your Red Hat Ansible project can be achieved with
Red Hat Ansible Galaxy, as shown in Figure 3-11. Galaxy is a no cost site for finding,
downloading, and sharing community developed roles.
For more information about the Community Ansible Collection for IBM Power Systems, see
this web page.
You can download the Supported Red Hat Ansible Collection for IBM Power Systems from
Automation Hub (Red Hat Ansible subscription required). For more information, see the
following resources:
Ansible Automation Platform Certified Content
Red Hat Automation Hub (log in required)
The Red Hat Ansible experience is identical across POWER and x86 servers. The same
steps can be repeated in the IBM Power Systems Virtual Server, public clouds environments,
and on-premises. Some Red Hat Ansible modules or playbooks can be platform or operating
system specifics. Then, customers can develop their own playbook to build a
platform-independent management solution.
In the following section, we describe an Oracle DBaaS on AIX and Power Systems that uses
Red Hat Ansible modules. The example relies on the similar workflow that is used with the
IBM Terraform and Service Automation example, as shown in Figure 3-12 on page 30.
This Red Hat Ansible playbook assumes that the AIX LPAR is created. It can be extended by
creating the AIX LPAR by way of another Red Hat Ansible module. Such corresponding Red
Hat Ansible module is in charge of the use of OpenStack APIs with PowerVC or Power Virtual
Server APIs to create the AIX LPAR that hosts the Oracle Database.
The deployment of an Oracle Database was decomposed into several Red Hat Ansible
modules to allow flexibility and re-use on those independent modules into other playbooks:
preconfig: This role performs AIX configuration tasks that are needed for Oracle
installation. It sets all AIX and Oracle best practices and prerequisites.
oracle_install: This role performs the Oracle binary installation.
oracle_createdb: This role creates the database instance by using the Oracle dbca utility
and custom parameters, such as DB password and DB SID.
Note: For more information about a playbook and roles project on development, see this
web page.
You can reuse them as-is to start the first Oracle Deployment on AIX by using Red Hat
Ansible and then, customize and enhance them per your own requirements.
Deploying Oracle DBaaS with vRealize Automation 7.2, vRealize Orchestrator and IBM
PowerVC illustrates the deployment of an Oracle DBaaS with vRealize Automation and
vRealize Orchestration and IBM PowerVC. It uses most of the concepts and deployment logic
that were introduced with PowerVC and IBM Terraform Automation and Services examples.
This chapter describes different methods to complete a migration. It also presents other
issues that must be considered and describes the arguments for and against the different
methods.
During migration, you need downtime for the application. Depending on the size of the
database (that is, the amount of data that must be copied) and the available downtime
window, you must carefully plan the migration.
The migration process does not include that only the database is moved to the new platform.
It also includes at least one (but can be more than) the following tasks:
Analyze the source environment
Evaluate and select a migration method
First test migration plus application test
Second test migration plus application test
Live migration
Plan for ample preparation time. Especially if several databases and departments are
involved because migration is not a task; rather, it is a project.
Note: Key aspects also include rollback options and point of no return in the migration
process.
Endianness
Endianness refers to the order in which the processor stores and interprets data. Systems are
big-endian or little-endian. Big-endian systems store the most significant byte of multi-byte
data at the first memory address and the least significant byte at the last memory address.
Little-endian system store data the opposite way (see Figure 4-1).
The choice is always a consideration between effort (preparation, planning, performing the
migration), tool cost (especially because logical replication methods are often costly), size,
and available downtime and network speed.
You must script all migration steps to reproduce them (first and second test and live
migration), especially when migrating a complex environment, a higher number of databases
or critical systems.
For a hardware refresh while staying on the same platform, you also must plan your
approach.
Note: You must be aware that the use of Live Partition Mobility to move an LPAR to a new
Power System is technically feasible and simple. However, Oracle Licensing terms state
that when you use Live Partition Mobility, you must purchase Oracle licenses for all CPU
cores in the old and new server. Therefore, this method often cannot be used for migration.
If any issues are revealed during the first test migration (error in scripts that must be fixed,
errors in the migrated database during application testing, and so on) you fix the scripts and
perform the second test migration.
Depending on the migration method that is used, you might need downtime for your source
system, which also applies to all future test migrations.
If any issues are revealed, consider repeating this step that is, the third test migration) to
avoid surprises during live migration as much as possible.
Applications must be shut down. After the migration, you must (depending on the allotted
time) run basic or comprehensive functional application tests before you decide whether to
proceed. It is important to start your first full backup at this point.
If not stated otherwise, you need downtime for the entire process.
If small databases are being migrated, this migration method is easy and quick to set up. For
any reasonably sized database (that is, for medium to large size databases), this approach is
too slow.
The expdp/impdp command line tools connect from the client over the network to the
database. However, they start only the export/import process.
Data is written to a directory on the database server that is defined in the database. It also is
possible to run the impdp command in a way that the export is done directly over a database
link between the target and source database, which saves time and space to perform the
migration.
Also, the transfer speed is considerably higher compared to exp/imp. Build-in parallelism is
available. as with the traditional export /import method, only table data is transferred; indexes
are re-created.
An empty target database also must be created, including all needed table spaces.
SYS/SYSTEM level objects are not transferred.
With Oracle Data Pump, you cannot use UNIX named pipes (FIFO) to run import while export
is still running. Data is written to a file system on the database server (into several files when
parallelism is used).
However, you can write data to an NFS file system, which is exported from the target
database server. Therefore, no extra copying of the dump after export finishes is needed;
import can start immediately.
Oracle Data Pump can be used with manageable effort for medium-sized databases when
adequate downtime is available.
Specific data types (for example, LONG/LONG RAW) cannot be queried by using a database
link. If such data types exist for a table, you can combine them with an exp/imp or Oracle Data
Pump for those tables.
This approach also can be classified into logical data transfer methods.
Changes to the database are extracted from the log files and (depending on configuration)
transferred (modified or unmodified) to another database, or skipped. By first creating a
database copy as of time stamp x (which can be done, for example, by using a time
consistent data pump dump) and then transferring all changes, the target database can be
put (and kept) in sync while the source database is still in use.
This process results in a reduced downtime migration because only replication must be
stopped and the application must be configured to access the target database. However, the
tools are mostly expensive.
Note: Oracle Data Guard running as a Logical Standby cannot be used in this context
because to set it up, you first must create a Physical Standby that later can be converted
into a Logical Standby. Therefore, the constraints of setting up an Oracle Data Guard
mirror still apply.
For migration, only sync must be disabled, the mirror must be “activated”, and the application
must be pointed to the new database.
If source and target are the same or one of these combinations, Oracle Data Guard is a good
method to use to perform the migration.
Another use of Oracle Data Guard is to obtain o generate a copy of the database that can be
used for test migrations. Because the source database must be down for migration, it can be
difficult to perform test migrations for databases that must be highly available.
However, if you first create an Oracle Data Guard mirror for migration, you can stop
replication, activate the copy database, and then, perform migrations steps.
For more information about this approach, see this Oracle white paper.
Note: The procedure that is used in this document creates a convert script
(convert_mydb.rman), which includes all data files, not only those files that include rollback
segments. Edit the script to remove any surplus data files.
Another optimization is to allocate more than one disk channel so that all data files are
converted in parallel. This process saves overall conversion time.
The following SQL query is used to identify the data files with rollback segments:
select distinct(file_name)
from dba_data_files a, dba_rollback_segs b
where a.tablespace_name=b.tablespace_name;
For more information about this approach, see My Oracle Support, Document 732053.1.
One optimization that can be done when the target database uses a file system (not ASM) is
to create a Data Guard mirror in which the file system from the target server that finally
contains the data files is NFS-exported to the server that is running the Data Guard instance.
By using this approach, all data files are at their final location after the Data Guard mirror is in
sync. This result eliminates the time that is needed to copy the data files from source to
target.
The high-level necessary steps are similar to the steps that are described in chapter 4.3.7,
“Cross-Platform Transportable Tablespace”.
For more information about this approach, see this Oracle white paper.
Even when data files are on the target platform (by using a DataGuard mirror with NFS) and
several channels are used to convert data files in parallel, you still might not realize the
needed throughput because it is limited by CPU and storage performance.
Imagine having a 20 TB database and a downtime window of 2 hours. Even if all actions are
scripted, this maximum of 80 minutes are available for conversion. Therefore, each second
(that is, 20*1024/(60*80) = 4.3 GB) must be read and written, for a total of 8.6 GBps on
average over the entire conversion time.
The needed downtime mostly depends on the number of data changes that were made to the
source database since the last incremental backup. If fewer changes were done, the process
takes less time, especially if compared to a full conversion during downtime.
The time that is needed for the incremental backups can be further reduced by using Block
Change Tracking. By using this method, only the data blocks that changed since the previous
backup are read.
For more information about this approach, see My Oracle Support, Doc ID 2471245.1.
1. Setup:
a. Run the /stage command to hold the storage for the backup files and shared
configuration files.
b. On the source side. create standby storage and wait for sync. Activate block change
tracking.
c. On the target, create an empty database and Data Guard mirror.
2. Prepare and rollforward:
a. On source (standby), run xttdriver.pl --backup (script from My Oracle Support).
b. On target (primary + standby), run xttdriver.pl --restore.
c. Repeat this process as many times as needed to keep the target as closely in sync
with the source as possible.
3. Transport (downtime is necessary if final migration, or stop sync to standby for test
migration):
– Source:
• Alter tablespaces to read only.
• xttdriver.pl --backup (last time).
(FTEX): expdp system/xxx full=y transportable=always [version=12]
1 But we found a way to enhance them to make this scenario possible by creating a version that will run certain
commands against the target primary database instead but gets the critical data from the source standby database
by way of a database link. Contact the Systems Lab Services Migration Factory for more details about this
approach: https://www.ibm.com/blogs/systems/tag/ibm-systems-lab-services-migration-factory/
Table 4-2 lists some suggestions on which migration methods can be recommended for
different scenarios. However, some issues might need to be considered based on each case’s
specifics.
Different platform / Transportable Source standby (on Less than one hour
same endianness. Database (TDB) NFS from target if on with enhancements.
(10.2+). file system);
Incremental Backup;
target standby.
Different platform / Full Transportable Source standby; Less than one hour
different endianness. Export/Import (FTEX) Incremental Backup; with enhancements.
(12c+) / target standby.
Cross Platform
Transportable
Tablespaces (XTTS)
(10g+).
Nothing else works or Data Pump (10 g+). N/A Data transfer time
database is small (<1 (parallel).
TB).
Often, we do not prefer methods that are not includes in the table, but logical replication might
be suitable in cases where small downtimes are required (less than one hour) and other
methods do not work.
To deal with these challenges and to make a structured approach to complex migration
projects, we developed the Migration Factory Framework.
We used the Migration Factory Framework (including some earlier versions) successfully for
several large migration projects. However, it is still a work in progress that we intend to
enhance and improve further over time.
This approach is based on Red Hat Ansible playbooks; therefore, it does not need any agent
programs to be installed on the source or target environments except for python. It runs on a
Linux workstation (called the Control Workstation) with Red Hat Ansible and Python3 that is
installed that can access the source and target machines by way of SSH as an unprivileged
user.
This user then needs the access to run commands on the source or target machines as
oracle/grid through sudo with no password per standard Red Hat Ansible practice.
Menu program
The Menu program is the front-end part of the Migration Factory Framework. It is used by the
migrators to perform the steps in the migration workflow as it is defined in the Menu
configuration files. It is a python3 program with a simple character-based user interface.
Each migration includes a line in a database list file that contains information, such as
database name, migration type (trial or live), which menu file to use (for example, which
workflow), and more. When starting the program, the user is prompted to choose a database
to be migrated. If the database is found in the database list file, the corresponding line is
locked exclusively for this user.
If another user attempts to work on the same database while the line is locked, it does not
work.
If a step is run successfully, the arrow advances to the next step in the workflow.
If a step fails, the failure is indicated by a crossed out arrow, and the workflow does not
advance. The problem must be fixed and then the failed step resubmitted (however, this is an
exception).
The steps in the workflow also can be grouped into larger units, which are called tasks. In the
database list file, each task can be associated with a specific date and time when this task
must be run.
Although this process does not occur automatically, when the user starts a new task in the
Menu program, the date and time are checked, and the user must specifically agree whether
the task must be run in advance. This check is useful for the beginning of the downtime, for
example, or when the workflow must pause to allow for some manipulation outside of the
workflow.
The user also czn decide to quit the menu after each step. In this case, the status of the
migration is stored in a status file and the migration is unlocked in the database list. Another
user (or the same user) later can start the Menu program and, after choosing the database,
automatically continue at the same step where the last user quit the menu.
Users also can work at different migrations at the same time by starting multiple copies of the
Menu program in different terminal windows and then, choosing different databases from the
database list file.
Menu configurations
The Menu program workflows are fully configurable. You can define the menu structure,
including submenus, tasks, and the individual steps, as shown in Example 4-2. For each step,
a corresponding Red Hat Ansible playbook is available that is started when the step is run.
We included sample workflows for some available migration methods and we plan to create a
more comprehensive collection.
:t:A:
:m:install "Setup source reference and target":
{
:a:a:xtt_setup "Setup scripts for xtt":
}
:t:B:
:m:prep_target "Prepare target database"
{
:a:a:ftex_drop_ts_target "Drop tablespaces in target database"
}
:t:C:
:m:increment "Incremental backup and restore"
{
:a:a:xtt_backup_reference "Take incremental backup from reference node"
:a:a:xtt_transfer "Transfer backup info to target"
:a:a:xtt_restore "Apply incremental backup to target"
}
:t:D:
:m:conv2snap "Convert source standby to snapshot standby"
:t:E:!:
:m:ftex "Perform FTEX migration
{
:aL:a:ftex_ts_ro_primary "Set tablespaces to read only on primary"
:aT:a:ftex_ts_ro_reference "Set tablespaces to read only on reference node"
:a:a:xtt_backup_reference "Take final backup from reference node"
:a:a:xtt_transfer "Transfer backup info to target"
:a:a:xtt_restore "Apply final incremental backup to target"
:a:a:ftex_drop_ts_target "Drop tablespaces in target database"
:a:a:ftex_apply "Perform FTEX migration"
}
:t:F:
:m:conv2phys "Convert source standby to physical standby"
{
:aT:a:dg_convert_physical "Convert source standby to physical standby"
}
Example 4-2 on page 45 shows the menu configuration file for the workflow of the “Combined
Method”, as discussed in 4.4, “Combined method for optimized migration” on page 40.
When a playbook is run through the menu program, all output is displayed in the window and
checked by the user for errors that occurred but were not caught by the logic of the playbook.
For later reference (and as a documentation aid), all output also is collected in log files in the
respective project directory.
Helper programs
Helper programs are available to help the preparation of the inventory file for the Red Hat
Ansible playbooks. When a Red Hat Ansible playbook runs, it needs an inventory file that
contains the host names of the machines that are used in the migration and many variables
that are needed by the playbooks.
The helper programs guides you through the process of creating the inventory file by asking
some basic questions about the migration and then, providing you with a template or sample
inventory file that might need some adaptation to your specific environment.
The simplest installation is a stand-alone Oracle install with JFS2 file systems. ASM requires
the installation of Grid Infrastructure that has the additional overhead of an ASM database.
When correctly configured the performance difference between JFS2 and ASM is negligible.
Implementing Oracle is a team effort with tasks for the DBA and the Systems, Storage, and
Network administrators. Getting the prerequisites correct is key to a successful installation.
5.2 Firmware
Your firmware must be as up to date as possible. Consider the following points:
Check the current firmware level by using the prtconf | grep "Firmware" AIX command
from your LPAR.
Example output:
# prtconf | grep "Firmware"
Platform Firmware level: VL940_027
Firmware Version: IBM,FW940.00 (VL940_027)
The latest firmware level is available at this IBM Support Fix Central web page.
5.3 AIX
The 19c database software is certified against AIX 7.1 and 7.2. The following minimum levels
are required:
7.1 TL5 SP01
7.2 TL2 SP1
Note: Unlike older versions, Oracle 19c is not certified on AIX 6.1.
For best performance and reliability, it is recommended to install the latest server firmware
and AIX TL and SP levels.
The Oracle installation documentation covers most tasks that are required to install Grid
Infrastructure and the Oracle Database software.
The Oracle documentation does not cover all the best practices. Some steps are omitted that
we cover in this publication for completeness or to highlight their importance.
One commonly used tool is TightVNC server. TightVNC server is not installed on AIX by
default. However, you can find it as part of the AIX Toolbox for Linux Applications, which is
available at no cost to download at this web page.
You also must install the TightVNC viewer or another VNC viewer on your desktop to access
the session. Complete the following steps:
1. Connect as the oracle or grid user and start the VNC server. You are prompted for a
password if this connection is the first time that VNC server was used.
2. Export the DISPLAY variable with the value that is indicated by the vncserver.
3. Run the xhost + command to allow external connections to the vncserver.
4. Connect to the vncserver by using the vncviewer you installed that uses the same host
and port that the vncserver is running on. Enter the password that you set.
The unzip must be version 6. You cannot work around this requirement with an old version of
unzip because it cannot handle the file size. You cannot use jre to perform the extraction
because the permissions are incorrect on the files and the runInstaller does not work.
Unzip also is included in the AIX Toolbox for Linux Applications. If you do not want to install
the full toolbox, you can download the rpm for unzip 6 from this web page.
Ulimit must be set for root, grid, and oracle users. Without the correct values, copying the
binaries onto the file system can fail because the zip file size exceeds the default ulimit
maximum file size. Setting the values to unlimited helps to avoid any issues, as shown in the
following example:
chuser threads='-1' nproc='-1' fsize='-1' data='-1' rss='-1' nofiles='-1' root
By default, the input/output completion port (IOCP) is set to Defined. To enable IOCP, set
IOCP to Available and perform the following commands as root:
mkdev -l iocp0
chdev -l iocp0 -P -a autoconfig='available'
Verify the settings by using the lsdev command to confirm that the IOCP status is set to
Available:
# lsdev | grep iocp
iocp0 Available I/O Completion Ports
Note: If IOCP was not defined, an AIX restart is required. Without this setting, the Oracle
installer fails, but the Grid Infrastructure that is installed does not check the status of IOCP;
however, your ASM disks are not available.
Note: You cannot apply updates that have a lower build sequence identifier.
If you update AIX, it is recommended to relink Oracle Home binaries. This process done as
the oracle user by using the command relink all.
The latest AIX Technology Level (TL) or Service Pack (SP) can be downloaded from this IBM
Fix Central web page (search for “AIX Technology Levels” to display the available levels).
5.4.1 Configuring LPAR profile for Shared Processor LPAR performance (best
practice)
If you use shared processors, correctly set and adjust entitled capacity (EC) and virtual
processor (VP) settings (rule of thumb: up to 30% gap range between EC and VP settings) to
mitigate CPU folding activity.
These lines are extract from the lparstat output that shows only the values that are
discussed here.
5.4.2 Checking that Power Saver Mode is set to Maximum Performance Mode
(best practice)
This process is done by using the HMC (from the Advanced System Management menu
[ASM] or the command-line interface [CLI]). This setting is the default setting from S924
model to E980.
To access the Advanced System Management menu from the HMC, complete the following
steps:
1. Select the server and then, Operations → Launch Advanced System Management
(ASM).
You are prompted for an administrator login and password.
2. Click ASM menu → System Configuration → Power Management → Power and
Performance Mode Setup (or Power Mode Setup for POWER8). Select the Enable
Maximum Performance mode option if it is not yet selected, as shown in Figure 5-1 on
page 52 (or, select Enable Fixed Maximum Frequency mode for POWER8).
The Speculative execution fully enabled option is described in the IBM Support
documentation1:
This optional mode is designed for systems where the hypervisor, operating system, and
applications can be fully trusted. Enabling this option can expose the system to
CVE-2017-5753, CVE-2017- 5715, and CVE-2017-5754. This includes any partitions that
are migrated (by using Live Partition Mobility) to this system. This option has the least
possible impact on the performance at the cost of possible exposure to both User
accessible data and System data.
If your entire environment is protected against Spectre and Meltdown vulnerabilities, you can
disable the security on the POWER9 frame by using the ASM interface from the HMC.
1 https://www.ibm.com/support/pages/how-disableenable-spectremeltdown-mitigaton-power9-systems
Note: This change is possible only when the server is powered off.
If at some time you must use live partition mobility to a POWER8 server, remain on
POWER8 mode.
The core and memory affinity map must be closely aligned as much as possible for optimal
performance. Check the affinity by using the lssrad -va command from the AIX instance that
is running on your LPAR. This command reports the logical map view.
As shown on the left side in Figure 5-2, the CPUs and memory are not aligned. In this case,
consider shutting down the frame and restarting the LPARs (starting with the largest) to align
the CPU and memory allocation. The result looks more like the example that is shown on the
right side of Figure 5-2.
Tip: With a shared processor system that is running RAC, it is suggested to set the
vpm_xvcpus parameter from schedo to 2 to avoid RAC node evictions under a light database
workload conditions (schedo -p -o vpm_xvcpus=2).
5.5 Memory
The installation document suggests that the minimum RAM for a database installation is 1
GB; however, 2 GB is recommended unless you use Grid Infrastructure. In that instance, 8
GB is recommended.
From the testing that we performed, it is clear that the database software can be installed with
a small amount of memory; however, dbca fails with less than 8 GB of RAM (+swapspace).
Therefore, we recommend 8 GB of RAM as a minimum without Gird Infrastructure and 24 GB
with it.
In the environment we created to validate this document, we allocated 8 GB, but the key
factor in defining the memory for a partition is the memory that is required for the SGA and
PGA or memory target of the database instance for the workload that you are running.
Oracle recommends the following memory allocations for the swap space:
1 GB - 2 GB: 1.5 times the size of the RAM
2 GB - 16 GB: Equal to the size of the RAM
More than 16 GB: 16 GB
For more information, see the LDR_CNTRL Settings in MOS Note, USE OF AIX
LDR_CNTRL ENVIRONMENTAL SETTINGS WITH ORACLE (Doc ID 2066837.1).
It is not sufficient to add the addresses to /etc/hosts. Oracle must detect them by using
nslookup. You must add the DNS details in the resolv.conf file on your server.
Note: To use the flow control as efficiency as possible, it must be enabled on all
network components (including the network switch).
Enable jumbo frames (jumbo_frame=yes) on a per-interface basis by using the lsattr -El
<interface name> -a jumbo_frame command.
Note: The network switch must be jumbo frame capable with jumbo frame support
enabled.
Virtual Ethernet
Enable largesend for mtu_bypass=on per interface basis by using the lsattr -El <interface
name> -a mtu_bypass command.
Set the following parameter to 4096 by using the chdev command on the Virtual Ethernet
adapter (inherited from SEA):
min_buff_tiny=max_buff_tiny=min_buff_small=max_buff_small=4096
The ntpd daemon maintains the system time of day in synchronism with internet-standard
time servers by exchanging messages with one or more configured servers at designated poll
intervals.
Under ordinary conditions, ntpd adjusts the clock in small steps so that the timescale is
effectively continuous. Under conditions of extreme network congestion, the round-trip delay
jitter can exceed three seconds and the synchronization distance (which is equal to one-half
the round-trip delay plus error budget terms) can become large.
As the result of this behavior, after the clock is set, it rarely stays more than 128 ms, even
under extreme cases of network path congestion. Sometimes, especially when ntpd is first
started, the error might exceed 128 ms. With RAC, this behavior is unacceptable. If the -x
option is included on the command line, the clock is never stepped and only slew corrections
are used.
Note: The -x option ensures that time is not set back, which is the key issue.
The -x option sets the threshold to 600 s, which is well within the accuracy window to set the
clock manually.
After the cluster is installed, you can ignore the following message in one of the alertORA.log
of the cluster:
[ctssd(2687544)]CRS-2409:The clock on host rac1 is not synchronous with the mean
cluster time. No action has been taken as the Cluster Time Synchronization Service
is running in observer mode.
Note: RAC also has its own time sync method, but this method is overridden by NTPD.
This parameter prevents the CPU from skyrocketing in case of memory page claims under
certain circumstances.
5.9 Storage
In this section, we discuss high-level storage issues. For more information, see Chapter 8,
“Oracle Database and storage” on page 111.
We recommend the use of eight a multiple of eight LUNs to improve I/O flow. We recommend
limiting the size of the LUNs to 512 GB.
Note: Normal practice is to use external redundancy on AIX because the redundancy is
provided by the storage.
5.9.2 Setting the queue depth to allow enough I/O throughput (best practice)
To check the current queue_depth value for an HDisk device, use the following command:
# lsattr -El hdisk1 –a queue_depth
To check the range of settings that is allowed by the driver, run the following command:
# lsattr -Rl hdisk1 -a queue_depth
1...256 (+1)
To set a new value with your own device and value, use the following command:
chdev -l hdisk1 -a queue_depth=256 –P
The value of max_transfer must be set to a minimum of 0x100000. Its value can be checked
by using the lsattr command. The value can be set by using the chdev command in the
same way.
The value of the algorithm is set to the shortest_queue again by using the chdev and lsattr
commands.
Example output:
# lsfs -q /u01
Name Nodename Mount Pt VFS Size Options Auto
Accounting
/dev/oralv -- /u01 jfs2 1002438656 noatime,rw yes no
(lv size: 1002438656, fs size: 1002438656, block size: 4096, sparse files: yes,
inline log: yes, inline log size: 512, EAformat: v1, Quota: no, DMAPI: no, VIX:
yes, EFS: no, ISNAPSHOT: no, MAXEXT: 0, MountGuard: no)
For the Oracle Database, set multiple I/O queues by configuring several LUNs to improve I/O
flow, particularly if the queue depth value is low.
When hosting the Oracle Database on JFS2, create a scalable volume group (for example, a
VG) with Physical Partition (PP) size set to 32 MB:
mkvg -S -y'<VG name>' -s '32' '-f' <space separated hdisk list>
Create separate JFS2 file systems for the following configuration files:
Data files (4 K default).
Redolog files (512 bytes)
All data files are spread across the LUN set (-e'x' flag) by using the noatime option and inline
JFS2 logging:
For each Logical Volume (LV):
mklv -y'<LV name>' -t'jfs2' -e'x' <VG name> <LV size in PP unit>
Check LV spreading by using the following command:
lslv <LV name> | grep "INTER-POLICY" which is tagged as "maximum“
Sample output:
INTER-POLICY: maximum RELOCATABLE: yes
For data files:
crfs -v jfs2 -d'<LV name>' -m'<JFS2 mount point>' -A'yes' -p'rw' -a
options='noatime' -a agblksize='4096' -a logname='INLINE' -a isnapshot='no’.
For redolog and control files:
crfs -v jfs2 -d'<LV name>' -m'<JFS2 mount point>' -A'yes' -p'rw' -a
options='noatime' -a agblksize='512' -a logname='INLINE' -a isnapshot='no'.
Example output:
# lsfs -q /oracle
Name Nodename Mount Pt VFS Size Options Auto
Accounting
/dev/oralv -- /oracle jfs2 1002438656 noatime,cio,rw
yes no
(lv size: 1002438656, fs size: 1002438656, block size: 4096, sparse files: yes,
inline log: yes, inline log size: 512, EAformat: v1, Quota: no, DMAPI: no, VIX:
yes, EFS: no, ISNAPSHOT: no, MAXEXT: 0, MountGuard: no)
For the Oracle Database, set multiple I/O queues by configuring several LUNs to improve I/O
flow, particularly if the queue depth value is low.
When hosting the Oracle Database on JFS2, create a scalable Volume Group (VG) with
Physical Partition (PP) size set to 32 MB:
mkvg -S -y'<VG name>' -s '32' '-f' <space separated hdisk list>
Create separate JFS2 file systems for the following configuration files:
Data files (4K default)
Redolog files (512 bytes)
Control files (512 bytes)
The hdisk2 is available to use for ASM. You see that the Physical Volume Identifier (PVID)
and volume group are both set to none. If not, ASM cannot see the disk.
If the PVID is not showing as none, reset it by using the following command:
chdev -l hdisk2 -a pv=clear
You can check the value of hdisk2 by using the following command:
lsattr -El hdisk2 -a reserve_policy
reserve_policy no_reserve Reserve Policy True+
The permissions on the rhdisks are to 066. These rhdisks are owned by the grid user and
the group asmadmin. If you must rename the devices, you can do so by using the rendev
command.
Installer (mandatory)
The grid infrastructure installer requests 79 GB of free space to install GI but needs only less
than 5 GB at this stage because the binaries are unpacked on the disk.
The runfixup.sh function fixes any minor configuration issues that are prerequisites, but it
does not apply the recommended best practices.
During the grid infrastructure installation process, you can ignore the space and swap space
requirements.
5.9.6 Required file sets for Oracle Database software installation (mandatory)
To check whether the necessary file sets are installed, run the following commands:
lslpp -l bos.adt.base bos.adt.lib bos.adt.libm bos.perf.perfstat
bos.perf.libperfstat bos.perf.proctools
lslpp -l|grep xlC
If you are running AIX 7.2 TL 2 SP1, this documentation also recommends installing APAR:
IJ06143.
The following package also is required for 19C but is not in the documentation:
Xlfrte.aix61-15.1.0.9
If you do not install this package, you get the error PRVF-7532: Package “xlftre.aix61” is
missing.
When you click What is ix Central (FC)?, you are required to log in by using your IBM ID
credentials.
Transfer the package to the server, unpack and install it by running the following command:
smit install_latest
Run the smitty command to show you that the following files were installed:
File:
I:xlfrte 15.1.0.0
S:xlfrte 15.1.0.10
I:xlfrte.aix61 15.1.0.0
S:xlfrte.aix61 15.1.0.10
Note: This document does not cover the requirements for NLS language settings.
In the documentation, the minimum is stated as approximately 11 GB. You need more space
for staging the zip and with the log files that are created in the Oracle Home, you need
approximately 15 GB as a minimum to have a usable environment without any patching. In
our test installation, the final size of the installation alone was 12.6 GB without the zip file or
any database logs or trace files created.
Setting the user and group IDs to be the same helps if you need to transfer files or data
between servers.
In this scenario, we create a file system /u01 for all the Oracle files:
– /u01/app/oraInventory was created for the inventory files.
– /u01/app/oracle was created for the Oracle Base.
– /u01/app/product was created for the ORACLE_HOME to be created as
/u01/app/product/19c.
We chose to use JFS2 file systems to store our data files.
5. Create the ORACLE_HOME directory and cd to it.
From this directory, start the unzip command for the zip file as the oracle user.
Oracle recommends performing the -q option:
unzip -q AIX.PPC64_193000_db_home.zip
Note: Unlike older versions of Oracle, the Oracle Home is not created within the Oracle
Base; it must be in a separate directory.
For this example, we created the /oradata file system that is owned by the oracle user for the
data files and redo logs.
Note: This process does not follow best practices; it is included here for demonstration
purposes only.
Your redo logs are on a separate file system with a block size of 512 bytes. The data files
must be in a file system that is mounted in cio. Alternatively, you can set the parameter
filesystemio_options to setall (see “Related publications” on page 127).
Note: The SGA of all database on a partition cannot exceed 70% of the available
memory on the server or partition.
9. We do not recommend the use of Memory Target. Although it is a useful tool if you do not
know the required memory size, assigning SGA and PGA targets avoids memory resize
operations that can affect performance. Click Next.
10.By default, the database ID is configured to include Enterprise Manager database
express. Clear this option if you do not want it to be installed. Click Next.
After the creation process completes, you are presented with another opportunity to change
Password Management options or you can close the dbca utility.
For memory allocation, we recommend setting SGA and PGA targets. Setting memory
targets can result in frequent memory resize operations, which degrades performance.
As a rule, the total memory that is taken by the SGA and PGA must not exceed 70% of the
memory that is allocated on the machine. This requirement allows enough memory for the
Oracle processes and operating system.
In this chapter, we describe the deployment of Oracle RAC 19c on a two node cluster that is
based on IBM Advanced Executive Interactive (AIX) and PowerVM on IBM POWER9 servers
that use IBM Spectrum Scale as shared storage infrastructure.
The choice of deploying Oracle RAC on IBM POWER9 with AIX and IBM Spectrum Scale
provides a strong combination of performance, scalability, and availability for your mission
critical databases and applications. The combination (Oracle RAC, Power, AIX, and Spectrum
Scale) goes back many years, starting with Oracle RAC 9i2 to the most recent announcement
for Oracle RAC 19c3.
For more information about IBM POWER9 systems, see this IBM Redbooks web page.
6.2.1 CPU
POWER9 provides advanced processor capabilities, including the following examples that we
consider the most relevant for our deployment:
High frequency super scalar Reduced Instruction Set Computing (RISC) architecture.
Parallel execution of multiple threads (8-way Simultaneous Multi-Threading - SMT).
Optimized execution pipeline and integrated accelerators.
Highly granular allocation capabilities (0.01 processor increments).
Dynamic CPU sparing.
For more information, see IBM POWER Virtualization Best Practices Guide.
6.2.2 Memory
Memory management for POWER9 servers includes the following features:
Dynamic memory allocation (add/remove): Also known as DLPAR Memory, this memory is
the traditional memory management that is based on PowerVM and Reliable Scalable
Technology (RSCT) feature.
Available Active Memory Expansion (priced feature): Provides in-memory data
compression, which provides expanded memory capacity for your system. This feature
relies on-chip compression feature of the Power processor.
Active Memory Sharing: Provides physical pool of memory sharing among a group of
partitions. This feature is implemented in PowerVM.
1
Availability is a combination of hardware and software working in synergy for providing the uptime required by your
applications’ service level agreement. Combined with the AIX operating system and management software, the
POWER9 platform can provide “five nines” uptime (99.999% availability). For more information, see:
https://www.ibm.com/it-infrastructure/us-en/resources/power/five-nines-power9.
2
For latest support matrix, see: https://www.oracle.com/database/technologies/tech-generic-unix-new.html
3 See http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10907
Availability for LPAR access to external network can be provided by various techniques and
combinations of thereof, including the following examples:
Redundant VIO Server configuration with SEA failover (with optional dual Ethernet
adapters and link aggregation in each VIO Server).
Dual Ethernet adapters (physical, virtual, SR-IOV, or vNIC) configuration in LPAR with
network interface backup configuration in AIX.
Redundant VIO Servers with vNIC failover.
4
Oracle Database binaries also can be deployed on shared storage; however, for availability during upgrades, we
decided not to use shared storage for database binaries.
The two-node cluster configuration starts by installing the base operating systems and
required packages for deploying Oracle RAC solution (Oracle Clusterware and Oracle
Database).
The AIX installation (AIX 7.2 TL4 SP2) procedure is beyond the scope of this document. Use
the installation method of your choice to deploy the operating system and other packages that
are required for your environment.
For more information about the required operating system (AIX operating system) and
packages, see the following Oracle documents, which are available at this Oracle Database
19c web page:
Grid Infrastructure Installation and Upgrade Guide - 19c for IBM AIX on POWER Systems
(64-Bit), E96274-03
Oracle Database Installation Guide - 19c for IBM AIX on POWER Systems (64-bit),
E96437-04
Although we can run all commands on each node, we can optionally use the distributed
systems management toolkit for AIX that allows us to run commands from a single node onto
multiple nodes (in our case, two nodes).
The packages that are used for this purpose are named dsm.core and dsm.dsh. These
packages provide the tools to run commands and copy files on multiple nodes by using a
single point of control (one of the nodes in the cluster).
Tip: The distributed systems management tools can be installed on only one of the nodes
in your cluster.
Example 6-2 shows the distributes systems management configuration. We use the dsh
command to run commands on both nodes in the cluster. Because the remote command
uses /usr/bin/ssh, the Secure Shell (ssh) must be configured to run commands on all nodes
without prompting the user for a password. Make sure that your nodes are configured for
password-less remote ssh command execution.
Important: Password-less remote commands execution with ssh also is required for
Spectrum Scale configuration and Oracle RAC installation and configuration.
Per Oracle Grid installation instructions, we also configured the SSH server by using the
following parameter:
LoginGraceTime 0a
Example 6-5 shows the real memory that we configured on our nodes.
Tip: If AIX toolbox for Linux packages is installed (or other open source packages) we
recommend that you set the AIX MANPATH variable (in /etc/environment) to include the
path to the man pages of the installed products (for example, add /opt/freeware/man to
your current MANPATH).
Example 6-8 shows the VMM5 tuning parameters on our cluster nodes.
Example 6-8 Virtual memory manager (VMM) parameters - same on both cluster nodes
# vmo -L minperm%
NAME CUR DEF BOOT MIN MAX UNIT TYPE
DEPENDENCIES
--------------------------------------------------------------------------------
minperm% 3 3 3 1 100 %% memory D
--------------------------------------------------------------------------------
# vmo -L maxperm%
NAME CUR DEF BOOT MIN MAX UNIT TYPE
DEPENDENCIES
--------------------------------------------------------------------------------
maxperm% 90 90 90 1 100 %% memory D
minperm%
maxclient%
--------------------------------------------------------------------------------
# vmo -L maxclient%
NAME CUR DEF BOOT MIN MAX UNIT TYPE
DEPENDENCIES
--------------------------------------------------------------------------------
maxclient% 90 90 90 1 100 %% memory D
maxperm%
minperm%
--------------------------------------------------------------------------------
# vmo -L strict_maxclient
NAME CUR DEF BOOT MIN MAX UNIT TYPE
DEPENDENCIES
--------------------------------------------------------------------------------
strict_maxclient 1 1 boolean d
strict_maxperm
--------------------------------------------------------------------------------
# vmo -L strict_maxperm
NAME CUR DEF BOOT MIN MAX UNIT TYPE
DEPENDENCIES
--------------------------------------------------------------------------------
strict_maxperm 0 0 boolean d
strict_maxclient
--------------------------------------------------------------------------------
Example 6-10 shows the maximum user processes and block size allocation (system
parameters) on our cluster nodes.
The network configuration for our test environment consists of the following parameter:
Network tuning options parameters, as shown in Example 6-11.
6 See https://www.ibm.com/support/pages/aix-virtual-processor-folding-misunderstood
Network interfaces, as shown in Example 6-12. For our test environment, we use ent0 for
Public LAN and ent2 for Private LAN (RAC Interconnect).
<node2> (aop93cl093)
-------------------------------------------------------------------------------
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
en0 1500 link#2 fa.f6.ea.66.14.20 12474675 0 11019992 0 0
en0 1500 129.xx.xx aop93cl093 12474675 0 11019992 0 0
en1 1500 link#3 fa.16.3e.23.22.6c 505 0 404 0 0
en1 1500 10.10.1 node2-ic 505 0 404 0 0
en2 1500 link#4 62.66.4a.82.88.7 30714349 0 28337844 0 0
en2 1500 10.20.1 node2-ic2 30714349 0 28337844 0 0
lo0 16896 link#1 3607774 0 3607774 0 0
lo0 16896 127 loopback 3607774 0 3607774 0 0
lo0 16896 loopback 3607774 0 3607774 0 0
The host names and basic network configuration, as shown in Example 6-14.
<node2> (aop93cl093)
-------------------------------------------------------------------------------
authm 65536 Authentication Methods True
bootup_option no Use BSD-style Network Configuration True
gateway Gateway True
hostname aop93cl093.pbm.ihost.com Host Name True
rout6 IPv6 Route True
route net,-hopcount,0,,0,129.xx.xx.xxx Route True
Network name resolution for our test environment is configured for the use of static IP
addresses that are resolved to IP labels locally (/etc/hosts), and the use of Domain
Name Server (DNS, which is configured in /etc/resolv.conf), in this specific order. The
network name resolution order is set in /etc/netsvc.conf.
Time synchronization
Generally, time synchronization between cluster nodes is required in any cluster environment.
In our test environment, we use Network Time Protocol (NTP) client on both nodes, which is
configured in /etc/ntp.conf, as shown in Example 6-15.
After the changes are made in your environment, you must restart the xntpd service to pick
up the changes (stopsrc -s xntpd; startsrc -s xntpd). To check the xntpd, you can use
the command that is shown in Example 6-16.
Check that both nodes in the cluster use the same NTP configuration by using the dsh date
command, as shown in Example 6-17.
The created users feature the properties and capabilities that are shown in Example 6-19.
# dsh lsuser -f -a capabilities fsize cpu data stack core rss nofiles grid|dshbak
-c
HOSTS -------------------------------------------------------------------------
aop93cl093, aop93cld24
-------------------------------------------------------------------------------
grid:
capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE
fsize=-1
cpu=-1
data=-1
stack=-1
core=2097151
rss=-1
nofiles=-1
We set the log on password for grid and oracle users and created a file in the users’
home directory, as shown in Example 6-20.
## For node1:
# su - oracle
$ cat ~/.oraenv
export ORACLE_HOME=/u02/app/oracle/product/19.0.0/dbhome_1
export ORACLE_SID=itsodb1
export PATH=$PATH:${ORACLE_HOME}/bin
$ exit
##For node2:
# su - oracle
$ cat ~/.oraenv
export ORACLE_HOME=/u02/app/oracle/product/19.0.0/dbhome_1
export ORACLE_SID=itsodb2
export PATH=$PATH:${ORACLE_HOME}/bin
$ exit
Important: The storage configuration must be adjusted to fit your configuration needs. For
more information, see the following publications:
Chapter 7 of the Oracle manual Grid Infrastructure Installation and Upgrade Guide -
19c for IBM AIX on POWER Systems (64-Bit), E96274-03.
Chapter 1 of the Oracle manual Oracle Database Installation Guide - 19c for IBM AIX
on POWER Systems (64-bit), E96437-04.
The local storage configuration consists of two non-shared volumes (LUNs), each 100 GB, on
each cluster node. One disk volume is used for rootvg (AIX operating system) and the
second volume is used for Oracle binaries:
Local (non-shared) volumes are shown in Example 6-21.
Paging space also is configured per Oracle requirements, as shown in Example 6-22. Per
Oracle requirements, 16 GB of swap space is required as a minimum for systems with real
memory that is larger than 16 GB.
Note: You must correctly size the memory and paging space for your environment to
avoid the “out of paging space” error in AIX.
Local file systems that are used for our environment are shown in Example 6-23. The /u02
file system is used for Oracle binaries (typically, /u01 is used as the name of the file
system; however, this name is only a convention).
Tip: Example 6-24 shows that a LV is used for JFS2 logging operations. You also can
configure your JFS2 file systems with inline logging.
HOSTS -------------------------------------------------------------------------
aop93cl093
-------------------------------------------------------------------------------
Filesystem GB blocks Used Available Capacity Mounted on
/dev/fslv05 90.00 46.82 43.18 53% /u02
Directory structure that was prepared for Oracle installation is shown in Example 6-25.
Example 6-25 Directory structure for Oracle Grid and Oracle Database
##Oracle Grid
##Oracle Database
# dsh mkdir -p /u02/app/oracle
# dsh mkdir -p /u02/app/oraInventory
# dsh chown -R oracle:oinstall /u02/app/oracle
# dsh chown -R oracle:oinstall /u02/app/oraInventory
# dsh chmod -R 775 /u02/app
Note: Because we do not use ASM in our environment, we do not prepare a shared
storage for this purpose.
Note: The Spectrum Scale configuration that is shown in this section is provided for your
reference. Your configuration can differ, depending on your deployment requirements.
For more information about IBM Spectrum Scale installation and configuration, this IBM
Documentation web page.
For more information about the latest Spectrum Scale Frequently Asked Questions, see
this web page.
It is beyond the scope of this document to describe configuring external devices, such as
PowerVM VIO Servers and NPIVa, Storage Area Network, and external storage
subsystems configuration.
a. N-Port ID Virtualization (Fibre Channel virtualization)
Configuration steps
We completed the following steps:
1. Spectrum Scale software was installed on our nodes, as shown in Example 6-26.
We also set the path variable ($PATH) to include the Spectrum Scale binaries path.
# echo $PATH
/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java7_64/jre/bin:/usr/jav
a7_64/bin:/usr/lpp/mmfs/bin:/u02/app/19.0.0/grid/bin
2. We use the network configuration that is shown in Example 6-27 for our Spectrum Scale
cluster.
For our cluster configuration, we used en0.
Because Spectrum Scale is used as back-end storage for Oracle RAC, we configured the
local name resolution (/etc/hosts) for Spectrum Scale.
3. We tested the secure shell password-less access between Spectrum Scale cluster nodes,
as shown in Example 6-28. For more information about the SSH configuration, see IBM
Spectrum Scale Version 5.0.3: Concepts, Planning, and Installation Guide, SC27-9567.
4. We use the disks for Spectrum Scale that are shown in Example 6-29 on page 90. We
configured these disks with SCSI-3 Persistent reservation (only one of the disks that are
shown as an example).
5. We created the nodes descriptor file that is shown in Example 6-30 for Spectrum Scale
Cluster definition.
6. We applied the Spectrum Scale license in our cluster, as shown in Example 6-32.
Summary information
---------------------
Number of nodes defined in the cluster: 2
Number of nodes with server license designation: 2
Number of nodes with FPO license designation: 0
Number of nodes with client license designation: 0
Number of nodes still requiring server license designation: 0
Number of nodes still requiring client license designation: 0
This node runs IBM Spectrum Scale Advanced Edition
7. We created the disks descriptor files that are shown in Example 6-33 for our cluster and
added the NSDs to the cluster configuration. We also created the following files:
– gpfs_tie_disks, which was used to define NSDs that were used for cluster tiebreaker
configuration.
– gpfs_data_disks_oradata, which was used to define NSDs that were used for Oracle
shared data file system.
# cat gpfs_data_disks_oradata
%nsd: device=/dev/hdisk9
nsd=data11
servers=aop93cld24,aop93cl093
usage=dataAndMetadata
failureGroup=1
pool=system
%nsd: device=/dev/hdisk10
nsd=data12
servers=aop93cld24,aop93cl093
usage=dataAndMetadata
failureGroup=1
pool=system
%nsd: device=/dev/hdisk11
nsd=data13
servers=aop93cld24,aop93cl093
usage=dataAndMetadata
failureGroup=1
pool=system
%nsd: device=/dev/hdisk12
nsd=data14
servers=aop93cld24,aop93cl093
usage=dataAndMetadata
failureGroup=1
pool=system
%nsd: device=/dev/hdisk13
nsd=data21
servers=aop93cld24,aop93cl093
usage=dataAndMetadata
failureGroup=2
pool=system
%nsd: device=/dev/hdisk14
nsd=data22
servers=aop93cld24,aop93cl093
usage=dataAndMetadata
failureGroup=2
pool=system
%nsd: device=/dev/hdisk15
nsd=data23
servers=aop93cld24,aop93cl093
usage=dataAndMetadata
failureGroup=2
pool=system
%nsd: device=/dev/hdisk16
nsd=data24
servers=aop93cld24,aop93cl093
usage=dataAndMetadata
failureGroup=2
Example 6-34 shows the NSD creation and verification in our cluster. Currently, the NSDs
are not assigned to any file system.
8. Example 6-35 shows the cluster started and configured with NSD tiebreaker. Note the (*)
after the Quorum value (1) in the last mmgetstate output command. This asterisk indicates
that the Spectrum Scale cluster used nodes with tiebreaker disks as the cluster quorum.
# mmgetstate -aL
Node number Node name Quorum Nodes up Total nodes GPFS state Remarks
----------------------------------------------------------------------------------
1 aop93cld24 2 2 2 active quorum node
2 aop93cl093 2 2 2 active quorum node
# mmchconfig tieBreakerDisks=”tie1;tie2;tie3”
........
# mmlsconfig tieBreakerDisks
tiebreakerDisks tie1;tie2;tie3
# mmgetstate -aL
Node number Node name Quorum Nodes up Total nodes GPFS state Remarks
----------------------------------------------------------------------------------
1 aop93cld24 1* 2 2 active quorum node
2 aop93cl093 1* 2 2 active quorum node
Example 6-37 shows the file system creation and activation. For Oracle RAC
deployments, check the My Oracle Support Document Doc ID 2587696.1 for Spectrum
Scale file system parameters recommendations.
Tip: The file system we created contains a single pool that is named system. The
system pool contains disks for data and metadata. The data and metadata disks are
divided into two failure groups (1 and 2).
Also, data and metadata mirroring is configured for all files. The two copies of each data
and metadata block are stored in separate failure groups.
In addition to the two failure groups that are used for data and metadata, the system
pool also contains a disk that is used as a file system descriptor quorum (failure group
3).
We created the file system first and then, we added one disk to be used as file system
descriptor quorum. We used the following NSD descriptor file for this purpose:
# cat gpfs_tie_oradata_disk
%nsd: device=/dev/hdisk6
nsd=tie1
usage=descOnly
failureGroup=3
# mmmount all -a
# mmlsmount all -L
Tip: For more information about the parameters that are used when the file system is
created, see the man page of the mmcrsfs command.
Note: The distribution of the three file system descriptors is one per failure group.
9. We configured the directory structure and permissions that are shown in Example 6-39 for
deploying Oracle Clusterware and one Oracle RAC database. Oracle Clusterware
Registry (OCR) and Vote files are deployed in the /oradata/crs_files2 directory; the
Oracle data files are deployed in the /oradata/itsodb directory.
Important: We do not provide step-by-step Oracle Grid and Oracle Database installation
instructions because this process is documented by Oracle and guided by the Oracle
Universal Installer (OUI). Instead, we focus on the configuration parameters we selected
for our deployment.
Although specific actions can be performed on a single node during the installation and
configuration of Oracle software, some tasks must be performed on both nodes.
Tip: In our test environment, we used an older version of VNC. You can use the graphics
server of your choice (for example, tightvnc).
The VNC configuration files can be found in each user’s home directory in the ~/.vnc folder.
Example 6-40 shows the VNC server that is configured for grid and oracle users.
Example 6-40 VNC stared for grid and oracle users (node1)
# rpm -qa |grep vnc
vnc-3.3.3r2-6.ppc
# ps -aef |grep vnc |egrep “grid|oracle”
grid 9372104 1 0 Sep 29 - 0:00 Xvnc :2 -desktop X -httpd
/opt/freeware/vnc/classes -auth /home/grid/.Xauthority -geometry 1024x768 -depth 8 -rfbwait
120000 -rfbauth /home/grid/.vnc/passwd -rfbport 5902 -nolisten local -fp
/usr/lib/X11/fonts/,/usr/lib/X11/fonts/misc/,/usr/lib/X11/fonts/75dpi/,/usr/lib/X11/fonts/100dpi
/,/usr/lib/X11/fonts/ibm850/,/usr/lib/X11/fonts/Type1/
We log on as grid and extract the Oracle 19c grid_home.zip installation archive in the
directory that is shown in Example 6-41.
$ unzip /mnt1/db/19.3/AIX.PPC64_193000_grid_home.zip
.........
We log on as oracle and extract the Oracle 19c db_home.zip installation archive in the
directory that is shown in Example 6-42.
$ unzip /mnt1/db/19.3/AIX.PPC64_193000_db_home.zip
.........
Note: Before starting the installation process, check the My Oracle Support document
INS-06006 GI RunInstaller Fails If OpenSSH Is Upgraded to 8.x (Doc ID 2555697.1).
You can “pre-patch” the 19.3 binaries before running the installer with an Oracle Release
Update, which contains the fix for this issue. In this case, you do not need INS-06006 GI
RunInstaller Fails If OpenSSH Is Upgraded to 8.x (Doc ID 2555697.1).
The Patch ID that contains the fix for this issue is 32545008, which is part of the April 2021
Release Update (RU):
gridSetup.sh -applyRU /<Staging_Path>/grid/32545008
Where the <Staging_Path> is the staging directory where the RU from Apr 2021 was
extracted (in our test environment this. this directory is /u02/stage).
At the end of the Oracle Grid installation process, we checked the results by using the
following procedure:
1. We logged on to node1 as root and checked the path variable:
# echo $PATH
/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java7_64/jre/bin:/usr/
java7_64/bin:/usr/lpp/mmfs/bin:/u02/app/19.0.0/grid/bin
2. From the VNC GUI as the grid user, we started the installation process:
$ /u02/app/19.0.0/grid/gridSetup.sh
We then followed the OUI menus and instructions. This installation does not use or create
an Oracle ASM configuration because ASM is not required for Oracle RAC with Spectrum
Scale deployments.
3. Example 6-43 shows the Oracle Grid infrastructure that was installed, configured, and
running. Because we used shared file system (Spectrum Scale) for Oracle Clusterware
Repository (OCR) and Vote files, ASM (although installed) is not configured and has no
ONLINE resources.
Note: In our test configuration, we used only one SCAN VIP. Oracle recommends that
three SCAN VIP addresses (configured in DNS) are used for standard deployments.
The Oracle Clusterware Repository and Vote files are shown in Example 6-44.
Example 6-44 OCR and Vote files (located in shared file system)
# ocrcheck -config
Oracle Cluster Registry configuration is :
Device/File Name : /oradata/crs_files2/ocr1
Device/File Name : /oradata/crs_files2/ocr2
Device/File Name : /oradata/crs_files2/ocr3
During the Oracle Database software installation process, we chose to install software. The
instance and database was created later by using the Oracle dbca utility.
4. After the database software was installed, we started the Database Configuration
Assistant from the VNC terminal window:
$ /u02/app/oracle/product/19.0.0/dbhome_1/dbca
5. We selected Create a Database, and then, followed the instructions. We chose to
configure a database named itsodb, with two instances (itsodb1/node1 and
itsodb2/node2). The shared file system location for this database is /oradata/itsodb.
6. For our installation, we selected Sample schemas and Oracle Enterprise Manager (EM)
database express.
7. After the installation completed, we checked the configuration by using the commands that
are shown in Example 6-46.
Chapter 6. Oracle RAC for AIX with IBM Spectrum Scale 101
1 ONLINE ONLINE aop93cld24 STABLE
ora.asm(ora.asmgroup)
1 OFFLINE OFFLINE STABLE
2 OFFLINE OFFLINE STABLE
3 OFFLINE OFFLINE STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
1 OFFLINE OFFLINE STABLE
2 OFFLINE OFFLINE STABLE
3 OFFLINE OFFLINE STABLE
ora.cvu
1 ONLINE ONLINE aop93cl093 STABLE
ora.itsodb.db
1 ONLINE ONLINE aop93cld24 Open,HOME=/u02/app/o
racle/product/19.0.0
/dbhome_1,STABLE
2 ONLINE ONLINE aop93cl093 Open,HOME=/u02/app/o
racle/product/19.0.0
/dbhome_1,STABLE
ora.qosmserver
1 ONLINE ONLINE aop93cl093 STABLE
ora.scan1.vip
1 ONLINE ONLINE aop93cl093 STABLE
--------------------------------------------------------------------------------
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for IBM/AIX RISC System/6000: Version 19.0.0.0.0
- Production
Start Date 08-OCT-2020 09:13:29
Uptime 2 days 6 hr. 56 min. 11 sec
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=aop93cld24)(PORT=5500))(Security=(my_wa
llet_directory=/u02/app/oracle/product/19.0.0/dbhome_1/admin/itsodb/xdb_wallet))(P
resentation=HTTP)(Session=RAW))
Services Summary...
Service "8939c30fdc8c01b2e0530af11a191106" has 1 instance(s).
Instance "itsodb1", status READY, has 2 handler(s) for this service...
Service "b07f99e1d79401e2e05381285d5dd6c5" has 1 instance(s).
Instance "itsodb1", status READY, has 2 handler(s) for this service...
Service "itsodb" has 1 instance(s).
Instance "itsodb1", status READY, has 2 handler(s) for this service...
Service "pdb" has 1 instance(s).
Instance "itsodb1", status READY, has 2 handler(s) for this service...
Chapter 6. Oracle RAC for AIX with IBM Spectrum Scale 103
104 Oracle on IBM Power Systems
7
For daily monitoring in a production environment, we recommend that nmon is started and
includes the following options:
nmon -s60 -c1440 -f -d -V -^ -L -A -M
Note: The 60s capture interval for continuous monitoring might be too short as it generates
too much data for analysis. AIX by default runs topas_nmon collection. You can disable it if
nmon is used instead.
Where:
-s: The snapshot interval (in seconds) that defaults to 2. For a production system, 60 is
sufficient for daily performance monitoring. The reason for reducing the frequency is to
reduce the size of the output file. The effect on performance is not significant.
-c 1440: Gives 24 hours of capture.
-f: Specifies that the output is in spreadsheet format.
-d: Includes the Disk Service Time section in the view.
-V: Includes disk volume group section.
-^: Includes the Fibre Channel (FC) sections.
-L: Includes the large page analysis section.
-A: Includes the Asynchronous I/O section in the view.
-M: Include memory page size-specific data.
The command can be added to crontab to run at midnight each night to produce a daily nmon
monitoring file. These files can be useful as a reference if performance issues occur.
If you are investigating a specific issue that runs for a shorter period, you can decrease the
interval length by using the -s parameter to capture more detailed performance data. In
benchmarks, we often set the capture interval length to 10 seconds and run the nmon
command during the test workload that we want to monitor.
Other options are available that can be useful to activate if you are investigating a specific
issue; for example, if you are investigating a CPU spike, it is useful to add the -T option. This
option includes the top processes and their call parameters in the output.
For more information about the full documentation for the nmon monitoring tool, see this IBM
Documentation web page.
We recommend that you collect nmon data on your AIX LPARs and on all involved Virtual I/O
Servers.
For more information about alternative post-processing tools for nmon data, see this hweb
page.
Note: Although Enterprise Manager Cloud Control allows the same data to be reviewed in
real time, this topic is not included here.
To create AWR reports, you must acquire the Oracle software license for the chargeable
add-on option Diagnostic and Tuning Pack. Then, you activate the Server Manageability Pack
option by using the following command by way of Oracle’s SQL*Plus tool:
ALTER SYSTEM SET
control_management_pack_access='DIAGNOSTIC+TUNING' SCOPE=BOTH;
The AWR report provides operating system data for the server, but the numbers are not
always 100% accurate for the following reasons:
The operating system level CPU data is for the server or partition and not for the individual
database instance.
Oracle does not record DB CPU use correctly for AIX. You never see 90% CPU usage for
a database on AIX when SMT2, SMT4, or SMT8 are used because Oracle does not
include time that is spent waiting on the CPU from threads other than the first thread on
each core.
Note: Oracle includes system CPU statistics that show USR/SYS/WAIT as reported by
AIX. Where Oracle can be misleading is in reporting the CPU time that is used by the
DB. The higher the SMT level, the higher the discrepancy between reported CPU that is
used by Oracle and the actual CPU that is used.
The %DB time can add up to over 100. Despite this issue, knowing the proportion of
activity in relation to other events is still a valuable metric.
I/O statistics must be compared with the same statistics from the AIX hdisk layer and the
storage system. The observed values should closely match. If a significant discrepancy is
observed, bottlenecks exist somewhere in the I/O stack or SAN infrastructure that must be
investigated.
The AWR report is based on the delta between two metric snapshots in time. The longer the
time between those two snapshots, the more difficult it is to determine what occurred. For
example, the reported CPU utilization data is the same for a process that ran at 100% CPU
for 6 minutes or 10% CPU for 1 hour when the AWR report covers a 1-hour period.
If you are manually capturing the AWR snapshots for a specific issue, creating two snapshots
during the peak allows you to capture a clear image of what is occurring during the peak.
From Oracle 12c onwards, the AWR report contains the output from ADDM. This inclusion
can be useful for finding performance issues that were detected by the Oracle Database.
Often, a solution is proposed.
If you do not have the license for the Diagnostic and Tuning pack, STATSPACK is still available
but provides significantly less information and analysis.
7.4.1 CPU
A 100% CPU consumption rate is not necessarily an issue. The true indicator of a system that
is overloaded is the run queue. If the run queue is higher than the number of CPU threads,
the system is CPU bound. This information can be found in the PROC tab of the nmon report.
Running nmon from the command line allows users to see CPU use in real time. It also helps
users to review top processes that use CPU by allowing users to determine whether an
Oracle process or some other workload is using the CPU resources.
The AWR report can help to identify individual SQL requests that can be using excessive
CPU time.
7.4.2 Memory
Oracle asks for a page space (or swap space) that is twice the size of the memory available at
the time of running the installer. On AIX, we do not want to have any paging to paging space
at all. If the paging space is being used, a shortage of physical memory exists (or did exist)
that forced AIX to page memory pages to paging space.
You can see the paging space-related I/O activity in the PAGE tab of the nmon report. Current
usage of paging space can be determined by running lsps -a.
Note: Workloads exist in which not using CIO can be beneficial. The types of memory that
are allocated are shown in the MEMNEW tab in the nmon report.
If insufficient memory was reserved for the operating system and the database connections, a
high number of connections can result in memory swapping to paging space.
You also find this information about the BBBP tab of your nmon report with the output from
vmstat -v and then again with the output of ending vmstat -v.
By comparing the two, you can determine if the number of blocked pbufs increased during
the nmon capture. If so, this issue can be resolved by adding LUNs or by increasing the
pv_pbuf_count by using of the lvmo command.
iostat -D shows if queuing is occurring at the operating system level, as shown in
Figure 7-2.
Figure 7-2 shows the avgserv of the I/O read is 0.2 ms, which is good but the avgtime that
is spent in the queue of 2.2 ms is an issue. This issue can be resolved by increasing the
queue depth on the disk or by adding \ LUNs. Queue avgtime is also in the DISKWAIT tab
of nmon (by using the -d option).
If queuing is occurring at the Fibre Channel (FC) adapter level, you see the output (see
Figure 7-3) by using the fcstat command.
This setting is similar to the queue_depth but for the adapters. The num_cmd_elements is set
by using the chdev -l fcsX -o num_cmd_elems=YYY command, where fcsX is the name of
the Fibre Channel adapter. The value of YYY is the sum of the queue_depths divided by the
number of FC adapters. FC information also is in the nmon BBBF tab (with nmon option -^).
Note: The num_cmd_elems in the VIO Servers typically is set to maximum that is supported
by SAN. It must not be lower than what is configured in client LPARs.
Oracle provides a tool that is called orion, which can be used to test I/O bandwidth and
latency. This tool is used by the I/O calibrate function in the Oracle Database.
The difference between the two tools is that orion does not require a database to be created
to work and I/O calibrate updates the statistics in the database for use by the Oracle
optimizer.
Note: Both of these tools return a value with which you can compare the I/O capacity of
different environments. However, they must be used with caution, especially in an active
production environment because they attempt to saturate the I/O resources during their
test.
This chapter highlights key components that are involved in a database I/O. Then, it
discusses a proven approach to use available server and storage resources in support of
database I/O.
Figure 8-1 shows in light blue where data cache and buffers are implemented. For Oracle
Databases, the most critical cache is the buffer cache in the Oracle SGA and is available
regardless of what storage technology that is used on the lower layers.
Although the caches in the disk device driver and adapter device driver layers are
comparatively small, they must be sized correctly to support the maximum concurrent number
of active I/O the database is driving against the storage subsystem at any time.
The AIX JFS2 file system is the simplest and most optimally performing option to deploy a
single database instance on a cooked file system. The depicted cache for the JFS2 file
system is in most cases not used because it is for most Oracle workload types that are
recommended to use the JFS2 Concurrent I/O (CIO) feature of AIX to minimize constraints on
concurrent write access to data files. CIO and correct file system layout can provide I/O
performance, such as raw devices or Oracle Automatic Storage Management (ASM), while
still providing the convenience of a cooked file system.
For Oracle Real Application Cluster deployments, ASM or IBM Spectrum Scale (see
Chapter 6, “Oracle RAC for AIX with IBM Spectrum Scale” on page 69) typically are chosen
to provide the shared concurrent access to data from multiple servers.
Note: Starting with Oracle 12c, raw devices (disks or raw Logical Volumes [LV]) are
supported as devices for ASM only. For more information, see Oracle Doc ID 578455.1.
Figure 8-2 shows a typical deployment that is based on Fibre Channel (FC) attached external
storage where the Fibre Channel Host Bus Adapters, physical or virtual (NPIV) are physically
allocated to the AIX LPAR. This configuration typically is used only for workloads with high I/O
requirements and sharing of FC adapters with other Logical Partitions (LPAR) in the physical
server is not feasible.
Workload consolidation that results in improved resource usage and reduced TCO typically
drives the sharing of FC adapters between multiple LPARs. Figure 8-3 shows a typical
deployment with dual Virtual I/O Servers (VIOS) where the physical FC adapters are
presented as N_Port ID virtualization (NPIV) adapters in the AIX LPAR.
The use of virtual SCSI (vscsi) devices to map storage space from VIOS to client LPAR for
use by an Oracle Database is discouraged because it introduces increased I/O latency and
CPU usage in the VIOS as compared to the NPIV technology.
To simplify the example that is shown in Figure 8-3, only two FC adapters or ports are shown
per VIOS. Those two ports connect to different SAN switches for highest reliability. The
number of connections and active paths into the storage subsystem are configured to support
the aggregated peak throughput requirements that are driven through the server FC adapters
to that storage.
Older adapters typically support a value up to 2048, but more current 16 Gbit adapters
support a value of 3200, or even 4096. Before setting or changing this value, verify with the
storage vendor whether the SAN can support the number of FC adapter ports multiplied by
num_cmd_elems concurrent pending I/O.
The max_xfer_size setting for a physical FC adapter has a dual meaning. On one side, it
specifies the maximum size of an I/O that is driven against the SAN. On the other side, it
influences how much DMA physical memory is allocated to the adapter. This parameter value
typically is never reduced and increased only if guided so by IBM Support. Effective maximum
I/O sizes typically are configured on the AIX <hdisk> layer and must be lower or equal to the
value of max_xfer_size.
The NPIV adapter (virtual HBA) in the client LPAR also features a parameter num_cmd_elems,
which constrains the number of concurrent I/O that is driven by the LPAR by way of the
respective NPIV adapter.
Storage capacity is made available to the client LPAR in the form of one or more storage
volumes with assigned Logical Unit Numbers (LUNs), which can be accessed by way of a
defined set of NPIV HBAs in the client LPAR. The AIX multi-path device driver, or storage
vendor-specific alternative, automates the distribution of physical I/O to the available volumes
over all available paths.
The size restriction of the hdisk queue_depth drives the need to define and map more than
one volume for an Oracle Database to use for data or redo. A good starting point for storing
Oracle data, index, and temp files is eight volumes. For redo data, a different set of volumes
at minimum four, but often eight volumes work well.
For Oracle use, those volumes are less than 2 TB and typically less than 1 TB. If more than 8
TB of space are required, more volumes are mapped (again, in multiples of eight). Eight was
chosen based on typical characteristics of today’s SAN-attached storage solutions where the
number of controllers in the SAN-attached storage are multiples of two for redundancy and
I/O to a specific volume is typically routed by way of an “owning” controller. Only if that
controller becomes unavailable is I/O rerouted to an alternative path.
Even if all LUNs are spread over all physical storage space in the SAN-attached storage, you
still want to configure several volumes to not be I/O constrained by the queue depth of a
single or small number of hdisks.
Recent AIX releases and FC adapters added the support for multiple I/O queues to enable
significantly higher I/O rates. For more information, see this IBM Developer web page.
The initial approach was based on the idea to isolate database files based on function and
usage; for example, define different pools of storage for data files and index files. this
approach included the following key observations:
Manually intensive planning and manual file-by-file data placement is time-consuming,
resource intensive, and iterative.
Best performance is achievable, but only with continues maintenance.
It can lead to I/O hot spots over time that affect throughput capacity and performance.
The current approach, which also is the idea under ASM, is based on stripe and mirror
everything (SAME). The “stripe” pertains to performance and load balancing and is relevant
for this discussion. The “mirror” provides redundancy and availability. It is in a SAN-attached
based environment that is implemented in the storage subsystem and not in AIX (it is not
discussed further here). Oracle Databases with ASM on SAN-attached storage typically use
EXTERNAL as the redundancy setting.
In ASM devices, raw hdisks often are grouped by way of an ASM disk group. ASM
automatically stripes all data within a disk group over all underlying devices. Typically, disk
groups are used for:
DATA: Data/index/temp
FRA: Redo
OCR: OCR/vote
In the context of a production database deployment in AIX, the DATA disk group has eight (or
a multiple of eight) devices. The FRA disk group has at minimum four (more typical is eight
devices).
For more information about Oracle ASM, see this web page.
The remainder of this section discusses how data striping can be efficiently implemented with
the AIX logical volume manager for a single instance database that persists its data in AIX
JFS2 file systems.
As with ASM, AIX groups hdisks into volume groups. A minimum of three AIX volume groups
is recommended for the deployment of a production Oracle Database:
ORAVG: Oracle binaries: Can be a single volume or two volumes.
DATAVG: Oracle data/index/temp: Contains eight volumes or multiples of eight.
REDOVG: Oracle redo; flash recovery: A minimum four volumes, (eight is better).
To stripe data over all hdisk devices in a volume group, AIX supports the following
approaches, which are discussed in this section:
Striping that is based on logical volume striping
Striping by way of PP striping or PP spreading; for example, see the orabin LV in
Figure 8-4.
Note: The specified block size (agblksize) must be adjusted to 512 for the file system that
contains redo log files.
If striped logical volumes are used, consider and plan for the following points:
All allocated volumes and hdisks in a VG must be of the same size.
The LV is stripped over some number of hdisks (typically all hdisks in a VG) from one VG.
The LV space allocation can grow only in multiples of N times the PP size. For example,
N=8, PP size=1 GB ' 8G, 16G, 24G, …
A file system (FS) on top of a stripped LV always grows with the same increments (N * PP)
so that no space is wasted. You must create the stripped LV first and then, create the JFS2
file system on top of the striped LV.
If any of the N hdisks runs out of available PP, attempts to grow the LV fail. The following
options are available to resolve this issue:
– Grow the underlying volumes dynamically by way of SAN methods and discover the
new size dynamically by using the chvg -g <VG name> command. Minimum size
increase for each volume is PP size and increases are in full multiples of PP size.
This option is preferred because it requires no changes in the SAN configuration and
host mapping. It must be verified with the storage vendor if any dynamic resize
limitations exist.
– Add hdisks (volumes) to the VG and expand the LV to those new volumes. This
technique is called a stripe column. The system administrator manually adds the
hdisks to the VG and then expands the LV.
This option requires significant SAN and storage subsystem changes and more work
by the AIX administrator. Adding volumes also affects FlashCopy configurations, for
example.
Note: The specified block size (agblksize) must be adjusted to 512 for the file system
containing redo log files.
If PP Spreading is used to stripe data over all disks in a VG, consider and plan for the
following points:
All allocated volumes and hdisks in a VG are of the same size.
PP size must be planned carefully as management of VG with more than 60k PP becomes
slower.
The LV is PP spread over some number of hdisks (typically, all hdisks in VG and a multiple
of 8) from one VG.
The LV space allocation can grow in multiples of one PP size, but grows in multiples of M
PP sizes for balanced I/O distribution.
A file system (FS) on top of a PP spread LV always grows with the same increments (1+ or
M * PP) so no space is wasted. You must create the LV first and then, create the JFS2 file
system on top of the PP-spread LV.
If any of the M hdisks in the VG run out of available PP, AIX skips that hdisk in the
round-robin allocation from eligible hdisks in the VG. If no eligible hdisk has an available
PP, further attempts to grow the LV fail.
The following options are available to resolve the <out-of-space> condition:
– (Preferred option) Grow the underlying volumes dynamically by way of SAN methods
and discover the new size dynamically by way of chvg -g <VG name>. Minimum size
increase per volume is PP size.
– Add another K hdisks (volumes) to the VG and then run reorgvg <VG name> to
redistribute the allocated space evenly over all hdisks in the VG (K >= 1).
This option requires significant SAN changes and more work by the AIX administrator.
Adding volumes also affects IBM FlashCopy® configurations, for example.
For more information about the latest supported network technologies for Oracle RAC, see
this Oracle web page.
In this chapter, after the options overview, a sample configuration is presented that shows
how Shared Ethernet Adapter functions can be used to deploy two independent RAC clusters
onto the same two physical Power Systems servers.
For environments with stringent requirements on latency and or bandwidth or packet rates,
dedicated physical network adapters can be the best choice. The use of dedicated physical
communication adapters is more expensive than the use of virtualized communication
adapters that provide sharing across LPARs.
In addition to cost, another limitation is that a physical server has only a relatively small
number of adapter slots. This limitation limits how many LPARs with physical adapters can be
configured on a specific server.
The use of dedicated network adapters can be the choice for the Oracle Real Application
Cluster Interconnect in a large environment.
SEA requires at minimum one (typically two) VIO Servers. This configuration allows the
sharing of the physical Ethernet adapter between multiple client LPARs. With dual VIO
Servers configuration and Shared Ethernet Adapter Failover, redundant access to the
external network is provided.
In addition to sharing of physical network adapters between LPARs, SEA technology enables
Live Partition Mobility (LPM). LPM enables the migration of an active LPAR between two
separate physical Power servers without having to stop the applications or the operating
system. LPM is supported for LPARs that are running single instance Oracle Databases or
Oracle RAC.
For more information about supported configurations, see this Oracle web page.
LPM is fully supported for client access network and the RAC interconnect. Figure 9-1 also
shows only a single network. For Oracle RAC, a minimum of two independent networks is
required.
Client access and backup and management networks also must be implemented. The client
access and backup and management networks can be implemented with any of the available
network technologies, with the most likely choice being SEA. Typically, it is recommended to
separate RAC interconnect traffic onto separate physical adapters and network ports.
For an aggregation of clusters with high packet rates on the RAC interconnect, it is better to
use dedicated physical adapters for the highest users as high packet rates drive significant
CPU usage in the respective VIO Servers.
Note: Single Root IO Virtualization (SR-IOV) is a new technology that is available with
latest AIX and IBM Power Systems servers. However, at the time of this writing, this is not
yet supported by Oracle.
When Oracle support for SR-IOV based technology becomes available, it is highly
recommended to use this new technology to share network adapters more efficiently
between multiple client LPARs.
The configuration that shown in Figure 9-2 is the minimum recommended configuration for an
Oracle RAC cluster, which also provides protection against networking single points of failure
by using SEA failover.
Oracle High Availability IP (HAIP) can then aggregate those virtual network ports for the RAC
interconnect providing availability and high bandwidth by distributing network traffic over all
specified interfaces.
The code is provided on an as-is basis and is available at this web page.
The publications that are listed in this section are considered particularly suitable for a more
detailed discussion of the topics that are covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide more information about the topic in this
document. Note that some publications that are referenced in this list might be available in
softcopy only:
IBM Power Systems Private Cloud with Shared Utility Capacity: Featuring Power
Enterprise Pools 2.0, SG24-8478
IBM PowerVC Version 2.0 Introduction and Configuration, SG24-8477
Red Hat OpenShift V4.X and IBM Cloud Pak on IBM Power Systems Volume 2,
SG24-8486
You can search for, view, download, or order these documents and other Redbooks,
Redpapers, Web Docs, draft, and additional materials at the following website:
ibm.com/redbooks
Online resources
The following websites also are relevant as further information sources:
Licensing Oracle Software in the Cloud Computing Environment:
https://www.oracle.com/assets/cloud-licensing-070579.pdf
On-premises: IBM Private Cloud with Dynamic Capacity:
https://www-01.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_ca/1/897/E
NUS120-041/index.html&lang=en&request_locale=en
Off-premises: IBM Power Systems Virtual Server:
https://www.ibm.com/cloud/power-virtual-server
SG24-8485-00
ISBN 0738460125
Printed in U.S.A.
®
ibm.com/redbooks