ACI Guide
ACI Guide
ACI Guide
A Guide to Using
ACI Worldwide’s
BASE24-es on z/OS
Set up and use BASE24-es on z/OS
ibm.com/redbooks
International Technical Support Organization
August 2006
SG24-7268-00
Note: Before using this information and the product it supports, read the information in
“Notices” on page vii.
This edition applies to Release 1, Version 06.2 of ACI Worldwide’s BASE24-es product on z/OS.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
The team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Contents v
9.2.1 CMAS failure after workload has been activated . . . . . . . . . . . . . . 160
9.2.2 TCP/IP client - TOR failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.2.3 TCP/IP server - TOR failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
9.2.4 AOR failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.2.5 TCP/IP server - TOR maintenance . . . . . . . . . . . . . . . . . . . . . . . . . 167
9.2.6 AOR maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9.2.7 Dynamically adding a TOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
9.2.8 Dynamically adding an AOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.3 Data environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.3.1 SMSVSAM failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.3.2 Lock structure full - IGWLOCK00 . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.3.3 Read-only data table at initial time . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.3.4 Read-only flat file at initial time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
9.4 BASE24-es application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.4.1 Journal file full . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.4.2 IP client failure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.4.3 IP Server failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
9.4.4 Integrated Server failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
9.4.5 Planned application upgrade. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
9.4.6 Remote program link to back-end authorization failure. . . . . . . . . . 202
9.4.7 Usage file full. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
9.5 Hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
9.5.1 Central Processor (CP) failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
9.5.2 CPC failure/LPAR failure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
9.5.3 Coupling Facility (CF) failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
9.6 z/OS and relevant subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.6.1 z/OS maintenance - best practices . . . . . . . . . . . . . . . . . . . . . . . . . 215
9.6.2 CF maintenance - best practices . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
This information was developed for products and services offered in the U.S.A.
IBM® may not offer the products, services, or features discussed in this document in other countries.
Consult your local IBM representative for information on the products and services currently available in your
area. Any reference to an IBM product, program, or service is not intended to state or imply that only that
IBM product, program, or service may be used. Any functionally equivalent product, program, or service that
does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document.
The furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions
are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES
THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer
of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may
make improvements and/or changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm
the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on
the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrates programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the
sample programs are written. These examples have not been thoroughly tested under all conditions. IBM,
therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy,
modify, and distribute these sample programs in any form without payment to IBM for the purposes of
developing, using, marketing, or distributing application programs conforming to IBM's application
programming interfaces.
ACI, ACI Worldwide, and BASE24-es are trademarks or registered trademarks of ACI Worldwide Inc. or its
affiliates in the United States, other countries or both.
Java™, JSP™, and all Java-based trademarks are trademarks of Sun™ Microsystems™, Inc. in the United
States, other countries, or both.
Windows®, and the Windows logo are trademarks of Microsoft® Corporation in the United States, other
countries, or both.
Intel®, Intel logo, Intel Inside®, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron®, Intel Xeon®,
Intel SpeedStep®, Itanium®, and Pentium® are trademarks or registered trademarks of Intel Corporation or
its subsidiaries in the United States and other countries.
UNIX® is a registered trademark of The Open Group in the United States and other countries.
Linux™ is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
In this IBM® Redbook we explain how to use the ACI BASE24-es product on
z/OS®. BASE24-es is a payment engine utilized by the financial payments
industry. The combination of BASE24-es and System z™ is a competitive and
attractive end-to-end retail payments solution for the finance sector.
Alexei Alaev is a Senior Engineer with ACI Worldwide, located in the USA.
Alexei has 28 years of experience in the IT industry. He holds degrees in
Computer Science from Moscow State University in Russia. His areas of
expertise include IBM mainframe application and system programming, including
the sysplex environment. For the past 8 years, Alexei has been working on the
development and implementation of the multi-platform BASE24-es application.
Dan Archer is a Senior Product Manager with ACI Worldwide Inc. located in
Omaha, NE. He has 11 years of online transaction processing experience with
ACI, and has provided technical support of BASE24-es since its inception.
Gordon Fleming is a Master Engineer with ACI Worldwide Inc., located in the
USA. He has 19 years of experience in online transaction processing, primarily in
BASE24 and BASE24-es development. He holds degrees from the University of
Chicago and the University of Nebraska at Omaha. His areas of expertise
include multi-platform infrastructure design and implementation.
Ron Schmidt is a Systems Engineer with ACI Worldwide, Inc., located in the
US. He has 15 years of experience in online transaction processing, primarily in
TRANS24-eft development/deployment and BASE24-es performance. He holds
a degree in Computer Science and Business Administration from Kearney State
College. His areas of expertise include development/deployment of online
transaction processing systems and the performance analysis of zSeries
applications.
Stephen Anania
IBM Poughkeepsie
Ron Beauchamp, Rick Doty, Ajit Godbole, Jeff Hansen, Karen Jarnecic, Kurt
Lawrence, Charlie Linberg, Catherine McCarthy, Calvin Robertson
ACI Worldwide
Preface xi
Become a published author
Join us for a two- to six-week residency program! Help write an IBM Redbook
dealing with specific products or solutions, while getting hands-on experience
with leading-edge technologies. You'll team with IBM technical professionals,
Business Partners and/or customers.
Your efforts will help increase product acceptance and customer satisfaction. As
a bonus, you'll develop a network of contacts in IBM development labs, and
increase your productivity and marketability.
Find out more about the residency program, browse the residency index, and
apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
1
Knittel, C., and V. Stango. 2004. "Compatibility and Pricing with Indirect Network Effects: Evidence
from ATMs." NBER working paper 10774
“Installation of ATMs and the proliferation of Retail POS has been particularly
rapid in recent years. ATM growth was 9.3 percent per year from 1983 to 1995
but accelerated to an annual pace of 15.5 percent from 1996 to 2002. Much of
the acceleration is due to placing ATMs in locations other than bank offices.
These off-premise ATMs accounted for only 26 percent of total U.S. ATMs in
1994, but now account for 60 percent. On the debit card side of the industry,
growth has been extremely rapid in point-of-sale (POS) debit card transactions.
With an annual growth rate of 32 percent from 1995 to 2002, POS debit is the
fastest growing type of payment in the United States. Today it accounts for nearly
12 percent of all retail non-cash payments, a fivefold increase in just five years.
Growth has been sharp in both online (PIN-based) and offline (signature-based)
debit. From 1995 to 2002, annual growth of online debit was 29 percent, while
offline debit grew at 36 percent.”2
Moreover, the brand name of the ATM has to be considered. Maestro™, Star,
and others value their name brand and prefer to avoid being the target of poor
performance complaints by banking customers.
2
Knittel, C., and V. Stango. 2004. "Compatibility and Pricing with Indirect Network Effects: Evidence
from ATMs." NBER working paper 10774.
There are several methods available for meeting the response time, availability,
and data integrity requirements imposed on financial processing systems. These
techniques go beyond simple fault-tolerance; the system must be continuously
available. Fault tolerance is not a requirement for high availability; it is simply one
of the tools that contributes to the high availability capability.
To cite a useful example of the levels of uptime that are typically maintained, a
major bank ATM processing system in The Netherlands has had zero downtime
over the past four years, including maintenance and upgrades. This type of
reliability is not unusual for the banking industry; even systems in developing
countries maintain availability targets higher than 96%.
National Bridge The card holder uses an ATM or POS at a bank not their
Transactions own, and the two banks belong to different regional
networks that do not have any agreement. Both banks
must belong to the same national network. The
transaction is handed from the ATM or POS regional
network to the national network, and finally to the
authorizing bank’s regional network. In this case, there
are three switches involved.
Transactions are routed through the originating ATM or POS, either the bank’s
own network or through some combination of regional or national network.
Figure 1-1 demonstrates, at high level, the possible paths that a transaction
might follow to completion.
"On Us"
Bank-Owned Bank-Owned
Network Regional Network
"On Us" Network "On Us"
National Bridge or Reciprocal
Databases Databases
or Record or Record
National
Network
Financial
Institution Bank/Financial Institution Auth Systems
Network
"On Us" Regional
Network
Databases
or Record
Both the regional and national networks are switching systems that, to a fair
extent, resemble the systems used within the financial institution. The switching
systems drive transactions from initiation to destination. We use the word
“switched” to describe the hand-off of a transaction from host-to network-to host.
An ATM or POS transaction is accepted by the host of the card-issuing
institution, and either processed locally “On Us”, or switched to one of the
regional networks “Network On Us”.
The regional networks either switch the transaction to another institution host
for processing “Network On Us”, or switch the transaction to a national
network for hand-off to another regional network “National Bridge, or
Reciprocal”.
Responses are switched from the owning host of an ATM or POS “On Us”, to
the ATM or POS, or switched to a regional network “Network On Us”.
Although there are other variations on this theme, transactions are switched from
host to host via network switches until the original transaction request is
completed by a transaction response, or it times out.
Note that BASE24-es does not have different paths for Authentication and
Authorization. Both functions are performed as part of a single task.
1.3.1 Availability
System z provides 24-hour a day, 7-day a week availability, including scheduled
maintenance. Continuous availability goes beyond just hardware fault tolerance;
it is achieved by a combination of hardware, application code, and good system
management practices.
The System z servers are the result of a long evolution beginning in 1964. The
sysplex configuration is a high availability solution that not only provides a
platform for continuous availability, but also for system maintenance and capacity
additions. The System z hardware platform is capable of providing up to five 9s of
availability.
On a server basis, System z systems are equipped with features that provide for
very high availability exclusive of clustering:
Redundant I/O interconnect
The Coupling Facilities, at the heart of the Parallel Sysplex cluster, enable high
speed, read/write data sharing and resource sharing among all the z/OS images
in a cluster. All images are also connected to a Sysplex Timer® to ensure all
events are properly sequenced in time.
Sysplex
Timer
System z System z
ESCON/FICON
Shared
data
When configured properly, a Parallel Sysplex cluster has no single point of failure
and can provide customers with near continuous application availability over
planned and unplanned outages. Events that otherwise would seriously impact
application availability (such as failures in hardware elements or critical operating
system components) have no, or reduced, impact in a Parallel Sysplex
environment.
1.3.3 Manageability
A wide array of tools, including the IBM Tivoli® product and other operational
facilities, contribute to continuous availability. IBM Autonomic Computing
facilities and tools provide for completely fault tolerant, manageable systems that
can be upgraded and maintained “on the fly” without downtime.
1.3.4 Security
On March 14, 2003, IBM eServer™ zSeries 900 was the first server to be
awarded EAL5 security certification. The System z architecture is designed to
prevent the flow of information among logical partitions on a system, thus helping
to ensure that confidential or sensitive data remains within the boundaries of a
single partition.
On February 15, 2005, IBM and Novell announced that SUSE Linux Enterprise
Server 9 successfully completed a Common Criteria (CC) evaluation to achieve a
new level of security certification (CAPP/EAL4+). IBM and Novell also achieved
US DoD Common Operating Environment (COE) compliance, a Defense
Information Systems Agency requirement for military computing products.
On March 2, 2006, z/OS V1.7 with the RACF® optional feature achieved EAL4+
for Controlled Access Protection Profile (CAPP) and Labeled Security Protection
Profile (LSPP). This prestigious certification assures customers that z/OS V1.7
has gone through an extensive and rigorous testing process and conforms to
standards sanctioned by the International Standards Organization.
1.3.5 Expandability
The Parallel Sysplex environment can scale near linearly from 2 to 32 systems.
This can be a mix of any servers that support the Parallel Sysplex environment.
The aggregated capacity of this configuration meets every processing
requirement known today.
In addition, the new Enhanced Book Availability function also enables a memory
upgrade to an installed z9-109 book in a multibook server. This can provide
customers with the capacity for much needed dynamic growth in an
unpredictable ATM/EFT world.
In this chapter we describe the BASE24-es product and its functions and
features.
As the payment industry evolves and new instruments emerge (for example,
customer ID, mobile telephone numbers), the flexible nature of its architecture
enables BASE24-es to easily adapt to provide continued value.
ACI provides a set of sample scripts that cover the basic positive, negative or
usage-based authorization processes. This flexibility shortens the time needed to
develop new products and services or accommodate changes requested by the
business department of an organization. Separating the business logic (in
scripts) from the product source code also facilitates script compatibility with
future releases.
Through the use of XML- and ISO-based interface standards and other
industry-specific formats, transaction services can be exposed to any channel.
Thus, BASE24-es can provide a single point of access across an enterprise for
the service of consumer payments, eliminating the costs of maintaining multiple
service points.
Written in Java and C++ and using XML message formats, the ACI user interface
provides a flexible operating environment that is used by multiple ACI
applications. A security administrator configures access permissions through the
ACI user interface where a user security and user audit environment are shared
by all ACI applications.
2.2 Architecture
ACI uses an object-oriented design and development style to implement its
Enterprise Services architecture of BASE24-es. This architecture helps reduce
impacts associated with extending the core product code. The use of
object-oriented programming languages, such as C++ and Java, enhance the
extensibility of BASE24-es solutions and minimize time-to-market for new
products and services. By extending integration flexibility, BASE24-es allows
access to more customer information.
Scripts are maintained and compiled through the user interface. Users can
display a script repository that shows all scripts available for use by BASE24-es.
A script editor allows the user to add, edit, delete and compile scripts through the
user interface. During the compile process, scripts are checked for syntax errors
and saved on the BASE24-es system. Rather than compiling into machine code,
these scripts are ordered into a list of serial instructions that the script engine
may interpret in real time during transaction processing.
Compiled scripts are loaded into memory where they can be retrieved for
execution during transaction processing. If a change must be made to script
logic, then the script can be updated, recompiled and placed back into use
without ever taking the affected programs out of service.
A single script can contain all of the tasks that BASE24-es must perform to
authorize a transaction. The tasks can also be split into multiple scripts that are
organized in a hierarchical structure.
With the system’s built-in user security feature, users are assigned roles that
grant them permission to specific functions and tasks associated with various
windows. Users are authenticated during the logon process, thereby minimizing
the risk associated with unauthorized users gaining access to functions they are
not permitted to perform.
The user audit function is responsible for maintaining a secure audit database
where all file maintenance transactions and modifications to the user security
database are recorded. Before and after images of the affected record will be
logged wherever appropriate.
The user interface design incorporates the flexibility for users to alter the layout
and wording on the desktop to meet individual organization needs. All text and
positional information is maintained in configuration files, so adapting the user
interface without altering the product code is particularly easy. This structure also
incorporates multi-language capability.
3.1.1 Background
For more than 40 years, IBM mainframes have successfully run the core IT
systems of many mid-size, large, and very large enterprises. Over this period,
IBM has consistently invested in the evolution of the mainframe’s unparalleled
technology. Mainframes have incorporated new technologies and computing
models, ranging from the centralized model to computing to the Web model, and
from assembler and COBOL languages to Java.
IBM mainframes have become the computing industry benchmark. They are
used by thousands of enterprises worldwide, with trillions of dollars invested in
applications and skills. The platform hosts a large portion of the total business
transactions and data in the world, providing 24x7 availability and other unique
qualities.
IBM mainframes run five different operating systems, consisting of z/OS, z/VM®,
z/VSE™, z/TPF and Linux on System Z. In the following section we focus on the
z/OS operating system, the flagship mainframe operating system.
Centralized
The centralized computing model was less used in the late 1980s and early
1990s due to a perceived lack of flexibility and openness, and high cost of
acquisition. By the end of the 1990s, however, this computing model regained
favor following improvements in mainframe hardware and software. Also,
businesses had by then come to understand the serious manageability
complications and security exposures of the client/server model, and realized
that centralized computing offered some clear advantages.
Notably, z/OS also allows all members in a cluster to share all data, even up to
the record level. In contrast, other cluster implementations allow you to partition
data among the elements of the cluster, and each system can access just the
data attached to it.
Even when the Parallel Sysplex is physically spread over several different
machines, the communication between them flows over high speed fiber optic
connections managed by the Cross Coupling Facility (XCF), a specific protocol
for those connections with a magnitude of GigaBytes of transfer rate. Within a
physical machine, communication between z/OS images is accomplished
through memory-to-memory, and there is no network protocol faster than that.
Security
z/OS provides deep security integration, from the application level down to the
operating system level and the hardware. In this section we highlight a number of
major functions in z/OS.
Accountability
Accountability in z/OS is achieved by a combination of user authentication and
the ability to propagate a user’s credentials throughout the application.
Cryptography
The System z hardware has two distinctly different cryptographic hardware
engines that are supported under z/OS; the CMOS (Complementary Metal Oxide
Network security
Networking and communications security on z/OS is provided by the
Communications Server element of z/OS. The Communications Server provides
networking and communication services for accessing z/OS applications over
both SNA and TCP/IP networks.
Furthermore, the Firewall Technologies include packet filtering to limit the kind of
network requests that can reach a machine, proxy and SOCKS servers to control
TCP/IP connectivity, Domain Name Services (DNS), encryption of IP traffic (IP
tunnels, or Virtual Private Networks (VPN)) to allow private communication over
the public network. Other security features provided by z/OS and the
Communications Server include LDAP, PKI, Kerberos, and openSSH.
Manageability
The z/OS operating system has been designed to manage multiple workloads in
a single system image. This design has, from the beginning, put the
requirements on the operating system to provide the means to manage these
complex environments. As a consequence, over the years z/OS has become
equipped with an extensive set of management tools for managing and
controlling thousands of complex simultaneously running applications,
comprising batch and online components, databases, transaction managers and
so on.
The management tools included in z/OS have become a very mature set of
system management and automation software. Clients over the years have
based their sophisticated procedures and automation scenarios on this software.
The virtualization capabilities of IBM mainframes are diverse (see Figure 3-2 on
page 31).
Linux
CICS IMS
IFL
WebSphere z/OS
zAAP DB2 zVM
zVSE
Physical Resources
Memory
Memory
Memory
I/O Memory CPUs Networks
Memory
All of these capabilities make the mainframe an ideal platform for running mixed
workloads of hundreds of different business-critical applications, thereby
achieving very high resource utilization. The mainframe manages these diverse
workloads while maintaining a consistent performance.
Reliability
System z and z/OS provide an advanced combination of availability and
resilience. The combination of redundancy and virtualization delivers
Scalability
System z hardware can be scaled up vertically to a maximum of 54 CPUs at this
time. The z/OS Parallel Sysplex configuration provides additional horizontal
scalability. A sysplex can be a cluster of up to 64 z/OS images in different
partitions on different System z machines (possibly geographically dispersed),
with full data sharing and high availability and recovery.
Availability
System z has many components to address availability, as well as z/OS.
Parallel Sysplex, the clustering solution for the z/OS environment, provides both
scalability and availability. With Parallel Sysplex, a failure of one image in the
cluster does not affect any other image. And any specific transaction running on
the failed image can be fully dispatched and recovered in any other image, thus
making use of the full data sharing architecture.
For higher availability and disaster recovery purposes, a Parallel Sysplex can be
configured in a Geographically Dispersed Parallel Sysplex™ (GDPS®) mode.
There are two GDPS modes:
GDPS/PPRC
GDPS/PPRC is a configuration in which a Parallel Sysplex is distributed
across two sites, connected together up to 100 km, with data synchronized
and shared continuously.
One site (part of the sysplex) acts as a primary, and the second site acts as a
secondary, in stand-by mode. GDPS controls and automates a full swap to
the backup site in case of failures.
GDPS/XRC
GDPS/XRC is a configuration in which the distance between sites can be
more than 100 km, theoretically without limitation. In GDPS/XRC, the sysplex
Both modes use HiperSwap, which allows you to activate replicated data in the
disaster recovery site without application outage.
12
11 1
10 2
9 3
8 4
12 1
7
6
5 11 1 12 11
10 2 2 10
9 3 3 9
8 4 4 8
7 6 5 5 7
6
For some organizations, the network traffic that traverses IBM SNA
communication controllers has declined to the point where it is in the business
interest to find functional alternatives for the remaining uses of these controllers
by consolidating and possibly eliminating the controllers from the networking
environments.
Estimates are that the current use of TCP/IP in the EFT, ATM, and POS
environments is more than 70% and rapidly increasing, but there are still a
significant percentage of ATMs on SNA protocol and some (less than 5%) on
X.25 in the POS segment.
Support for SNA on z/OS is provided by the Communication Controller for Linux
(CCL). The 3745/3746 SNA communications controller has been withdrawn from
the market.
With CCL you can continue using the traditional SNA subarea (INN), NCP
boundary (BNN), SNA Network Interconnect (SNI), and X.25 connectivity that
are currently supported by the IBM 3745.
Transaction processing
Traditionally, the mainframe has offered extremely robust transaction processing,
providing thousands of concurrent online users with the ability to retrieve
information and make updates. The IBM transaction management solutions on
the mainframe have since long been CICS and IMS on the z/OS platform, and
TPF as a specialized operating system on the mainframe for airline and financial
transactions.
WebSphere Application Server for z/OS was recently added to the family of
transaction managers on z/OS. Businesses have invested heavily in the
development of transactional applications, and despite efforts to mimic
transaction processing on non-mainframe platforms, these solutions have never
provided the same scalability and reliability as mainframe transaction processing.
As discussed, the z/OS operating system provides transactional services that are
not limited to applications running in one of the transaction managers. Resource
Batch processing
Another outstanding capability of the mainframe architecture is the running of
batch workloads. In the z/OS environment, dedicated subsystems (JES2 or
JES3), in combination with special job scheduling software such as Tivoli
Workload Scheduler (TWS), manage batch processing. No other architecture
can support batch processing as well as z/OS.
The value of batch processing is often underrated. With the requirement for
integrating new applications with existing back-end systems, batch processing
becomes ever more important. The IBM mainframe is well-positioned to handle
batch processing. Offering the ability to run batch processes written in the Java
language making use of zAAP processors, and the ability to access back-end
data either on z/OS or any other server, z/OS is a very cost-effective platform for
running business-critical batch applications.
Openness
Virtualization has provided “openness” to the mainframe, enabling it to run not
only the traditional operating systems of z/OS, z/VM, z/TPF and z/VSE, but also
to support Linux. Once called “proprietary” technology, the mainframe has
adopted open standards technology as well and is as open as any other
UNIX-based platform or the Windows platform. There are no limitations on the
mainframe in using standards such TCP/IP, HTTP, SOAP and so on.
Today’s mainframe not only supports open standards, but enables and integrates
traditional technologies with the new ones. Core business applications, usually
running in IMS or CICS, can be integrated with new-style applications using XML
and SOAP, thereby bridging the technology landscape and maximizing
investments in skill and resources.
Cost-effectiveness
Cost is one of the most sensitive and complex aspects of any solution analysis. It
is sensitive because everyone responsible has a different opinion of what
constitutes a reasonable cost. It is complex, because determining the total cost
of a solution requires a critical analysis of many factors.
Companies often do not distinguish between TCO and TCA, because lack of
knowledge or lack of data makes full analysis impossible. Various consulting
companies specialized in performing this kind of analysis have published very
positive results for the System z platform, considering five years analysis.
Economies of scale
Due to its centralized computing model and almost endless scalability, the
System z platform provides economies of scale, meaning that you can add new
physical resources (CPU, I/O devices, servers) without the need to proportionally
expand the existing infrastructure and the number of staff managing it.
People are often the most expensive resource in a business. In the distributed
world, for every additional set of servers or pool of storage, the amount of human
resource needed will increase. On the System z platform, however, the number
of people supporting it is practically flat and only increases when very large
chunks of additional infrastructure are added.
An “out-of-the-box” solution
z/OS is not just an operating system. It is a package that encompasses the
operating system itself and more than 30 built-in solutions. The full package
comes with everything that is required to build a robust IT software infrastructure.
Data proximity
As previously mentioned, data proximity delivers additional benefits. When
integration logic is deployed on z/OS close to the resources that are being
integrated, composition and integration with multiple z/OS resource managers
will give optimal performance because data access can be realized
cross-memory, requiring no network traffic and duration of held locks can be
minimized. This also significantly increases availability because the configuration
will comprise less points of failure. Also, recovery will be much faster in rollback
situations. The ability to run the transactions using native z/OS services rather
than a distributed 2-phase commit protocol offers far better performance.
Security SAF/RACF
Cryptographic hardware
Network level security
MLS
Cost-effectiveness TCO
Manageability/manpower
Scalability
Integrated management solutions
Charge model
zIIP, zAAP, IFL specialty processors
External Endpoints
Protocol &
Middleware Data Communication Stack
Messaging Middleware
Application
BASE24-es BASE24-es BASE24-es
Integrated Server Integrated Server Integrated Server
Database
Transaction Manager
Database
IBM System z platforms offer at least two subsystems that provide high
performance memory-based message queuing that meets the requirements of
BASE24-es.
WebSphere MQ is an industry-leading stand-alone message queuing product
that meets the requirements of BASE24-es.
CICS Transient Data Queues meet the requirements of BASE24-es.
VSAM with Record Level Sharing (RLS) meets the requirements of BASE24-es
for an efficient, scalable and reliable file system providing shared record-level
access to multiple concurrent instances of the BASE24-es Integrated Server. In
addition to enhanced performance, scalability, and reliability, the CICS API to
VSAM/RLS allows access to additional functionality relative to traditional
CICS/FOR file sharing.
DB2 support is planned for a future release. Contact ACI for details.
CICS sockets provide efficient TCP/IP communications, while the CICS VTAM
API provides access to existing communications protocols. CICS file access in
conjunction with SMSVSAM provides efficient and scalable record-level shared
file access, while CICS itself is a world-class transaction manager. Although
WebSphere MQ is a supported alternative, CICS itself provides entirely
adequate message queuing and delivery capabilities. CICS Multiple Region
Operations (MRO) and sysplex provide scalability and help provide availability.
Most of the external endpoints with which BASE24-es communicates via TCP/IP
use long-lived sockets, and require a long-running task to maintain the socket. In
addition, the BASE24-es architecture is based on large C++ executables with
substantial initialization cost. It is far more efficient to process many units of work
in a single task instance. Accordingly, BASE24-es makes use of long-lived IP
handler and Integrated Server tasks.
Multiple instances of the long-lived Integrated Server task run in a single CICS
region to achieve the parallelism normally achieved with many short-lived CICS
tasks running in a single region.
CICS Region
BASE24-es IP BASE24-es
Handler IPTDQ IntegratedServer
IPTDQ
BASE24-es BASE24-es
VTAM IntegratedServer
Reader/Writer
TOR
BASE24-es IP IP
Handler TDQ
AOR AOR
XDYR XDYR
IS IS
Trans - Trans -
TDQ action TDQ action
IS Task IS Task
IS Task IS Task
The target region typically contains definitions for the BASE24-es Integrated
Server and associated queues and file definitions, along with associated
ancillary tasks. By convention these target regions are referred to as Application
Owning Regions (AORs).
AOR
IS TOR Scope
AOR
IS TOR Scope
All communications handlers that are in the TOR route messages to the AOR
using symbolic routing (see 4.2.4, “Message routing” on page 45), and can route
messages using most of the available routing methods. Typically routing will
result in a distributed start of the XDYR BASE24-es Message Delivery program
as previously described (see 4.2.8, “Workload management” on page 48).
Figure 4-6 on page 52 illustrates the components of the TOR associated with the
TCP/IP client.
SOCKRECS
BASE24-es
IP Client
TCPIPCFG
SDMF
BASE24-es
Task
Monitor DDMF
Start XDYR
Figure 4-7 illustrates the components of the TOR associated with the TCP/IP
server.
TOR
SOCKRECS
BASE24-es
IP Client
TCPIPCFG
SDMF
BASE24-es
Task
Monitor DDMF
Start XDYR
SOCKRECS has one record for each configured remote endpoint that the IP
Server will associate with a unique symbolic name. Remote endpoints not
configured in SOCKRECS will take on the symbolic name of the TCP/IP
Server itself.
VTAM considerations
BASE24-es makes use of CICS/VTAM facilities to support non-IP based
protocols on z/OS. VTAM reader/writer programs are developed as the need
arises. Current offerings include SNA LU.0 and SNA LU.2.
VTAM
Terminal
1 6
TOR AOR
2 XDYR
VTAM
Reader/Writer IS
TDQ
3
4
5 Integrated
Server
WebSphere MQ considerations
In some cases, it may make sense for an external endpoint to send requests to
BASE24-es using WebSphere MQ. This form of message delivery has no
components in the TOR, but is handled directly in the AOR (see “Operator
notification considerations” on page 55).
Static Destination Map File (SDMF) This SIS routing configuration file contains static information
about an endpoint. It contains one record per CICS transaction
in the BASE24-es system, plus one record per SOCKRECS file
entry.
Synchronous Destination Map File This SIS routing configuration file maps BASE24-es symbolic
(SYDMF) names to CICS facilities. It contains one record per routable
endpoint in the BASE24-es system that supports synchronous
requests. It is not generally required in the TOR.
Socket Records file (SOCKRECS) This BASE24-es communications configuration file maps
remote endpoints to BASE24-es symbolic names. It contains
one record per remote endpoint defined to the system.
Endpoints not defined to the system are assigned a generic
symbolic name.
Event Log (EVTLOGR) This SIS event file is a Relative Record Data Set that is a
circular buffer of Operator Notification events.
DTR Control file (DTRCTL) This file is used by the BASE24-es Workload Management
User Exit as a source of information for the dynamically created
Applications Table and Routing Tables. The records in this file
are used to create tables in the DTRCTLTS CICS TS queue.
Figure 4-9 on page 57 depicts the components associated with the long-running
Integrated Server tasks in the AOR.
ISnn (BASE24-es
Integrated Server)
Timer
AOR1 ISnn
MDS Memory
Table
BASE24-es
Task Monitor
The BASE24-es Integrated Server tasks are started by the BASE24-es Task
Monitor program during CICS region initialization. They in turn create a
configurable number of instances of one or more BASE24-es Integrated Server
classes. (It may be desirable to create more than one Integrated Server class to
handle specific workloads. A separate TDQ is configured for each server class.)
When the Integrated Server long-running tasks find no messages on their TDQ,
they post a timer and wait. When the AOR Message Deliver Program runs, it
writes the message to the TDQ associated with the correct server class. It then
examines the shared memory table and checks the state of class member tasks.
If it finds a task that is waiting on a timer, it cancels the timer so the task can
wake and process a message from the TDQ.
It is possible that another task has already processed the delivered message. If
so, the newly-awakened task simply re-posts its timer and waits; the associated
cost is low.
BASE24-es region
BASE24-es
Application BASE24-es
EVTLOGR
TDQ
CICS
EINF Spooler
BASE24-es Event (CEEOUT)
Distributor EWRN
Transaction
EEER
ECRT
CICS
BASE24-es EVTLOGR (VSAM RLS Operator
Shared Relative Record Data Set)
Then, based on the severity of the notification, the Event Distributor starts one of
four Alternate Distributor transactions: EINF (Informational Notification), EWRN
(Warning Notification), EERR (Error Notification) or ECRT (Critical Notification).
By default, these four transactions are associated with the same program,
SIADIST, which simply writes to the local CICS spooler. However, the real
flexibility of the notification architecture lies in the fact that the program
associated with these transactions is intended to be user-replaceable. The user
might, for example, associate EERR and ECRT with a program that also writes to
the z/OS console.
WebSphere MQ considerations
A BASE24-es Integrated Server class or other BASE24-es Message Delivery
Service-based task can be configured to read a WebSphere MQ queue rather
than a CICS TDQ.
The Integrated Server processes the request and routes the response to the
originating endpoint by sending the response to symbolic name assigned by the
Message Delivery Service.
The Message Delivery Service determines that symbolic name is associated with
a WebSphere MQ, and that messages to that destination should be placed on
the queue without a BASE24-es internal header.
IS (MQ, No
IS Internal Header)
MQ
External
Endpoint MQ
Supporting all the options above would require configuring four Integrated Server
classes; that is, four CICS transactions, each with one of the preceding
configurations, each with a specified number of instances of long-lived tasks, and
each associated with the BASE24-es Integrated Server program.
TOR
HSM
Synchronous
IP Client
IP Client
TDQ
AOR
Integrated
TSQ Server
Record
1
Record
2
The Synchronous IP Client reads its TDQ and sends the request to the HSM over
a long-lived socket connection. It then reads the response from the socket,
validates that the response is still related to the request specified in the first
record of the TSQ, places the response in the second record of the TSQ, and
cancels the timer set by the Integrated Server.
The Integrated Server wakes up and, finding a response in the second record of
the TSQ, processes the response from the TCP/IP-attached HSM.
TOR
VTAM
Writer/Reader
HSM
AOR
TSQ Integrated
Server
Record
1
Record
2
The VTAM Writer/Reader is passed the HSM request when it is started. It sends
the request to the HSM and waits for a response. When a response is received,
the Writer/Reader validates that the response is still related to the request
specified in the first record of the TSQ, places the response in the second record
of the TSQ, cancels the timer set by the Integrated Server, and terminates.
The Integrated Server wakes up and, finding a response in the second record of
the TSQ, processes the response from the TCP/IP-attached HSM.
This may be appropriate if the component to which BASE24-es will link is itself
small, with small CPU utilization and resource requirements; perhaps it is a
simple front-end for a more extensive system located in another AOR, and is
capable of handling the associated availability and workload distribution
concerns. In this case the communications between the BASE24-es AOR and
the external authorization system AOR is determined by the external
authorization system itself.
When sizing the system it is important to remember that since a single thread of
execution in BASE24-es is blocked waiting for a response from the host, it is
necessary to configure more threads of execution, that is, more Integrated
Server long-lived tasks.
At a minimum, (peak_arrival_rate)*(is_latency +
external_authorization_system_latency) instances of the Integrated Server
long-lived task should be configured. (This number is a general guideline only;
more specific sizing information should be obtained before finalizing any plans.)
TOR
CICSLINK
TOR
Acquiring Issuiing
external external
endpoint endpoint
TOR 1 12 6 7
IP Handler 1 IP Handler 2
11 5
IP 1Q IP 2Q
2
AOR
10 4
IS Q
3, 9
Integrated Server
TOR AOR
IS01
IS01 (TDQ/Internal EAS
Header Server
TDQ Class) MQ
IS02
(MQ/No IS02
Header Server MQ
Class)
Static Destination Map File (SDMF) This SIS routing configuration file contains static information
about an endpoint. It contains one record per CICS transaction
in the BASE24-es system, plus one record per SOCKRECS file
entry.
Synchronous Destination Map File This SIS routing configuration file maps BASE24-es symbolic
(SYDMF) names to CICS facilities. It contains one record per routable
endpoint in the BASE24-es system that supports synchronous
requests.
Event Log (EVTLOGR) This SIS event file is a Relative Record Data Set that is a
circular buffer of Operator Notification events.
Card File (CARDD) This file contains one record per card known to the associated
financial institution.
Context File (CTXD) This file stores context for a client component. If client context
exceeds the limits of a single record, the context is stored in
multiple records. A context record has an expiration time stamp
after which a clean-up program will delete the record if the
client has not already done so. The assigns for the context
tables are defined in the Context Configuration table to allow a
number of Context tables to be shared or kept separate by
different clients.
A BASE24-es AOR also contains definitions for nearly two hundred files that are
optional or required for various options and configurations of the
platform-independent BASE24-es application logic. Most of these files are small
and static, and (if required) are read into CICS main storage at CICS region
initialization. They are never again read, and never updated by the Online
Transaction Processing (OLTP) logic. They can be updated only by operator
command and do not introduce any CICS region affinities.
The platform-independent files are listed in Table 4-3. Files which have both a
base name and an OLTP name (for example, Acquirer_Issuer_Relation and
Acquirer_Issuer_Relation_OLTP) are of this type; the file exists only as input to
build the table in main storage.
Acquirer_Txn_Allowed The Acquirer Transaction Allowed table contains one row for
each acquirer transaction profile, message category code
and processing code in the system that are allowed.
This table defines the set of transactions that are valid for an
acquirer.
Active_Script_Statistics The Active Script Statistics Table contains one record for
each authorizer and message type which is being
monitored. The records contain the counts for the script and
whether the script is enabled or not. Once the number of
transactions processed with the script exceeds the minimum
transaction count, the percentage of approvals, denials and
referrals are compared to their corresponding threshold
limits to determine whether the script should continue to be
used or not. If not, the script is disabled. This table is
updated during online processing for scripts configured to be
monitored.
This table should not be audited.
Audit_Store_and_Forward The Audit Store and Forward (SAF) table contains all the
records which are stored to be forwarded to the EAE at a
later time.
Authorization_Script The Authorization Script Table contains one record for each
Authorization Script defined in the Script Configuration data
source. It defines whether the script is enabled or not and
whether to monitor the script to ensure thresholds are met
for the script.
Authorization_Script_OLTP The Authorization Script OLTP Table contains one record for
each Authorization Script defined in the Script Configuration
data source. It defines whether the script is enabled or not
and whether to monitor the script to ensure thresholds are
met for the script.
This table is populated from the Authorization Script table.
Banknet_Group_Timers The Banknet Group Timers Table contains a record for each
logon sent to keep track of duplicate MCI ids, and to handle
logon time-outs and responses.
Card The Card Table. Contains one record for each card in the
network.
Compiler_Message The Compiler Message table contains one record for each
message which is generated by the script compiler for errors
and warnings. The record contains the severity of the
message, the message text and message detail.
EMV_Security The EMV Security Data. It contains the keys required for
EMV security processing and the scheme used.
EMV_Security_OLTP The EMV Security Data. It contains the keys required for
EMV security processing and the scheme used.
Environment The Environment table contains one record for each attribute
name/value pair in the system which overrides the default
value of the attribute defined by the application. Any attribute
not defined in the Environment table will use an appropriate
default value as defined by an application.
EPP_Serial_Number The EPP Serial Number data source is used to verify that the
EPP serial number received in a message from a device is a
valid/known serial number.
Event The Event table contains one row for each event which may
be logged. Each event contains up to 20 tokens which can
be substituted into the message text. Event suppression is
also defined in the Event table.
Exception_Log The Exception Log table contains one record for each
exception condition encountered by an endpoint which
requires the external message to be logged.
Holiday The Holiday table contains a record for each holiday date
specified as a holiday belonging to a holiday profile.
Holiday_OLTP The Holiday OLTP table contains a record for each holiday
date specified as a holiday belonging to a holiday profile.
This table is loaded from the Holiday table.
Host_Public_Key_Security_Primary This contains the primary public key and secret key
information generated for the Host.
Host_Public_Key_Security_Second This contains the secondary public key and secret key
information generated for the Host.
Inbound_Station_OLTP The Inbound Station table contains one record for each
station an inbound message may be received on. The
outbound station is its station pair that a corresponding
message will be sent out on.
The inbound and outbound station names may be the same
value. The interface they belong to is identified.
This table is populated from the Interface Station table, and
is accessed as a read only table during online transaction
processing.
Institution The Institution table contains one record for each institution
or retailer in the system acting as a terminal owner or
card-issuing authorization entity.
Institution_ID_OLTP This is a read-only table built from the Institution table for use
in routing to acquirers.
Interchange_Prefix The Interchange Prefix table contains one record for each
unique prefix and pan length combination identifying an
issuer. The instrument type is optional and defaults to
spaces. If the instrument type is spaces, the instrument type
from the System Prefix table is used.
Interchange_Prefix_OLTP The Interchange Prefix OLTP table contains one record for
each unique prefix and pan length combination identifying
an issuer. The instrument type is optional and defaults to
spaces.
If the instrument type is spaces, the instrument type from the
System Prefix table is used.
This table is populated from the Interchange Prefix table,
and is accessed as a read only table during online
transaction processing.
Interface The Interface table contains one record for each interface in
the system.
Interface_Acquirer_OLTP The Interface Acquirer OLTP table contains one record for
each interface in the system.
This table is populated from the Interface table.
Interface_OLTP The Interface OLTP table contains one record for each
interface in the system.
This table is built from the Interface table.
Interface_Station Interface Station table contains one record per input station.
The station record stores status information for stations.
There can be multiple stations per interface.
ISO_Field The ISO Field table contains one record for each message
profile which defines the characteristics of the fields.
ISO_Field_OLTP The ISO Field OLTP table contains one record for each
message profile which defines the characteristics of the
fields.
This table is loaded from the ISO Field table.
ISO_Message The ISO Message table contains one record for each
message profile and message type combination in the
system.
ISO_Message_OLTP The ISO Message OLTP table contains one record for each
message profile and message type combination in the
system.
This table is loaded from the ISO Message table.
Issuer_Profile_Map The Issuer Profile Map table contains an entry for each
issuer in the system. An issuer may be an institution
identifier or an interface name. The table assigns a primary
and alternate journal profile for each entry.
Issuer_Profile_Map_OLTP The Issuer Profile Map OLTP table contains an entry for
each issuer in the system. An issuer may be an institution
identifier or an interface name. The table assigns a primary
and alternate journal profile for each entry.
This table is populated from the Issuer Profile Map table, and
is accessed as a read only table during online transaction
processing.
Issuer_Route_Profile The Issuer Route Profile table contains an entry for each
issuer route profile in the system.
Journal_Prfl_Group_Assign_OLTP The Journal Profile Group Assign OLTP table contains one
record for each journal profile and index entry. There should
be at minimum four assign names for each journal profile
(one for previous, one for current, one for future and one for
an empty file). The actual number of assigns is defined in the
Journal Profile Group table by adding the retention period
(previous), future count plus two for the current and empty
file. This table is populated from the Journal Profile Group
Assign table and is accessed as a read only table during
online transaction processing.
Journal_Profile_Group The Journal Profile Group table contains a row for each
journal profile in the system.
Journal_Profile_Group_Assign The Journal Profile Group Assign table contains one record
for each journal profile and index entry. There should be, at
a minimum, four assign names for each journal profile (one
for previous, one for current, one for future and one for an
empty file). The actual number of assigns is defined in the
Journal Profile Group table by adding the retention period
(previous), future count plus two for the current and empty
file.
Journal_Profile_Group_OLTP The Journal Profile Group table contains a row for each
journal profile in the system.
This table is populated from the Journal Profile Group table,
and is accessed as a read only table during online
transaction processing.
Journal_Query_Output_Map The Journal Query Output Map table contains an entry for
each journal query output profile in the system. The table
defines the ring of journal query profiles to use for a journal
query.
Limits This table contains limits that are associated with a payment
instrument (for example, card).
Limits_OLTP This table contains limits that are associated with a payment
instrument (for example, card).
Merchant_OLTP The Merchant OLTP table contains one record for each
merchant in the system. Each record contains information
for use in validating transactions and defining cutover times.
This table is populated from the Merchant Table.
NCR_PIN_Verification This data source holds the keys used in NCR PIN
Verification. It is accessed using a Pin Verification Profile
and a NCR PIN Version Number.
NCR_PIN_Verification_OLTP This read only data source holds the keys used in NCR PIN
Verification. It is built from the NCR_PIN_Verification table.
Outbound_Station_OLTP The Outbound Station table contains one record for each
station an outbound message may be sent out on. The
inbound station is its station pair that a corresponding
message will be received on. The inbound and outbound
station names may be the same value. The interface they
belong to is identified.
This table is populated from the Interface Station table, and
is accessed as a read only table during online transaction
processing.
Perusal_Script The Perusal Script table contains one record for each user
profile, query type, and query name in the system. The
executable script name is defined.
Perusal_Script_OLTP The Perusal Script OLTP table contains one record for each
user profile, query type and query name in the system. The
executable script name is defined.
This table is populated from the Perusal Script table, and is
accessed as a read only table during online transaction
processing.
Prefix The Prefix Data Source defines each prefix that can be
processed. One record must exist in the Data Source for
each prefix to be processed. Note that if one prefix is used
with multiple account number lengths, multiple records must
exist in the Data Source—one for each account number
length.
Prefix_OLTP The Prefix Data Source defines each prefix that can be
processed. One record must exist in the Data Source for
each prefix to be processed. Note that if one prefix is used
with multiple account number lengths, multiple records must
exist in the Data Source—one for each account number
length.
Route The Route table contains one record for each issuer route
profile, route code, account 1 type, and account 2 type
defined in the system. Through configuration of Route
records, issuers define the routing and authorization
parameters for their transactions.
Issuers may share common routing and authorization
configurations through the specification of a common issuer
route profile. Processing codes share common routing and
authorization configurations through the specification of a
common route code. A wildcard value of asterisks is
supported for the issuer route profile, route code, account 1
type, and account 2 type.
Route_OLTP The Route table contains one record for each issuer route
profile, route code, account 1 type, and account 2 type
defined in the system. Through configuration of Route
records, issuers define the routing and authorization
parameters for their transactions.
Issuers may share common routing and authorization
configurations through the specification of a common issuer
route profile. Processing codes share common routing and
authorization configurations through the specification of a
common route code. A wildcard value of asterisks is
supported for the issuer route profile, route code, account 1
type, and account 2 type.
This table is populated from the Route table, and is
accessed as a read only table during online transaction
processing.
Script_Configuration The Script Configuration table contains one record for each
executable script in the system. An executable script may be
a high level script or a subscript. The script compiler updates
the param types and return values.
Script_OLTP The Script OLTP table contains one record for each
compiled script statement.
This table is populated from the Script table, and is accessed
as a read only table during online transaction processing.
Script_Source The Script Source table contains one record for each line of
a script. All lines together define the flow of a script. The
script source is compiled and the output is placed in the
Script table and updated in the Script Configuration table.
Store_and_Forward The Store and Forward (SAF) table contains all the records
which are stored to be forwarded to the interface at a later
time.
Store_and_Forward_Outstanding The Store and Forward (SAF) Outstanding table contains all
the records which have been stored and have been sent to
the interface. A response is pending for the request and thus
the SAF message is outstanding.
Stream The Stream table contains one record per line to be written
to a write only data source. The line is composed of a single
variable length string.
System_Prefix The System Prefix table contains one record for each
unknown prefix to search based on Source Logical Network
and Acquirer Route Profile to determine the correct issuer.
The Acquirer Route Profile may be wildcarded.
System_Prefix_OLTP The System Prefix OLTP table contains one record for each
unknown prefix to search based on Source Logical Network
and Acquirer Route Profile to determine the correct issuer.
The Acquirer Route Profile may be wildcarded.
This table is populated from the System Prefix table, and is
accessed as a read only table during online transaction
processing.
User_Perusal_Prfl_Relation The User Perusal Profile Relation table contains one record
for each alias which maps specific users to a user profile.
User_Perusal_Prfl_Relation_OLTP The User Perusal Profile Relation OLTP table contains one
record for each alias which maps specific users to a user
profile.
This table is populated from the User Perusal Profile
Relation table, and is accessed as a read only table during
online transaction processing.
Visa_PVV_OLTP Visa PVV contains information specific to the Visa PVV PIN
verification method.
Authorization type
Authorization type impacts both the configuration of the AOR (the files required
by BASE24-es, the requirement or lack of requirement for a security module) and
the TOR (the requirement or lack of requirement for connectivity to an external
authorizer.)
Switched transactions
Switched transactions are routed by BASE24-es to a local or national card
network for authorization. The card networks commonly prefer that the
BASE24-es system connect to them as a TCP/IP client (see “TCP/IP client
Offline authorization
Offline transactions are transactions authorized by BASE24-es on the basis of
data configured in its own database. They require no connectivity in the TOR to
an external authorizer.
Online authorization
Online transactions are transactions routed by BASE24-es to a back-end host
application for authorization. On other platforms, connectivity to the back-end
host application is generally accomplished by some variety of TCP/IP data
communications.
Online/Offline authorization
Online/Offline authorization is the authorization method in which BASE24-es
generally goes to a back-end host system for authorization, but is capable of
standing in and performing authorization when the back-end host application is
unavailable.
Considerations for communications with the back-end host application are the
same as for Online Authorization described in “Online authorization” on
page 88). Considerations for the authorization database in the AOR are similar to
the considerations for Offline Authorization described in “Offline authorization” on
page 88, although a less robust authorization method (perhaps declining hot
cards rather than checking available balances) is often selected.
Acquirer type
The Acquirer type generally has little direct impact on the configuration of the
BASE24-es AOR other than the specific module within BASE24-es to which the
Device-acquired transactions
Device-acquired transactions include transactions that originate from automated
teller machine (ATM) and point of sale (POS) devices. They may include a mix of
on-us and not-on-us transactions, and may be authorized by any of the
authorization types.
Switch-acquired transactions
Switch-acquired transactions are transactions acquired from local or national
card networks and routed to BASE24-es for authorization. They generally
represent on-us transactions only and are authorized as one of the Online,
Offline, or Online/Offline authorization types.
Host-acquired transactions
Host-acquired transactions are usually transactions acquired by devices or card
networks connected to a pre-existing host application and routed to BASE24-es
for authorization or routing. Host-acquired transactions are most generally
not-on-us transactions (since on-us transactions are often processed by the
pre-existing host application itself), but may be a mix of switched and offline
transactions.
External Endpoints
Protocol &
Middleware CICS Sockets/VTAM
CICS TDQ/WebSphere MQ
Memory-Based Queues
Application
BASE24-es BASE24-es BASE24-es
Integrated Server Integrated Server Integrated Server
Database
CICS/SMSVSAM Manager
VSAM/RLS Database
The majority of file access is accomplished using the CSKL-based CICS file
server SIDC. SIDC uses both the TCP/IP stack and the BASE24-es file
definitions, and must be run in a region containing both sets of resources. This
may be an AOR with TCP/IP services configured, a TOR with file definitions
configured, or a CICS region dedicated to BASE24-es UI processing.
SIDC UI Server
UI CICS Region
...
CICS RLS File Definitions
VSAM/RLS Database
The CICS-based SIDC UI server cannot work with the BASE24-es TCP/IP
Communications handlers and asynchronous message queuing because it must
maintain a transactional session with the UI Server.
In Figure 4-16 on page 66, the BASE24-es UI Server opens a session with the
IBM CSKL listener. CSKL links to the BASE24-es Security Validation Program to
verify the UI Server’s credentials. The Security Validation Program validates the
UI Server, so CSKL starts SIDC. CSKL calls GIVESOCKET and SIDC calls
The SIDC task persists until the client connection is lost. If the connection is lost
within the boundaries of a database transaction, SIDC rolls back any database
updates in progress before terminating.
XMLS UI server
The CICS-based XMLS UI Server is a special instance of the BASE24-es
Integrated Server used to route BASE24-es Process Control commands to the
Integrated Server, and to provide C++ facilities to access certain
specially-formatted data in some BASE24-es data files.
TOR
BASE24-es
IP Server IP
TDQ
AOR
XDYR IS
XMLS
TDQ
XMLS IS
TDQ
The TCP/IP server starts the AOR Routing Program XDYR and XDYR places the
transaction on the XMLS TDQ.
The Integrated Server processes the request and places a response on the
XMLS TDQ (see “TCP/IP connected security module” on page 61 for details or
synchronous messages between BASE24-es components.) XMLS reformats the
response and places it on the TCP/IP Communication Handler’s TDQ for delivery
to the client.
Replicating TORs
BASE24-es TCP/IP Communications Handlers may be replicated within a TOR,
and TORs may be replicated within a CICSPlex or other MRO environment.
Replicating AORs
BASE24-es AORs may be freely replicated. As BASE24-es will attempt to
distribute work evenly across available AORs, consideration should be given to
defining more AORs in LPARs with more available resources.
Replicating tasks
BASE24-es Integrated Server tasks can and should be replicated within an AOR.
The MAXSERVERS parameter in the BASE24-es System Interface Services
Message Delivery SDMF configuration file controls the number of tasks created
in each AOR.
Replicating LPARs
BASE24-es may be distributed across multiple Logical Partitions (LPARs) in a
Parallel Sysplex. The Parallel Sysplex may be distributed across multiple physical
processing units. The only requirement is that all the context-free servers in a
BASE24-es system must have a common database in which to hold their shared
context.
Benchmark observations
ACI Worldwide and IBM conducted a benchmark from November 7-18, 2005, at
the IBM Washington Systems Center to demonstrate the performance
characteristics of BASE24-es on the latest IBM System z9. The objective was to
achieve a processor consumption rate competitive with other solutions of this
type, and to maintain virtually constant MIPS per transaction rates as volumes
increased to a throughput of over 1000 transactions per second. The effort met
or exceeded these objectives.
Performance and scalability were close to linear, while the cost per
transaction remained low and constant.
The test achieved more than 1000 transactions per second on 8 CPUs
(limited only by I/O capacity). This represents twice the daily peak volume
requirements of the world’s largest transaction processors.
System z9 performance can satisfy any customer’s requirements with near
linear MIPS per transaction. TCO per transaction becomes lower with higher
transaction volumes as support costs are fairly constant as volume grows.
Results:
Test results are summarized in Figure 4-21 on page 97. Throughput
(transactions per second) and cost (MIPS per transaction) data was captured for
each of four LPAR configurations: 1, 2, 4 and 8 CPUs. MIPS per transaction
remained constant through approximately 1018 transactions per second, where
the I/O subsystem became saturated.
The solid upper line shows transactions per second, while the dotted lower line
shows that MIPS per transaction remained nearly constant as the transaction
rate increased. (Absolute CPU utilization and MIPS cost are not shown. For
accurate sizing information, contact ACI.)
CPU Busy
1200
0
1 2 4 8
Number of CPUs
Results:
Test results are summarized in Figure 4-22 on page 98. Two 4-CPU LPARs
processed approximately the same throughput as a single 8-CPU LPAR (again
limited by I/O capacity), but the MIPS per transaction was slightly higher and
non-linear (15% at 1072 TPS).
The solid upper line shows transactions per second, while the dotted lower line
shows that MIPS per transaction increased only slightly as the transaction rate
increased. (Absolute CPU utilization and MIPS cost are not shown. For accurate
sizing information, contact ACI.)
With each 9 comes a higher degree of cost and complexity. Customers must
weigh the cost of downtime against the cost of avoiding downtime to determine
the high availability model that best fits their business requirements.
Table 5-1 illustrates the relationship between the 9 and the actual business
service downtime.
Virtualization
There are two virtualization strategies available when configuring BASE24-es.
The first takes advantage of an ATM or point of sale (POS) device’s capability to
have a primary and alternate destination configured. The second strategy is to
provide Virtual IP Addressing (VIPA), a router, Sysplex Distributor, or a
combination of those.
Primary/Alternate configuration
Figure 5-1 on page 102 illustrates a primary/alternate configuration. In the figure,
note that half of the devices are configured with a primary destination of 1.2.3.4,
port 8000 and a secondary destination of 1.2.3.5, port 8000. The other half of the
devices are configured with primary destination of 1.2.3.5, port 9000 and a
secondary destination of 1.2.3.4, port 9000.
There is a separate TOR responsible for listening on each of the ports. The TORs
are spread across the two LPARs. The TORs are configured such that the
primary and secondary destination for all terminals will reside in different LPARs.
If one of the TORs fails, the devices connected to it before the failure would utilize
their secondary destination. This is illustrated in Figure 5-2 on page 103.
BASE24-es can utilize the z/OS Automatic Restart Manager (ARM) to detect the
TOR failure and restart it. After it returns to service, customers have two choices:
either allow the connections to remain on the alternate TOR, or issue a command
to terminate those connections.
Virtual IP
In this model, the IBM Virtual IP Address (VIPA) or a router is utilized to provide
virtualization of the TORs. TCP/IP Client connections are distributed across the
available TORs by VIPA or the router.
As you can see in Figure 5-3 on page 104, balancing of the connections is done
dynamically. This is unlike the previous model, in which the balancing of
connections was done through static configuration at the device.
LPAR 1 LPAR 2
In the case of a TOR failure, shown in Figure 5-4 on page 105, the devices that
were connected to the TOR will reconnect to the same IP address and port. The
connection will be distributed to one of the surviving TORs, as depicted.
After the failed TOR returns to service, it is difficult to rebalance the connections
across all TORs and LPARs. However, this should not cause a problem because
the majority of the work is done in the AOR, and the TORs distribute the load
across all available AORs.
Over time, connections will rebalance as devices lose and reestablish their
connections. Customers can choose to expedite rebalancing by terminating all
connections and forcing the all devices to reconnect. (However, this action should
be done during off-peak hours only.)
LPAR 1 LPAR 2
CICS TOR
Maintenance - TOR
When maintenance is required on some aspect of the TOR, the IP Handlers
within the TOR can be shut down in a controlled manner. During a controlled
shutdown, the IP Handler will terminate all connections. Any messages sent to IP
Handler after the connections have been terminated will be failed back to the
AOR. Typically, this will result in the message being rerouted or reversed
In addition, an IP Server will notify the application that the connections have been
terminated and the AOR will go into Network Management Mode and periodically
monitor the status of the connection.
AOR failure
In the event of an AOR failure, the TORs rely on the Dynamic Transaction
Routing Program in a CICSPlex environment, and routing user exit in a
non-CICSPlex environment, to detect and route around the AOR failure. The few
transactions that were on the TDQ at the time of the failure will be lost.
BASE24-es relies on the end-to-end protocols built into the electronic payment
industry to deal with the transaction appropriately.
z/OS Automatic Restart Manager (ARM) can be used to quickly restart the AOR
after a failure. Refer to 4.2.10, “AOR architecture” on page 56 for an in-depth
discussion about starting the AOR.
Maintenance - AOR
Occasionally, maintenance will be required on the AOR. This could take two
forms, either maintenance to the Integrated Servers, or maintenance to the AOR
itself. This section describes how maintenance can be performed on the AOR
without disrupting service.
AOR maintenance
When AOR maintenance is necessary, the AOR can be shut down in a controlled
manner. When a shutdown is required, the AOR will notify the TOR that it should
no longer be considered in the AOR selection process. This will stop new work
from being added to the TDQ. After all messages on the TDQ have been
processed, the Integrated Servers will be stopped and the AOR will be
terminated.
After maintenance is complete, the AOR can be restarted. Refer to 4.2.10, “AOR
architecture” on page 56 for an in-depth discussion of starting the AOR.
If you have two or more z/OS system images in your sysplex, however, you may
have to implement specific levels of database or access method software to
enable all AORs to access the same data at the same time, no matter which
z/OS system the AOR is running in. DFSMS (SMSVSAM), in conjunction with
CICS TS, provides similar facilities for VSAM data sets.
Multiple CICS/ESA® applications in a CICSplex can share VSAM data sets with
integrity by using function shipping to a single FOR. This approach has
limitations, though. It does not solve the problem of the FOR being a single point
of unavailability. Other CICS applications outside the Parallel Sysplex can also
use the FOR by function shipping over ISC links. DFSMS provides RLS access
for VSAM files. CICS TS allows CICS users to exploit this function. RLS is
designed to allow VSAM data to be shared, with full update integrity, among
many applications running in many CICS regions in one or more z/OS system
images in a Parallel Sysplex.
The SMSVSAM server runs in its own address space and handles all VSAM
requests made in RLS mode. If this address space fails, it can be restarted
automatically up to six times, assuming RLSINIT=YES is specified in the
IGDSMSxx member that the host z/OS image uses. The most likely reason for an
SMSVSAM server failure is loss of connectivity to the Coupling Facility.
The SMSVSAM address space fails if it cannot rebuild a Coupling Facility lock
structure, if it loses connectivity to the Coupling Facility lock structure, or if the
Coupling Facility lock structure fails. The SMSVSAM address space also fails
when it loses the last active SHaring Control Data Sets (SHCDS) and has no
spare available.
When the SMSVSAM address space fails, any I/O against a VSAM data set open
in RLS mode, or any attempt to open a VSAM data set in RLS mode, receives an
error indicating that the SMSVSAM is unavailable. CICS detects that SMSVSAM
address space is unavailable when one of the transactions attempts an RLS
access.
When this event occurs, CICS issues message DFHFC0153: File control RLS
access is being closed down. The effect that this has varies, depending on
where individual tasks are in their life cycle.
Coupling Facilities
A Coupling Facility (CF) is a special LPAR that provides high-speed caching, list
processing, and locking functions in a Parallel Sysplex. The LPAR can be
configured to run within a CP that runs other operating systems (it is often
referred to as an ICF in this case), or it could be configured in a stand-alone
processor that only runs Coupling Facility Control Code (CFCC).
In order to be able to deliver acceptable response times, you must have enough
capacity in your CFs. In addition to CF CPU utilization, you must also consider
response times. It is possible for CF CPU utilization to be low, but with CF
response times so long that the cost of using the CF becomes unacceptable (this
is especially the case if there is a large disparity in speed between the CF CPC
and the CPCs that the operating systems are running on).
Failure isolation
One of the most important characteristics of a CF is its location in relation to the
systems that are connected to it. There are some advantages to it being in a
stand-alone processor. However, the most important question is whether a single
failure can impact both the CF and one or more systems connected to it.
A CF in a CPC where none of the other images in that CPC are connected to that
CF can provide nearly as good availability as one running in a stand-alone
processor. Many structures can be rebuilt even if there is a double failure.
To avoid this situation, regularly verify that all structures are actually in the first
available CF in their PREFLIST. One way to do this is by using the z/OS Health
Checker, which has a check for this condition. In addition, consider issuing a
SETXCF START,REBUILD,POPCF=cfname for each of your CFs on a regular
basis. This will ensure that all structures are in the first CF in their PREFLIST (all
other things being equal).
Structure duplexing
Similarly, there are two flavors of structure duplexing, which are User-managed
duplexing and System-Managed CF Structure Duplexing.
User-managed duplexing is only used by DB2, and then only for its Group Buffer
Pool structures. System-Managed CF Structure Duplexing is supported for any
structure whose connectors support system-managed rebuild.
Assuming that the two copies of a duplexed structure are failure-isolated from
each other, the use of System-Managed CF Structure Duplexing for a structure
allows you to place a structure in the same failure domain as one or more of its
connectors, but still be protected from a single failure that would otherwise cause
a data sharing group restart. However, while System-Managed CF Structure
Duplexing provides substantial availability benefits, it can also have a significant
impact on the response time for update requests to the duplexed structures.
Note: Whatever number of CFs you have, make sure that all of them are
specified on the PREFLIST for all critical structures.
CF maintenance procedures
There are different approaches to emptying Coupling Facilities.
One method that some customers use is to maintain three sets of CFRM
policies: One that contains both CFs, one that contains just one CF, and one that
contains just the other. To empty a CF, they switch to the policy containing just
the CF that will remain in service, and then rebuild all the structures that now
have a POLICY CHANGE PENDING status.
After the work has been completed and the CF LPAR is activated and available to
all systems, always use the SETXCF START,RB,POPCF=cfname command to
repopulate the CF.
For example, the IBM DS8000™ series is designed to avoid single points of
failure and provide outstanding availability. With the additional advantages of
IBM FlashCopy®, data availability can be enhanced even further; for instance,
production workloads can continue execution concurrent with data backups.
Metro Mirror and Global Mirror business continuity solutions are designed to
provide the advanced functionality and flexibility needed to tailor a business
continuity environment for almost any recovery point or recovery time objective.
The addition of IBM solution integration packages spanning a variety of
heterogeneous operating environments offers even more cost-effective ways to
implement business continuity solutions.
The DS8000 supports a rich set of Copy Service functions and management
tools that can be used to build solutions to help meet business continuance
requirements. These include IBM TotalStorage® Resiliency Family Point-in-Time
FlashCopy can help reduce or eliminate planned outages for critical applications.
FlashCopy is designed to provide the same point-in-time copy capability for
logical volumes on the DS6000™ series and the DS8000 series as FlashCopy
V2 does for ESS, and it allows access to the source data and the copy almost
immediately.
If a connection is lost, the interface will mark the connection as down and will no
longer send messages to it. The interface will periodically check the connection
to see if it has been re-established. Once re-established, the interface will begin
sending messages to the connection.
Distributed links
In a CICSPlex environment, a customer can achieve high availability by taking
advantage of the CICSPlex Workload Manager (WLM) to distribute the CICS
LINK from BASE24-es across multiple Host Application target regions. In order to
do this, the BASE24-es Host Interface must be configured in Synchronous mode.
There should be one entry configured in the Station Table for the Host Interface.
That entry should correspond to an entry the SYDMF that defines the CICS
The BASE24-es Host Interface will utilize the stations in turn, which will provide a
load balancing effect. If a target region is unavailable, BASE24-es Alternate
Routing can be configured to re-route the request to a second or third target
region for authorization.
The number of both TORs and AORs can be expanded to provide greater
resilience and throughput. This option uses VIPA to provide virtualization of the
data communications.
LPAR 1
CF
CICS Routing Region CF
VIPA
VIPA
MSG Route
VSAM
VSAM
T
BASE24-es
Single CPC
The single CPC is the next point of failure. While a CPC failure is rare, planned
maintenance will cause a system outage for the duration of the maintenance.
Single LPAR
The first point of failure is the single LPAR. If there is a planned or unplanned
outage of the LPAR, the whole system becomes unavailable for the duration of
the outage.
As with the first option, there is only one copy of the BASE24-es data. This option
uses VIPA to provide virtualization of the data communications.
CPC
LPAR CF LPAR
CICS Routing Region 1 CICS Routing Region 2
VIPACF
MSG MSG
Route Route
VIPA
VSAM
VSAM
CICS Target Region CICS Target Region
T T T T
BASE24 BASE24
-es -es
Single CPC
The single CPC can represent a single point of failure. While a CPC failure is
rare, planned maintenance will cause a system outage for the duration of the
maintenance.
As with the first two options, there is only one copy of the BASE24-es data. This
option uses VIPA to provide virtualization of the data communications.
LPAR 1 CF LPAR 2
CICS Routing Region CICS Routing Region
VIPACF
MSG MSG
Route Route
VIPA
VSAM
VSAM
CICS Target Region CICS Target Region
T T T T
BASE24-es BASE24-es
This configuration provides the highest availability that can be achieved at one
site. It satisfies the requirements of many BASE24-es customers.
Figure 5-8 illustrates this configuration. Each site houses a system similar to the
one described in the previous option, with tools to replicate the data. The number
of TORs, AORs, LPARs, and CPCs can be expanded to provide greater
resilience and throughput.
Site A Site B
CPC CPC
T T T T T T
T T
BASE24-es BASE24-es BASE24-es BASE24-es
Replication
CICSplex System Manager (CP/SM) and BASE24-es were used to provide the
dynamic routing capability from each TOR to all the AORs.
All the AORs were capable of accessing the VSAM files in Record Level Sharing
(RLS) mode through the SMSVSAM address space. SMSVSAM allocates the
cache and lock structure in the two Coupling Facilities available to the Parallel
Sysplex.
CPC CPC
SC30 SC30
SC31 SC32
CPSM
TOR TOR TOR
VSAM data
z/OS 1.6
6.2.3 Desktop
The user interface to manage the BASE24-es application is controlled by
client/server architecture. The ACI desktop client is the user interface to
BASE24-es. Table 6-2 on page 127 summarizes the ACI desktop client
requirements. The ACI desktop client needs write access to the installation
location for file updates, logs, and configuration file storage.
Operating system Choose only one of the following. All current service
packs applied, where applicable:
- Microsoft Windows 2000 Professional
- Microsoft Windows XP Home or Professional
Additionally, the installer will require two TSO ids with ISPF access.
Therefore, a 64-bit time library was introduced into the System Interface Services
layer of BASE24-es to circumvent this potential issue. Additionally, an
environment variable to accurately represent time zone offsets based on
POSIX 1.1 standards was introduced: the SIS_TZ environment variable. Some
activities are needed for setting the SIS_TZ environment variable in CICS.
Note: The CICS ESDA space should be set to at least 600 MB for the ES
application. A System Initialization Table (SIT) is provided for sample
parameters. The CICS region is started with cold or initial to cause the
new resource definition to be installed.
You can use the provided user interface to verify that the BASE24-es installation
in a simple CICA region has been completed correctly. Figure 6-3 on page 132
displays a sample from the BASE24-es user interface.
The CICS System Definitions (CSDs) generated in the single region installation
were used as input to the CICSplex to provide resource definitions for the various
MASs. The CICS utility DFHCSDUP was used to extract specified groups from
the CSD. A CPSM-provided user exit (EYU9BCSD) formatted the extracted data
into CPSM resource definition records that can be used as input to the Batched
Repository Update Facility (BATCHREP).
CICSplex considerations
The starting point is a sample BASE24-es CICSplex using the CICSplex
System Management (CPSM) facility.
Note: ACI provides procedures to accomplish these tasks. For further details
on the CICSplex setup, refer to DTR BASE24-es IBM CICSplex Installation
Guide.
TCP/IP configuration
1. We needed to determine whether the BASE24-es TCP/IP process would be
acting as a client or a server.
2. We needed to reserve the TPC/IP address and the port number.
3. The BASE24-es TCP/IP Control File (TCPIPCFG) screens allowed us to set
these parameters for all processes:
– Service Name is the key to this file. It must be unique, and will be used
within BASE24-es as a reference point to a component.
Example: Service name Switch_Station_01. In the BASE24-es user
interface, you would select the Stations tab from the Switch Interface
Configuration window. The Service name is then entered as one of the
stations to be used to send and receive message traffic from a Switch.
– Port number that this BASE24-es TCP/IP process will communicate on.
The BASE24-es installer can build this data ahead of time. By working with the
customer beforehand, the installer can build the necessary data in a project
environment at ACI before bringing it to a customer site.
Performance data
Acquirers
Visa DPS interface
SPDH POS devices
Issuers
BASE24-es Scripting Component
ISO 93 Host Simulator
Journal files
Ten journal files, each with two alternate indexes.
Merchant
Eight thousand merchants, each supporting four terminals.
Thirty-two thousand terminals
Routing
Offline authorization using the BASE24-es Scripting component.
Online authorization using the ISO 93 Host Simulator.
Script
Offline authorization used a base script with a series of subscripts to
complete the authorization process, and accessed the Card file and the
Usage file in the process.
Online authorization used a base script with a series of subscripts to
complete pre-screening of the transactions before being sent to the issuer for
authorization, and accessed the Card file in the process.
Institution
One institution record was defined.
File partitions
Card
3.2 million card records defined.
Partition order is card number
Two partitions each with 1.6 million card records.
Usage
Partition order is card number.
Five partitions.
This starts out as an empty file.
Usage records are added at first use and updated after that.
Initialization
Each target region has an initialization program that CICS executes as part of
its PLT process.
Check the following items in each target region:
– TCP/IP task
• Use the CICS supplied CEMT command to verify that the expected
TCP/IP tasks are running.
• Use the NETSTAT GATE REPORT DSN 'data set name' command
from the ISPF command shell to verify that TCP/IP processes are in
the correct state. In our configuration, we issued:
NETSTAT REPORT DSN 'tsorwsa.netstat.cicsibm2’
The output data set ‘TSORWSA.NETSTAT.CICSIBM2’ should show the
port associated with the SPDH server process is listening for client
connections; see Example 7-1.
Example 7-1 Output data set showing port listening for client connections
EZZ2350I MVS TCP/IP NETSTAT CS V1R4 TCPIP Name: TCPIP 16:30:25
EZZ2585I User Id Conn Local Socket Foreign Socket State
EZZ2586I ------- ---- ------------ -------------- -----
EZZ2587I CI31IBM2 0033C4EA 0.0.0.0..7708 0.0.0.0..0 Listen
EZZ2587I CI31IBM2 0033C4E7 0.0.0.0..9829 0.0.0.0..0 Listen
EZZ2587I CI31IBM2 0033C4E9 0.0.0.0..9833 0.0.0.0..0 Listen
EZZ2587I CI31IBM2 0033C4E6 0.0.0.0..9826 0.0.0.0..0 Listen
EZZ2587I CI31IBM2 0033C4E5 0.0.0.0..9823 0.0.0.0..0 Listen
EZZ2587I CI31IBM2 0033D558 172.17.4.153..9823 172.17.1.177..3149
Establsh
EZZ2587I CI31IBM2 0033C4E8 0.0.0.0..9827 0.0.0.0..0 Listen
EZZ2587I CI31IBM3 0033EA4A 0.0.0.0..9850 0.0.0.0..0 Listen
EZZ2587I CI31IBM3 0033EA4B 0.0.0.0..9843 0.0.0.0..0 Listen
EZZ2587I CI31IBM3 0033EA49 0.0.0.0..9853 0.0.0.0..0 Listen
EZZ2587I CI31IBM3 0033EA47 0.0.0.0..9849 0.0.0.0..0 Listen
EZZ2587I CI31IBM3 0033EA48 0.0.0.0..9847 0.0.0.0..0 Listen
EZZ2587I CI31IBM3 0033EA4C 172.17.4.153..2934 172.21.200.37..1500
Establish
Transaction counts
From the workload driver, verify that the number of requests and the number of
responses match.
Approval codes
From the workload driver, verify that the responses have the proper approval
code.
Timeouts
From the workload driver, verify whether any timeouts have been recorded.
Cards
The routing method used for our project requires read access to the Card file for
every transaction. To avoid having the Card file become a bottleneck, we
performed these tasks:
We increased the number of records in the file. As a consequence, we
needed to increase the VSAM file allocated space by redefining the file.
This number will vary, depending on the workload goals. To address this, we
wrote a program to build these records in a sequential file. Then we used IBM
utility programs to load them into the VSAM files.
The card numbers built by this program will need to kept in sync with those
delivered by the workload driver.
Journals
The BASE24-es Journal file can be configured with up to seven alternate
indexes. One alternate index (AIX®) is required for Online Transaction
Processing (OLTP). The other indexes may be optionally configured to provide
for online perusal of the Journal file according to various criteria.
We used the required AIX and one optional AIX, which is a typical
configuration.
Each financial transaction results in a record being written to the BASE24-es
Journal File. To avoid having this file become a bottleneck, use more than one
journal file.
BASE24-es allows you to configure many Journal files per institution. This
allows you to create many physical files that appear as one logical file.
– Use the BASE24-es User Interface Journal Configuration windows to
define the number of journal files necessary to achieve the workload
goals.
– BASE24-es uses a hashing algorithm to determine to which of the
configured Journal files to write a financial transaction record.
Usage
The routing method used for our project requires a read and update access to
the Usage file for every transaction. To avoid having this file become a
bottleneck, we performed these tasks:
We increased the number of records in the file. As a consequence, we
needed to increase the VSAM file allocated space by redefining the file.
VSAM define statements are needed for any new files added.
CICSPlex System Manager File Definitions and File in Group entries will be
needed for any files.
Merchant processing
Merchant processing requires the coordination between merchants and devices.
In our project, we identified the number of merchants necessary to meet the
workload goals and the number of terminals each merchant will support.
These numbers will vary depending on the workload goals. We addressed
this by writing a program to build these records in a sequential file. Then we
used IBM utility programs to load them into the VSAM files.
The terminal numbers built by this program will need to kept in sync with those
delivered by the workload driver.
Card file
Access ranges from zero to two reads on every financial transaction,
depending on the authorization method configured.
BASE24-es concepts
– The BASE24-es File Partitioning ITSO configuration:
• Fields tab
Catalog name is Card
Field partition order number 1
Field name Personal Account Number (PAN)
• Catalog tab
Range one
Low key range value of “9876500000000001”
Assign name root of “CARD_01”
Assign name suffix of “A”
Range two
Low key range value of “9876500001600001”
Assign name root of “CARD_01”
Assign name suffix of “B”
– BASE24-es Metadata configuration CONFCSV file
• Logical Card file
Assign name of “CARD”
Table name of “file_part_catalog”
Physical name of CRDCAT
• Physical Card file 1
Assign name of “CARD_01A”
Table name of “card”
Journal file
Each financial transaction results in a record being written to the BASE24-es
Journal file. To avoid having this file become a bottleneck, use more than one
Journal file. Also examine the number of alternate indexes being used.
Each alternate index adds requires additional CPU to process. One way to
reduce CPU consumption in a production or performance environment is to
remove the alternate indexes that are not required to meet a customer’s business
needs. The BASE24-es Journal file comes with seven alternate indexes:
Account Key
Channel Key
Clerk Key
Clerk Key2
Merchant Key
On-line Key
PAN Key
In our project, we used the On-line Key and the PAN key.
BASE24-es allows you to configure many Journal files per institution, and it is
possible to view a group of physical files as a logical entity. In our ITSO
environment, we used the BK02 institution for the scenarios.
However, when it is used, then the BASE24-es File Partitioning described for the
Card file or the Usage file also applies to the Positive Balance file.
Usage file
Access is a read and update on every financial transaction, depending on the
authorization method configured.
The BASE24-es File Partitioning ITSO configuration:
– Fields Tab
• Catalog name is Usage
• Field partition order number 1
• Field name ID
– Catalog Tab
• Range one
When configured correctly, the trimming process has little or no effect on the
processing of transactions. When configured incorrectly, however, the trimming
process can cause additional overhead. That is, while the trimming process is
executing, additional CPU will be used and the customer will experience
increased response time. For more information about this topic, refer to the
presentation “Logger/CICS - Performance and Common Problems”, which you
can find at the following site:
http://www.ibm.com/support/techdocs/atsmastr.nsf/PubAllNum/PRS183
You can also use the DFH0STAT, SMF type88 record, IBM CICS Performance
Analyzer tools to investigate common CICS log stream problems.
When this value starts to get above 5%, observations have shown an increase in
CPU utilization coupled with an increase in response time.
Our test configuration was designed to represent the highest possible level of
availability attainable at a single production site. All components were replicated
to eliminate any single point of failure. Scenarios were developed to assure that
the OLTP processing system remained available through a wide cross-section of
potential hardware and software failures.
Note: For simplicity, some of the failover scenarios are illustrated by graphics
which show only two LPARs. However, in our configuration we used three LPARs.
Also, this is possible since there are no affinities within the BASE24-es
application. If there were affinity lifetimes of System or Permanent, the workload
would fail.
With the CMAS running on a different LPAR and the Routing Region (TOR)
routing to Target Regions (AOR) on the other LPAR, those Target Regions will
have an additional weight due to the CMAS being unavailable. Workload will still
be routed to the AORs, but CPSM will not be able monitor, operate, analyze or
otherwise manage these AORs.
Actual behavior
The workload continued to be processed without interruption.
The LPAR in which the CMAS was failed produced a series of messages in both
routing and target regions, as shown here:
10.53.44 STC03506 +EYUNL0902I CICSET01 LMAS LRT CMAS Requested termination initiated
10.53.49 STC03506 +EYUNL0999I CICSET01 LMAS LRT termination complete
10.53.49 STC03506 +EYUXL0011I CICSET01 LMAS shutdown in progress
10.53.49 STC03506 +EYUCL0005I CICSET01 ESSS Receive Link Task terminated
10.53.49 STC03506 +EYUXL0018I CICSET01 LMAS restart in progress
.
.
.
10.55.14 STC03506 +EYUCL0006I CICSET01 ESSS link to CMASSC30 established
10.55.14 STC03506 +EYUXL0007I CICSET01 LMAS Phase II initialization complete
10.55.14 STC03506 +EYUNL0099I CICSET01 LMAS LRT initialization complete
Solution
The expected behavior was observed.
Actual behavior
At 100 transactions per seconds (TPS), three routing regions were each
processing one-third of the workload. One of the routing regions was cancelled.
At this point the workload driver recognized one of its client connections was
down, and the workload was reduced by one-third. After the routing region
automatically recovered, the client connection was reestablished and the full
workload was resumed.
The routing region that was purged was restarted with a new started task number
by the Automatic Restart Manager. The routing region was restarted by ARM in
less than one second. The routing regions were fully initialized and the client
connection was established in 30 seconds, at which time the full workload was
resumed. During this period, the workload driver recorded 7 requests as timed
out.
Following are the system log messages showing when the cancel command was
issued and the ARM start command was issued:
0000000 SC30 2006143 10:32:02.27 ACIRWSA 00000290 C CICSET01,ARMRESTART
0000000 SC30 2006143 10:32:02.55 STC04052 00000090 IEF450I CICSET01 CICS -
ABEND=S222 U0000 REASON=00000000
.
.
.
Following is the BASE23-es event message showing the IP client process has
been started:
06-05-2310:32:32TCPSERVNACI.8000000.10000 I1000 ET01T710 0
TCP/IP Main, long term task starting*
Solution
The expected behavior was observed.
Expected behavior
1. Responses to messages in flight in the AOR will be reversed when the send
to the TOR fails.
2. Simulator will reconnect to virtual IP and be routed by Sysplex Distributor to
remaining IP handlers. Message traffic will resume.
3. The TOR will be restarted by CICS Automatic Restart Manager (ARM). ACI
Task Monitor handler will be restarted by first ACI handshake transaction from
an available AOR, and will restart IP handler.
4. If messages on outbound TDQ are still accessible, IP handler will read TDQ
and fail messages back to the IS process. The IS process will generate
reversal transactions. Verify queue empty with CECI and reversal transactions
with journal perusal. Subsequent connections will be distributed by VIP to all
available TORs.
5. Messages actually being processed by the IP handler at the time it was
cancelled will be lost, as are messages on the outbound TDQ that could not
be recovered at region restart. An out-of-balance condition exists where the
Actual behavior
At 100 transactions per second (TPS), three routing regions were each running,
processing one-third of the workload. One of the routing regions was cancelled.
At this point the workload driver issues a new client connection request and the
connection was established to one of the remaining IP servers. The workload
driver showed no interruption to the workload.
The routing region that was purged was restarted with a new started task number
by the Automatic Restart Manager. The routing region was restarted by ARM in
less than one second. The routing regions were fully initialized, and the IP server
process was available for new client connections in 20 seconds.
Following are system log messages showing when the cancel command was
issued and the ARM start command was issued:
0000000 SC30 2006143 11:10:31.53 ACIRWSA 00000290 C CICSET01,ARMRESTART
0000000 SC30 2006143 11:10:31.78 STC04072 00000090 IEF450I CICSET01 CICS -
ABEND=S222 U0000 REASON=00000000
.
.
.
0000000 SC30 2006143 11:10:31.87 00000290 IXC813I JOBNAME CICSET01,
ELEMENT SYSCICS_CICSET01 349
349 00000290 WAS RESTARTED WITH THE FOLLOWING
START TEXT:
349 00000290 S CICSET01
349 00000290 THE RESTART METHOD USED WAS
DETERMINED BY THE ACTIVE POLICY.
Expected behavior
1. The TOR start of XDYR in the failed AOR will fail. The TORs will detect failure,
remove the failed AOR from their active list, and reroute the message to an
active AOR.
2. AOR will be restarted by CICS ARM.
3. A handshake program in AOR will register with the active TORs, and traffic
will be distributed across all active AORs.
4. Any message actually being processed by the IS processes in the AOR at the
time of the failure will be lost. Any associated changes will be backed out of
the database. No out-of-balance condition exists since the transaction has
been backed out, and the acquirer has not received an approval.
Though the simulator does not support this functionality, a typical real world
interface might generate a reversal on the time-out, then stand in and
generate a notification of the stand-in approval.
5. Messages on the IS inbound TDQ may or may not be lost.
6. Any messages that cannot be recovered from the IS inbound TDQ will be lost.
No out-of-balance condition exists since the transaction has been backed out,
and the acquirer has not received an approval.
Though the simulator does not support this functionality, a typical real world
interface might generate a reversal on the time-out, then stand in and
generate a notification of the stand-in approval.
7. Any messages that can be recovered from the TDQ will be processed. The
database will be updated, and responses will be sent to the acquirer. Most of
these responses will almost certainly be stale.
Though the simulator does not support this, in the real world the acquirer
would generate a late-response reversal to bring things back into balance.
Following are the system log messages showing when the cancel command was
issued and the ARM start command was issued:
0000000 SC30 2006143 11:37:02.18 ACIRWSA 00000290 C CICSEA03,ARMRESTART
0000000 SC30 2006143 11:37:03.21 STC04099 00000090 IEF450I CICSEA03 CICS -
ABEND=S222 U0000 REASON=00000000
.
.
.
0000000 SC30 2006143 11:37:03.32 00000290 IXC813I JOBNAME CICSEA03,
ELEMENT SYSCICS_CICSEA03 200
200 00000290 WAS RESTARTED WITH THE FOLLOWING
START TEXT:
200 00000290 S CICSEA03
200 00000290 THE RESTART METHOD USED WAS
DETERMINED BY THE ACTIVE POLICY.
Following is the BASE24-es Dynamic Routing AOR Status Display showing the
failed AOR EA03 back in alive status:
Expected behavior
1. The ACI IP handlers will get termination notifications.
2. The IP handlers will close their sockets
3. Any messages on the outbound TDQs will be returned to the originators for
appropriate handling (for example, approved responses on the queue will be
backed out; financial advice will be placed in a store-and-forward file; and so
on).
4. After monitoring its associated TDQ for a few seconds, the IP handler will
terminate.
Actual behavior
Three routing regions were each running, processing one-third of the workload.
In one of the routing regions the IP server process was stopped. At this point the
workload driver issued a new client connection request, and the connection was
established to one of the remaining IP servers. The workload driver showed no
interruption to the workload.
Following are the tasks which were active before the command to stop the IP
process was issued. Task T701 is the BASE24-es IP server process:
Tas(0000026) Tra(CONL) Sus Tas Pri( 255 )
Sta(U ) Use(CICSUSER) Uow(BED98A8FA2977854)
Tas(0000028) Tra(COI0) Sus Tas Pri( 255 )
Sta(U ) Use(CICSUSER) Uow(BED98A911AFB4996) Hty(USERWAIT)
Tas(0000034) Tra(COIE) Sus Tas Pri( 255 )
Sta(U ) Use(CICSUSER) Uow(BED98A9C4A348494)
Tas(0000047) Tra(CSKL) Sus Tas Pri( 255 )
Sta(S ) Use(CICSUSER) Uow(BED98A9E80156856)
Tas(0000065) Tra(T701) Sus Tas Pri( 250 )
Sta(S ) Use(CICSUSER) Uow(BED98AA7CABAC8C4)
The following shows the number of items in the transient data queue that the IP
server process uses when sending responses back to connected clients. The
stop command was issued in CICS system CICSET01.
Notice that the column Number Items is showing zero for CICS System
CICSET01, indicating that messages on the queue were returned to the
application. The other CICS systems have numbers greater than zero under this
column, showing they are still processing.
CMD Queue CICS Enabled Accesses ATI ATI Trigger Number Recovery
--- ID--- System-- Status--- -------- Tran Term Level--- Items-- Status-----
T701 CICSET01 ENABLED 4257 1 0 NOTRECOVABL
T701 CICSET11 ENABLED 5758 1 2 NOTRECOVABL
T701 CICSET21 ENABLED 5161 1 1 NOTRECOVABL
Solution
The expected behavior was observed.
Expected behavior
1. The ACI WLM subsystem is invoked to unregister the AOR with the TORs.
New work will stop flowing from the TORs to the AOR.
2. The transaction will monitor the state of the ACI TDQs in the AOR. When all
queues are empty (which will probably take a couple of seconds or less under
normal circumstances), the transaction will notify the long-lived ACI
transactions to terminate.
Note: The alive status of region EA03 is NO, indicating it is not participating in
financial transaction processing. Also note that the number of requests sent to
region EA03 have halted.
Note: The alive status of EA03 is YES. Also note the number of requests sent
to region EA03, as it is again increasing.
Solution
The expected behavior was observed.
Expected behavior
The expectation is to see the existing AOR regions register with the new TOR as
soon as the region is initialized. Consequently, the workload should be spread
evenly across the three TOR regions.
TOR Scope: ET01 ET11 ____ ____ ____ ____ ____ ____ ____ ____ ____ ____
____ ____ ____ ____ ____ ____ ____ ____
AOR Scope: EA01 EA11 EA21 EA02 EA12 EA22 EA03 EA13 EA23 ____ ____ ____
TOR Scope: ET01 ET11 ET21 ____ ____ ____ ____ ____ ____ ____ ____ ____
____ ____ ____ ____ ____ ____ ____ ____
AOR Scope: EA01 EA11 EA21 EA02 EA12 EA22 EA03 EA13 EA23 ____ ____ ____
Following is the BASE24-es dynamic routing AOR status; this is from the third
routing region that was added. It shows how it has distributed financial
Solution
The expected behavior was observed.
Expected behavior
The new AOR should automatically register with the TORs.
Actual behavior
Eight target regions were configured in the BASE24-es dynamic routing
configuration and were active. A ninth target region was added. The new target
region registered the BASE24-es dynamic routing configuration and became
active to begin processing financial transactions. Workload was uninterrupted
during the process.
TOR Scope: ET01 ET11 ET21 ____ ____ ____ ____ ____ ____ ____ ____ ____
____ ____ ____ ____ ____ ____ ____ ____
AOR Scope: EA01 EA11 EA21 EA02 EA12 EA22 EA03 EA13 ____ ____ ____ ____
____ ____ ____ ____ ____ ____ ____ ____
TOR Scope: ET01 ET11 ET21 ____ ____ ____ ____ ____ ____ ____ ____ ____
____ ____ ____ ____ ____ ____ ____ ____
AOR Scope: EA01 EA11 EA21 EA02 EA12 EA22 EA03 EA13 EA23 ____ ____ ____
____ ____ ____ ____ ____ ____ ____ ____
Note: The AOR column include regions EA23, and transactions are being
successfully routed to it.
Solution
The expected behavior was observed.
Actual behavior
Three LPARs were running, processing the workload, equally distributed. In one
of the LPARs, a force SMSVSAM command was issued. SMSVSAM was then
restarted automatically because of the RLSINIT(YES) definition in the IGDSMS
parmlib member.
While SMSVSAM was unavailable, all messages sent to this LPAR for processing
were queued. After SMSVSAM became available, the queued messages were
processed.
While SMSVSAM was unavailable, the IP client process was unavailable, which
was notified by the workload driver and the workload was reduced by one third.
After SMSVSAM became available, the IP client process connected to the
workload server and the full workload was again processed.
The workload drivers reported that 103 transactions timed-out during the time it
took to fully recover from this failure.
The following system log messages show SMSVSAM being closed and
restarting in one second, and the restart ending 10 seconds later. No BASE24-es
event messages were generated.
4020000 SC30 2006144 09:31:31.76 STC04531 00000090 +DFHFC0153 CICSET01 412
Solution
The expected behavior was observed. In order to automatically have SMSVSAM
restart after a failure, we specified the RLSINIT(YES) keyword in the IGDSMSxx
parmlib member.
A simple test driver program is written to retail locks within the Coupling Facility
lock structure. When the structure has only a few entries remaining, introduce
workload to the system to use the remaining entries as BASE24-es modifies its
recoverable RLS files.
Expected behavior
1. Syslog will show the following messages for the AOR:
DFHRM0205 An activity keypoint has been successfully taken.
ACTIVE STRUCTURE
----------------
ALLOCATION TIME: 11/14/2004 16:38:57
CFNAME : P1CF02
COUPLING FACILITY: 009672.IBM.51.000000061147
PARTITION: 0D CPCID: 00
ACTUAL SIZE : 8192 K
STORAGE INCREMENT SIZE: 256 K
ENTRIES: IN-USE: 23597 TOTAL: 23597, 100% FULL
LOCKS: TOTAL: 1048576
PHYSICAL VERSION: BC1EAD96 11947D49
LOGICAL VERSION: BC1EAD96 11947D49
Note: The CFRM POLICY for this lock structure has a maxsize of
8192 K, and the structure has reached this maxsize.
In our scenario, the lock structure was already at the maximum size allowed in
the Coupling Facility Resource Manager (CFRM) policy. In such a case, you
have to perform these additional steps:
a. Define a CFRM policy that has a larger IGWLOCK00 structure size.
SETXCF START,ALTER,STRNAME=IGWLOCK00,SIZE=xxx
d. To reinitiate the backout process, issue the following command:
CEMT SET DSN(your.datasetname) RETRY
If you are already prepared for a larger lock structure through your CFRM
policy, simply issue the rebuild command.
Actual behavior
Lock structure IGWLOCK00 was preloaded so that it was approximately 95% full.
Then workload was applied to increase the structure to 99% full.
At this point each financial transaction produces an AFDH abend. Each of these
abends produces a dump, so watch the spool so it does not become full. An alter
command was issued against the structure to give it more space. This changed
the structure from 99% full to 27% full. Financial transactions were then
processed normally, and the AFDH abends no longer occurred.
ACTIVE STRUCTURE
----------------
ALLOCATION TIME: 05/25/2006 14:17:33
CFNAME : CF37
COUPLING FACILITY: 002084.IBM.02.000000026A3A
PARTITION: 1D CPCID: 00
ACTUAL SIZE : 40192 K
STORAGE INCREMENT SIZE: 256 K
ENTRIES: IN-USE: 0 TOTAL: 73739, 0% FULL
LOCKS: TOTAL: 8388608
PHYSICAL VERSION: BEDB1266 5BAEB999
LOGICAL VERSION: BEDB1266 5BAEB999
SYSTEM-MANAGED PROCESS LEVEL: 8
XCF GRPNAME : IXCLO00B
DISPOSITION : KEEP
ACCESS TIME : NOLIMIT
NUMBER OF RECORD DATA LISTS PER CONNECTION: 16
MAX CONNECTIONS: 4
# CONNECTIONS : 3
Example 9-2 System log messages - DFHLOG has not been trimmed
DFHLG0760 05/25/2006 14:29:29 CICSEA01 Log stream SYSPROG.CICSEA01.DFHLOG not trimmed by
keypoint processing. Number of
keypoints since last trim occurred: 6. History point held by transaction: WMIL, task
number: 00299.
DFHRM0205 05/25/2006 14:29:38 CICSEA01 An activity keypoint has been successfully taken.
DFHLG0760 05/25/2006 14:29:38 CICSEA01 Log stream SYSPROG.CICSEA01.DFHLOG not trimmed by
keypoint processing. Number of
Example 9-3 System log - AFDH abend during financial transaction processing - lock structure full
DFHDU0203I 05/25/2006 14:32:55 CICSEA01 A transaction dump was taken for dumpcode: AFDH,
Dumpid: 10/0001.
DFHDU0203I 05/25/2006 14:32:55 CICSEA01 A transaction dump was taken for dumpcode: AFDH,
Dumpid: 10/0007.
DFHDU0203I 05/25/2006 14:32:55 CICSEA01 A transaction dump was taken for dumpcode: AFDH,
Dumpid: 10/0006.
DFHDU0203I 05/25/2006 14:32:55 CICSEA01 A transaction dump was taken for dumpcode: AFDH,
Dumpid: 10/0005.
DFHDU0203I 05/25/2006 14:32:55 CICSEA01 A transaction dump was taken for dumpcode: AFDH,
Dumpid: 10/0004.
DFHDU0203I 05/25/2006 14:32:55 CICSEA01 A transaction dump was taken for dumpcode: AFDH,
Dumpid: 10/0003.
DFHDU0203I 05/25/2006 14:32:55 CICSEA01 A transaction dump was taken for dumpcode: AFDH,
Dumpid: 10/0002.
Example 9-4 CICS log - AFDH abend during financial transaction processing - lock structure full
DFHAC2236 05/25/2006 14:32:55 CICSEA01 Transaction IS01 abend AFDH in program ISLO term ????.
Updates to local
recoverable resources will be backed out.
DFHAC2236 05/25/2006 14:32:55 CICSEA01 Transaction IS01 abend AFDH in program ISLO term ????.
Updates to local
recoverable resources will be backed out.
DFHAC2236 05/25/2006 14:32:55 CICSEA01 Transaction IS01 abend AFDH in program ISLO term ????.
Updates to local
recoverable resources will be backed out.
DFHAC2236 05/25/2006 14:32:55 CICSEA01 Transaction IS01 abend AFDH in program ISLO term ????.
Updates to local
ACTIVE STRUCTURE
----------------
ALLOCATION TIME: 05/25/2006 14:17:33
CFNAME : CF37
COUPLING FACILITY: 002084.IBM.02.000000026A3A
PARTITION: 1D CPCID: 00
ACTUAL SIZE : 40192 K
STORAGE INCREMENT SIZE: 256 K
ENTRIES: IN-USE: 73724 TOTAL: 73739,
99% FULL
LOCKS: TOTAL: 8388608
PHYSICAL VERSION: BEDB1266 5BAEB999
ACTIVE STRUCTURE
----------------
ALLOCATION TIME: 05/25/2006 14:17:33
CFNAME : CF37
COUPLING FACILITY: 002084.IBM.02.000000026A3A
PARTITION: 1D CPCID: 00
ACTUAL SIZE : 86016 K
STORAGE INCREMENT SIZE: 256 K
ENTRIES: IN-USE: 73739 TOTAL: 269207,
27% FULL
LOCKS: TOTAL: 8388608
PHYSICAL VERSION: BEDB1266 5BAEB999
LOGICAL VERSION: BEDB1266 5BAEB999
SYSTEM-MANAGED PROCESS LEVEL: 8
XCF GRPNAME : IXCLO00B
DISPOSITION : KEEP
ACCESS TIME : NOLIMIT
NUMBER OF RECORD DATA LISTS PER CONNECTION: 16
MAX CONNECTIONS: 4
# CONNECTIONS : 3
Solution
The expected behavior was observed. To avoid this issue you can use monitoring
tools such as RMF that will warn your installation before this threshold becomes
critical.
An error will be introduced reading the Journal Profile Group and Journal Profile
Group Assign data sources.
Expected behavior
An error will be generated during CICS region startup. The AOR will not be
functional and transactions routed to it will be declined. The AOR must be taken
out of service by the operator until the problem is resolved, at which time the
AOR can be restarted.
Actual behavior
During initialization, each routing region detected that the BASE24-es Journal
File configuration was not available. This was due to the VSAM files being either
not available or empty. All financial transactions were declined and returned to
the workload driver.
Solution
The expected behavior was observed.
An error will introduced reading the CONFCSV and MDBCSV data sets, which
are mapped to corresponding Extrapartition TDQs during PLT processing.
Expected behavior
This is a catastrophic error and the application will be unable to generate
operator events or perform any financial processing. The application will abend.
The region will be inoperable until the operator corrects the error and restarts the
CICS region.
Actual behavior
During initialization, each routing region detected that the BASE24-es metadata
configuration file was not available and produced an event message. All financial
transactions were declined and returned to the workload driver. Each financial
transaction attempted to start a BASE24-es Integrated Server that will abend,
because it was uninitialized.
Following are the CICS CEEOUT messages indicating the failure during
initialization:
RSTR 20060525191115 SIRSTR: ES Region startup processing initiated.
RSTR 20060525191115 The DAL Bootstrap program for CICS has started.
RSTR 20060525191115 TDQ ACI1 Action = EXEC CICS SET TDQUEUE., Action Status =
failed, RESP = 17, RESP2 = 14. IOERR: An error
RSTR 20060525191115 occurred opening or closing the data set associated with
the queue.
RSTR 20060525191115 The DAL bootstrap program is halting due to errors.
RSTR 20060525191115 Subject: EMTBLD - DAL error: 48
RSTR 20060525191115 ÝSub-system: DAL Subject: CICS Action: EXEC CICS READQ TS.
Action Status: failed Error 44: (QIDERR: The
RSTR 20060525191115 queue specified can not be found in auxiliary or main
storage.)¨
Solution
The expected behavior was observed.
To avoid a lengthy wait while writing a very large number of messages to fill one
or more JRNL files, we created the failure by modifying the JCL that creates the
JRNL to create artificially small JRNL files. We configured the workload driver to
send VDPS messages to be authorized offline by BASE24-es, and we observed
the behavior as records were added that exceeded the maximum capacity of the
primary JRNL file.
Test configuration
In the initial configuration, transactions flowed from the workload driver. They
were authorized by BASE24-es and logged to the primary JRNL file, and
responses were returned to the workload driver. There are primary and alternate
JRNL files associated with each card issuer configured in the system.
LPAR LPAR
TOR IP TOR IP
Client Client
AOR AOR
IS IS
Primary JRNL
File Alternate
JRNL File
Expected behavior
As each Primary JRNL file filled up, an error was returned to each Integrated
Server when it attempted to write to that file. The Integrated Server reports the
log in the Error Log File and in its associated CICS spooler, and the current and
subsequent financial transactions are logged by that Integrated Server to the
configured Alternate JRNL file associated with the failed primary. One event is
logged per BASE24-es Integrated Server per full JRNL file. Refer to Figure 9-2
on page 188 for an illustration of the journal file full expected behavior scenario.
A slightly longer response time may be experienced by the workload driver on the
request that actually encounters the failure; otherwise no impact to the workload
driver is expected.
Recovery occurs naturally in the course of the business cycle, as records are
extracted from the JRNL file and it is deleted and redefined before it is reused
some days later. In the intervening days the operator has the opportunity to
examine the error logs and modify the JCL to redefine a larger JRNL file if
necessary. (Recovery was beyond the scope of this test.)
LPAR LPAR
TOR IP TOR IP
Client Client
AOR AOR
IS IS
Primary JRNL
File Alternate
JRNL File
Actual behavior
When the primary Journal File full condition was encountered, the financial
transactions were then written to the Alternate Journal File. An event message
was produced to indicate this action occurred for each file. During this process,
the workload continued uninterrupted.
Following is the BASE24-es event message showing the failure to insert to the
Primary Journal File:
06-05-2517:06:34ISLO 1046 E10 EA03IS 1
INSERTER: Insert failed for Kernel Inserter Class. Error TABLE FULL: The requ
est failed, no more records could be added to the table. Data Source: JLF_BK
02_P00_0 .*
06-05-2517:06:36ISLO 1046 E10 EA13IS 1
INSERTER: Insert failed for Kernel Inserter Class. Error TABLE FULL: The requ
est failed, no more records could be added to the table. Data Source: JLF_BK
02_P05_5 .*
06-05-2517:06:36ISLO 1046 E10 EA13IS 1
INSERTER: Insert failed for Kernel Inserter Class. Error TABLE FULL: The requ
Solution
The expected behavior was observed.
A spontaneous failure was simulated by using CICS CEMT to force purge the IP
Client task.
LPAR LPAR
TOR TOR
IP IP
Client Client
BASE24-es BASE24-es
Task Task
Monitor Monitor
TDQ TDQ
AOR AOR
Test configuration
Initially the workload driver was connected with two BASE24-es IP Clients in two
Terminal Owning Regions. Each IP Client distributed requests across the
available Application Owning Regions. Each AOR routed responses back to the
CICS TDQ in the TOR that originated the request; see Figure 9-3.
Expected behavior
When one BASE24-es IP client is force purged, the connection to the workload
driver is lost. Responses to transactions in flight may be on the TDQ associated
with the IP client, and responses may continue to be placed on the TDQ while the
IP Client remains down.
Within a few seconds, the IP Client will be restarted by the BASE24-es Task
Monitor task. At that time, connections with the workload driver will be
reestablished and new requests will start to flow.
Responses that were on the TDQ at the time of the failure, or that were placed on
the TDQ during the failure, will be passed on to the workload driver. However,
since the workload driver is not a fully functional switch, real world end-to-end
recoverability will not occur in this test scenario.
The IP process that was purged was restarted with a new CICS transaction ID by
the BASE24-es monitor process. Recovery took a few seconds (in our case,
seven seconds). During that time, three requests were indicated as timed-out at
the workload simulator. There were messages on the IP client queue to be
processed when it was purged, as listed here:
IP client Failure:
05/23/2006 09:15:03 CICSET01 Transaction T710 abend AEXY in program EZACIC01
term ??? Updates to local recoverable resources will be backed out.
IP client recovery:
06-05-2309:15:10TCPSERVNACI.8000000.10000 I1000 ET01T710 0
Solution
The expected behavior was observed.
A spontaneous failure was simulated by using CICS CEMT to force purge the IP
Server task.
Workload
Driver
Network
Router
LPAR LPAR
TOR TOR
IP IP
Server Server
BASE24-es BASE24-es
Transaction Transaction
Monitor Monitor
TDQ TDQ
AOR AOR
Initially the workload driver was connected with two BASE24-es IP Servers in two
Terminal Owning Regions. Connections were distributed across the IP Servers
by a network router. Each IP Server distributed requests across the available
Application Owning Regions. Each AOR routed responses back to the CICS
TDQ in the TOR that originated the request; refer to Figure 9-4.
Expected behavior
At the time of the failure, connections between the workload driver and the IP
Server are lost. The workload driver will reestablish connections with the network
router, which will direct the connections to the remaining IP Server. Workflow will
resume.
Within a few seconds, the IP Client will be restarted by the BASE24-es Task
Monitor task. If restart occurs quickly enough, some of the workload drivers
connections may be assigned by the network router to the restarted IP Server
rather than the surviving server in the other TOR.
Actual behavior
Three IP servers were running, each processing one-third of the workload. One
of the IP servers was force purged. At this point the workload driver issued a new
client connection request and the connection was established to one of the
remaining IP servers. The workload driver showed no interruption to the
workload.
The IP process that was purged was restarted with a new CICS transaction ID by
the BASE24-es monitor process and was available to accept new connections
within few seconds (in our case, eleven seconds).
IP client Failure:
05/23/2006 09:43:38 CICSET01 Transaction T701 abend AEXY in program EZACIC01
term ??? Updates to local recoverable resources will be backed out.
IP client recovery:
06-05-2309:43:49TCPSERVNACI.8000000.10000 I1000 ET01T701 0
TCP/IP Main, long term task starting*
Solution
The expected behavior was observed.
Test configuration
At inception, IP clients in the TORs have connected to the workload driver, and
transactions are being distributed across the IS TDQs in the available AORs and
are being authorized. When one IS task is force-purged, the remaining IS tasks
in the AOR continue processing messages from the TDQ. (Sufficient IS tasks are
configured to assure the loss of any one task will not affect the throughput
capability of the AOR.)
Within a few seconds the BASE24-es Task Monitor will restart the failed IS task
and processing will return to normal.
Because the workload driver is not a fully functional switch, end-to-end recovery
will not occur. However, in the real world, end-to-end recovery would assure
financial integrity between the acquiring interface and BASE24-es.
LPAR LPAR
TOR TOR
AOR AOR
IS IS
IS IS
TDQ TDQ
TDQ
BASE24-es BASE24-es
Task Task
Monitor Monitor
Expected behavior
The expected behavior was as follows:
1. The workload will be taken up by the remaining IS tasks.
2. The IS task will be restarted by the ACI monitor program.
3. Any message actually being processed by the IP handler at the time it was
cancelled will be lost. Any associated database changes will be backed out.
No out-of-balance condition exists, since the transaction has been backed out
and the acquirer has not received an approval.
Though the simulator does not support this functionality, a typical real world
interface might generate a reversal on the time-out, then stand in and
generate a notification of the stand-in approval.
Actual behavior
Workload continued to be processed uninterrupted.
The IS process that was purged was restarted with a new CICS transaction ID by
the BASE24-es monitor process.
IS Failure:
5/22/2006 16:22:09 CICSEA01 Transaction IS01 abend AEXY in program ISLO
term ??? Updates to local recoverable resources will be backed out.
Solution
The expected behavior was observed.
Notes:
If there are multiple transactions associated with the ISLO program, it may
be preferable to use the 'SIQM RESTART ALL' command to restart all the
long-running BASE24-es tasks in the AOR, rather than issuing the 'SIQM
RESTART tran' command for each transaction.
The RESTART command causes the rolling restart of the IS servers and it
should have no impact on processing.
Figure 9-6 on page 199 illustrates the Planned application upgrade scenario.
Workload
Driver
LPAR LPAR
TOR TOR
AOR AOR
IS IS
IS IS
TDQ TDQ
TDQ
BASE24-es BASE24-es
Task Task
Monitor Monitor
Expected behavior
Setting the ISLO program to PHASEIN specifies that a new copy of the program
is brought into CICS memory. Because the RESTART command causes the IS01
tasks to stop and restart, they are restarted with the new version of the ISLO
program.
The RESTART command causes a rolling restart so that instances of the task
are stopped and restarted sequentially outside of a financial transaction. Some
minor increases in response time will occur as a result of the RESTART
command. No impact to the financial integrity of the system is expected.
Actual behavior
Workload continued to be processed uninterrupted.
Restart commands:
CEMT SET PROG(ISLO) PHASEIN
SIQM RESTART IS01
Following are the event messages notifying that the IS stop/start process was
initiated by the commands:
IS01 20060522163011 SIS MDS: Task stopping, name = %EA02IS010000262
IS01 20060522163016 SIS MDS: Task stopping, name = %EA02IS010000263
IS01 20060522163021 SIS MDS: Task stopping, name = %EA02IS010000264
IS01 20060522163026 SIS MDS: Task stopping, name = %EA02IS010000265
Solution
The expected behavior was observed.
The assumption was that there will be at least two AORs running the customer's
back-end host system, and that BASE24-es will route online authorizations to
those systems via remote LINK. At least two BASE24-es stations will be
configured, each associated in the BASE24-es synchronous destination map file
(SYDMF) with a separate remote program definition in a separate authorizing
region. Three regions running a host simulator are available. The failure will be
injected by canceling one of the back-end AORs. Figure 9-7 on page 203
illustrates this scenario.
Note: We chose to test this scenario using the alternate routing capabilities of
the BASE24-es application, rather than using the equally valid approach of
configuring a CIVSplex distributed link.
TOR TOR
BASE24-es BASE24-es
AOR AOR
SAF
TDQ SAF
TDQ
IS IS
CPSM
Expected behavior
BASE24-es would use round robin processing across the available stations. After
the AOR failure, messages to the associated station would experience an error
on the link. BASE24-es would stand in and authorize those messages, adding
financial notifications to the store-and-forward file.
After a configurable number of link errors, BASE24-es would mark the station as
down and continue processing using the remaining stations
Actual behavior
BASE24-es continued to attempt to route transactions to the cancelled host
region and produced a BASE24-event message. Alternate routing selected an
active host region. The transaction was then returned to the workload driver as
authorized. Workload continued to be processed uninterrupted throughout this
process.
Solution
The expected behavior was observed.
The Usage file contains one record per card per usage period. The error was
created by an extended high-volume run with random cards.
Test configuration
In the initial configuration, transactions flow from the workload driver. They are
authorized by BASE24-es, logged to the primary JRNL file, and responses are
returned to the workload driver. There are primary and alternate JRNL files
associated with each card issuer configured in the system. Figure 9-8 on
page 205 illustrates the Usage file full scenario.
LPAR LPAR
TOR IP TOR IP
Client Client
AOR AOR
IS IS
USGD File
Expected behavior
When a file full error is encountered on the Usage file, BASE24-es logs an
operator event indicating the error but continues to authorize transactions.
Updates to existing records continue. The fact that BASE24-es is unable to write
a new record to the USGD does not mean the initial transaction it is trying to
record should be retried.
Recovery will occur naturally as usage periods elapse and records are removed
from USGD.
Actual behavior
Workload continued to be processed uninterrupted.
After the Usage file was full, Updates to existing Usage File records were
processed normally. Each attempt to add a new record resulted in the following
message:
EERR 20060517190454 Kernel Inserter Class. Error TABLE FULL: The request
failed, no more records could be added to the table.
Solution
If this error is encountered frequently, it may be necessary to take an outage to
resize the system, so it is good practice to assure that the usage file is
adequately sized initially.
9.5 Hardware
This section covers the following hardware failure scenarios:
1. Central Processor (CP) failure
2. CPC failure/LPAR failure
3. Coupling Facility (CF) failure
Because our LPARs are symmetrical, we can perform this test on any LPAR and
the behaviors should be similar on all of them. In our case, we used SC31. We
initiate this test by removing a CP using the CF CP(0),offline command.
Configuration
When we issue the command D M=CPU on SC31, we can see in the response that
the LPAR is running on two processors (CPs). For this reason, taking one
processor offline should not affect the functionality of the application. If this was
the only or last processor in the LPAR, a failure of the processor will be presented
similarly as an LPAR failure or CPC failure.
IEE174I 13.52.53 DISPLAY M 101
PROCESSOR STATUS
ID CPU SERIAL
00 + 24991E2094
01 + 24991E2094
02 -
03 -
Expected result
We are expecting a message on the console informing us that CP0 is having a
problem, but the application should continue to run without failure. The CPU
sparing feature handles this type of failure in a way that will be transparent to the
application and operations.
Actual behavior
Three LPARs were each running processing one-third of the workload. Each
LPAR had two online CPUs. In one of the LPARS, CPU(0) was taken offline.
Workload continued uninterrupted. CPU(0) was later brought back online.
Following are the system log messages showing CPUs 00 and 01 online:
RESPONSE=SC31
IEE174I 11.23.52 DISPLAY M 161
PROCESSOR STATUS
ID CPU SERIAL
00 + 24991E2094
01 + 24991E2094
02 -
03 -
Following is the SDSF display showing LPAR is 16% busy across two CPUs:
SDSF DA SC31 SC31 PAG 0 CPU/L 16/ 16 LINE 1-4 (4)
COMMAND INPUT ===> SCROLL ===> CSR
NP JOBNAME ame ProcStep JobID Owner C Pos DP Real Paging SIO CPU%
CICSEA11 A11 CICS STC04571 SYSPROG NS FE 49T 0.00 325.17 3.05
RESPONSE=SC31
IEE174I 11.24.36 DISPLAY M 177
PROCESSOR STATUS
ID CPU SERIAL
00 -
01 + 24991E2094
02 -
03 -
Following is the SDSF display showing LPAR is 26% busy across one CPU:
SDSF DA SC31 SC31 PAG 0 CPU/L 26/ 26 LINE 1-4 (4)
COMMAND INPUT ===> SCROLL ===> CSR
NP JOBNAME ame ProcStep JobID Owner C Pos DP Real Paging SIO CPU%
CICSEA11 A11 CICS STC04571 SYSPROG NS FE 64T 0.00 438.31 5.74
CICSEA12 A12 CICS STC04572 SYSPROG NS FE 64T 0.00 1116.9 7.26
CICSEA13 A13 CICS STC04570 SYSPROG NS FE 62T 0.00 670.67 5.44
CICSET11 T11 CICS STC04568 SYSPROG NS FE 14T 0.00 116.18 3.02
Following is the system log messages showing CPUs 00 and 01 are online:
RESPONSE=SC31 IEE504I CPU(0),ONLINE
RESPONSE=SC31 IEE712I CONFIG PROCESSING COMPLETE
RESPONSE=SC31
IEE174I 11.25.20 DISPLAY M 195
PROCESSOR STATUS
ID CPU SERIAL
00 + 24991E2094
01 + 24991E2094
02 -
03 -
Solution
The expected behavior was observed.
Because the behavior of this scenario is similar for either an LPAR or CPC loss,
we will perform the test by failing the LPAR. To initiate the test, we need to access
the HMC of the CPC and deactivate the partition, which was SC32 in our case.
For this test, we are also planning to have the ARM function enabled to
automatically restart the CICS regions on one of the surviving LPARs.
Configuration
The test took place while the workload was spread across the three LPARs:
SC30, SC31, and SC32. One of the LPARs, SC32 in our case, is going to be
deactivated to simulate a failure.
Expected result
As soon as the deactivation task is completed, the icon representing the LPAR on
the HMC should change from green to red. The CICS regions and the
SMSVSAM server will be down, as a consequence.
When CICS terminates, there might be active locks left that are converted to
retained locks. Locked records cannot be accessed by transactions in other
CICS regions. Transactions in other CICS regions will wait for an active lock. For
this reason, the CICS regions need to be brought back on a running system in
the sysplex to be able to resolve the potential retained lock condition.
When the CICS region is brought back, CICS will analyze the logging data,
perform rollback, and release the locks. We expect to see the CICS regions
restarted automatically by ARM.
After the CICS (AORs) regions are brought back and initialized, then unless they
required for the workload, these regions can be taken down. The only reason for
restarting these regions is to be able to clear potential locks held on RLS records.
From the workload driver perspective, when the connection with the TOR is lost,
the driver will reconnect automatically after 5 seconds using the same IP
address. If the network has routing capability, this connection will be routed and
established with the surviving TORs. In this case there will no loss from the
client’s side.
If the network does not have routing capability, the driver will keep pinging the
same IP address until the TOR is back on the same LPAR.
Actual behavior
Three LPARs were each running processing one-third of the workload. Then one
LPAR was stopped.
This caused the IP client processes and the CICS mirror transactions on the
surviving LPARs to remain in an IRLINK state for several seconds. During this
time, the workload simulator did not send new workload.
After the LPAR was removed from the sysplex by an active SFM policy, the
workload simulator again sent workload to the connected IP client processes.
One observed exception was that our workload simulator did not accept
connections from one of the IP client processes. This was resolved by recycling
the workload simulator.
Solution
The expected behavior was observed.
Note: This scenario makes sense only if the installation is planning to have at
least two Coupling Facilities in the configuration. With one Coupling Facility,
some of the elements might not be able to recover the single point of failure.
Configuration
Following is the configuration of the Coupling Facilities used for our scenarios:
To initiate the test, we need to access the HMC of the CEC and deactivate the
partition, CF39 in our case.
Expected result
Coupling Facility failure in a CICS RLS system will disable access to all data sets
being accessed in RLS mode and, more importantly, disable all CICS system
logging unless a suitably configured alternate Coupling Facility is available. With
a suitably configured system, the only impact of a Coupling Facility failure should
be the time to recognize the error and rebuild the structures into the alternate
Coupling Facility.
The Coupling Facility failure messages are issued to all systems in the Parallel
Sysplex at the time of the failure. The subsystems that own the structures will
then drive the rebuild of the structures. Within seconds, the structures should
recover successfully into the new Coupling Facility, and, in the case of log stream
data, CICS logging should resume automatically.
Actual behavior
We had two Coupling Facilities active. We halted one of the Coupling Facilities.
The structures that were active in the halted Coupling Facility were rebuilt in the
surviving Coupling Facility. During this time workload continued uninterrupted
and BASE24-es did not log any events.
Following are the system log messages before the failure of CF37:
Note: The following is the output of the D CF command issued against CF37
and CF39 showing the CONNECTED SYSTEMS and STRUCTURES for each CFs.
CONNECTED SYSTEMS:
SC30 SC31 SC32
CONNECTED SYSTEMS:
SC30 SC31 SC32
STRUCTURES:
IXC_DEFAULT_2 IXC_DEFAULT_3 LOG_DFHSHUNT_001
RLSCACHE02 SYSIGGCAS_ECS SYSTEM_LOGREC
SYSZWLM_WORKUNIT SYSZWLM_991E2094
Note: The following is the output of the D CF command issued against CF37
and CF39 after the failure, showing the CONNECTED SYSTEMS and STRUCTURES for
each CFs.
CFNAME: CF37
COUPLING FACILITY : 002084.IBM.02.000000026A3A
PARTITION: 1D CPCID: 00
SITE : N/A
CONNECTED SYSTEMS:
SC30 SC31 SC32
STRUCTURES:
IGWLOCK00 ISGLOCK ISTGENERIC
ISTMNPS IXC_DEFAULT_1 IXC_DEFAULT_2
IXC_DEFAULT_3 IXC_DEFAULT_4 LOG_DFHLOG_001
LOG_DFHSHUNT_001 RLSCACHE01 RLSCACHE02
SYSIGGCAS_ECS SYSTEM_LOGREC SYSTEM_OPERLOG
SYSZWLM_WORKUNIT SYSZWLM_6A3A2084 SYSZWLM_991E2094
Following are the system log messages after failure for CF37:
CFNAME: CF39
COUPLING FACILITY : 002094.IBM.02.00000002991E
PARTITION: 2F CPCID: 00
SITE : N/A
POLICY DUMP SPACE SIZE: 2048 K
ACTUAL DUMP SPACE SIZE: 2048 K
STORAGE INCREMENT SIZE: 256 K
STRUCTURES:
IXC_DEFAULT_2(PND) IXC_DEFAULT_3(PND) LOG_DFHSHUNT_001(PND)
RLSCACHE02(PND) SYSIGGCAS_ECS(PND) SYSTEM_LOGREC(PND)
SYSZWLM_WORKUNIT(PND) SYSZWLM_991E2094(PND)
Solution
The expected behavior was observed.
To initiate the test, we will perform an orderly shutdown of all the subsystems
running on one of the z/OS images (for example, SC32), by following a normal
shutdown procedure. After the subsystems have been closed, we can deactivate
the LPAR from the HMC (optional).
Configuration
The test will take place while the workload is running across the three systems:
SC30, SC31, and SC32. One system, SC32 in our case, is going to be shut down
to simulate a normal maintenance procedure.
Expected result
There should be no impact to end users.
When the connection with the TOR is closed, the workload simulator will
reconnect automatically after 5 seconds, using the same IP address. If the
network has routing capability, this connection will be routed and established with
the surviving TORs. In this case there will no loss from the client’s side.
If the network does not have routing capability, the driver will keep pinging the
same IP address until the TOR is back on the same LPAR.
Actual behavior
The procedure to maintain a z/OS system has been tested within IBM and
documented in multiple sources. We do not consider this scenario to be specific
to the BASE24-es application, and implicitly it is covered in 9.5.1, “Central
Processor (CP) failure” on page 206.
When such a situation occurs, you must remove any structures contained in the
Coupling Facility before it is shut down and removed from the sysplex. We will
demonstrate that it is possible to move all the structures from one Coupling
Facility to the alternate one without interrupting the workload.
This scenario makes sense only if the installation is planning to have at least two
Coupling Facilities in the configuration. With one Coupling Facility, some of the
subsystems will not be able to operate.
Configuration
Following is the configuration of the Coupling Facilities used for our scenarios:
CF37 contains the following structures:
IGWLOCK00 ISGLOCK ISTGENERIC
ISTMNPS IXC_DEFAULT_1 IXC_DEFAULT_4
LOG_DFHLOG_001 RLSCACHE01 SYSTEM_OPERLOG
SYSZWLM_6A3A2084
Since the structures are equally distributed among the CFs, the procedure to
apply maintenance would be similar for either one of the CFs. In our scenario, we
use CF37.
STRUCTURES:
IGWLOCK00 ISGLOCK ISTGENERIC
ISTMNPS IXC_DEFAULT_1 IXC_DEFAULT_4
LOG_DFHLOG_001 RLSCACHE01 SYSTEM_OPERLOG
SYSZWLM_6A3A2084
5. At this point, the structures resident on CF37 need to be moved to CF39. All
of the structures currently allocated in CF37 support the rebuild function.
Rebuilding a structure allows the application to remain active during the
reconfiguration.
Rebuilding a structure is the preferred way to move a structure from a
Coupling Facility. If you are unsure of whether a structure supports rebuild,
you can attempt a rebuild of the structure. If rebuild is not supported, the
system will indicate that the rebuild operation cannot occur, and you must use
an alternative procedure for moving the structure.
The following commands will remove the structures from CF37:
SETXCF START,REBUILD,CFNAME=CF37,LOCATION=OTHER
Note: The XCF signalling structures must be rebuilt one at a time. After all
structures have been removed from CF37, it can be removed from the
configuration for maintenance.
Configure all CHPIDs offline to the Coupling Facility that you are removing.
6. Power off the Coupling Facility. In our case, we will simple deactivate the CF
LPAR. When the Coupling Facility is deactivated, any remaining structure
data is lost.
Expected result
We expect to see the rebuild messages for the structures allocated in CF37. After
the moves are completed, there should be no structures left in CF37. We expect
to see no variation, from a workload point of view. Within seconds, the structures
should be successfully moved into the new Coupling Facility, and in the case of
log stream data, CICS logging should resume automatically.
Actual behavior
The maintenance procedure for a Coupling Facility has been tested within IBM
and documented in multiple sources, for example, in Setting up a Sysplex. We do
not consider this scenario to be specific to the BASE24-es application and it is
implicitly covered in 9.5.3, “Coupling Facility (CF) failure” on page 211.
In this environment, continuous availability from end to end is not just a server
requirement, but a requirement for all components in the IT complex. The
weakest link determines availability as perceived by customers.
Maintaining availability for ATM and POS networks has been characterized by
predictable workloads for many years. Application changes have been few and
the business model has been stable. A stable environment also requires
comparatively few personnel to support it and only limited changes in their skill
sets.
In-house developed solutions account for more than half the systems in the total
payments environment. These legacies, often old designed applications, consist
of low-level programming languages which require a specialist staff to maintain.
The design often consists of inflexible software that cannot be easily modified to
keep pace with market changes.
The BASE24-es program code stream is by and large generic for all platforms, so
that the differences in the implementations on these platforms can be attributed
largely to the Quality of Service (QoS) of the platforms themselves and their
underlying basic software infrastructure.
10.2.1 Conclusions
Our BASE24-es installation on z/OS and the tests performed in the scenarios
clearly demonstrate the ability to achieve the highest level of availability.
Minimum configuration
In this section we list a minimum setup for high availability of BASE24-es on
z/OS:
Two LPARs distributed in a Parallel Sysplex cluster, one LPAR per CPC. Be
sure your configuration has at least two Coupling Facilities.
Coupling Facility (CF). A special LPAR that Central Processor (CP). The part of the computer
provides high-speed caching, list processing, and that contains the sequencing and processing
locking functions in Parallel Sysplex. facilities for instruction execution, initial program
load, and other machine operations.
CFRM policy. The allocation rules for a Coupling
Facility structure that are declared by a z/OS
administrator.
Static Destination Map File (SDMF). A Synchronous request. A request that is issued
configuration file used by the BASE24-es System followed by an immediate wait for response.
Interface Services Message Delivery Service to Generally acceptable only for lower-latency,
define attributes of various endpoints. highly-reliable servers.
Glossary 225
Transient Data Queue (TDQ). A common
abbreviation for the CICS management module and
its associated functions for intra-partition and
extra-partition data queues.
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
IBM Redbooks
For information about ordering these publications, see “How to get IBM
Redbooks” on page 227. Note that some of the documents referenced here may
be available in softcopy only.
IBM System z9 and zSeries Connectivity Handbook, SG24-5444
GDPS Family - An Introduction to Concepts and Capabilities, SG24-6374
Other publications
These publications are also relevant as further information sources:
DTR BASE24-es IBM CICSplex Installation Guide
Setting up a Sysplex, SA22-7625
Online resources
These Web sites and URLs are also relevant as further information sources:
z/OS UNIX ported tools
http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxalty1.html
A B
Banknet Interface 71
ACI application 17
BASE24-es
ACI Payments Framework 21
architecture 17
acquirer 53, 69–70, 77, 81, 85, 87–88, 139, 162,
flexible infrastructure 16
164–165, 195
introduction 14
acquirer route profile 69–70, 86
reporting 17
active AOR 165
software components 18
activity keypoint 176, 179
VTAM support 35
Adaptors 18
BASE24-es application 21, 45, 120, 124, 126–127,
ALLOWAUTOALT 177, 179, 181–182
150
AOR
alternate routing capabilities 202
architecture 56
maximum availability 124
AOR column 173–174
BASE24-es customer 119–120
AOR failure 106
BASE24-es Dynamic
AOR Message
Destination entry 141
Delivery program 56
Destination Map 139–140
Delivery transaction XDYR 54
Transaction 141, 146
AOR Scope 142, 171, 173
BASE24-es File
AORs 104–105, 107, 116, 118–119, 121, 124, 128,
Partitioning 150, 154
161, 165–166, 175, 193–194, 202, 210, 222
BASE24-es file 129, 146–151, 154
API 48, 142
BASE24-es Integrated Server 43, 47, 56–57,
API flag 142, 171, 173
59–62, 92, 94, 140–141, 185, 187, 194, 198
Appl name 171–173
multiple concurrent instances 43
application code 7, 19
process 140
same set 19
special instance 92
Application Key 141, 169, 171–173
task 222
Application Owning Regions (AOR) 104–107, 160,
transaction 140
162–163, 165–166, 168, 170–171, 173, 176, 184,
BASE24-es internal header 54, 59, 68
190, 192, 194, 198, 203, 222
only messages 60
associated TDQ
BASE24-es Metadata configuration
XMLS server 92
CONFCSV 151, 155
ATM processing 4
BASE24-es metadata configuration
authentication 15
file 185
authorization 15
file CONFCSV 147–148
level 15
BASE24-es product 13–14, 128
Authorization Script
BASE24-es solution 17, 101
OLTP Table 71
BASE24-es system 19, 55, 68, 101, 114, 189, 191,
Table 71
194
Automated Teller Machine (ATM) 1–6, 10, 89, 101
CICS transaction 55, 68
Automatic Restart Manager (ARM) 32, 103, 106,
financial integrity 194
128, 162–164, 166–167, 209
routable endpoint 55, 68
autonomic 9
BASE24-es Task
available AOR 48, 54, 162–163
Monitor 52, 54, 56–57, 106, 190, 192, 194
Index 231
GDPS/PPRC 33 J
GDPS/XRC 33 Journal file 79, 95, 147, 152–153
Geographically Dispersed Parallel Sysplex (GDPS) storage capacity 79
33–34 journal profile 78, 80, 153
H L
handshake trn 171, 173 Language Environment (LE) 126
Hardware Management logical file 147, 150, 152
Console 113 Long-running task 44, 56–58
Hardware Security Module (HSM) 61–63 LPAR 27, 105, 108, 113, 115–116, 118–119, 124,
hashing algorithm 147, 153 128, 221
Hierarchical File System (HFS) 129 LPARs 27
HiperSockets 33
host station 134, 141
host-acquired transaction 87, 89 M
mainframe
history 24
I member bank 7
I/O capacity 95, 97 Message Deliver Program 57
I/O device 37 message queuing 42
IGWLOCK00 127, 174, 176–177, 179, 181–182, message queuing (MQ) 43–44, 55, 59–60, 67–68
211, 213–214, 216–217 message type
IMT Interface 76 and/or transaction code 79–80
industry 2 combination 78
instrument type 72, 77, 81 MRO 44
Integrated Server 57 Multiple Image Facility (MIF) 109
Integrated Server (IS) 44, 55, 99–101, 103–113, multiple LPARs 109, 115, 118–120
115–120, 138–140, 146, 186–187, 204 Multiple Region Operations (MRO) 47–49, 94
Integrated Servers 45 multiple TCP/IP
Intelligent Resource Director (IRD) 31 Client 94
IP Client Server 94
task 189, 191, 193
IP client 51–53, 61–62, 87, 89, 94, 171
failure 186, 189–191, 193 N
name suffix 150–151, 154–155
process 163–164, 175
Network Control Program (NCP) 35
queue 191
Non-recoverable TDQs 45
recovery 191, 193
NOTRECOVABL 168
IP handler 44, 46, 52, 54, 66–67, 94, 105, 162–163,
167, 195
symbolic name 52, 54 O
IP process 167, 191, 193 offload processing 25
IP Server OLTP table 70–73, 76–78, 80, 82, 86
failure 160, 163, 186, 191 Online Transaction Processing
failure scenario 192 only table 70, 73, 76–78, 80–83, 86
process 54, 164, 167 Online Transaction Processing (OLTP) 44, 52,
task 191 69–73, 75, 77–79, 81–82, 84, 86, 147, 160, 184
IP tunnel 29 On-us transaction 79, 87, 89
issuer route profile 69–70, 78, 83 openness 36
Operating System 18, 24–25, 108, 129
Index 233
Synchronous Destination Map File (SYDMF) 134, 116, 118–119
202 Three-tier deployment 48
SYSPLEX COMMPLEX 211 Tivoli Enterprise Console 59
Sysplex Distributor 27, 31, 101, 124, 163 Tivoli Workload Scheduler (TWS) 30, 36
System Authorization Facility (SAF) 28 TOR
System Automation 30 architecture 50
System Initialization Table (SIT) 130 TOR architecture 50
System Integration Services (SIS) 129 TOR maintenance 160
System Interface Service (SIS) 51, 138, 200–201, TOR region 161, 163
204 TOR Scope 142, 171
System Interface Services 42 TORs 47–48, 50, 94, 101–104, 106, 116, 118–119,
System Management Facility (SMF) 28 121, 163, 165, 168, 194, 210, 215
System z ix, 7, 9, 23–24, 27–28, 30–35, 37, 39, 41, Total Cost
44 of Acquisition 37
BASE24-es physical architecture 44 of Ownership 37–38
hardware 32, 39 total cost 36
hardware platform 7 Total Cost of Acquisition (TCA) 37
machine 32 Total Cost of Ownership (TCO) 37
platform 37 transaction
processor 126 acquisition 7
server 7 authentication 7
value 7 categories 3
transaction code 69–70, 76, 79, 84
text description 85
T transaction manager 43
Table name 151–153, 155–156
transaction processing 15
target region 47, 141, 144
transactions
CICS APPLIDs 142
routing 14
TCP port 53
transactions per second (TPS) 97
TCP/IP Client 61
two-tier deployment 47
connection 103
TCP/IP client 51–52, 101, 139–140, 160–161
and/or server 139 U
process 139 Unified Modeling Language (UML) 21
TCP/IP communications handlers 45 Unplanned event 100
TCP/IP communications stack 43 Usage file 95, 147–148, 154–156
TCP/IP network 29 user interface 17, 19–20, 90, 92, 124, 126, 129,
TCP/IP protocol 34 131–132, 134–135
TCP/IP server 52–53, 140, 160, 163, 167
symbolic name 53
type 139
V
virtual IP 101, 103
TCP/IP service 91, 133
Virtual Private Networks (VPN) 29
TCP/IP stack 29
Virtual Telecommunications Access Method
TDQ 46, 48, 55, 57–60, 62, 66–67, 92–93, 106
(VTAM) 43
non recoverable 45
virtualization 30
Temporary Storage Queue (TSQ) 61–63
VISA DPS 124, 129
Terminal Owning Region 50
Visa PVV 87
Terminal Owning Region (TOR) 101, 124, 128, 222
VSAM data 107–108
that originated the request (TOR) 101, 103–106,
VSAM file 107, 120, 124, 127, 146–148, 151, 154,
W
WebSphere MQ 43, 55, 59–60, 67–68
workload balancing 11
workload driver 146, 148, 162, 164, 166–167, 171,
175, 184–187, 190, 192–194, 203–204, 210
transactions flow 186, 204
Workload Management 48
Workload Manager (WLM) 31
X
XCF GRPNAME 183
XDYR transaction 66
XMLS TDQ 93
Z
z/OS operating system
IBM System z hardware 44
z/OS strengths 24
z/OS value 23, 38
zAAP 38
zIIP 38
zSeries Application Assist Processor (ZAAP) 10
Index 235
236 A Guide to Using ACI Worldwide’s BASE24-es on z/OS
A Guide to Using ACI Worldwide’s BASE24-es on z/OS
(0.2”spine)
0.17”<->0.473”
90<->249 pages
Back cover ®
A Guide to Using
ACI Worldwide’s
BASE24-es on z/OS
Set up and use In this IBM Redbook we explain how to use the ACI
BASE24-es on z/OS BASE24-es product on z/OS. BASE24-es is a payment engine INTERNATIONAL
utilized by the financial payments industry. Failure in a TECHNICAL
High availability financial payments environment is a high-visibility customer SUPPORT
scenarios service issue, and outages at any level have debilitating ORGANIZATION
effects on customer loyalty. The entire payments cycle must
be conducted in near real-time. In such an environment, high
Configure for highest
availability is a mission-critical requirement. We demonstrate
availability how you can achieve a high availability configuration for BUILDING TECHNICAL
BASE24-es on z/OS. We begin by outlining the requirements INFORMATION BASED ON
PRACTICAL EXPERIENCE
of a payments system, and then introduce the structure and
functionality offered by the BASE24-es product. We describe
the strengths and abilities of System z and z/OS, and explain IBM Redbooks are developed by
the technical and physical architecture of BASE24-es on the IBM International Technical
z/OS. We guide you in designing a system layout and in Support Organization. Experts
installing, tailoring, and configuring your workload on z/OS. from IBM, Customers and
Partners from around the world
Finally, we detail the numerous failure scenarios that we create timely technical
tested in order to verify the robustness of the solution. These information based on realistic
scenarios were carefully selected in areas such as data scenarios. Specific
environment, CICS/CICSplex, the BASE24-es application, and recommendations are provided
hardware. Communication was handled by ATM and POS to help you implement IT
solutions more effectively in
device simulators and a Visa network simulator. Note that the your environment.
information in this redbook is specific to BASE24-es release
06.2 and is subject to change in subsequent releases of the
product.
For more information:
ibm.com/redbooks