0% found this document useful (0 votes)
34 views21 pages

High Avail Netweaver

This document discusses strategies for ensuring a successful, cost-effective high availability implementation of SAP NetWeaver systems. It covers high availability basics, technical details of an HA setup with SAP NetWeaver including architectural single points of failure and how to protect them. The document provides recommendations to help readers formulate their own high availability strategy.

Uploaded by

I
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views21 pages

High Avail Netweaver

This document discusses strategies for ensuring a successful, cost-effective high availability implementation of SAP NetWeaver systems. It covers high availability basics, technical details of an HA setup with SAP NetWeaver including architectural single points of failure and how to protect them. The document provides recommendations to help readers formulate their own high availability strategy.

Uploaded by

I
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Put your integrated WebSphere environments into production fast

Architecting a high availability


SAP NetWeaver infrastructure
Strategies for ensuring a successful, cost-effective
implementation
by Matt Kangas

This article originally appeared in the March/April 2007 issue of SAP Professional Journal and appears
here with the permission of the publisher, Wellesley Information Services. For information about SAP
Professional Journal and other WIS publications, visit www.WISpubs.com.

In today’s business world, automation and computing have become key


differentiators that can increase process efficiency and productivity. This is
especially the case with the rise of the Internet, wireless networking, radio
frequency identification (RFID), and Web services. With increased automa-
tion levels comes the increased need for reliable availability of the systems
that support these business processes, like SAP NetWeaver.
Providing high availability (HA) for an enterprise service-oriented
architecture (enterprise SOA) like SAP NetWeaver is a challenge, however
Matt Kangas — typical setups include multiple integrated SAP systems that are required
Product Manager, around the clock for continuous, one-step business scenarios. This article
SAP NetWeaver, will help you assess the SAP NetWeaver architecture, its configuration,
SAP Labs, LLC the procedures involved, and the implication of these elements on systems
availability; it offers recommendations for formulating an HA strategy for
Matt Kangas joined SAP Labs in your own SAP NetWeaver-based systems.
2003, where he has worked as an In the first part of the article, I present a summary of HA basics to help
SAP NetWeaver Product Manager
project planners and managers understand the strategic importance and role
specializing in systems topics. His
of HA within the SAP system landscape. In the second part, I discuss the
areas of coverage include software
technical details of an HA setup with SAP NetWeaver, including topics
lifecycle management in both
such as architectural single points of failure (SPOFs) and ways to isolate
ABAP and Java, IT landscape and
architecture, installations and
and protect such failure points. Armed with the understanding of HA
upgrades, system management,
provided by this article, both system architects and administrators will
high availability, platforms, and be able to implement their own HA setups with confidence.
the Internet Transaction Server.
Prior to working for SAP Labs, he
was a systems consultant for SAP
Note!
America for more than five years.
Matt has a B.A. in economics from This article applies to SAP NetWeaver ’04 and SAP NetWeaver
the University of California at 2004s, and applications based upon these releases, such as the
Berkeley. You may reach him at mySAP Business Suite.
[email protected].

No portion of this publication may be reproduced without written consent. 31


SAP Professional Journal • March/April 2007

Availability description Availability in %* Downtime per year


Conventional 99.0 3.7 days
Highly reliable 99.9 8.8 hours
Highly available 99.99 52.6 minutes
Fault resilient 99.999 5.3 minutes
Fault tolerant 99.9999 32 seconds
Disaster tolerant 99.99999 3 seconds
* Percentage of system uptime during a given time period

Figure 1 System availability measurements (source: Harvard Research Group)

The fundamentals of HA customers and users,1 system availability is usually


measured in percentage of uptime per year, as shown
In the next sections, I briefly take you through funda- in Figure 1. In general, these SLAs define the term
mental HA concepts, including what constitutes HA “high availability” as a system availability rate of at
and the causes of system downtime, the tradeoffs least 99.99% (52.6 minutes of downtime per year).
involved in increasing availability, and who is respon-
sible for creating and maintaining an HA system. A
good understanding of the considerations involved
in each of these areas will provide you with a solid
foundation for devising your own HA strategy. Note!
Figure 1 also helps to illustrate the difference
between perception and reality, sometimes
What is HA? called “the myth of the nines.” Of course “Fault
HA addresses the critical business need to avoid tolerant” and “Disaster tolerant” availability
unplanned downtime. An unplanned downtime takes would be better than “Fault resilient”; so would
place if an application crashes unexpectedly — i.e., if 100% availability. But the reality is that not
a “failure” occurs. In many cases, the failure is caused only are those availability levels not currently
by a hardware fault. Other frequent reasons are a crash achievable — or achievable only at an extreme-
of the host operating system or human error, for ly unreasonable cost — what could you even
example. There are also situations that require a restart accomplish in any planned downtime at those
or temporary downtime of the application itself, such availability levels? Can you perform mainte-
as updates, security fixes, patches, or tests. These situ- nance on a system in 32 seconds of downtime
ations are referred to as planned downtime. Since per year, or 0.6 seconds of down-time per
planned downtime by definition is expected, most HA week? Such levels could be possible if a system
efforts are centered on avoiding unplanned downtime, were completely static, but the dynamic nature
which is more risky — for example, an unplanned of business requires system changes, which in
downtime of an SAP system during working hours turn requires some amount of downtime, restart,
would probably have more of an impact on business and so on.
operations than a planned downtime on Sunday.
However, reducing planned downtime should be a part
of any complete HA strategy.
1
For more on SLAs, see the SAP Professional Journal article, “Defining
In Service Level Agreements (SLAs) created by SAP Service Level Agreements: An IT Manager’s Survival Guide”
hosting providers and computing centers for their (November/December 2000).

32 www.SAPpro.com ©2007 SAP Professional Journal. All rights reserved.


Architecting a high availability SAP NetWeaver infrastructure

Note!
Hardware and operating
Application
system failures, disasters
Even though the numbers in Figure 1 may 20%
failures
40%
look pretty impressive, they are valid only
for unplanned downtime. The calculations
do not include any planned downtime estima-
tions for offline backups, upgrades, updates,
security fixes to applications and operating
systems, and so on. By their very nature, the Operator errors
40%
business anticipates these activities and can
take actions to mitigate their impact; HA
measurements, on the other hand, address
events that have an unknown and unplanned Figure 2 Causes of unplanned downtime
occurrence. As previously mentioned, in a (source: Gartner Group)
real-world scenario you will need to consider
planned downtime as well.
From a technical infrastructure point of view,
architectural and technical SPOFs must be identified
and secured in an appropriate manner to help ensure
system availability. It is often advisable to use addi-
tional hardware and fault-tolerant software so the
A “Fault resilient” availability of 99.999% (also system can continue operating, uninterrupted, if any
known as “the five nines”), is currently accepted in single component fails. For example, disk drives could
the industry as the best possible availability. be a hardware SPOF; disk mirroring reduces the like-
lihood of failure and can effectively eliminate disk
To achieve a scalable and flexible system land- drives as an SPOF.
scape, professional application servers are usually
installed in a software cluster, which means that there Figure 2 shows the major causes of unplanned
are various instances of the same service, available on downtime. Hardware failures, operating system fail-
different physical machines. These physical instances ures, environment failures, and disaster effects can be
are usually reached via a load balancer that dispatches avoided by eliminating the SPOFs within systems and
requests from the clients. In addition to dispatching implementing disaster recovery scenarios to minimize
the workload, this concept also provides a redundant the impact of possible disasters on the system infra-
infrastructure that enables resistance to most hardware structure.2 “Human errors” (e.g., errors caused by
and software faults. faulty configuration, bad change control, etc., as
opposed to errors caused by a failed hardware com-
Despite this redundancy, in many system land- ponent, earthquake, etc.) make up 80% of downtime
scapes there are still services — e.g., the load causes, though,3 and these cannot be solved through
balancer, central database, and so on — that are redundancy or techniques involving switching to alter-
unique within the cluster environment and vital for nate resources. Human errors need to be addressed
the cluster operation. If one of these services crashes,
the whole cluster might not work anymore. These
2
For more on disaster recovery, see the SAP Professional Journal articles,
“Is Your R/3 System Recovery Plan a Disaster? A Three-Step Approach
central, non-redundant services are referred to as for Designing Recovery and Availability Plans” (September/October
potential “SPOFs. Therefore, an SPOF is defined as 2001) and “The 15 Most Overlooked Items in Planning for High
“a hardware or software service that, if failing, will Availability and Disaster Recovery” (July/August 2002).
cause the entire system to fail, leading to unplanned 3
See www.gartner.com/webletter/ibmglobal/edition2/article5/
downtime.” article5.html.

No portion of this publication may be reproduced without written consent. 33


SAP Professional Journal • March/April 2007

through ease of system management and with The process of switching from a failed component or
improved change and problem management processes. subsystem to its redundant replacement is called
A good source of more information on useful system “failover.”5
management software is available at
When designing a highly available system,
http://service.sap.com/ha.4
however, you must consider the tradeoff between the
As mentioned earlier, unplanned downtime poses costs of increasing your system availability and the
more of a risk to your technical infrastructure than costs of system downtime (see Figure 3). As you can
planned downtime due to its unpredictability, but see on the left side of the figure, the costs of downtime
that doesn’t mean that you can ignore the effects of are not linear with respect to the duration of the down-
planned downtime. In the end, HA is measured from time: with longer downtimes, the increase in costs
the perspective of the end user. If a system is running, is closer to exponential. For example, when supply
but a user cannot access the system, then the system chain processes are stuck for longer than three hours,
cannot be considered available. In an Internet sales the entire production process may become stuck,
model, for example, the lack of availability can be which then produces even higher downtime costs.
critical, since the end user may go to a competitor to Conversely, as shown on the right side of the figure,
complete the transaction. For this reason, it is impor- the measures taken to increase system availability —
tant to also reduce planned downtimes for tasks such such as redundant components, a disaster recovery site,
as system and/or infrastructure maintenance, patching, first-class system management tools, a skilled IT staff,
upgrading, and so on through strategies that involve system capacity planning, and proactive services —
scalable components that enable rolling maintenance, cost progressively more money to implement.
efficient upgrade and patching processes, and proven
Because of this HA cost tradeoff, a business case
software lifecycle management engines such as the
must be made for the availability level of a system.
SAP Transport Management System (TMS) for ABAP
Businesses need to determine their realistic business
and the Change Management Service (CMS) for Java.
needs for availability. Measure the cost of system
I go into more detail on some of the features provided
downtime and then balance this availability level
by SAP for minimizing planned downtime later in the
with the cost required to provide it. For example,
article.
development, sandbox, training, and production
Understanding the end user perspective of avail- systems will all have different availability needs; a
ability will also help with setting and meeting appro- reporting system can have more system downtime
priate SLA requirements, as well as with making than an operational system. There may well be cases
informed decisions when weighing the costs of where 99% availability (outage of less than two hours
increasing availability, which I discuss next. per week) is good enough.

When do the costs of increasing Who is responsible for ensuring


availability outweigh the benefits? system availability?
The main ingredient for successfully increasing Creating highly available SAP NetWeaver systems is
availability in any kind of technical system is to a joint responsibility of SAP, the platform/solution
provide redundancy. If one component or subsystem partner, and the customer.
fails, at least one other component or subsystem must
SAP’s responsibility is to provide an HA-capable
be available to take over for the failed component.
integration and application platform (i.e., SAP
4
From the Media Library, navigate to Documentation → HA 5
In this article, I differentiate between “failover” (the process of switch-
Documentation, select the document “BC SAP High Availability NW ing to an alternate system in the case of an unplanned outage caused by
2004s,” and go to the section “System Automation Software for the some failing system element) and “switchover” (the process of switch-
SAP Environment” that begins on page 217. ing to an alternate system for a reason other than failure).

34 www.SAPpro.com ©2007 SAP Professional Journal. All rights reserved.


Architecting a high availability SAP NetWeaver infrastructure

Costs Availability cost

Costs
Costs

95% 98% 99.5% 99.9%


Downtime
Availability

Figure 3 Availability costs vs. system downtime costs (source: Gartner Group)

NetWeaver) and application scenarios that run on top can exist in your systems and the different methods
of SAP NetWeaver. SAP systems must be able to you can use to minimize downtime in your system
utilize the HA-capable computing infrastructure environment, both unplanned and planned, and to
you have in place.6 SAP also works with partners improve availability.
regarding platform-specific procedures, and in the
case of Microsoft Cluster Server (MSCS), provides
specific HA procedures and documentation.
HA-capable computing infrastructure elements
Keys to minimizing system
(hardware, operating system, database system, file downtime and improving
system, etc.) are provided by SAP platform/solution
partners, who must also provide their platform-
availability
specific HA procedures. This includes the HA Understanding the preventative options available to
configuration and switchover ability, and support you for protecting common failure points, avoiding
of the implementation at the customer site. unplanned downtime, and reducing planned downtime
will help set you up for success when assessing and
The customer must define the suitable and required
planning your own HA implementation. In the next
HA levels for their systems. It must also provide the
sections, I take you through these options and point
proper IT management concepts and guidelines, and
out common strategies, recommendations, and best
ensure the appropriate operating procedures and
practices for a successful HA implementation.
training of their IT staff.
Now that you understand the fundamental princi-
ples and considerations involved with HA systems, Protecting common failure points
let’s take a look at the possible points of failure that
A system landscape consists of different components
6
The computing infrastructure, cluster software, etc., are provided by and infrastructures, each of which must be checked
third-party partners. for potential SPOFs and protected with appropriate

No portion of this publication may be reproduced without written consent. 35


SAP Professional Journal • March/April 2007

Cluster types
In the technology world, the word “cluster” has different meanings. There are at least three kinds of clusters:
• Hardware: A hardware cluster provides HA by using redundant hardware for every piece of equip-
ment that may fail. At the same time, additional hardware options are needed to control running units
and to detect if such hardware has failed. Hardware clustering is the only way to protect (hardware)
against SPOFs.
• Software: In this scenario, software runs in a distributed environment, mainly to provide scalability
to the overall system. At the same time, this means of deployment implies a certain kind of HA,
since failing parts may be replaced through the distributed nature of the system. Unfortunately, this
technique can become quite complicated, due to system requirements for maintaining the integrity of
distributed data.
• Database: Database vendors are using the same principles to provide their products with HA. One way
is to run a database in a clustered environment. This means that the database is running on more than
one machine and in case of failure a switchover will occur. There are different technologies available
to achieve this, such as the Microsoft Cluster Server (MSCS) and Oracle RAC, for example.

measures (which may rely on partner technology; one computer fails another takes over its resources
more on this later when I discuss switchover solu- and provides them transparently to the outside
tions). Common points within the system landscape world (see the sidebar above for more on the
requiring protection and ways to protect them include different types of clusters)
the following:
• Database (vendor-specific) — cluster, replication,
• Network — redundant network components and shadow database, multi-runtime, etc.
topology; redundant provider links; protected
network services (DNS, Mail, LDAP, etc.)
Avoiding unplanned downtime
• Storage — RAID technologies (protecting disk
availability through redundancy); SAN storage As stated previously, SAP shoulders the responsibility
networks with some HA features (split mirrors, for leveraging a computing infrastructure that is HA-
synchronous write) capable to protect against unplanned downtime. SAP
NetWeaver achieves this using four strategies:
• Server hardware — redundant components (power
supplies, buses, coolers, boards); hot-pluggable • You can install SAP NetWeaver into an HA
components; ECC memory; server domains; environment that includes switchover software7 or
manageability (remote management, automatic clustered resources.
restart, etc.) 7
Software that automatically starts a failed service on a different physical
host machine. Switchover software reduces unplanned downtime by
• Server operating system (hostnames, IP addresses, enabling rapid resumption of a failed service on a substitute host. SAP
file services, applications, name servers, Windows system services that can be susceptible to failure because they cannot be
configured on multiple hosts (such as the DBMS or enqueue and mes-
domain controllers) — cluster (several computers
sage services) can benefit from the extra resilience of switchover soft-
coupled to work together as one computer, with ware. The substitute host must be sufficiently powerful to support the
shared resources varying from none to all); when additional workload following switchover.

36 www.SAPpro.com ©2007 SAP Professional Journal. All rights reserved.


Architecting a high availability SAP NetWeaver infrastructure

• You can operate SAP NetWeaver in an HA envi- without changing the system’s runtime behavior,
ronment — that is, all SPOFs can be protected thereby reducing the subsequent downtime for the
(either by SAP functionality and/or partner remaining import. Especially in connection with
features) by redundant components extensive or several support packages, this reduces
and/or switchover techniques. the system downtime significantly (see SAP Note
361735).
• Software maintenance tools can operate in an
HA environment such as a switchover or cluster. • Release upgrades — Reduce downtime for release
upgrades through a “system switch” upgrade,
• You can operate SAP scenarios in redundant where many of the upgrade activities (such as
disaster recovery sites. imports and activation) are performed in a shadow
system in parallel to the running production
system, reducing downtime of the production
Reducing planned downtime system.9 To further reduce downtime, SAP also
SAP also provides features that help minimize offers the Customer-Based Upgrade (CBU), where
planned downtime, including: customer-specific post-upgrade activities, such as
modification adjustments, add-on installations, etc.,
• Kernel upgrades — Enable one-by-one “rolling are performed on a copy of the production system.
kernel upgrades” between (SAP-defined) compat- A custom upgrade package that includes these
ible kernels on application servers to eliminate changes is then created on this system and used
full system shutdown8 for the production system upgrade. The overall
user downtime for the upgrade of the production
• Profile parameter changes — Make changes to system is thereby reduced, since these activities
SAP configuration profile parameters online no longer need to be performed post-upgrade.
without restart (for a list of parameters that can
be changed online, see SAP Note 102428) • End of daylight savings time — Stretch time in the
“double hour” (kernel release 6.40 and higher, see
• Operation mode changes — Dynamically change SAP Note 7417)
the work process type to adjust to different work
profiles (day vs. night, dialog vs. batch) • Database reorganization — Perform table/index
defragmentation, etc., to improve performance10
• Object imports or transports — For standard SAP
software maintenance activities (e.g., transports), • Offline backup — Enable offline backup, with
provide predictability and planning (for system or without split-mirror technology. Performing a
outage) through optimization of software mainte- backup on a split mirror has the advantage of less
nance tools, such as ABAP Transport Management downtime because the system may be started up on
System (TMS) or Java Change Management the single disk array while the offline backup is
Service (CMS) performed on the mirror, but the disadvantage is
that the system is not mirrored during the backup
• Applying support packages — Perform a shadow (unless it is a triple mirror, that is!).
import of support packages; that is, perform a
parallel import of inactive new repository objects • Online backup — All SAP products allow consis-
into the database, then activate them for the tent online backup
runtime environment through a “switch” proce-
dure. The advantage to this technique is that you 9
For more on the system switch upgrade method, see the article, “A
Basis Administrator’s Step-by-Step Guide to Preparing for an Upgrade
can import a large portion of the objects of a to SAP R/3 Enterprise” (SAP Professional Journal, July/August 2004).
support package (reports) into the running system
10
For more on database reorganizations, see the article “Boost SAP R/3
Performance by Reorganizing Your Oracle Database: A Proven
8
This capability is currently in testing and has not yet been released. Reorganization Strategy” (SAP Professional Journal, July/August 2005).

No portion of this publication may be reproduced without written consent. 37


SAP Professional Journal • March/April 2007

With the keys to minimizing system downtime in 1. Understand the architecture


hand, we’re ready to take a test drive through imple-
menting an HA landscape in preparation for devising The first, and most important, step in implementing
the HA strategy best suited to your own environment. an HA landscape is to understand the SAP NetWeaver
system architecture, so you can choose the best setup
to meet your particular business needs.

There are three kinds of installation options


Implementing an HA landscape you can choose among for SAP NetWeaver systems:
To implement an HA landscape with SAP NetWeaver,
it is best to take a stepwise approach to ensure that • SAP NetWeaver AS ABAP — SAP NetWeaver
you choose the most appropriate HA setup and Application Server11 with only the ABAP run-
achieve an optimal balance between cost and avail- time installed
ability. The sidebar on the next page provides an
overview of recommended design patterns to keep in
mind during the HA implementation process. Over the
remainder of this article, I walk you through the Note!
following five HA implementation steps:
SAP NetWeaver AS ABAP systems are based
1. Understand the architecture. on proven technology from the ABAP-based
SAP R/3 world with which most customers
2. Find the SPOFs that you need to isolate. are familiar.

3. Understand ways to isolate the SPOFs.

4. Choose a setup that secures the SPOFs.


• SAP NetWeaver AS Java — SAP NetWeaver
5. Implement the system landscape. Application Server with only the Java runtime
installed

• SAP NetWeaver AS ABAP+Java — SAP


NetWeaver Application Server with both the
Note!
ABAP and Java runtimes installed
Before implementing an HA strategy at the
software level with SAP NetWeaver, you must The SAP NetWeaver usage type drives the in-
have a HA hardware infrastructure already in stallation type. For example, SAP NetWeaver Portal
place. In my experience, I have found that some requires Java, so an SAP NetWeaver AS Java or SAP
customers think that setting up their SAP soft- NetWeaver AS ABAP+Java system must be installed.
ware in a highly available manner is sufficient Process Integration (PI) with SAP Exchange Infra-
— this is not the case! You must consider your structure (SAP NetWeaver XI) requires an SAP
whole environment, including switches, routers, NetWeaver AS ABAP+Java system, and so on.
name servers or Windows domain controllers,
and so on. In reality, if your hardware fails, it HA setups for SAP NetWeaver AS ABAP are well-
doesn’t matter how well you set up your soft- known and established from the SAP R/3 world. The
ware, so be sure to collect a list of potential latter two installation options are new to HA, so let’s
infrastructure hotspots that you should protect take a closer look at these.
at all costs and address them as needed.
11
Known as SAP Web Application Server (SAP Web AS) prior to SAP
NetWeaver 2004s.

38 www.SAPpro.com ©2007 SAP Professional Journal. All rights reserved.


Architecting a high availability SAP NetWeaver infrastructure

Recommended HA design patterns


The following diagram shows design patterns that are common to all successful HA solutions — both
hardware- and software-based:

Virtualization & Redundancy Load Balancing & Synchronization


• Be able to add, change, or remove redundant units on • Use load balancing over redundant units to overcome
the fly, e.g., enable rolling (downtime-free) upgrade of unit failure.
redundant units: • In case of singleton units, be able to have standby
- Services or instances (rolling kernel or core switch) units that can take over in case of failure.
- Enterprise services • If required, be able to perform state transitions or state
- Whole systems synchronizations to redundant (standby) units to pre-
serve context or session data.
• Use virtual naming to address redundant units and
be able to switch between them without affecting the
end user.

Isolation Robustness
• Try to avoid single points of failure (SPOFs) wherever • Enable redundant units to detect and survive SPOF
possible. outages, e.g., implement reconnect capabilities.
• Minimize dependencies from SPOFs. • Raise critical situations and react proactively, e.g.,
• Enable fast restart of SPOFs. raise alert or restart if JVM is running out of heap.
• Isolated SPOFs from remaining (redundant) infrastruc- • Minimize impact of redundant unit outages.
ture to be able to secure them independently, e.g., by • Enable reliable notification mechanisms (e.g., token-
third-party disaster protection solutions. based activities) to activate standby unit in case of
active unit failure.

Each of the components of an SAP NetWeaver system can be classified according to these four design
patterns. Some examples include:
• Virtualization & Redundancy: Application servers are redundant units, because more than one appli-
cation server (dialog instance) can be installed on separate machines. This type of configuration
eliminates application servers as SPOFs, and at least some users will be able to survive a crash of
a dialog instance.
• Load Balancing & Synchronization: The system’s message server can distribute and load balance
user requests to the active (redundant) application servers.
• Robustness: ABAP-based systems have a “db reconnect” feature, so that if a work process loses its
connection to the database (for example, through a crash or network traffic problem), it will attempt to
reconnect to the database without aborting. This way, the user session is not lost and the transaction
does not have to be rolled back if the work process can reconnect during the (configured) reconnect
time period.
• Isolation: Since the database and the central services (the enqueue server and the message server) of
the system architecturally appear only once within the landscape, they are SPOFs and must be isolated.

No portion of this publication may be reproduced without written consent. 39


SAP Professional Journal • March/April 2007

SAP NetWeaver AS Java and provides state information to the SAP Web
The SAP NetWeaver AS Java architecture contains Dispatcher, which processes and routes Web requests
many features that customers may not be familiar with (via HTTP/HTTPS) from external clients/applications
(see Figure 4). The proven approach from the ABAP to application servers in the SAP system, and balances
stack has been transferred to the Java stack in SAP the load between application server instances.
NetWeaver ’04. The Java Central Instance (Java CI) The database (DB) instance contains a single Java
contains the Java dispatcher, which, similar to the schema.
ABAP dispatcher, receives client requests and
forwards them to the appropriate server processes.
It is the Java server process that actually processes SAP NetWeaver AS ABAP+Java
the requests and holds the session data. Also included This installation option combines both the ABAP and
in the Java CI is Internet Graphics Service (IGS) for Java stacks in a single SAP instance — the Add-In
rendering graphics and Software Deployment Central Instance, as shown in Figure 5. The Add-In
Manager (SDM) for managing an SAP Java develop- Central Instance consists of the ABAP Central
ment landscape. Instance (ABAP CI) containing the ABAP dispatcher,
The SAP Central Services (SCS) Instance contains work processes, gateway, enqueue server, and
the Java enqueue (ENQ) server and message (MSG) message server; the Java Central Instance (Java CI)
server, which are also similar to their ABAP counter- containing the Java dispatcher, server processes,
parts. The Java enqueue server manages the logical and SDM; and the IGS. In this setup, an Internet
locks in the system and ensures server synchroniza- Communication Manager (ICM) service manages
tion. The Java message server is the central service for communications between the application server and
internal cluster communication (such as event notifi- external clients/applications — it receives requests
cations, broadcasts, or an exchange of cache content) from the “outside world” (HTTP, SMTP, etc.) and
forwards them to the appropriate stack for processing.
As in the SAP NetWeaver AS Java scenario, the
SCS Instance contains the Java enqueue (ENQ) server
and message (MSG) server. So, in an ABAP+Java
installation, there is an enqueue and message server
for each stack.
The database (DB) instance contains two separate
schemas — one for the ABAP stack and one for the
Java stack.
Once you understand the technical features of the
SAP NetWeaver architecture, you next need to iden-
tify the failure points that could compromise the
availability of your system.

2. Find the SPOFs that you need


to isolate
While each SAP NetWeaver installation type has its
own specific SPOFs, there are three main points that,
if they fail, cause the entire system to fail, leading to
Figure 4 SAP NetWeaver AS Java architecture unplanned downtime (see Figure 6 on page 42):

40 www.SAPpro.com ©2007 SAP Professional Journal. All rights reserved.


Architecting a high availability SAP NetWeaver infrastructure

SAP NetWeaver AS ABAP+Java


Add-In Central
Add-In Instance
Central Instance SCS
Instance
ICM

ENQ Server
ABAP Java
(Java)
Dispatcher Dispatcher

Work Server MSG Server


Process Process (Java)

Gateway SDM

ENQ Server
(ABAP)

MSG Server
DB Instance
(ABAP)
Costs
ABAP CI Java CI
Java CI ABAP Schema
Costs

IGS Java Schema

© SAP AG

Figure 5 SAP NetWeaver AS ABAP+Java architecture

• In all three installation types (SAP NetWeaver the enqueue server since the consequences of its
AS ABAP, AS Java, and AS ABAP+Java), there failure can be severe.
are the central services that, architecturally, appear
only once in a system. These are the enqueue and • The central database instance is an SPOF that must
message servers that for Java reside in the SCS be secured. In step 3, I show you a useful way to
Instance, and for ABAP reside in the ABAP CI. secure the central database instance.
In an ABAP+Java installation there is an enqueue • Load balancers (such as SAP Web Dispatcher)
server and a message server for the ABAP stack and other Web infrastructure components (such as
that resides in the Add-In Central Instance as well reverse proxies) are SPOFs from an end-user point
as an enqueue server and a message server for the of view.
Java stack that resides in the SCS Instance; these
must be secured. (Note that the SDM appears only - For the SAP Web Dispatcher, two may be run
once in the architecture, but it is not considered in a redundant setup, the second able to take
to be an SPOF because it is not a runtime-critical over if the first fails. For additional informa-
component.) I demonstrate some possibilities for tion, go to the SAP NetWeaver 2004s online
isolating the central services to protect your system help at http://help.sap.com, search for “Web
from downtime in step 3, with a particular focus on Dispatcher,” and navigate to Architecture and

No portion of this publication may be reproduced without written consent. 41


SAP Professional Journal • March/April 2007

Figure 6 SAP NetWeaver AS architectural SPOFs

Functions of the SAP Web Dispatcher → High


Availability of the SAP Web Dispatcher.
Note!
- Reverse proxies are a partner technology that In addition to the three identified architectural
you can secure as described in the server hard- SPOFs, the central file share (/sapmnt/...) also
ware and operating system bullet items in the represents an SPOF from a technical (installa-
section “Protecting common failure points” tion) point of view and needs to be secured at
earlier in the article. Contact the individual the file system level (for some recommenda-
vendor for additional information. tions, see the bullet item on storage in the
section “Protecting common failure points”
Because configuring SAP Web Dispatcher is straight- earlier in the article).
forward, and reverse proxies are vendor specific, I do
not go into more detail on these in step 3.

42 www.SAPpro.com ©2007 SAP Professional Journal. All rights reserved.


Architecting a high availability SAP NetWeaver infrastructure

Dispatcher Dispatcher

WP1 WP2 WP3 WP1 WP2 WP3

Enqueue Client Enqueue Client

Enqueue Replication
Server Server

Lock Table Replication


Table

Enqueue HA Software Enqueue


Host A Host B

High Availability Enqueue Server

© SAP AG

Figure 7 Using the standalone enqueue server with replication

3. Identify ways to isolate the SPOFs For the Java stack, the consequences are much
more severe. The enqueue table contains locks on data
There are different possibilities for securing the objects and infrastructure locks. In addition, these
central services and central database instance SPOFs locks are system bound, not session bound. When the
identified in step 2 within the SAP NetWeaver archi- enqueue service is lost in the Java stack, the enqueue
tecture. In the next sections, I first show you how to table is lost, causing a session rollback, just as in
use a standalone enqueue server with replication to ABAP. But, because there are also infrastructure locks
avoid session rollbacks, and then I show you how to in the lock table, a restart of all J2EE instances is
use a switchover solution to provide redundancy for necessary (and enforced as of SAP NetWeaver ’04
the central services and the central database instance. SPS15)!
To achieve state preservation and thus provide an
Use a standalone enqueue server with HA strategy for the enqueue service, SAP provides a
replication to avoid session rollbacks standalone enqueue server with enqueue replication.
The enqueue server is critical to any SAP system Enqueue replication prevents session rollback due to
because it manages all of the SAP locks. Although enqueue server failure and, for the Java stack, must be
similar in concept in both ABAP and Java, the failure implemented to ensure an HA strategy.
of the enqueue has different consequences within The HA enqueue server consists of the standalone
each stack. enqueue server and a replicated enqueue server (see
For the ABAP stack, the enqueue table contains Figure 7). The replicated enqueue server runs on
session-bound locks on data objects. When the en- another host and contains a replica of the lock table
queue service is lost, the enqueue table also is lost, (replication table). The standalone enqueue server is
which causes an automatic rollback of any concern- no longer integrated into an SAP application server
ed sessions. However, only these sessions (work (central instance) and is instead provided by SAP as
processes) are rolled back; you do not need to restart an independent program. The main advantage of using
the software cluster. the standalone enqueue server is that you can replicate

No portion of this publication may be reproduced without written consent. 43


SAP Professional Journal • March/April 2007

Typical resources
• A physical disk
• An IP address
• A network name
• An SAP resource
• A file share
Resource groups:
• A group for Oracle
• A group for the SCS

The switchover environment


consists of two physical servers

Figure 8 Typical example of a switchover environment (MSCS)

the lock table on another host, which means that if the allowing SAP system operation to continue. Outside
standalone enqueue server fails, you can very quickly of the switchover solution, it is just a basic setup of
start a new standalone enqueue server (on the alterna- SAP NetWeaver ’04. It doesn’t matter how the instal-
tive host) that will continue working seamlessly with lation is distributed and how many dialog instances
the current status of the lock table. All clients and the are installed! In each HA setup, third-party software
replication server are connected to the standalone is involved to execute the switchover.
enqueue server.
HA solutions are heavily platform-dependent and
In the case of SAP NetWeaver AS ABAP, each rely on third-party switchover solutions such as:
work process has a separate connection to the stand-
• Microsoft Cluster Server (MSCS)
alone enqueue server. With SAP NetWeaver AS Java,
each server process in the J2EE cluster has a separate • HP Service Guard
connection to the standalone enqueue server.
• SUN Cluster
In step 4, when I take you through possible HA
setups, I show an example configuration that incorpo- • Veritas Cluster Server
rates a replicated enqueue server. • Oracle Failsafe, Oracle RAC
• IBM HACMP
Using a “switchover solution” to provide redundancy
• SteelEye LifeKeeper
To fully safeguard the central services (enqueue server
and message server) and the central database, all HA • EMC Autostart
setups for SAP NetWeaver ’04 must include at least The vendors of these solutions may also offer
one “switchover solution.” Switchover solutions specific consulting and support. Contact the individual
provide the necessary redundancy for these SPOFs vendor for additional information.
because services can be automatically switched from
a failed host to a standby host in the event of failure, In general, switchover products are capable of

44 www.SAPpro.com ©2007 SAP Professional Journal. All rights reserved.


Architecting a high availability SAP NetWeaver infrastructure

monitoring and controlling different system resources somehow belong together. For example, the
such as host machines, network adapters, and so on. central services and central database instance
In the event of failure, the service offered by the should be in different resource groups because
resource is automatically taken over by a standby they are logically different resources and the
resource. To perform and administer these capabilities, database does not need to be taken offline
the software is configured with all of the elements when the central services are taken offline.
of the environment. For example, MSCS typically
• Actions on a cluster resource or resource group
creates the following setup (see Figure 8):12
- Bring resource or resource group online.
• Switchover environment
- Switch resource group.
- Some number of physical servers combined
together as a (hardware) cluster. These servers - Take resource or resource group offline.
each have their own physical hostname but
present themselves to the other resources under
a single virtual hostname, so the other resour- 4. Choose a setup that secures
ces are not “aware” of the physical host they the SPOFs
are using.
Once you have a clear understanding of the architec-
- The servers within the switchover environment ture and technology associated with HA and how the
have the ability to share the clustered resources. potential SPOFs should be addressed, you can identify
• Cluster resources solid choices for HA setups. See the sidebar on the
next page for key criteria and questions to ask when
- Services offered by the cluster to the outside evaluating the possibilities.
world that can be failed over to an alternate
resource. SAP NetWeaver ’04 provides for the following
potential HA setups:
- Resources can be combined into resource
groups (see below). • DB only in a switchover group

- Typical resource types • CI, SCS, and DB in one switchover group

+ IP addresses (e.g., virtual IP addresses) • CI and SCS in one switchover group, DB


in another
+ Network names (e.g., virtual hostname)
• DB and SCS in one switchover group
+ Processes (e.g., msgserver processes)
• DB and SCS each in its own switchover group
+ UNC or NFS shares (e.g.,
\\<vSCShost>\sapmnt\...) Each of the possible setups has pros and cons. For
example, fewer switchover groups may be simpler to
+ Others configure and administer, but may not meet the evalu-
• Resource group13 ation criteria. As described earlier, one of the key
ingredients for an HA setup and implementation is
- Collection of resources bundled together. failover time. In other words, it is recommended that
Actions are taken on a resource group as a you keep the switchover groups as lightweight as
whole, so the resources within the group should possible so that when failures occur, the (designed)
12
Because this example is an MSCS example, only MSCS terms are used
redundant components can start up quickly and
here. For other vendor solutions, resources and group terms differ. provide the least amount of interruption to the end
Nevertheless, the technology is basically the same. user. Manageability is a concern, too: one does not
13
A “resource group” is known as a “switchover group” in generic HA want the switchover groups so granular that there are
terminology. (unnecessarily) too many of them to administer. For

No portion of this publication may be reproduced without written consent. 45


SAP Professional Journal • March/April 2007

Key HA evaluation criteria and questions


When evaluating possible HA setups, consider the following criteria and questions:
• Degree of HA functionality (remaining SPOFs)
- Are all SPOFs secured and thus eliminated?
- How many of them remain?
• Implementation effort
- How long does it take to implement the HA solution?
• Failover detection
- How does the switchover software detect that it needs to switch over?
• Failover time
- How long does it actually take to bring all resources online after a switchover?
• Number of necessary machines
- How many servers (machines) are necessary to implement the HA setup?
• Architectural sustainability
- Is the chosen HA setup future-proof?

example, although it is possible to have the CI, SCS, The five different HA setup possibilities for SAP
and DB in a single switchover group, this does not NetWeaver ’04 in Figure 9 are, in principle, also
fully meet the evaluation criteria. In this case, the DB, possible for SAP NetWeaver 2004s. From a technical
which usually takes a long time to start up, would be perspective, SAP recommends the following settings:
part of a switchover and restart when the message
server, which can start up on another machine very • Separate the SCS instances (ABAP and Java)
quickly, fails. from the CI

For these reasons, SAP provides the recommendations • Only SPOFs should be protected by the switchover
listed in Figure 9 for the HA setups possible for SAP cluster (to achieve fast switchover times, as well as
NetWeaver ’04. Each HA setup also should be extended for simplicity and handling reasons)
by the replicated enqueue server discussed previously.
• Keep the switchover groups as simple and light-
weight as possible (reduce complexity)
Recommendations for SAP NetWeaver 2004s
• Avoid dependencies between different protected
With SAP NetWeaver 2004s, the setup of an ABAP- SPOFs as much as possible (try to avoid a
SCS (ASCS) is possible and recommended. In this common switchover group for different SPOFs)
case, the CI itself will no longer be an SPOF, nor will
it be so “central,” as it will then have the characteris- SAP provides the recommendations in Figure 10
tics of any other application server (dialog instance). for the HA setups possible with SAP NetWeaver

46 www.SAPpro.com ©2007 SAP Professional Journal. All rights reserved.


Architecting a high availability SAP NetWeaver infrastructure

HA setup type SAP NetWeaver AS Java SAP NetWeaver AS SAP NetWeaver AS ABAP*
ABAP+Java
1. DB only in switchover group Possible Possible Possible
2. CI, SCS, and DB in one Not recommended Possible Possible
switchover group
3. CI and SCS in one switchover Possible Recommended Recommended
group, DB in another
4. DB and SCS in one Not recommended Not applicable Standalone instance for
switchover group MSG/ENQ server
5. DB and SCS, each in its own Recommended Not applicable Standalone instance for
switchover group MSG/ENQ server
* In an ABAP-only installation, there is no SCS instance.

Figure 9 HA setup recommendations for SAP NetWeaver ’04

HA setup type* SAP NetWeaver AS Java SAP NetWeaver AS SAP NetWeaver AS


ABAP+Java** ABAP**
1. DB only in switchover group Possible Possible Possible

2. CI, ASCS/JSCS, and DB in one Possible Possible Possible


switchover group
3. CI and ASCS/JSCS in one Possible Possible Possible
switchover group, DB in another
4. DB and ASCS/JSCS in one Possible Possible Possible
switchover group
5. DB and ASCS/JSCS, each in its Recommended Recommended Recommended
own switchover group
* Separation of SCS for ABAP and Java is a prerequisite.
** In an ABAP-only or ABAP+Java installation, it is possible to separate an ASCS instance.

Figure 10 HA setup recommendations as of SAP NetWeaver 2004s

2004s. As you can see, SAP gives a clear recommen- • The ABAP and Java SCS are separate from the CI.
dation for HA setup type 5 (again, each HA setup
• Only SPOFs are within the switchover cluster.
should be extended by the replicated enqueue server
discussed previously). • The switchover groups are simple.
Figure 11 on the next page shows how the recom- • There are no dependencies between different
mended HA setup type 5 summarized in Figure 10 is protected SPOFs.
extended with the replicated enqueue server described
• The replicated enqueue server is implemented to
earlier in step 3. As you can see, this example configu-
provide SAP lock table protection.
ration illustrates the previously discussed technical HA
recommendations: • The least number of servers are used to reduce cost.

No portion of this publication may be reproduced without written consent. 47


SAP Professional Journal • March/April 2007

Figure 11 HA setup with the replicated enqueue server

5. Implement your system landscape • Isolate SPOFs as much as possible.


- Minimize the impact of a failure and streamline
Once you have chosen an HA landscape, it is time to
time and effort needed for replacement.
perform the implementation! Fortunately, the SAP
installation tool sapinst supports HA installations • Provision for a virtual hostname is mandatory.
“out-of-the-box,” especially through support of instal- - A component that runs inside the switchover
lation on virtual hostnames. As a summary, keep the environment needs to run on a virtual host
following advice in mind when performing an HA because the physical host may automatically
landscape configuration: change (on purpose).

48 www.SAPpro.com ©2007 SAP Professional Journal. All rights reserved.


Architecting a high availability SAP NetWeaver infrastructure

• Automatic reconnection to all software compo- the central services (the enqueue and message
nents is mandatory. servers) and the central database instance.

- In case of a switchover, all dependent software 4. Choose a setup that secures the SPOFs. There
components need to reconnect automatically. are different potential HA setups that you can
implement to secure the SPOFs. A setup should
- Reconnection parameters must be configured not only secure the SPOFs, but also provide fast
according to the expected duration of a failover, switchover times, be easy to administer, and be
which depends on operating system, database, cost effective.
switchover software, HA setup, etc. 5. Implement the system landscape. With your
homework complete, you are well prepared to
• Enable local loading of executables and binaries.
go forward with a successful implementation.
- If the executable switches over, the local machine Fortunately SAP tools support HA installations
is in serious trouble because its binaries cannot be right “out of the box.”
reloaded (TCP connections are broken) — SAP
provides the tool SAPCPE to synchronize bina-
ries and executables automatically.
Conclusion
You can now use these five steps on your own
With the adoption of SAP NetWeaver and the
systems to design and implement a successful and
enterprise SOA, more and more business processes
cost-effective HA infrastructure with SAP NetWeaver.
rely on IT. Mission-critical business functions such as
As a final review, let’s look at the lessons learned in
sales and order entry, continuous manufacturing, and
each step:
even the ability of consumers to make purchases all
1. Understand the architecture. A technical under- depend on the availability of IT services, making
standing of the system architecture is a critical first those services increasingly mission-critical them-
step toward understanding the failure points within selves. All of the players in a business process —
the architecture. There are three different SAP internal, external partners, and customers — demand a
NetWeaver installation types — SAP NetWeaver minimal amount of unplanned downtime of systems
AS ABAP, AS Java, and AS ABAP+Java — and and services. Depending on the type of business and
the business scenario being implemented will drive service, an outage of one hour can cost millions of
the necessary installation type to support it. dollars. What really counts is the availability of
mission-critical services from an end-user point of
2. Find the SPOFs that you need to isolate. Each view, regardless of which system or combination of
SAP NetWeaver installation type has its own systems is needed to provide this availability.
specific SPOFs, but there are general areas you The consequences of IT failing to meet the
should always check, including the central services demands of business are increasingly costly. Therefore,
(the message and enqueue servers), the central an important goal for SAP is not only to provide high
database, and any load balancers or other Web availability of systems such as SAP NetWeaver and
infrastructure. mySAP ERP, but also to facilitate the high- and near-
continuous availability of cross-system business
3. Understand ways to isolate the SPOFs. Different
processes such as order management, production
SPOFs are isolated in different ways. I showed
management, asset management, and more.
you how to address two key SPOFs — the central
services and the central database instance. The This article has examined the implementation of a
standalone enqueue server should be implemented HA infrastructure from both a strategic and implemen-
to avoid session rollbacks, and switchover solutions tation view. Strategically, there is a tradeoff that needs
should be implemented to provide redundancy for to be balanced between the cost of availability and the

No portion of this publication may be reproduced without written consent. 49


SAP Professional Journal • March/April 2007

cost of downtime. Availability is implemented through SAP NetWeaver, it is recommended that you create
redundancy, but at an associated cost — from redun- simple switchover groups for the software services that
dant hardware components to the cost of failover are SPOFs and do not have mutual dependencies. In
techniques. The higher the degree of system and this way, the user impact of failure can be minimized
process availability, the more challenging and expen- and the time and effort for replacement minimized. The
sive that availability is to achieve. A realistic business links and SAP Notes in the resource listing available at
need for availability must be established balanced www.SAPpro.com will aid in the technical implementa-
against the cost of providing this availability level. tion of the setup.
Implementing a high-availability infrastructure Continuous availability of business operations
involves finding the SPOFs within the architecture — has become increasingly important for many cus-
a hardware or software service that, if it fails, causes tomers. SAP has already strengthened focus and
the entire system to fail, leading to unplanned down- investments in this area over the last few years, and
time — isolating them within the architecture and will continue to increase attention and efforts in the
securing them with some sort of redundancy. With years to come.

50 www.SAPpro.com ©2007 SAP Professional Journal. All rights reserved.


Answers at your fingertips.
Extend your learning throughout the year with a license to SAP Professional Journal Online, where developers
and administrators are never more than an arm's length away from thousands of best practices and step-by-step
tutorials to implement, maintain, and extend SAP R/3 and SAP NetWeaver technology.

www.sappro.com

To learn about licenses for individuals, teams, and entire sites,


visit www.sappro.com.

You might also like