High Avail Netweaver
High Avail Netweaver
This article originally appeared in the March/April 2007 issue of SAP Professional Journal and appears
here with the permission of the publisher, Wellesley Information Services. For information about SAP
Professional Journal and other WIS publications, visit www.WISpubs.com.
Note!
Hardware and operating
Application
system failures, disasters
Even though the numbers in Figure 1 may 20%
failures
40%
look pretty impressive, they are valid only
for unplanned downtime. The calculations
do not include any planned downtime estima-
tions for offline backups, upgrades, updates,
security fixes to applications and operating
systems, and so on. By their very nature, the Operator errors
40%
business anticipates these activities and can
take actions to mitigate their impact; HA
measurements, on the other hand, address
events that have an unknown and unplanned Figure 2 Causes of unplanned downtime
occurrence. As previously mentioned, in a (source: Gartner Group)
real-world scenario you will need to consider
planned downtime as well.
From a technical infrastructure point of view,
architectural and technical SPOFs must be identified
and secured in an appropriate manner to help ensure
system availability. It is often advisable to use addi-
tional hardware and fault-tolerant software so the
A “Fault resilient” availability of 99.999% (also system can continue operating, uninterrupted, if any
known as “the five nines”), is currently accepted in single component fails. For example, disk drives could
the industry as the best possible availability. be a hardware SPOF; disk mirroring reduces the like-
lihood of failure and can effectively eliminate disk
To achieve a scalable and flexible system land- drives as an SPOF.
scape, professional application servers are usually
installed in a software cluster, which means that there Figure 2 shows the major causes of unplanned
are various instances of the same service, available on downtime. Hardware failures, operating system fail-
different physical machines. These physical instances ures, environment failures, and disaster effects can be
are usually reached via a load balancer that dispatches avoided by eliminating the SPOFs within systems and
requests from the clients. In addition to dispatching implementing disaster recovery scenarios to minimize
the workload, this concept also provides a redundant the impact of possible disasters on the system infra-
infrastructure that enables resistance to most hardware structure.2 “Human errors” (e.g., errors caused by
and software faults. faulty configuration, bad change control, etc., as
opposed to errors caused by a failed hardware com-
Despite this redundancy, in many system land- ponent, earthquake, etc.) make up 80% of downtime
scapes there are still services — e.g., the load causes, though,3 and these cannot be solved through
balancer, central database, and so on — that are redundancy or techniques involving switching to alter-
unique within the cluster environment and vital for nate resources. Human errors need to be addressed
the cluster operation. If one of these services crashes,
the whole cluster might not work anymore. These
2
For more on disaster recovery, see the SAP Professional Journal articles,
“Is Your R/3 System Recovery Plan a Disaster? A Three-Step Approach
central, non-redundant services are referred to as for Designing Recovery and Availability Plans” (September/October
potential “SPOFs. Therefore, an SPOF is defined as 2001) and “The 15 Most Overlooked Items in Planning for High
“a hardware or software service that, if failing, will Availability and Disaster Recovery” (July/August 2002).
cause the entire system to fail, leading to unplanned 3
See www.gartner.com/webletter/ibmglobal/edition2/article5/
downtime.” article5.html.
through ease of system management and with The process of switching from a failed component or
improved change and problem management processes. subsystem to its redundant replacement is called
A good source of more information on useful system “failover.”5
management software is available at
When designing a highly available system,
http://service.sap.com/ha.4
however, you must consider the tradeoff between the
As mentioned earlier, unplanned downtime poses costs of increasing your system availability and the
more of a risk to your technical infrastructure than costs of system downtime (see Figure 3). As you can
planned downtime due to its unpredictability, but see on the left side of the figure, the costs of downtime
that doesn’t mean that you can ignore the effects of are not linear with respect to the duration of the down-
planned downtime. In the end, HA is measured from time: with longer downtimes, the increase in costs
the perspective of the end user. If a system is running, is closer to exponential. For example, when supply
but a user cannot access the system, then the system chain processes are stuck for longer than three hours,
cannot be considered available. In an Internet sales the entire production process may become stuck,
model, for example, the lack of availability can be which then produces even higher downtime costs.
critical, since the end user may go to a competitor to Conversely, as shown on the right side of the figure,
complete the transaction. For this reason, it is impor- the measures taken to increase system availability —
tant to also reduce planned downtimes for tasks such such as redundant components, a disaster recovery site,
as system and/or infrastructure maintenance, patching, first-class system management tools, a skilled IT staff,
upgrading, and so on through strategies that involve system capacity planning, and proactive services —
scalable components that enable rolling maintenance, cost progressively more money to implement.
efficient upgrade and patching processes, and proven
Because of this HA cost tradeoff, a business case
software lifecycle management engines such as the
must be made for the availability level of a system.
SAP Transport Management System (TMS) for ABAP
Businesses need to determine their realistic business
and the Change Management Service (CMS) for Java.
needs for availability. Measure the cost of system
I go into more detail on some of the features provided
downtime and then balance this availability level
by SAP for minimizing planned downtime later in the
with the cost required to provide it. For example,
article.
development, sandbox, training, and production
Understanding the end user perspective of avail- systems will all have different availability needs; a
ability will also help with setting and meeting appro- reporting system can have more system downtime
priate SLA requirements, as well as with making than an operational system. There may well be cases
informed decisions when weighing the costs of where 99% availability (outage of less than two hours
increasing availability, which I discuss next. per week) is good enough.
Costs
Costs
Figure 3 Availability costs vs. system downtime costs (source: Gartner Group)
NetWeaver) and application scenarios that run on top can exist in your systems and the different methods
of SAP NetWeaver. SAP systems must be able to you can use to minimize downtime in your system
utilize the HA-capable computing infrastructure environment, both unplanned and planned, and to
you have in place.6 SAP also works with partners improve availability.
regarding platform-specific procedures, and in the
case of Microsoft Cluster Server (MSCS), provides
specific HA procedures and documentation.
HA-capable computing infrastructure elements
Keys to minimizing system
(hardware, operating system, database system, file downtime and improving
system, etc.) are provided by SAP platform/solution
partners, who must also provide their platform-
availability
specific HA procedures. This includes the HA Understanding the preventative options available to
configuration and switchover ability, and support you for protecting common failure points, avoiding
of the implementation at the customer site. unplanned downtime, and reducing planned downtime
will help set you up for success when assessing and
The customer must define the suitable and required
planning your own HA implementation. In the next
HA levels for their systems. It must also provide the
sections, I take you through these options and point
proper IT management concepts and guidelines, and
out common strategies, recommendations, and best
ensure the appropriate operating procedures and
practices for a successful HA implementation.
training of their IT staff.
Now that you understand the fundamental princi-
ples and considerations involved with HA systems, Protecting common failure points
let’s take a look at the possible points of failure that
A system landscape consists of different components
6
The computing infrastructure, cluster software, etc., are provided by and infrastructures, each of which must be checked
third-party partners. for potential SPOFs and protected with appropriate
Cluster types
In the technology world, the word “cluster” has different meanings. There are at least three kinds of clusters:
• Hardware: A hardware cluster provides HA by using redundant hardware for every piece of equip-
ment that may fail. At the same time, additional hardware options are needed to control running units
and to detect if such hardware has failed. Hardware clustering is the only way to protect (hardware)
against SPOFs.
• Software: In this scenario, software runs in a distributed environment, mainly to provide scalability
to the overall system. At the same time, this means of deployment implies a certain kind of HA,
since failing parts may be replaced through the distributed nature of the system. Unfortunately, this
technique can become quite complicated, due to system requirements for maintaining the integrity of
distributed data.
• Database: Database vendors are using the same principles to provide their products with HA. One way
is to run a database in a clustered environment. This means that the database is running on more than
one machine and in case of failure a switchover will occur. There are different technologies available
to achieve this, such as the Microsoft Cluster Server (MSCS) and Oracle RAC, for example.
measures (which may rely on partner technology; one computer fails another takes over its resources
more on this later when I discuss switchover solu- and provides them transparently to the outside
tions). Common points within the system landscape world (see the sidebar above for more on the
requiring protection and ways to protect them include different types of clusters)
the following:
• Database (vendor-specific) — cluster, replication,
• Network — redundant network components and shadow database, multi-runtime, etc.
topology; redundant provider links; protected
network services (DNS, Mail, LDAP, etc.)
Avoiding unplanned downtime
• Storage — RAID technologies (protecting disk
availability through redundancy); SAN storage As stated previously, SAP shoulders the responsibility
networks with some HA features (split mirrors, for leveraging a computing infrastructure that is HA-
synchronous write) capable to protect against unplanned downtime. SAP
NetWeaver achieves this using four strategies:
• Server hardware — redundant components (power
supplies, buses, coolers, boards); hot-pluggable • You can install SAP NetWeaver into an HA
components; ECC memory; server domains; environment that includes switchover software7 or
manageability (remote management, automatic clustered resources.
restart, etc.) 7
Software that automatically starts a failed service on a different physical
host machine. Switchover software reduces unplanned downtime by
• Server operating system (hostnames, IP addresses, enabling rapid resumption of a failed service on a substitute host. SAP
file services, applications, name servers, Windows system services that can be susceptible to failure because they cannot be
configured on multiple hosts (such as the DBMS or enqueue and mes-
domain controllers) — cluster (several computers
sage services) can benefit from the extra resilience of switchover soft-
coupled to work together as one computer, with ware. The substitute host must be sufficiently powerful to support the
shared resources varying from none to all); when additional workload following switchover.
• You can operate SAP NetWeaver in an HA envi- without changing the system’s runtime behavior,
ronment — that is, all SPOFs can be protected thereby reducing the subsequent downtime for the
(either by SAP functionality and/or partner remaining import. Especially in connection with
features) by redundant components extensive or several support packages, this reduces
and/or switchover techniques. the system downtime significantly (see SAP Note
361735).
• Software maintenance tools can operate in an
HA environment such as a switchover or cluster. • Release upgrades — Reduce downtime for release
upgrades through a “system switch” upgrade,
• You can operate SAP scenarios in redundant where many of the upgrade activities (such as
disaster recovery sites. imports and activation) are performed in a shadow
system in parallel to the running production
system, reducing downtime of the production
Reducing planned downtime system.9 To further reduce downtime, SAP also
SAP also provides features that help minimize offers the Customer-Based Upgrade (CBU), where
planned downtime, including: customer-specific post-upgrade activities, such as
modification adjustments, add-on installations, etc.,
• Kernel upgrades — Enable one-by-one “rolling are performed on a copy of the production system.
kernel upgrades” between (SAP-defined) compat- A custom upgrade package that includes these
ible kernels on application servers to eliminate changes is then created on this system and used
full system shutdown8 for the production system upgrade. The overall
user downtime for the upgrade of the production
• Profile parameter changes — Make changes to system is thereby reduced, since these activities
SAP configuration profile parameters online no longer need to be performed post-upgrade.
without restart (for a list of parameters that can
be changed online, see SAP Note 102428) • End of daylight savings time — Stretch time in the
“double hour” (kernel release 6.40 and higher, see
• Operation mode changes — Dynamically change SAP Note 7417)
the work process type to adjust to different work
profiles (day vs. night, dialog vs. batch) • Database reorganization — Perform table/index
defragmentation, etc., to improve performance10
• Object imports or transports — For standard SAP
software maintenance activities (e.g., transports), • Offline backup — Enable offline backup, with
provide predictability and planning (for system or without split-mirror technology. Performing a
outage) through optimization of software mainte- backup on a split mirror has the advantage of less
nance tools, such as ABAP Transport Management downtime because the system may be started up on
System (TMS) or Java Change Management the single disk array while the offline backup is
Service (CMS) performed on the mirror, but the disadvantage is
that the system is not mirrored during the backup
• Applying support packages — Perform a shadow (unless it is a triple mirror, that is!).
import of support packages; that is, perform a
parallel import of inactive new repository objects • Online backup — All SAP products allow consis-
into the database, then activate them for the tent online backup
runtime environment through a “switch” proce-
dure. The advantage to this technique is that you 9
For more on the system switch upgrade method, see the article, “A
Basis Administrator’s Step-by-Step Guide to Preparing for an Upgrade
can import a large portion of the objects of a to SAP R/3 Enterprise” (SAP Professional Journal, July/August 2004).
support package (reports) into the running system
10
For more on database reorganizations, see the article “Boost SAP R/3
Performance by Reorganizing Your Oracle Database: A Proven
8
This capability is currently in testing and has not yet been released. Reorganization Strategy” (SAP Professional Journal, July/August 2005).
Isolation Robustness
• Try to avoid single points of failure (SPOFs) wherever • Enable redundant units to detect and survive SPOF
possible. outages, e.g., implement reconnect capabilities.
• Minimize dependencies from SPOFs. • Raise critical situations and react proactively, e.g.,
• Enable fast restart of SPOFs. raise alert or restart if JVM is running out of heap.
• Isolated SPOFs from remaining (redundant) infrastruc- • Minimize impact of redundant unit outages.
ture to be able to secure them independently, e.g., by • Enable reliable notification mechanisms (e.g., token-
third-party disaster protection solutions. based activities) to activate standby unit in case of
active unit failure.
Each of the components of an SAP NetWeaver system can be classified according to these four design
patterns. Some examples include:
• Virtualization & Redundancy: Application servers are redundant units, because more than one appli-
cation server (dialog instance) can be installed on separate machines. This type of configuration
eliminates application servers as SPOFs, and at least some users will be able to survive a crash of
a dialog instance.
• Load Balancing & Synchronization: The system’s message server can distribute and load balance
user requests to the active (redundant) application servers.
• Robustness: ABAP-based systems have a “db reconnect” feature, so that if a work process loses its
connection to the database (for example, through a crash or network traffic problem), it will attempt to
reconnect to the database without aborting. This way, the user session is not lost and the transaction
does not have to be rolled back if the work process can reconnect during the (configured) reconnect
time period.
• Isolation: Since the database and the central services (the enqueue server and the message server) of
the system architecturally appear only once within the landscape, they are SPOFs and must be isolated.
SAP NetWeaver AS Java and provides state information to the SAP Web
The SAP NetWeaver AS Java architecture contains Dispatcher, which processes and routes Web requests
many features that customers may not be familiar with (via HTTP/HTTPS) from external clients/applications
(see Figure 4). The proven approach from the ABAP to application servers in the SAP system, and balances
stack has been transferred to the Java stack in SAP the load between application server instances.
NetWeaver ’04. The Java Central Instance (Java CI) The database (DB) instance contains a single Java
contains the Java dispatcher, which, similar to the schema.
ABAP dispatcher, receives client requests and
forwards them to the appropriate server processes.
It is the Java server process that actually processes SAP NetWeaver AS ABAP+Java
the requests and holds the session data. Also included This installation option combines both the ABAP and
in the Java CI is Internet Graphics Service (IGS) for Java stacks in a single SAP instance — the Add-In
rendering graphics and Software Deployment Central Instance, as shown in Figure 5. The Add-In
Manager (SDM) for managing an SAP Java develop- Central Instance consists of the ABAP Central
ment landscape. Instance (ABAP CI) containing the ABAP dispatcher,
The SAP Central Services (SCS) Instance contains work processes, gateway, enqueue server, and
the Java enqueue (ENQ) server and message (MSG) message server; the Java Central Instance (Java CI)
server, which are also similar to their ABAP counter- containing the Java dispatcher, server processes,
parts. The Java enqueue server manages the logical and SDM; and the IGS. In this setup, an Internet
locks in the system and ensures server synchroniza- Communication Manager (ICM) service manages
tion. The Java message server is the central service for communications between the application server and
internal cluster communication (such as event notifi- external clients/applications — it receives requests
cations, broadcasts, or an exchange of cache content) from the “outside world” (HTTP, SMTP, etc.) and
forwards them to the appropriate stack for processing.
As in the SAP NetWeaver AS Java scenario, the
SCS Instance contains the Java enqueue (ENQ) server
and message (MSG) server. So, in an ABAP+Java
installation, there is an enqueue and message server
for each stack.
The database (DB) instance contains two separate
schemas — one for the ABAP stack and one for the
Java stack.
Once you understand the technical features of the
SAP NetWeaver architecture, you next need to iden-
tify the failure points that could compromise the
availability of your system.
ENQ Server
ABAP Java
(Java)
Dispatcher Dispatcher
Gateway SDM
ENQ Server
(ABAP)
MSG Server
DB Instance
(ABAP)
Costs
ABAP CI Java CI
Java CI ABAP Schema
Costs
© SAP AG
• In all three installation types (SAP NetWeaver the enqueue server since the consequences of its
AS ABAP, AS Java, and AS ABAP+Java), there failure can be severe.
are the central services that, architecturally, appear
only once in a system. These are the enqueue and • The central database instance is an SPOF that must
message servers that for Java reside in the SCS be secured. In step 3, I show you a useful way to
Instance, and for ABAP reside in the ABAP CI. secure the central database instance.
In an ABAP+Java installation there is an enqueue • Load balancers (such as SAP Web Dispatcher)
server and a message server for the ABAP stack and other Web infrastructure components (such as
that resides in the Add-In Central Instance as well reverse proxies) are SPOFs from an end-user point
as an enqueue server and a message server for the of view.
Java stack that resides in the SCS Instance; these
must be secured. (Note that the SDM appears only - For the SAP Web Dispatcher, two may be run
once in the architecture, but it is not considered in a redundant setup, the second able to take
to be an SPOF because it is not a runtime-critical over if the first fails. For additional informa-
component.) I demonstrate some possibilities for tion, go to the SAP NetWeaver 2004s online
isolating the central services to protect your system help at http://help.sap.com, search for “Web
from downtime in step 3, with a particular focus on Dispatcher,” and navigate to Architecture and
Dispatcher Dispatcher
Enqueue Replication
Server Server
© SAP AG
3. Identify ways to isolate the SPOFs For the Java stack, the consequences are much
more severe. The enqueue table contains locks on data
There are different possibilities for securing the objects and infrastructure locks. In addition, these
central services and central database instance SPOFs locks are system bound, not session bound. When the
identified in step 2 within the SAP NetWeaver archi- enqueue service is lost in the Java stack, the enqueue
tecture. In the next sections, I first show you how to table is lost, causing a session rollback, just as in
use a standalone enqueue server with replication to ABAP. But, because there are also infrastructure locks
avoid session rollbacks, and then I show you how to in the lock table, a restart of all J2EE instances is
use a switchover solution to provide redundancy for necessary (and enforced as of SAP NetWeaver ’04
the central services and the central database instance. SPS15)!
To achieve state preservation and thus provide an
Use a standalone enqueue server with HA strategy for the enqueue service, SAP provides a
replication to avoid session rollbacks standalone enqueue server with enqueue replication.
The enqueue server is critical to any SAP system Enqueue replication prevents session rollback due to
because it manages all of the SAP locks. Although enqueue server failure and, for the Java stack, must be
similar in concept in both ABAP and Java, the failure implemented to ensure an HA strategy.
of the enqueue has different consequences within The HA enqueue server consists of the standalone
each stack. enqueue server and a replicated enqueue server (see
For the ABAP stack, the enqueue table contains Figure 7). The replicated enqueue server runs on
session-bound locks on data objects. When the en- another host and contains a replica of the lock table
queue service is lost, the enqueue table also is lost, (replication table). The standalone enqueue server is
which causes an automatic rollback of any concern- no longer integrated into an SAP application server
ed sessions. However, only these sessions (work (central instance) and is instead provided by SAP as
processes) are rolled back; you do not need to restart an independent program. The main advantage of using
the software cluster. the standalone enqueue server is that you can replicate
Typical resources
• A physical disk
• An IP address
• A network name
• An SAP resource
• A file share
Resource groups:
• A group for Oracle
• A group for the SCS
the lock table on another host, which means that if the allowing SAP system operation to continue. Outside
standalone enqueue server fails, you can very quickly of the switchover solution, it is just a basic setup of
start a new standalone enqueue server (on the alterna- SAP NetWeaver ’04. It doesn’t matter how the instal-
tive host) that will continue working seamlessly with lation is distributed and how many dialog instances
the current status of the lock table. All clients and the are installed! In each HA setup, third-party software
replication server are connected to the standalone is involved to execute the switchover.
enqueue server.
HA solutions are heavily platform-dependent and
In the case of SAP NetWeaver AS ABAP, each rely on third-party switchover solutions such as:
work process has a separate connection to the stand-
• Microsoft Cluster Server (MSCS)
alone enqueue server. With SAP NetWeaver AS Java,
each server process in the J2EE cluster has a separate • HP Service Guard
connection to the standalone enqueue server.
• SUN Cluster
In step 4, when I take you through possible HA
setups, I show an example configuration that incorpo- • Veritas Cluster Server
rates a replicated enqueue server. • Oracle Failsafe, Oracle RAC
• IBM HACMP
Using a “switchover solution” to provide redundancy
• SteelEye LifeKeeper
To fully safeguard the central services (enqueue server
and message server) and the central database, all HA • EMC Autostart
setups for SAP NetWeaver ’04 must include at least The vendors of these solutions may also offer
one “switchover solution.” Switchover solutions specific consulting and support. Contact the individual
provide the necessary redundancy for these SPOFs vendor for additional information.
because services can be automatically switched from
a failed host to a standby host in the event of failure, In general, switchover products are capable of
monitoring and controlling different system resources somehow belong together. For example, the
such as host machines, network adapters, and so on. central services and central database instance
In the event of failure, the service offered by the should be in different resource groups because
resource is automatically taken over by a standby they are logically different resources and the
resource. To perform and administer these capabilities, database does not need to be taken offline
the software is configured with all of the elements when the central services are taken offline.
of the environment. For example, MSCS typically
• Actions on a cluster resource or resource group
creates the following setup (see Figure 8):12
- Bring resource or resource group online.
• Switchover environment
- Switch resource group.
- Some number of physical servers combined
together as a (hardware) cluster. These servers - Take resource or resource group offline.
each have their own physical hostname but
present themselves to the other resources under
a single virtual hostname, so the other resour- 4. Choose a setup that secures
ces are not “aware” of the physical host they the SPOFs
are using.
Once you have a clear understanding of the architec-
- The servers within the switchover environment ture and technology associated with HA and how the
have the ability to share the clustered resources. potential SPOFs should be addressed, you can identify
• Cluster resources solid choices for HA setups. See the sidebar on the
next page for key criteria and questions to ask when
- Services offered by the cluster to the outside evaluating the possibilities.
world that can be failed over to an alternate
resource. SAP NetWeaver ’04 provides for the following
potential HA setups:
- Resources can be combined into resource
groups (see below). • DB only in a switchover group
example, although it is possible to have the CI, SCS, The five different HA setup possibilities for SAP
and DB in a single switchover group, this does not NetWeaver ’04 in Figure 9 are, in principle, also
fully meet the evaluation criteria. In this case, the DB, possible for SAP NetWeaver 2004s. From a technical
which usually takes a long time to start up, would be perspective, SAP recommends the following settings:
part of a switchover and restart when the message
server, which can start up on another machine very • Separate the SCS instances (ABAP and Java)
quickly, fails. from the CI
For these reasons, SAP provides the recommendations • Only SPOFs should be protected by the switchover
listed in Figure 9 for the HA setups possible for SAP cluster (to achieve fast switchover times, as well as
NetWeaver ’04. Each HA setup also should be extended for simplicity and handling reasons)
by the replicated enqueue server discussed previously.
• Keep the switchover groups as simple and light-
weight as possible (reduce complexity)
Recommendations for SAP NetWeaver 2004s
• Avoid dependencies between different protected
With SAP NetWeaver 2004s, the setup of an ABAP- SPOFs as much as possible (try to avoid a
SCS (ASCS) is possible and recommended. In this common switchover group for different SPOFs)
case, the CI itself will no longer be an SPOF, nor will
it be so “central,” as it will then have the characteris- SAP provides the recommendations in Figure 10
tics of any other application server (dialog instance). for the HA setups possible with SAP NetWeaver
HA setup type SAP NetWeaver AS Java SAP NetWeaver AS SAP NetWeaver AS ABAP*
ABAP+Java
1. DB only in switchover group Possible Possible Possible
2. CI, SCS, and DB in one Not recommended Possible Possible
switchover group
3. CI and SCS in one switchover Possible Recommended Recommended
group, DB in another
4. DB and SCS in one Not recommended Not applicable Standalone instance for
switchover group MSG/ENQ server
5. DB and SCS, each in its own Recommended Not applicable Standalone instance for
switchover group MSG/ENQ server
* In an ABAP-only installation, there is no SCS instance.
2004s. As you can see, SAP gives a clear recommen- • The ABAP and Java SCS are separate from the CI.
dation for HA setup type 5 (again, each HA setup
• Only SPOFs are within the switchover cluster.
should be extended by the replicated enqueue server
discussed previously). • The switchover groups are simple.
Figure 11 on the next page shows how the recom- • There are no dependencies between different
mended HA setup type 5 summarized in Figure 10 is protected SPOFs.
extended with the replicated enqueue server described
• The replicated enqueue server is implemented to
earlier in step 3. As you can see, this example configu-
provide SAP lock table protection.
ration illustrates the previously discussed technical HA
recommendations: • The least number of servers are used to reduce cost.
• Automatic reconnection to all software compo- the central services (the enqueue and message
nents is mandatory. servers) and the central database instance.
- In case of a switchover, all dependent software 4. Choose a setup that secures the SPOFs. There
components need to reconnect automatically. are different potential HA setups that you can
implement to secure the SPOFs. A setup should
- Reconnection parameters must be configured not only secure the SPOFs, but also provide fast
according to the expected duration of a failover, switchover times, be easy to administer, and be
which depends on operating system, database, cost effective.
switchover software, HA setup, etc. 5. Implement the system landscape. With your
homework complete, you are well prepared to
• Enable local loading of executables and binaries.
go forward with a successful implementation.
- If the executable switches over, the local machine Fortunately SAP tools support HA installations
is in serious trouble because its binaries cannot be right “out of the box.”
reloaded (TCP connections are broken) — SAP
provides the tool SAPCPE to synchronize bina-
ries and executables automatically.
Conclusion
You can now use these five steps on your own
With the adoption of SAP NetWeaver and the
systems to design and implement a successful and
enterprise SOA, more and more business processes
cost-effective HA infrastructure with SAP NetWeaver.
rely on IT. Mission-critical business functions such as
As a final review, let’s look at the lessons learned in
sales and order entry, continuous manufacturing, and
each step:
even the ability of consumers to make purchases all
1. Understand the architecture. A technical under- depend on the availability of IT services, making
standing of the system architecture is a critical first those services increasingly mission-critical them-
step toward understanding the failure points within selves. All of the players in a business process —
the architecture. There are three different SAP internal, external partners, and customers — demand a
NetWeaver installation types — SAP NetWeaver minimal amount of unplanned downtime of systems
AS ABAP, AS Java, and AS ABAP+Java — and and services. Depending on the type of business and
the business scenario being implemented will drive service, an outage of one hour can cost millions of
the necessary installation type to support it. dollars. What really counts is the availability of
mission-critical services from an end-user point of
2. Find the SPOFs that you need to isolate. Each view, regardless of which system or combination of
SAP NetWeaver installation type has its own systems is needed to provide this availability.
specific SPOFs, but there are general areas you The consequences of IT failing to meet the
should always check, including the central services demands of business are increasingly costly. Therefore,
(the message and enqueue servers), the central an important goal for SAP is not only to provide high
database, and any load balancers or other Web availability of systems such as SAP NetWeaver and
infrastructure. mySAP ERP, but also to facilitate the high- and near-
continuous availability of cross-system business
3. Understand ways to isolate the SPOFs. Different
processes such as order management, production
SPOFs are isolated in different ways. I showed
management, asset management, and more.
you how to address two key SPOFs — the central
services and the central database instance. The This article has examined the implementation of a
standalone enqueue server should be implemented HA infrastructure from both a strategic and implemen-
to avoid session rollbacks, and switchover solutions tation view. Strategically, there is a tradeoff that needs
should be implemented to provide redundancy for to be balanced between the cost of availability and the
cost of downtime. Availability is implemented through SAP NetWeaver, it is recommended that you create
redundancy, but at an associated cost — from redun- simple switchover groups for the software services that
dant hardware components to the cost of failover are SPOFs and do not have mutual dependencies. In
techniques. The higher the degree of system and this way, the user impact of failure can be minimized
process availability, the more challenging and expen- and the time and effort for replacement minimized. The
sive that availability is to achieve. A realistic business links and SAP Notes in the resource listing available at
need for availability must be established balanced www.SAPpro.com will aid in the technical implementa-
against the cost of providing this availability level. tion of the setup.
Implementing a high-availability infrastructure Continuous availability of business operations
involves finding the SPOFs within the architecture — has become increasingly important for many cus-
a hardware or software service that, if it fails, causes tomers. SAP has already strengthened focus and
the entire system to fail, leading to unplanned down- investments in this area over the last few years, and
time — isolating them within the architecture and will continue to increase attention and efforts in the
securing them with some sort of redundancy. With years to come.
www.sappro.com