Resource Management and Analysis Best Practices For Dtapower
Resource Management and Analysis Best Practices For Dtapower
Resource Management and Analysis Best Practices For Dtapower
WebSphere DataPower
John Rasmussen ([email protected])
Senior Software Developer
IBM
17 July 2013
Introduction
IBM WebSphere DataPower Appliances (hereafter called DataPower) are purpose built for
the rapid deployment of system integration and security policies. Firmware and hardware
components are matched for optimized policy execution in a hardened and easily managed
platform. DataPower accelerates the time to value and lowers the total cost of ownership of these
complex infrastructures.
DataPower configurations often implement solutions through integration with other services. For
example, security policy decisions may be made by accessing a centralized directory (LDAP).
Enterprise policies may be obtained through registry and repository systems. Logging may be
performed by accessing a SYSLOG resource. Messages may be transformed or "enriched"
through database access. Functions such as these are typically performed before the message is
delivered to a downstream application for processing and there may be additional processing of
the application's response before the response is delivered back to the client.
This centralized topography demonstrates DataPower's agile connectivity capabilities. However,
each of these integrations may impose latencies or limitations on the success of a process flow.
Copyright IBM Corporation 2013
Resource management and analysis best practices for
WebSphere DataPower
Trademarks
Page 1 of 22
developerWorks
ibm.com/developerWorks/
While some interactions may be asynchronous or "fire and forget", others will be synchronous
and require completion before subsequent actions can begin. In both instances, transactions may
queue up and consume appliance resources waiting for events to complete. While DataPower
hardware platforms provide faster interfaces, extended memory, and faster CPUs, in extreme
situations these events may limit the ability to process transactions at an optimum rate.
There are best practices to manage these interactions. Monitors may be constructed to track the
rate of incoming transactions and the duration required for each to process. Transactions may
be "shaped" and processed at predetermined rates. Service level agreements and service level
monitors may be configured with complex mediation capabilities, which may be coupled with
enterprise governance resources such as WSDLs, WS-Mediation, and WebSphere Registry and
Repository (WSRR).
This article discusses and demonstrates techniques to make your DataPower configurations
more resilient to the variances of integration dependencies, providing a more robust DataPower
architecture. We'll describe some of the fundamentals of DataPower resources across platforms
and firmware revisions and review the basics of resource monitoring. We'll then look at some best
practices you can implement to optimize your DataPower services.
Memory
7198-32X XG45: 1U
24 GB
7199-42X XI52: 2U
96 GB
7199-62X XB62: 2U
96 GB
12 GB
Page 2 of 22
ibm.com/developerWorks/
developerWorks
as well as supporting higher concurrency (see Tech note: Total amount of memory of DataPower
7198/9 devices). These latest models are required to fully utilize the extended memory capabilities
of 5.X. You can use 5.X on the previous generation hardware, such as the XI50, XS40 and XA35
model 9235 appliances. You can run firmware versions prior to 5.X on the XG45, XI52, XB62, and
XI50B, but you will only fully utilize the increased capabilities with 5.X on XG45, XI52, XI50B, and
XB62.
Memory categories
The memory categories shown in the Memory Usage report are confusing at first glance.
The primary issues revolved around the total amount of physical memory on the appliance
(installed memory) versus the amount available to the DataPower firmware (total memory). When
DataPower requests a block of memory from its operating system and completes the requirement
for its use, it is typically returned to the hold queue, not to the operating system. It is only returned
to the operating system during periods of memory constraint, or when the system recycles.
Therefore, the requested memory typically stays even or grows over time. Table 2 shows the
memory categories.
Resource management and analysis best practices for
WebSphere DataPower
Page 3 of 22
developerWorks
ibm.com/developerWorks/
Amount
Calculation
Description
Memory usage
Total memory
82333842
Installed - Reserved
Used memory
3230980
Total - Free
Free memory
79102862
Requested memory
3858884
Hold memory
627904
Reserved memory
16863558
Installed memory
99197400
Installed - Total
System usage
Another classification of DataPower resources is available through the "Show Load" command of
the Command Line Interface or the "System Usage" (see Figure 2) status provider. The System
Usage shows data for several tasks running on the appliance, not just the main DataPower task.
The values are displayed as percentages over an "interval", which may be modified through the
"load-interval" command. Other tasks include DB2, SSH, and potentially other tasks that run as
side processes and not within the DataPower address space.
The system usage takes into account all the resources that have been allocated, regardless of
whether it is being actively used or simply held in reserve. These values are sometimes useful
Resource management and analysis best practices for
WebSphere DataPower
Page 4 of 22
ibm.com/developerWorks/
developerWorks
when working with DataPower support on resource issues. However, the memory usage from the
show memory status provider is a more accurate measure to use for capacity planning because
the hold memory is available to DataPower for re-use.
Since the publication of the previously mentioned developersWorks article (DataPower release
3.8.2), additional memory status information has been added. In particular, a new log category
"memory-report" now produces detailed information about each individual action's memory
utilization within a processing rule. For example (as shown in Figure 5), a sample rule execution
demonstrates the ability to determine the memory used by sign, verify, and transform actions
Resource management and analysis best practices for
WebSphere DataPower
Page 5 of 22
developerWorks
ibm.com/developerWorks/
and the transaction in total. This is particularly valuable in custom XSLT actions. XSLT that uses
inefficient XPath or patterns may often be optimized to reduce memory footprints. The report
shows memory information for the initial parsing and associated schema validation of incoming
messages and each action with the rule. The sign action in this particular transaction is using more
memory resources than the simple identity transformations that precede it, as you would expect
given its complexity.
Page 6 of 22
ibm.com/developerWorks/
developerWorks
Streaming
An alternative to document parsing is the "streaming" of documents through a service policy.
In this scenario, the document passes through a policy rule, section by section, and while the
entire document is not accessible, this is often all that is required. In streaming mode, memory
requirements are greatly reduced. Streaming requires strict adherence to processing limitations,
including XSLT instructions that may be invoked. For example, an XSLT XPath instruction cannot
address a section of the document outside of the current "node" of the document as it will not be
available.
While streaming processes extremely large documents, you must follow the requirements. You'll
need to create streaming rules and compile options policies and check to ensure your XSLT
conforms to the streaming limitations. For more information about streaming, see the Optimizing
through streaming topic in the DataPower Information Center.
Resource management and analysis best practices for
WebSphere DataPower
Page 7 of 22
developerWorks
ibm.com/developerWorks/
In many occasions, you can use the special "PIPE" context to avoid this intermediate context
creation. The PIPE context does not require separate memory for each processing step and has
other performance advantages as well. While some actions require an intermediate context, and in
some cases you'll need to have noncontiguous processing patterns, you should ensure that each
processing policy is reviewed for optimum context usage.
Another important tool regarding context is the use of the special "NULL" context. This "bit bucket"
is useful when an action does not produce meaningful output. Perhaps all you need to do is log
some data, or set a dynamic route. If you are not modifying the message, subsequent actions can
access the original input data and you do not need to pass it along with XSLT "Copy" statements
and the unnecessary production of context.
Page 8 of 22
ibm.com/developerWorks/
developerWorks
continuous rate, they begin to queue up in an active and incomplete status. These transactions
hold onto resources until they complete.
Backend timeout values are set at the service. The default values are typically 180 seconds and
controls initial connections and the maintenance of connections between transactions. User
agent settings (which are identified from the service's XML Manager Object) are used to specify
the timeout values of "off-box" or inter-rule requests. The default value is 300 seconds. This is
probably too much and more restrictive values should be used, allowing connections to fail when
connections cannot be made in a realistic time. Timeouts vary between endpoint types (HTTPS,
ODBC, and so on) and may be dynamically altered using extension functions. Consult the product
documentation for your specific service configuration.
Timeouts may be identified by log messages and analyzed through the use of log targets,
which consume these events. Latencies are potentially more insidious. You may not be aware
of increases in latencies (unless you are monitoring these values). However, you may use
monitoring techniques, such as SNMP monitors to query service rates and duration monitors.
Some customers will utilize latency calculators through XSLT and potentially create log message,
which again can be consumed by log targets for dynamic configuration or analysis.
Throttling
DataPower constantly monitors system resources, including memory and CPU. To avoid an
excessive use of resources, throttle settings allow for a temporary hold on incoming transactions
until the constraint is relieved. Using these throttle settings allow for inflight transaction to complete
before additional transactions are accepted. In a typical high vailability environment, transactions
are processed by other appliances in the HA peer group, relieving load on the saturated appliance.
Figure 8 shows the default values for throttling. In this example, when memory is at 20% of
available memory, the firmware waits "timeout" seconds and then reevaluates the memory. If the
memory constraint has not cleared, the firmware restarts. If at any time memory falls below the
"Terminate At" value, an immediate restart occurs.
Page 9 of 22
developerWorks
ibm.com/developerWorks/
You can use the backlog queue to hold incoming transactions while waiting for the resource
freeing. The "Backlog Size" number or transactions (currently limited to 500) can be queued for
a maximum of "Backlog Timeout" seconds. If the backlog size is at its default of "0", transactions
are immediately rejected during throttling. As the throttle process evaluates memory and other
resources, it can be configured to produce detailed log messages. Setting the "Status Log" to
"on" from its default of "off" produces messages like those shown in Listing 1. Memory, Port, Free
space, and File System snapshots are captured. As with all log messages, you can send these
events to a process for further processing. That process can, for example, be another DataPower
service that executes XML management commands to modify the configuration settings.
Page 10 of 22
ibm.com/developerWorks/
developerWorks
You now understand some of the resources available to you. You see how the memory data is
used to dive into domains, services, and individual actions to see where memory resources are
being consumed. Next, we're going to discuss some tools that are important in performing dynamic
analysis of transaction flows and that can be used to affect transaction rates. Later, we'll discuss
critical issues of transaction flow regarding interaction with services off the appliance, such as the
logging service and backend resources. You'll see how they can greatly affect transaction flow and
how you can use the service level monitoring tools to mitigate these issues.
Managing services
DataPower services provide an extremely efficient processing environment through the purposebuilt hardware and firmware. However, there are good reasons to control the rate at which
transactions are accepted and processed. We've discussed the interrelationship between
DataPower and external services. You do not want to accept transactions at a rate that exceeds
the external service's abilities to process them. You may also want to offer different classes of
services. Your "gold" customers may warrant a higher rate of processing than your "bronze"
customers.
In this section, we'll describe some of the fundamental configuration options that you can use
to accomplish these objectives. We'll describe count and duration monitors and service level
management policies, which extend the monitor capabilities with more options and the ability to
define multiple "rules" for complex service level monitoring.
DataPower service management options go far beyond these basic capabilities including
integration with WSRR. WSRR provides organizational governance capabilities including policy
definition and automatic service configuration. You can define the policies in WSRR and have
the DataPower configurations automatically created. You are encouraged to investigate these
capabilities and to refer to the Resources section of the article.
Page 11 of 22
developerWorks
ibm.com/developerWorks/
backend server is experiencing a period of high utilization and begins to show increased latency,
DataPower can limit the traffic to this server and the server may recover. If the period is relatively
brief, it may be preferable for DataPower to queue a few transactions in memory rather than
rejecting them. The transactions can be released when the backend latency has decreased. This
can have the benefit of reducing the number of errors seen by the clients.
The drawback to shaping is increased memory utilized to hold (queue) the transactions. If the
spike is too long, resources may be constrained before transactions can be released. Once
accepted into the shaping queue, you cannot cancel a transaction. The queue size is fixed and
you cannot configure it. These are important considerations to take into account when choosing to
shape traffic.
One important factor in the duration monitor filter calculation is that duration monitors measure the
average time for transactions to complete. It is important to note that the algorithm only considers
the average time of the last several transactions. It is not an absolute limit of a single transaction.
A common use case is to configure a monitor that generates a logging event if the average total
latency of the service is climbing above some threshold.
Figure 9 shows the definition of the filter action. Within this configuration, the filter action is defined,
which is "Shape" in this case.
The messages to which filter rules are applied can be selective. You can use a variety of
conditions to determine the characterization of messages used for filter calculations. For example,
the URL of the input message, HTTP headers, and HTTP methods might be part of the conditional
filtering. Figure 10 shows an example of selecting only those messages whose HTTP method is
POST.
Page 12 of 22
ibm.com/developerWorks/
developerWorks
Combining the message type and filter action produces the count or duration monitoring object. In
the example in Figure 11, POST type messages are counted. When they exceed 100 TPS (1000
Millisecond interval), the messages are "shaped" or placed into the temporary queue and executed
at a controlled rate. The threshold calculation includes a "Burst Limit" value. This value allows for
an uneven transaction rate and accommodates a calculation in which "unused" counts in previous
intervals are allowed in successive intervals. The general best practice is to use an interval of at
least 1000 milliseconds and a burst rate of 2 times the rate limit.
Figure 11. Count monitor with message type and filter action (see enlarged
Figure 11)
Page 13 of 22
developerWorks
ibm.com/developerWorks/
SLMs are configurable as a processing action within the processing policies of Multi-Protocol
Gateways and Web Service proxies. An SLM action specifies an SLM policy object, and each of
these objects is composed of an ordered sequence of one or more SLM statements. Each SLM
statement defines a separate set of acceptance and enforcement criteria as well as the action to
be taken. SLM policies provide the option to execute all statements, to execute statements until
an action is taken, or to execute statements until the policy rejects a message. Figure 12 shows a
configured SLM policy containing a single SLM statement.
Figure 12. SLM policy with a single policy statement (see enlarged Figure 12)
Page 14 of 22
ibm.com/developerWorks/
developerWorks
Figure 13 shows the SLM statement form. You can combine multiple SLM statements to provide
for sophisticated service level agreements and throttling procedures. By configuring a set of SLM
statements, each of which is tailored to handle a particular situation that can lead to memory or
resource exhaustion, the appliance as a whole can be better protected from the negative impacts
of anomalous situations within clients, side services, and back-end servers. The SLM design
strategy is to limit the flow of incoming messages when there is the potential for slow transaction
processing times leading to a build-up of transactions being processed, since each in-process
message consumes resources of the appliance.
The following sections describe how you can configure each of the SLM options to handle these
types of situations.
Page 15 of 22
developerWorks
ibm.com/developerWorks/
complete, the appliance can have thousands of simultaneous transactions active. SLM, when
using algorithms such as greater-than or concurrent, considers the latency so SLM can protect
against slow backends.
Note
A best practice is to use monitors as a gross admissions control algorithm while allowing
SLM to handle the finer-grained details of managing resources.
SLM is, in essence, a transform and uses resources in its processing. An attacker can flood a box
with simultaneous connections and the resources become constrained before SLM calculations
have even started. Count monitors are very lightweight so they can handle a connection flood.
Page 16 of 22
ibm.com/developerWorks/
developerWorks
This is an important point. Other algorithms have a potential vulnerability to a slow backend. But,
concurrent connections do not because it is an instantaneous counter.
The faade service's front side handler (FSH) is a simple HTTP connection. In this case, do not
use persistent connections. There are several reasons for this:
First, over the loopback interface, there is no resource or performance penalty for not using
persistence.
Second, when using persistence the appliance caches some memory after each transaction,
which can increase the overhead of the service. Therefore, as there is no benefit, do not use
persistence.
Figure 15 shows the simple faade service (or possible called rule). Again, all we are doing is
encapsulating an SLM policy within the path to the logging service.
The SLM policy is demonstrated in Figure 16, which shows the policy with one rule and the
details of the resource class (using concurrent connections), and the throttle action, which rejects
messages. The policy rule is using a fixed interval of one second with a "count all" threshold of 20.
That is, allowing concurrent transactions and rejecting those in excess of 20.
Page 17 of 22
developerWorks
ibm.com/developerWorks/
Figure 17 illustrates the configuration change to the main processing policy. In the original rule,
transactions went directly to the logging service; in the second, they are sent to the faade service
on the loopback (127.0.0.1) interface.
Figure 17. Policy rule before and after using the faade service (see enlarged
Figure 17)
It's a simple as that. We have now altered our configuration to monitor transactions to the logging
service. When they are excessively slow, we reject the entire transaction.
In summary, some of the best practices for using the faade service are:
Page 18 of 22
ibm.com/developerWorks/
developerWorks
In the second test, we'll use Apache Bench again to process a series of transactions. However,
in this example, while the logging service is again slow to respond, the SLM policy is rejecting
transactions and the effect within DataPower is a dramatic change in resource consumption.
Figure 19 shows that the memory never goes below 95% free. This rejection of transactions is
typically advertised through a logging message, alerts, or other monitoring tools. Administrative
staff, being alerted to the delays in the "off box" service may respond to the latency issue. If the
issues are systemic, for example, "Black Friday" sale spikes, additional DataPower resources may
also be allocated to handle these periodic traffic spikes.
Page 19 of 22
developerWorks
ibm.com/developerWorks/
Conclusion
In this article, we described the central position that DataPower often takes in policy enforcement
and service integration. We've described how this architecture is affected by latencies within
interactions with "off box" services. Service latencies can, if unregulated, have a deleterious
effect on DataPower transaction processing. DataPower provides several methods for resource
monitoring, and we demonstrated the ability to analyze resource utilization at the system, domain,
and service, and down to the specific actions within a processing policy.
We described some of the fundamental transaction management options, including count and
duration monitors and service level management, and how you can use them to regulate and
smooth transaction flow and to mitigate latencies exposed by services with which DataPower
interacts. We also mentioned the more advanced governance capabilities available through
WSRR.
Finally, we demonstrated how you can use these techniques in a specific use case, in which an
integration service (logging service) becomes slow to respond, and how using the faade service
technique provides a more resilient and effective DataPower implementation.
Acknowledgements
Many of our IBM colleagues assisted in the preparation of this article. The authors would like to
thank Barry Mosakowski, David Shute, Carol Miller, and Daniel Badt.
Page 20 of 22
ibm.com/developerWorks/
developerWorks
Resources
developerWorks articles and DataPower publications:
IBM WebSphere DataPower SOA Appliances product documentation
IBM Redbook: WebSphere DataPower SOA Appliance Handbook
IBM Redbook: Strategic overview of WebSphere Appliances
Monitoring WebSphere DataPower SOA Appliances
IBM Redbook: DataPower SOA Appliance Administration, Deployment, and Best
Practices
WebSphere DataPower SOA Appliance performance tuning
End-of-service dates for the Hardware Generation Machine Types (M/T): 7993, 9235,
4195, 7199, and 7198
IBM WebSphere DataPower SOA Appliance Firmware Support Lifecycle
WebSphere DataPower Information Center: Optimizing through Streaming
Enforcing Service Level Agreements using WebSphere DataPower, Part 1: Applying the
SLA Control File pattern
SOA governance using WebSphere DataPower and WebSphere Service Registry and
Repository, Part 1: Leveraging WS-MediationPolicy capabilities
developerWorks WebSphere DataPower zone
Tech notes:
Why does memory % differ for "show memory" and "show load" when using DataPower?
Total amount of memory of DataPower 7198/9 devices
IBM WebSphere DataPower Appliances firmware V5.0 adds support for extended
memory, OAuth 2.0, and enhanced governance and SLA management features
IBM WebSphere DataPower Service Gateway XG45 and WebSphere DataPower
Integration Appliance XI52 virtual editions provide flexible deployment options
WebSphere DataPower spikes with high CPU when WebSphere MQ connection is
unavailable
Gathering WebSphere DataPower CPU and memory information
How does the front side timeout work?
What IBM DataPower timeouts are used in specific configuration methods
Page 21 of 22
developerWorks
ibm.com/developerWorks/
Matthias D. Siebler
Matthias Siebler is the L3 Team Lead for IBM's DataPower Appliances Division.
He has worked with DataPower Corporation and IBM since 2002 as a Software
Developer and Support Specialist.
Copyright IBM Corporation 2013
(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)
Page 22 of 22