SETLabs Briefings Performance Engineering

Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

VOL 7 NO 1

2009
PERFORMANCE
ENGINEERING AND
ENHANCEMENT
SETLabs Briefings
Advisory Board
Gaurav Rastogi
Associate Vice President,
Head - Learning Services

George Eby Mathew
Senior Principal,
Infosys Australia
Kochikar V P PhD
Associate Vice President,
Education & Research Unit
Raj Joshi
Managing Director,
Infosys Consulting Inc.
Rajiv Narvekar PhD
Manager,
R&D Strategy
Software Engineering &
Technology Labs
Ranganath M
Vice President &
Chief Risk Officer
Subu Goparaju
Vice President & Head,
Software Engineering &
Technology Labs

Perform or Perish
Businesses are going through tough times. Globally, companies are either
going bankrupt or are at the verge of it. Economies are tumbling down at
an alarming rate giving the Keynesians a reason to rejoice. Consumers and
taxpayers are reliving their dj vu moments. No, the idea in this piece is not
to paint a pessimistic picture but to assure that nothing is lost yet. As long as
one performs, one cannot perish. Be it an economy or a company or a function,
performance is a must to prevail. In the last one year, through our focused
themes we tried at addressing issues relating to cost reduction, maximizing
return on investments, speedier deployment and delivery, good understanding
of customers requirements, achieving alignment between IT and business and
ways to become smart enterprises. Given the turmoil that businesses are now
facing, we attempt at reecting at the same issues, albeit from a different prism
performance engineering.
It would be clichd to state that bad performance can result in lost revenues,
dwindling protability and jeopardized relationships thereby fastening the
progress of an enterprises decay. Through this issue we try at prodding the
importance of performance engineering and enhancement in todays business
enterprise.
As an enterprises IT infrastructure gets complex, performance risks grow
exponentially. There is a need therefore to rst dene and comprehend
performance engineering issues from a business perspective and then work out
ways to address and plug ambiguities if any, to skirt performance perils. We
have three papers in the collection that deal with this idea.
Rapid business changes bring with them concomitant quality concerns.
Performance has to keep up with the pace of environmental changes even while
maintaining high quality. Sounds difcult? Not really, as our authors contend.
Tight assumptions, holistic planning and a proactive outlook can help mitigate
quality problems in todays swiftly changing business scenarios.
This collection is also made rich by two case studies. Our practitioners draw
from their huge experience of providing performance solutions to some Fortune
500 companies and bare open ways to practically manage performance issues in
agile environments.
With the growing importance of data centers in todays enterprise IT space,
a pressing question that haunts IT managers is whether servers are being
utilized to their potential. Unfortunately, most data centers are plagued with
underutilization. In the spotlight is a paper, that suggests consolidation of
servers through virtualization to wheedle better performance off data centers.
Perform and triumph! Wishing you a very happy, prosperous and performance-
oriented new year!
Praveen B. Malla PhD
[email protected]
Editor

SETLabs Briefings
VOL 7 NO 1
2009
Insight: Is Performance Engineering Primarily a Business Issue?
By Bruno Calver
The entire rigmarole of performance engineering can be made more meaningful by
articulating business capacity requirements succinctly. Not only would this help the execution
team to prioritize its business needs but also bring about pecuniary benefits if properly
implemented, contends the author.
3
Opinion: Avoiding Performance Engineering Pitfalls
By Sathya Narayanan Nagarajan and Sundaravadivelu Vajravelu
How does one mitigate performance engineering risks especially when commercial grade
enterprise applications exact complex server infrastructure? The authors opine that a holistic
approach must be adopted to safely circumvent such risks.
9
Model: Addressing Challenges in Gathering Performance Requirements
By Rajib Das Sharma and Ashutosh Shinde
Quality concerns should not be considered as ad hoc plug-ins. They need to be in-built and
made an integral part of the software development lifecycle, feel the authors.
15
Practitioners Solution: How to Design High Performance Integration Solution?
By Arun Kumar
Making individual applications within an enterprise set up sing in harmony can be a difficult
proposition given the frequent changes in high performance IT scenarios. The author draws
from his experience in suggesting that assumptions need to be very strongly defined, else
achieving robust solutions to wither all weathers would be a far cry.
23
Viewpoint: Leveraging Enterprise Application Frameworks to address QoS Concerns
By Shyam Kumar Doddavula and Brijesh Deb
In a rapidly changing business scenario, pro-activeness in managing QoS concerns like
performance, scalability and availability is a must to sustain competition. In this paper, the
authors explain on how adoption of an enterprise-wide applications framework can help
manage QoS concerns with elan.
31
Case Study: Performance Engineering in ETL: A Proactive Approach
By Hari Shankar Sudhakaran and Anaga Mahadevan
A few performance checkpoints are all that it takes to enhance and optimize the ETL
processes. Every stage of software lifecycle, if reviewed through the lens of statistical tools, can
reveal some not-so-apparent yet obvious issues opine the authors.
37
Framework: Integrating Performance Engineering Services into Enterprise
Quality Framework
In this paper the authors discuss a comprehensive A to F approach that leaves no stone unturned
in a performance engineering process. They suggest that organizations need to look beyond
functional design of an application and plan for high performance in live production process.
45
Case Study: An Agile Approach to Performance Tuning
By Shailesh Bhate, Rajat Gupta, Manoj Macwan and Sandip Jaju
Drawing from their experience in delivering PE solutions in agile environments, the authors
showcase a sprint methodology that employs a stop, check and go approach to tune
performance.
53
Spotlight: Server Consolidation: Leveraging the Benefits of Virtualization
By Manogna Chebiyyam, Rashi Malviya, Sumit Kumar Bose PhD and Srikanth Sundarrajan
Data centers can be made to perform to their potential by keeping a close watch on
application usage patterns. A better way to enhance performance is to consolidate them in
the most intelligent way virtualization. In this paper, the authors draw out infrastructure
optimization algorithms that help improve the performance of data centers.
65
The Last Word: Tide Tough Times with Performance-driven Development
By Rajinder Gandotra
75
Index 77
Quality of Service considerations need to be focused
early on in the SDLC failing which expenses could
meteorically shoot high in detecting QoS issues,
rebuilding and re-architecting the application.
Shyam Kumar Doddavula
Principal Architect
Infosys J2EE Center of Excellence
SETLabs, Infosys Technologies Limited



Underperforming applications have always been eyesores
to businesses. Capturing performance requirements
scientifically and succinctly is thus crucial to the execution
of performance-driven development.
Rajib Das Sharma
Principal Architect
Performance Engineering and Enhancement Practice
Infosys Technologies Limited



3
SETLabs Briefings
VOL 7 NO 1
2009
Is Performance Engineering
Primarily a Business Issue?
By Bruno Calver
Maximize your ROI by entwining business
perspective into performance engineering
P
erformance engineering can be a time
consuming, complex and costly activity.
Projects can stumble on functional requirements
besides the usual fret over non-functional
requirements. In addition, clients are increasingly
nding the cost of stress testing tools and live like
test infrastructures prohibitive.
It is the joining up of business strategy,
priorities and operational practices at the outset
that will define the success of performance
engineering activities as much as the skill with
which they are executed. More importantly there
are other ways to address performance challenges
than a purely technical approach, like demand
management. It is all about understanding the
full range of options to ensure maximum return
on investment.
This paper draws on industry best
practices such as the Information Technology
Infrastructure Library v.3 (ITIL

) and research
from Forrester

to help business clients navigate


their way through the various challenges that are
presented by service performance and business
expansion.
PERFORMANCE = CAPACITY
The rst step in understanding performance
engineering is to look at the problem rst from
a capacity, rather than traditional performance
engineering perspective. Assuming that a
service is built to specification, there is no
change in usage patterns and data is properly
archived, it is hard to imagine what performance
issues might arise in an operational context.
The point here is that it is the business that
changes and presents the challenge to service
performance.
Think of it like a bus service. If commuters
suddenly increase or change their scheduled
time of travel then the bus service might struggle
to accommodate all the passengers. This is a
capacity problem. The bus is just not big enough
however fast it goes. Providing more buses at
peak times is disproportionally expensive as the
overall utilization of the eet is likely to fall, the
same is true of IT systems.
This indicates that the first step in
performance engineering is to understand the
business requirements. The key to the ongoing
4
success of managing performance rests on how
well changes, or planned changes, in the business
are monitored, understood and communicated.
This operational bit is something that is often
missed and can present the greatest challenge.
This paper will focus on how business can
articulate its capacity requirements and
how it can manage its needs on an ongoing
basis in order to achieve the best return on
investment.
ITIL VERSION 3 AS A STARTING POINT
ITIL version 3 can help provide a foundation to
address the challenges already outlined. First,
the new lifecycle model introduced as part of
version 3 of the best practice guide addresses
both the initial gathering of requirements at a
strategic business level as well as the on-going
work required to monitor the service strategy.
Supporting the direction set by the business
is then delivered by the well established ITIL
process of capacity management, as was already
present in version 2, although updated and
tted into the lifecycle model in version 3.
The goal of the capacity management
process is to ensure that cost-justifiable IT
capacity in all areas of IT always exists and is
matched to the current and future agreed needs
of the business, in a timely manner [1].
There are three sub-processes that are
worth mentioning that support the overall
capacity management process:
Business Capacity Management Takes
agreed business requirements and translates
them into IT service requirements
Servi ce Capaci t y Management
Focuses on the management, control and
forecasting of the end-to-end performance
and capacity of the operational service
Component Capacity Management
Looks at the performance and capacity of
individual technical components.
Figure 1 might help visualize where each
of these sub-processes t into the overall picture
in a reasonably typical IT environment. Figure 1
shows that business capacity management sits at
the top of the hierarchy. If the direction taken by
the business is not properly dened, understood
and communicated then all of the other aspects
of capacity management are made that much
more difcult or indeed defunct. Given the fact
that capacity management is perceived to be an
extremely technical, complex and demanding
process [1] there is little room for making
mistakes in understanding the requirements, as
they are likely to be costly.
From an ITIL version 3 perspective,
business capacity management derives its
requirements from service strategy and service
portfolio management, both being part of the rst
stage of the service management lifecycle. These
sections provide some excellent guidance in their
own rights and will indeed generate many capacity
requirements. The question is how much to invest
in an ITIL based capacity management process?
THE BUSINESS FILTER
The rst step is for the business to determine the
criticality of any planned or existing service. This
sounds simple, but ask any business manager
if their service is the most important and their
rst response is yes. How do you make sense
of all the yes responses that exist within the
organization? Application scoring mechanisms
give CIOs a rating mechanism that can help them
reallocate maintenance dollars to the highest-
priority applications while starving commodity
applications [2]. Such an approach can be
customized to the capacity management process.



















5
Application scoring is all about using a
objective method to prioritize and score service
in order to provide funding and resource to th
most critical and valuable IT services. There ar
four dimensions to the scoring system:
Business perspective
Application perspective
IT perspective
Vendor perspective
Score based questions in each of thes
categories in relation to capacity managemen
will help guide funding and effort levels allocate
to different services, particularly those that ar
already in operation.
The business perspective should assess th
criticality of the service to the business in factua
terms with queries like how much revenue is tie
to the service? Is it customer facing? How man
n
s
e
e
e
t
d
e
e
l
d
y
employees use the service? What is the impact
of delays or poor performance of the service on
business? A high score would mean that the service
is of critical importance to the business.
The application perspective looks at
some of the technical aspects of the service with
queries like how big is it? How easy is it to
scale? How complex is the service? How many
external dependencies does this application have
that affect performance or capacity? A high score
shows that the application or service is large,
technically complex and likely to be dependent
on a number of external services.
The IT perspective will capture concerns
like what is the availability of skills to work on
the service? Does the service align to the ongoing
IT strategy? Is the platform stable? How easily
are technical issues resolved? A high score would
indicate that there is little internal capability to
manage and develop the service.
Figure 1: Capacity Management Process Map in Relation
to IT Services
Source: Infosys Experience
6
Finally, the vendor perspective should
focus on the roadmap for the service from the
suppliers perspective, particularly in the context
of off the shelf products. Vendor perspective will
address queries like is this a key service for
the vendor? Does the vendor provide a tried and
tested capacity roadmap? Do the future plans for
developing this product align with the business
needs? A high score would reect poor vendor
support and roadmap alignment.
It is worth noting that low scores in the IT
and vendor dimensions not only suggest additional
focus is required in terms of capacity management,
but the solution needs to be seriously reconsidered
for projects at the point of inception.
The scoring system should be kept simple
and should align to the business priorities and
key technical questions vis--vis performance
and scalability issues. Each of the questions can
be weighted in order to enable the business and
technical teams to rank the relative importance of the
various questions. A 3-2-1-0 scoring mechanism is
advisable as it is simple and does not have a neutral
score and therefore forces a particular judgment.


The outcome of such an exercise will result
in a score against each dimension. The scores are
then combined to enable a matrix to be created as
per Figure 2.
The next stage is to develop some bands
of service that can be matched to the position a
given service resides within the matrix.
Figure 2: Capacity Management Scoring Matrix
Source: Infosys Research
PROVIDING A BANDED CAPACITY
MAANGEMENT PROCESS
Here it is worth referring to a seminal paper by
George I. Thompson Six Levels of Sophistication
for Capacity Management [3]. The author looks at
a variety of activities and approaches of capacity
management within an enterprise environment
and identies different levels of maturity. This
paper also recognizes that business criteria is a key
factor in the most mature capacity management
processes.
For the purposes of this analysis it is best
to consider the application of just three of the six
levels proposed by George I. Thompson:
Level 1 Formally measure, trend and
forecast peak period utilization and
plan resource capacity with an on-going
periodic review program.
Level 3 Includes an automated workload
forecast system. Either looks at single
attributes or in an advanced form enables
forecasts by different categories.
Level 5 - Use business application criteria
with an application model to predict
service levels and forecast resource usage
requirements.
In line with Thompsons suggested levels it
is proposed that services that fall within quadrant
1 of the scoring matrix take the approach dened
7
by level 5 to capacity management. Quadrants 2
and 3 should be managed using level 3 service.
Finally, quadrant 4 services should be managed
on the basis of level 1.
These levels should also be expanded in
two ways. The rst is to include the type of test
infrastructure required to support each of the
services where the highest level will surely need
the most live like test platform. The second should
be to dene the levels in terms of the ITIL version
3 capacity management process.
THE DEMAND SIDE OF CAPACITY
MANAGEMENT
So having looked predominantly at dening the
supply side of the capacity management equation
it is time to turn our attention to the demand side.
The supply-demand split is becoming a signicant
trend within the IT industry [4].
One aspect of demand management is to
understand the incoming business need. When
the business does decide that they want to grow
or extend a function they need to consider how
they might best do this. IT is not always the most
cost effective solution.
This is where incentives and penalties to
inuence consumption can assist in providing

economic capacity solutions. Figure 3 puts this
into context in relation to previous points.
Figure 3: The Service Demand Belt Source: ITIL version 3 Service Strategy
Crown copyright 2007 Reproduced under license from OGC
By introducing a nancial aspect to the
consumption of services that accurately reects
the additional cost of capacity, it creates a market
based system that makes sure that additional
capacity has equal or greater business value than
its cost. For example, the cost of capacity can be
varied according to the time of use and lower off-
peak charging can assist in utilizing an otherwise
unused capacity.
In addition, thresholds of capacity could
be determined so that capacity above a certain
level has a progressively higher marginal cost.
It is, however, critical that any such charging
model reects the real cost of provision and is
transparent and fair, or else the costs and benets
become distorted.
SOME WORDS OF CAUTION
It is important to understand the likely potholes
en route navigating your way along the capacity
management road, especially in terms of using
an application scoring approach as well as using
charging mechanisms to control demand. The
rst and most signicant issue is that an objective
approach to understanding and prioritizing
8
applications and services can challenge the status
quo within an organization. This can lead to
strong political challenges and make designing
and deploying the framework a tricky affair as
it might disrupt traditional priorities and budget
structures.
Next is the challenge of implementing an
effective charging system. A full-edged nancial
management and chargeback system needs to be
in place for demand management to be effective.
As mentioned earlier, any charging system
needs to be perceived as fair. This is especially
troublesome where measuring the use of a service
or application by different business units or users
is difcult or impossible.
Finally, capacity management will need to
accurately predict the additional cost of capacity
and build this into the charging mechanism.
This in itself can be a signicant effort or yield
inaccurate results that will affect any cost/benet
analysis.
CONCLUSION
Understanding and managing how capacity
management is supplied and demanded is critical
to ensure the extraction of maximum value from
the process. This is not to distract from the highly
technical aspects of performance engineering,
but is to be seen as a crucial step to ensuring that
such activities are focused in the right areas and
for the right reasons.
Using ITIL version 3 as a foundation for
capacity management and bringing together
other frameworks to determine how it is to be
applied, offers organizations a powerful lens
to have a view of the landscape. This paper has
shown that performance engineering should
start as a business question and the subsequent
answers then lay the foundation for any technical
activity.
REFERENCES
1. ITIL version 3, Service Design, Ofce of
Government Commerce (OGC), UK, 2007.
Available at http://www.ogc.gov.uk/
guidance_itil_4899.asp
2. Phil Murphy, Laurie M Orlov and Lauren
Sessions, CIOs: Reduce Costs By Scoring
Applications Lower Maintenance Costs
And Change IT Demand Governance,
2007. Available on www.forrester.com
3. George I Thompson, Six Levels of
Sophistication for Capacity Management,
Computer Measurement Group, 2000.
Available on www.cmg.org.
4. Christopher Koch, Why IT Executives
Split Staffs to Create Supply, Handle
Demand for Technology Services, CIO,
2007. Available at http://www.cio.com/
article/print/121150.
9
SETLabs Briefings
VOL 7 NO 1
2009
Avoiding Performance
Engineering Pitfalls
By Sathya Narayanan Nagarajan and Sundaravadivelu Vajravelu
Adopting proactive strategies to manage PE
helps enterprises improve time-to-market and also
minimizes cost overruns
M
odel driven performance engineering
(PE) is a cost effective technique that
helps enterprises predict the performance and
scalability of applications beyond the capacity of
the current hardware [1]. Performance engineering
of commercial grade enterprise application
demands complex server infrastructure. As this
type of complex hardware setup is available only
for a limited duration, it is imperative to manage
these challenges effectively to avoid time and cost
overruns. In performance engineering, it is quite
common to relate the source of performance issues
to the application being assessed; this need not
always be true.
This paper considers a holistic approach
to performance engineering from three aspects
assessment of software application, load generation
software and hardware infrastructure.
We delve into the performance engineering
pitfalls that an enterprise needs to be cognizant
of and how the proactive strategies when
followed can lead to successful performance
engineering endeavors. The issues faced and the
strategies used to solve them are based on our
experience from performance engineering of
Java/J2EE applications with loads ranging from
300 concurrent users to one million concurrent
users.
Figure 1 overleaf depicts the different
di mensi ons of t he i ssues f aced duri ng
performance engineering engagements.
CAPACITY ASSESSMENT
The first critical step in planning for a
performance engineering exercise or benchmark
is the assessment of hardware infrastructure
requirements.
In one of our benchmarking exercises,
hardware resources were assessed based on the
current production hardware infrastructure.
From the design perspective, the application
was not very database intensive (less than 30%
of the transactions involved database). Having
spent three months to assemble the required
hardware at the benchmark center and owing
to non-availability of storage area network
10
(SAN) infrastructure (equivalent to that o
production), it was consciously decided t
proceed with Automated Storage Managemen
(ASM) conguration for Oracle 10g with tw
external storage Logical Unit Numbers (LUN
connected via ber and load balanced for optim
performance.
The benchmark hit a bottleneck at on
million concurrent user loads owing to very hig
disk I/O (100%). Though the objective of th
benchmark to assess the scalability of the system wa
met, our assumption on database intensiveness o
the application was incorrect as it eventually turne
out to be a bottleneck in the scalability assessmen
Hence we deduced that a proper estimation o
hardware resources is needed based on CP
memory, I/O, network utilization.
Many a times having the right hardwar
for the application being tested gets mor
emphasis than the hardware required for the loa
generation. Benchmark exercises sometimes fa
or get delayed because of insufcient hardwar
to generate the required load. The followin
approaches can help to ensure a more realisti
hardware capacity for load generation software
Scenario 1: If there are no historical data point
f
o
t
o
)
al
e
h
e
s
f
d
t.
f
U,
e
e
d
il
e
g
c
.
s
on previous benchmark, perform a capacity
assessment for load generation hardware in the
planning phase.
Scenario 2: If there are one or more historical data
points, use that as a reference for extrapolating the
required hardware to generate benchmark load.
Load generation capacity issues can
manifest in different forms and in a few cases
can prove to be very misleading too. For
instance, load test reports might indicate that
the application transaction response time is
higher than the acceptable limits. In such a
scenario, the responsible teams usually end-up
spending more time and effort on diagnosing
the application software/hardware rather than
the load generator software/hardware. For
strange reasons, we seem to take it for granted
that load generation hardware/software will
not contribute to issues faced in performance
engineering. A change in perception is much
needed amidst such situation.
We recommend the following strategies
that should help to gure out whether the load
generation software/hardware is contributing
to performance engineering issues viz., high
response time for transactions.
Figure 1: Dimensions of Performance Engineering Issues Source: Infosys Experience
11
Sampling Technique: Use two instances of
load generation software on the same hardware
or different hardware. From one instance A,
have the same load Test per Second (TPS) at
which you encountered the performance issue.
From another instance B, just re one or two
transactions that are under scrutiny. Alternately,
one can also re the second transaction manually
and measure the response time. Measure and
compare the response time of transactions
in both the cases. If the response time differs
signicantly then one can infer that the load
generation software/hardware is the source of
the bottleneck. On the contrary, if the response
time is similar, one can infer that the issue is not
with load generation hardware or software. The
issue might be with the application being tested/
application server conguration/other factors
like network latencies.
Using Application Logs: Custom logging can
be enabled to measure response time of complex
transactions and the data can be used to validate
any discrepancies noted in the results from the
load test tool. This approach is very helpful in
isolating the source of the issue.
In one of the engagements, we had
taken the transaction response time from
application transaction logs as there was a huge
difference between the load testing tool results
and application logs transaction time.
TUNING FOR OPTIMAL PERFORMANCE
The next critical step in planning for performance
engineering is to ensure that the environment
(application software/hardware and the load
generation software/hardware) is tuned for
optimal performance. Lack of insight into
tuning often leads to change in focus and one
ends up resolving issues that are not related to
performance.
As in any model based performance
engineering exercise, accuracy of test data is
critical to be able to depict the actual system
behavior. All tests and measurements become
void if any environmental parameter changes.
This demands for more and more re-tests to
ensure that we have accurate data to derive the
predictive model depicting the actual system.
It is a common practice to tune the
application gradually with load. Tuned parameters
at a particular load might not be optimal for the
maximum benchmark load. Tuning it at each
incremental load might be time consuming,
involving lot of re-tests of the prior test runs.
Hence it is more important that we tune the
application for the maximum load required
for benchmark so that the system/Java Virtual
Machine (JVM)/database parameters do not vary
all through and we can minimize re-test effort.
PROFILING AND ANALYSIS
Prolers play an important role in identifying the
performance bottleneck - memory leak issues.
The conventional techniques of using logs to
identify the performance bottleneck is nave and
cumbersome when analyzing asynchronous,
multi-threaded, distributed applications.
Proling with Appropriate JVM Options: While
JVM provides many Virtual Machine (VM)
options viz., -X options, -XX options, to optimize
the garbage collection performance, the options
might not be supported by the prolers being
used. It is imperative to study various VM options
supported by the proler upfront and understand
the limitations we might face due to unsupported
options. During proling, we need to run the VM
with proler supported options only.
Profiling with Incremental Load: Profilers
have inherent limitation on supported load
12
(TPS) as they have overheads associated with
tracking object lifecycle, memory profile, etc.,
at a very detailed level. If there is a need to
trend the application profile characteristics at
incremental load, we can do so by gradually
incrementing the load and monitoring the CPU
and memory utilization on the server where
the application is being profiled.
Memory Modeling: Bottlenecks to horizontal
scalability require a thorough understanding
of the existing architecture and its limitations.
I n one of t he perf ormance engi neeri ng
engagements, though the application was
implemented with distributed architecture,
there were bottlenecks to horizontal scalability
due to storage of session data of logged in users
irrespective of the servers they are connected
to. In order to fi gure out the maxi mum
concurrent user that can be supported without
re-architecting the application, a memory
model was derived. The memory usage at
various concurrent users load viz., 1000,
2000 and 3000 till 10,000 was analyzed. The
memory usage trend was plotted on a graph
and curve fitting technique was used to
derive a mathematical model depicting the
memory utilization behavior of the application.
This was a useful inference in assessing the
scalability of the application.
IDENTIFYING PERFORMANCE
BOTTLENECKS
In complex heterogeneous systems, single
application can comprise of multiple modules/
services that are developed by various vendors
and are depl oyed i n the same/di fferent
servers. When the composite application fails
to meet the required performance levels, we
should be in a position to identify the source
of the bottleneck like the software being
developed, hardware infrastructure, OS/
Server parameters, network related issues like
network latencies, etc. Isolating the source
is critical to channel efforts towards timely
resolution.
While profilers are good way to identify
software bottlenecks, there are instances
where one might not be able to profile the
applications owing to environmental and
technological limitations. In such scenarios, a
good logging strategy can be a good choice to
identify software bottlenecks.
Response time of complex business
transactions can be measured using timestamp
data from the custom logs enabled specifically
for analysis. Further, within each transaction,
the time taken by each block that performs
request to third party services, database
requests, I/O operations can also be measured
using custom logs. Thus a good logging
strategy enables us to identify the exact block
contributing to the software bottleneck for
further analysis.
Figure 2: Memory Model
Source: Infosys Experience
1
ARCHITECTURE ANALYSIS WITH BLACK-
BOX INPUTS
We might often need to evaluate a vendor product
for performance and scalability to help enterprises
make a buy decision. Owing to IPR restrictions,
vendors might not have shared all the required
technical documentation for architectural analysis
until the assessment results in a positive business
outcome.
In one such situation, we interpreted
the architecture based on discussions with the
vendor architects/developers and derived the
representative software execution model. This
was later validated using proler data and log
le data.
CHALLENGES WITH LOAD GENERATION
TOOLS
We often face a dilemma during the planning
stages of any performance benchmark while
choosing the best tool for benchmark. As
applications are unique in their behavior, no
single tool can best fit all types of benchmark.
However the following factors can help one
to make a better decision on the choice of
tools.
Licensed vs Open Source Load Generation Tool:
At low volumes and user concurrency (non-server
grade applications) for load and stress testing, the
cost of licensing is not a major deciding factor.
Any standard load/stress testing tool can be a
good t.
At high volumes and user concurrency
(server grade applications) the cost is denitely
a key deciding factor. It is highly important that
we determine the options available with open
source tools and the one that best suits our test
requirements.
Critical success factors to be considered
while selecting the right open source tool:






Ability to support ability the protocols,
applications being tested
Measurement accuracy
Reporting capabilities of the tool - SNMP
support for resource utilization metrics
Error handling capabilities
Ability to support isolation tests and
mixed load tests
Flexibility to be customized for project
requirements in case the default metrics
do not support the objective.
Choosing a load testing tool without
validating the aforementioned points will prove
to be detrimental in the performance engineering
engagement. In one of our engagements, we had
used an open source tool without validating its
reporting capability. We ended up spending
unplanned effort to generate metrics from the test
tool log le.
UNVEILING DATA DISCREPANCIES
BEFOREHAND
It is highly recommended that raw data from the
performance testing be translated into required
form for trending and analysis, once the specic
test run is complete. This will help in identifying
any data discrepancies and the need for re-tests
well in advance. It is a common practice to
postpone data analysis and reporting towards
the end of the performance engineering exercise
and if there are discrepancies in the test results
data especially in a model driven performance
engineering this, results in a disaster
A few suggestions to overcome common
pitfalls can be listed with respect to different
phases:
Planning Phase: Ensure that a proper estimation
of hardware including load generation server
and software resources is done. In a case, where
3
14
one uses open source or new load generation
tool, perform a feasibility study of the tool.
If prolers are used it is good to understand
profiler supported JVM options. Use good
logging strategy to measure the response time
of database operation, third party services, I/O
operations and overall response time. In case of
non-availability of design and code, understand
the architecture through discussions and validate
the understanding using logs generated and
proler analysis
Warm up Phase: Validate load runner results
through manual/independent testing. Tune
environmental parameters to optimal value for
the maximum load planned.
Testing Phase: For proling with various loads,
gradually increase the load by monitoring CPU,
memory and I/O utilization. For predicting the
size of objects, use curve tting methodology and
also ensure that the data received from testing is
translated to the required form for trending and
analysis.
CONCLUSION
Performance engineering of commercial grade
enterprise application demands complex
infrastructure and efficient management of
risks to minimize failure. Enterprises need to be
cognizant of the performance engineering pitfalls
and be adequately equipped to mitigate them in
a timely fashion.
REFERENCES
1. Ki ngshuk Dasgupt a, Engi neeri ng
performance in Enterprise IT Application
An Infosys Perspective, 2006. Available
at http://www.infosys.com/newsroom/
event s/2006/Ki ngshukDasgupt a-
Webinar-Presentation-Performance-Engg.
pdf
2. Vishy Narayan, Enterprise Application
Performance Management: An End-to-End
Perspective, SETLabs Briengs, Vol 4, No
2, 2006. Available at http://www.infosys.
com/IT-services/architecture-services/
white-papers/enterprise-application-
performance-management.pdf
3. Deepak Goel, Software Infrastructure
Bottlenecks in J2EE, 2005. Available
at http://www.onjava.com/pub/a/
onjava/2005/01/19/j2ee-bottlenecks.
html
4. J D Meier, Srinath Vasireddy, Ashish
Babbar, and Alex Mackman, Improving
.NET Application Performance and
Scalability, 2004, Microsoft Corporation.
Available at http://msdn.microsoft.com/
en-us/library/ms998530.aspx.
15
SETLabs Briefings
VOL 7 NO 1
2009
Addressing Challenges in Gathering
Performance Requirements
By Rajib Das Sharma and Ashutosh Shinde
Comprehensive performance requirements are
critical for building systems that can scale inline
with business growth
P
erformance requirements can
significantly alter the architecture,
design and development of applications
and eventually affect the Quality of Service
(QoS) perceived by the users. It has been
observed that applications that are built
on the basis of deficient performance
requirements cannot sustain the workload
that is exerted in the production environment
which can lead to loss of credibility as well
as revenues for the business. In an industry
that continues to lose billions of dollars
due to underperforming applications,
deficient performance requirements are still
commonly observed.
The path to capturing comprehensive
performance requirements is laden with
multiple challenges and myths that impede
the activity. This paper explores these
challenges, myths and the risks faced by
businesses in the event of such challenges
and a model that aims to address these
challenges is proposed.







ELEMENTS OF PERFORMANCE
REQUIREMENTS
Performance requirements are usually
associated with response time and throughput
since these are the most tangible parameters
that are perceived by the end users. Thus, it is
not uncommon to come across performance
requirement documents that capture only the
volume and response time for some of the use
cases that are perceived to be critical from a
performance standpoint.
So, let us examine the elements that
must be addressed as a part of performance
requirements to ensure that the requirements
are comprehensive and include all the required
information for execution of subsequent
performance engineering activities.
Factors like database volume, workload
on the system, infrastructure hosting the
application, external systems involved in
servicing the request and the usage prole play
a role in dening the performance characteristics
of any application. So, while the performance
16
requirements should comprehensively
address each of the above mentioned factors,
requirements like response time, transaction
volume, resource utilization requirements, etc.,
should be dened within the constraints of these
elements. The importance of these elements in
inuencing the performance of the system is
explored below:
Workload Model: Workload model represents
the different forces that act on the system and is
represented in terms of various parameters like
concurrency, session length and think (inter-
arrival) time that can affect the performance of
the system. Let us explore the parameters:
Concurrency - Concurrent use of shared
resources like CPU, network, thread
pools, connection pool, etc., can lead to
the formation of queues in getting access
to these resources if the throughput is
less than the arrival rate. Longer queues
can delay the response time.
Session Length - Longer session time
implies longer consumption of the
server resources. Longer active sessions
paired with high concurrency can stress
the system and affect performance.
Think Time This denotes the time taken
by the user to think before initiating the
subsequent request. This parameter is
crucial in modeling real-life scenarios
during performance testing.
The workload model provides the
necessary information for performance
validation and modeling activities and
hence is a critical element of performance
requirements.







Usage Prole: Usage prole represents the
uctuations in the load (volume) on the system
due to business seasonality, peak and off-peak
business hours, etc. Considering the nature of
transactions viz., batch or online transaction
processing (OLTP), is also important as the
treatment for processing and the expectations
from a performance standpoint varies with
transactions. For example, batch transactions
are generally associated with higher throughput
while the OLTP transactions are associated with
lower response time.
External Interfaces: The time taken by a system
to respond to a request is inclusive of the
individual response time of all systems and
components that are invoked while processing
the request synchronously. Thus, it is important
to know the service level agreements of these
external systems and components before the
response time of the entire system is known and
committed.

Infrastructure: The hardware, software and
network inuence the performance of any
system as they provide the necessary computing
resources and communication channels. Queuing
network-based performance models aim to
identify the resource centers (like CPU, network,
thread pool, etc.) that are potential bottlenecks in
the system [1, 2]. Thus, details of these resource
centers are necessary to be documented as a part
of the requirements for creating the performance
models.
It is frequently observed that the
performance test environment is a scaled down
version of the production environment. In such
cases, the infrastructure details of both these
environments are helpful for extrapolating the
results observed on the test environment to the
production environment.
17
Database: The volume of data in the database
impacts the performance of requests that rely
on database operations for completion. Hence,
it is important to capture the volume of data
that can be expected in the production system
so that the performance validation and analysis
activities can be conducted in accordance with
the expected volume.
CHALLENGES IN GATHERING
REQUIREMENTS
Multiple challenges and myths are encountered
while trying to capture the different elements
of the performance requirements as described
in the previous section. These challenges
and myths can be introduced because of low
awareness about the performance engineering
cycle, unavailability of data, low quality of data
and resistance to share data.
Low Awareness Leading to Myths: Low
awareness about the importance of the workload
modeling and performance engineering activities
can result in myths that can jeopardize the results
of the performance related activities.
For instance, stakeholders frequently
request that the average concurrency per second
be derived from the average concurrency known
per hour, by dividing the known concurrency
value by 3600 seconds. For a system with a
total peak hour load of 36,000 hits it would be
incorrect to assume the concurrency as 10 hits/
second, unless it has a uniform distribution
[2]. The concurrency can be 17 hits/second
and 20 hits/second, if the distribution is
Normal and Poisson respectively (assuming a
mean of 10 and standard deviation of 2). The
concurrency is thus 1.7 times and 2 times that
of the assumed concurrency in these scenarios.
Hence, it is important to know the distribution
patterns before arriving at the concurrency as
the architectural and implementation decisions
can signicantly vary based on the expected
concurrency. This myth can lead to tests that
are not representative of the actual load on the
system.
Unavailability of Data: Web server log les and/
or business activity data logged in the database
can provide valuable information about the
Figure 1: Typical Responses Encountered while gathering
Performance Requirements
Source: Infosys Research
18
workload characteristics of an existing system
that has been in production for a sufciently
long duration. Important information like
session length, inter-arrival time, concurrency,
distribution patterns and usage prole (most
used use-cases) etc., can be derived from the log
les/database.
Some projects do not maintain historic
log les/business activity logs due to absence
of an archival strategy that mandates archival
of such artifacts. This results in a loss of data
sources that could potentially provide the
required information. Systems in the process of
new development also do not have the historical
data for such analysis.
Low Data Quality: Low quality of data can make
the available data unusable for the performance
engineering activities. It is often observed that
performance benchmarks for external systems
are available on environments that do not match
the production environment and/or workload
with that of the system under analysis.
In both the cases mentioned above, the
data that is available is not usable in dening
the response time for those use-cases that
utilize the services of these external systems
and components for fulllment.
Resistance to Share Data: Organizations are
reluctant to share the infrastructure details
for perceived threat of security. Details of the
test environment may be more easily obtained
compared to the production environment
as the security threat perceived is lower. In a
similar manner, vendors of external systems
are often resistant in sharing the performance
characteristics of their products with other
vendors.
The political dynamics within
organizations can also inhibit requirements

































gathering process and reach a consensus on the
requirements stated by different people/teams
in the organization.
CONVERSION OF CHALLENGES INTO
RISKS TO THE PROJECT
Deciencies in each element of the performance
requirement can induce different risks in the
project and jeopardize its success. Incomplete
and incorrect performance validations, failure
in creating performance models, failure to
hypothesize the performance characteristics at
different workloads and constraints in dening
the service level agreements are some of the key
risks that the project may be exposed to. Risks
associated with each element are discussed
below -
Risks Associated with Decient Workload
Model: Performance validation plays a crucial
role in assessing the performance capabilities
of applications and forms an important
activity in the performance engineering cycle.
Performance validation may not be aligned
to actual workload in the event of decient
performance requirements and thus test results
may not be representative of the actual system
capabilities.
Measurements of system resources like
CPU utilization, network utilization, disk I/O
etc., are important inputs to parameterize the
performance models. Incomplete workload
may lead to incorrect measurements that
may in turn result in highlighting incorrect
bottlenecks and saturation points in the
resources. Likewise, capacity plans may have
errors and systems that may get saturated
much earlier than expected and can impact
business. Incorrect workloads could also lead
to over-provisioning of capacities that may
result in revenue lock-in.
19
Risks Associated with Decient Infrastructure
Details: Detailed information about the
infrastructure is necessary to identify the
various computing resources in the systems that
are represented by the performance models.
Deciency in the infrastructure details can lead to
incomplete performance model that can constrain
their usage in identifying bottlenecks in the
system. Performance models can also be utilized
for predicting the performance characteristics
of the system at hypothetical workload. The
predictions are inaccurate if all elements of
the infrastructure are not represented in the
performance model. Likewise, extrapolation
of the performance characteristics from test
environment to target production environment
is constrained if the target environment is not
correctly and completely detailed.
Risks Associated with Decient External
Systems Details: Dening response time for
use-cases that rely on interactions with the
external system is constrained if the performance
characteristics of the external systems are not
known for the expected workload on the planned
hosting environment.
Risks Associated with Decient Database
Details: One of the key components of any
successful performance testing activity is the
availability of appropriate level of data in the
database. Results of performance validation may
be incorrect in the absence of the right amount
of data in the database. Thus, it is important
to capture the data volume and the expected
growth in the volume.
Risks Associated with Decient Usage Prole:
Decient usage prole may cause performance
validation activity to overlook some critical
scenarios under which the system is expected to



perform (for example, holiday season, peak time
of the month, peak time in the day, etc.) when the
load may be substantially higher. This can lead
to incomplete validation coverage and can cause
application failure in case the application is not
tuned or computing resources are insufcient in
an event.
PROPOSED MODEL FOR GATHERING
PERFORMANCE REQUIREMENTS
The nature of the challenges described in the
previous section makes it imperative that
the model that is adopted for performance
requirement gathering should not only focus
on the technology in terms of the tools and
techniques but should necessarily address
the challenges associated with people and
processes. For instance, resistance to share
data is a challenge that can be associated
with people issues. Likewise, proactive risk
management to address data quality and
unavailability issues and increasing awareness
of the performance engineering methodology
are essentials that must be incorporated into
the process of capturing requirements. One
such model that incorporates all elements -
technology, people and process - is described
below. The model aims to break the activity
of performance requirement gathering into
multiple phases:
Presentation of the business drivers and
requirement gathering process
Data gathering and analysis
Risk analysis and intervention
Requirement rationalization and
presentation of the requirements to the
stakeholders.
The activities in each phase are detailed
below
20
Presentation Phase: The purpose of this phase
is to increase awareness of the performance
engineering methodology between the
stakeholders and to highlight the importance of
the performance gathering exercise. The phase
is also important for dening the scope of the
requirements gathering activity and identifying
the key go-to people for soliciting information.
The activities in this phase are:
Activity 1: Present the Business Drivers for
Specic Performance Needs In this activity
the representative of the stakeholders
describes the business goals that are
motivating the need for the specic
performance requirements of the system.
These business goals form the primary
architectural drivers that dene the
performance of the system.
Activity 2: Present the Performance
Requirement Elements The key elements
of the performance requirements are
described to the stakeholders to make
them aware of the information that will
be solicited during the performance
requirement gathering activity. The
signicance of the information to the
performance engineering and validation
activities is shared along with the risks
associated with deciencies in each
element.
The key contact points for each
of the elements are identied, escalation
routes are discussed and periodicity for
getting responses from the contact points
is agreed upon. The purpose of this
activity is to increase awareness about
the activity. Resolution mechanism, in
case of people issues (resistance to share
data) are also identied.










Activity 3: Select the Use-cases using
Analytical Hierarchy Process (AHP) [3] The
purpose of this activity is to objectively
identify the top few transactions on
which the performance related activities
can be focused on.
Data Gathering and Analysis Phase: The
purpose of this phase is to solicit information
from different sources identied in the earlier
phase and to analyze the gathered information.
Activity 4: Channelize the Data
Questionnaires are prepared to capture
the details of the different elements
of the performance requirements.
Questionnaires enable the analyst to
comprehensively cover the requirements.
The questionnaires are shared with the
contact points identied in activity 2.
Activity 5: Analyze the Data Tools that
mine the log les or information in
database to determine key workload
parameters like session length,
concurrency, inter-arrival time, etc., are
used for creating the work load model
[4]. Forecasting tools can also be used
in this activity to understand the usage
trends and predict the future load on the
system.
Risk Analysis and Intervention Phase: This
phase is important for sharing the risks that are
identied during the requirement gathering
exercise and to highlight the potential impact
due to the risks. The activities in this phase are
Activity 6: Prepare and Share the Data Risk
Map On completion of the analysis
activity, the risks foreseen due to data
21
quality and/or data unavailability
are documented and shared with the
stakeholders. Interventions through
escalations routes as identied in activity 2
can be used to address the risks wherever
applicable. Effective intervention
introduces recursion in activities 4 to 6.
The phase is ideally concluded after all
the risks have been addressed.
Activity 7: Present the Performance
Requirements and the Risks All elements
of the performance requirements are
presented to the stakeholders. Risks that
are not addressed in activity 6 are also
highlighted in this activity.
Requirement Rationalization Phase: The
requirement gathering exercise must be extended
further to validate the requirements with the
help of analytical models and to rationalize the
requirements based on the output of the models.
Requirement validation, gap analysis and
recommending mechanisms to bridge the gap
are activities executed in this phase.
Activity 8: Validate Requirements
This activity is critical to understand
the systems capability to meet the
stakeholders performance requirements.
Performance models based on queuing
networks that use prototypes for
parameterization can be utilized for
modeling and understanding the
performance characteristics of the system.
The activity must be carried out on an
environment that closely resembles the
production environment.
Activity 9: Identify Gaps between
Expectations and Actual Behavior

















The information gathered in the
previous activity is used to compare the
expectations with the observed behavior
and gaps are identied.
Activity 10: Present Recommendations
and Initiate Sign-off Mechanism to
bridge the gap and the cost associated
with it is identied and shared with
the stakeholders. Requirements are
rationalized based on the cost-benet
analysis exercise. For example, queuing
models can be used to understand the
effect of adding a CPU on the response
time and throughput. In case of gaps in
either of these parameters vis--vis the
requirements, the impact and cost of
adding the CPU can be shared with the
stakeholders to enable better decision
making. Based on the analysis, either the
requirements are tempered or additional
computing resources can be introduced
in the system. Formalization of the
requirements document is also initiated.
CONCLUSION
Addressing performance issues is most often
an after-thought triggered by QoS or scalability
concerns raised by the end users. Proactive
efforts must be expended across the entire
software development lifecycle to ensure
that the performance issues are identied
early and xed before they risk the business.
Performance requirements form the focal point
around which all performance engineering
activities are executed in a performance driven
development. This implies that any deciency in
the performance requirements can signicantly
impact the performance engineering activities
and diminish their efcacy. Challenges across
different dimensions - people, technology and
22
processes can be encountered while dening
the performance requirements. A model that
weaves together all aspects of a performance
requirements gathering exercise is thus a
necessity.
REFERENCES
1. Edward D Lazowska, Quantitative
System Performance, Computer System
Analysis Using Queuing Network
Models, Prentice-Hall, Inc., 1984
2. Rajeshwari Ganesan and Shubhashis
Sengupta, Workload Modeling for
Web-based Systems, CMG Conference,
February 2003
3. Nidhi Tiwari, Using the Analytic









Hierarchy Process (AHP) To Identify
Performance Scenarios for Enterprise
Applications, CMG Measure IT, March
2006. Available at http://www.cmg.
org/measureit/issues/mit29/m_29_1.
html
4. Suhas S, Pratik Kumar and Mitesh
Vinodbhai Patel, Approach to Build
Performance Model for a Web-Based
System from its Application Server Logs,
CMG 2006. Available at http://www.
daschmelzer.com/cmg2006/PDFs/057.
pdf
5. Rational Unied Process, IBM. Available
at http://www-306.ibm.com/software/
awdtools/rup/
23
SETLabs Briefings
VOL 7 NO 1
2009
How to Design High Performance
Integration Solution?
By Arun Kumar
Well-defined assumptions and tautly designed
architecture for high performance can contribute
to robust solutions
W
hat does an organization expect from its
IT department? Sales team wants faster
IT systems to be able to drive down its products,
nance team wants reliable and secure systems,
management wants IT reports to show the
ground realities and the story goes on. To cut it
short, everyone wants the IT system to be faster,
reliable and scalable. Few major expectations that
an organization has in this competitive world is
from the IT systems that have to keep up with
the pace of evolving business needs and help
business to beat the competition. To meet such
high expectations, IT department has to evolve
to the next level and perform at the best possible
performance levels.
There is another aspect to this age
old problem. Applications mostly perform
well individually, but when it comes to
performance at enterprise level, the end to
end implementation fails miserably. It is like
an orchestra comprising of extremely talented
musicians at individual level but unable
to create a symphony unless directed by a
trained conductor. In an enterprise scenario,
most IT departments complain of inability
to scale-up integration layer that connects
the different applications. This paper will
discuss the different aspects of getting the best
performance from integration layer.
WHAT DOES THE PROBLEM ACTUALLY
LOOK LIKE?
A basic parameter of performance is time to
execute. Let us assume that an organizations
requirement is to synchronize the sales data
residing in applications located at different
ofces around the globe on a daily basis and
the head ofce is located in US. Based on this
sales data the head ofce sends instructions on
percentages of discount that individual ofces at
different locations can offer to their customers.
The issue here is that for the US ofce to decide
on the discount percentage, the daily sales data
should reach the head ofce.
24
This hypothetical situation also assumes
that an organization has implemented an
extract, transform and load (ETL) solution that
runs in a batch window of two hours during US
ofces night hours for the data to be picked up
from different locations. There is another batch
run of interfaces for reverse ow of data that
updates the entire regional applications one
after the other in different locations, for e.g.,
data is rst updated in the database of Hong
Kong ofce, second in Brisbane ofce and
then in Singapore ofce and so on. The system
should be updated in the night batch and be
ready for business users next day morning. Let
us assume a few probabilities:
Batch run fails to pick data from the rst
location, i.e., Hong Kong ofce. Batch
does not continue further. This results in
data not being updated for all locations.
Network latency causes the delay in
batch run and the systems are not
ready to be used by business users next
morning.
Database in the head ofce has gone
corrupt and the database needs to be
recovered from the data archived before
the batch run resulting in a delay.
The data that was sent from Singapore
has not been validated against the
business rules and led to the failure
in the update. This requires manual
intervention to correct the data and
again results in delay.
A new location has been added and
results in longer batch window and
added load on the integration layer.


A week before Christmas, there is a
massive sale in stores and huge sales
data needs to be processed, overloading
integration layer thus resulting in delay
in the batch window.
These are all hypothetical situations
and many more can be added, but to sum it up
various unforeseen situations can crop up that
can result in testing the performance of an IT
solution.
PRODUCT/TOOL SELECTION
According to a Gartner report, forty percent of
unplanned downtime is caused by application
issues [1]. There are multiple solutions available
in the market today to address a particular
performance requirement. But these solutions
address only a part of the problem. To get the
best performance from an implementation, the
solution should be able to meet performance
benchmark end to end.
If we look at the pattern in which
organizations acquire software products, we
will nd that organizations go for the best-in-
class products for a particular requirement.
Most organizations end up with a product from
different vendor in each application space.
For example, the organization would have
purchased hardware and operating system,
network solution, database solution, inventory
solution, point of sales solution, integration
solution, etc., all from different vendors. This
creates an IT landscape that looks nothing
less than spaghetti of products. On top of it,
there would be yet another set of vendors to
implement, support and maintain the solution
implementation.
Most decision makers are in a dilemma
when it comes to deciding the right tool/suite
to be implemented to meet the performance
25
needs of integration layer. They often have no
clue on how much effort and investment should
be allocated to make these implementations self
sustaining and cost effective in the long run.
Most organizations keep some buffer funds
just to purchase more hardware, in case the
software implementation is not able to give
expected results.
In such a situation the best way is
to shortlist a set of products based on the
functional and technical requirements that help
meet the integration reference model set for the
organization. Evaluate a product on the basis of
the following parameters and put the numbers
together in a decision matrix:
Rating: Products can be rated on the basis of
the percentage of functional and technical
requirements that is addressed by the product.




Skill Level: This is based on existing team/
vendors capability to solve issues in the product.
If a new product is implemented, then a proper
research needs to be done to check whether
enough skilled resource is available.
Product Support: Nature of product support and
the cost involved also needs to be considered.
Many organizations enter into multiple year
contracts without correctly gauging the actual
requirement.
Compatibility: Compatibility of the new
acquisition with the existing infrastructure has
to be checked. One can ask the product vendor
to do proof-of-concept (PoC) for evaluating a
product with existing infrastructure.
Hardware Sizing: Cost of hardware needed to
support the implementation should be assessed.
Debug/Patch Support: Support on debug
from the product vendor has to be considered.
It is always better to ask the product vendor
for the future roadmap lest the vendor plans
to discontinue the product or release a newer
version that might be incompatible with the
current implementation.
We can examine an incident where
a product vendor decided to deprecate/
discontinue a version of the product and launch
a new version in the next year. The concern was
that besides regular issues of migrating to the
new version, the product itself had become
unstable. Finally, the client organization had
to dump the product and look for a new one
in the market. As mergers and acquisitions
of firms have almost become the order of the
day, more uncertainty is added if a competitor
acquires the product vendor. Such changes
wreak havoc on the IT budget and destabilize
the organization itself.
All these factors need to be considered
in the decision matrix and then the IT
department can have a discussion with all
the stakeholders and have an objective look
at the product. This evaluation will help
A tool/application, when chosen for a firm, needs to be
evaluated against probable future support and stability
26
not only in evaluating the product but also
communicating the right expectations from
both sides.
DATA VOLUME AND FORMATS
One of the critical factors affecting the
performance of any IT solution is the amount
of data owing through the system. Most
organizations have very little visibility into the
growth of data accumulated over years. To start
with, when developing a new application, an
organization needs to x a time span for which
an application will continue to run before
being replaced or retired. For example, if an
organization is implementing a CRM package,
the organization should x a time span for the
implementation to be used. This can be done



by evaluating the product vendors support
contract and business needs that may drive the
usage.
The IT department should bring all
the stakeholders together and come to an
agreement on the data volume that will be
supported by the system. One major hiccup that
the IT department often faces is that the initial
requirements of data volume expected for rst
year often jumps up within a year and becomes
a major bottleneck later.
An ideal situation would be to predict
the data growth for the years to come based
on their business model. If need be, domain


analysts should be brought in to analyze the
existing data and predict a number for data
volume growth expected till the end of the
products lifecycle. The IT team then needs to
do performance testing with this data volume
before the actual implementation.
DESIGN PATTERNS
Most organizations today have a well developed
IT wing. This often means that they have a
developed set of applications that were added
over a period of time to the IT landscape catering
to the immediate needs of the organization
whenever and wherever needed. What most
businesses lack is the determination to modify
the existing applications or architecture
in the fear of disturbing the existing ones.
Organizations need to realize that fresh problems
need fresh methodologies to solve them. The
existing architecture has to be evaluated on
the basis of current set of requirements and
should be made ready to face the winds of change.
One key factor while designing high
performance solutions is the review process. The
major difference between a good system and a
bad system is categorizing the requirements
well to determine what is required and what is
good-to-have. This information becomes critical
when faced with the challenge of ne tuning a
solution. Most of the good-to-have features can
be removed/modied or allowed to be added
Frequent version changes, updates and escalating data
volume are the common, yet perennial issues in performance
tests of IT applications
27
later in further releases of the solution.
Another critical factor while designing
an integration solution involving business
process with human tasks is judging whether
complexity should be added to the automated
parts or in the human driven tasks. The decision
making process and the way organizations
respond to their suppliers and customers have
become complex. Complexity can exist in the
process or the roles assigned to the people at
responsible positions. McKinseys complexity
report suggests that organizations should not
evade this problem [2]. Instead they can benet
by re-evaluating their processes by focusing on
individual level and resolving issues by clearly
dening accountability, removing duplication
and streamlining processes.












WHERE DOES THE PERFORMANCE
PROBLEM ACTUALLY LIE?
In a large organization with multiple IT
implementations and many applications
being added on a regular basis, point to point
integration is a common architecture paradigm.
With business driving the IT implementation,
changes come quite regularly. This impacts the
IT landscape in bizarre fashion and sometimes
changes the IT architecture beyond recognition.
The hitches show up when this point to point
integration slows down the whole process. In
many cases these implementations are tweaked
to give good performance but no documentation



is maintained to inform a new user about vital
changes. As the number of applications increase,
so does the complexity of these integration
implementations. Such issues can be avoided
if an end to end analysis is done for each
integration implementation and documentation
is provided for each change/improvement
done. The biggest problem that can handicap
the correction process is identifying the source
or cause of the performance problem. A nely
tuned system can also fail if proper analysis
is not done at the inception phase. The best
methodology is to allow logging and keeping
an alert mechanism available to monitor the IT
implementation. The IT department can also
keep track of the performance of a solution by
monitoring on regular basis.
One way of doing it is to publish the
parameters that will be used for gauging the
performance of a solution. For example, a
stock ticker application is supposed to provide
updates within ve seconds to its platinum
customers. The IT team can average out the
response time for ticker and publish a report
once every week. In case of any slippage the
IT team can look at the bottlenecks of the
application by looking at the logs. Sometimes
the logging system also becomes a bottleneck,
so it is a good idea to acquire products that
allow the option of switching off the logging
system in runtime without affecting the
Assorted applications, multiple skill sets and zero
documentation of changes add to the complexity level of
performance tests
28
implementation. In case of a need to identify
bottlenecks, logging can be switched on.
Another concern in the existing
methodology is with organizations that
acquire monitoring products to satisfy a
particular need viz., network monitor tool,
web server monitor, etc. But these tools do
not measure how the implementation behaves
from end to end in the production process or
measure the performance at the users point of
view. In order to make the end user experience
better it is not enough to keep monitoring the
individual components; organizations need
to devise framework to monitor end to end
performance and measure performance at end
user level as well.






PHYSICAL/HARDWARE ARCHITECTURE
DESIGN
Most organizations follow the quick x remedy
of increasing the hardware for performance
issues related to meet end user expectations.
The hardware capacity is often increased by
increasing the number of servers, or increasing
the RAM capacity of each server, or increasing
scalability by adding more instances. These
remedies prove to solve the issue for a short span
of time. This ad-hoc issue management leads to
multiple issues later. In many instances it has
been found that an organization has acquired
many servers but none of them are being used to
their full capacity. This kind of over-budgeting

can be disastrous in this era of cost saving. The
need is to nd the optimum number that will
rule out the possibility of a blackout as well as
provide maximum utilization of underlying
hardware when the applications show average
usage.
LEARN AND IMPROVE
Most organizations purchase one product to
meet integration needs across the organization.
One of the easiest things that can be done is
to accumulate the knowledge from all the
implementations of the product and keep
a best practices knowledge base, for e.g., a
working document within the organization for
future reference. In case of multiple vendors
implementing the same product, it is a good
idea to establish Integration Competency Center
(ICC). ICC will require participation from all the
vendors as well as architects from different teams.
In the integration space, these learnings play a
critical role in achieving the best performance
from solutions. ICC can help in providing
following key benets to the organization
Dene an integration strategy to
encompass people, process and
technology dimensions
Dene enterprise reference architecture
that will be implemented across different

Ad hoc fixes to problems offer short term answers
to gaps in a system but they often entangle an IT
application inexplicably
29
departments/application suites
To move towards the common architecture
model like Service Oriented Architecture
(SOA), ICC will prove to be an ideal
platform to create shared common services
for all application suites
Dene and enforce an integration
development methodology
Dene non-functional requirements and
common frameworks
Dene common information model
Dene the knowledge management



framework and establish an integration
repository and promote reuse
Process to track the integration inventory
and software licenses
Develop an enterprise level funding
model for integration projects and tracking
ROI to demonstrate business benet
Establish integration governance model
and road map.
To cite a real life example, an organization
had maintained consistency across all the




integration implementations and audit
reconciliation was used as an effective tool. The
basic principle behind this methodology was that
the number of records retrieved from the source
was equal to the number of records loaded to the
target application. Such implementations can
remove any possibility of any data loss in the
integration layer and makes things easier for the
IT support team. In case of any record missing in
the target application, the IT support team just
needs to check that the audit has passed for the
particular interface in the integration layer.
LOOKING FORWARD
With Rich Internet Application (RIA) and
SOA bringing in a new wave of change in the
IT landscape, performance management of
integration solution has to get ready for the
next paradigm shift. In the new paradigm, the
end user will become the primary customer
and improving end user experience will
become the KPI for most organizations. In SOA
model, it will become imperative to track the
end to end transaction for better performance
management and solutions designed for
troubleshooting collaboration will rule the
market. With RIA becoming another key factor,
collaboration will dene the SLA for most of
the integration solutions. One key factor to
get better performance in SOA integration
is to analyze the run time relationships and

Integration Competency Center not only helps facilitate
coordination between different entities but also advocates
better practices to maximize an applications performance
30
ne tune the collaboration between various
services to get a better performance.
CONCLUSION
Organizations must understand that
performance needs will continue to grow with
time. To achieve maximum performance and
sustain it, performance engineering is the only
alternative. The advantages that a performance
tuned implementation brings along with it helps
the organizations create more value than their
nearest competitors, align their business strategy
with their IT process, easily adopt new changes
and get better Return On Investment (ROI).

REFERENCES
1. Theresa Lanowitz, Software Quality
in a Global Environment: Delivering
Business Value, The Gartner Application
Development Conference, Arizona,
2004
2. Suzanne Heywood, Jessica Spungin,
and David Turnbull Cracking the
complexity code, The McKinsey
Quarterly Series, 2007, No 2. Available at
http://www.mckinseyquarterly.com/
Organization/Strategic_Organization/
C r a c k i n g _ t h e _ c o m p l e x i t y _
code_2001?gp=1.
31
SETLabs Briefings
VOL 7 NO 1
2009
Leveraging Enterprise Application
Frameworks to address QoS Concerns
By Shyam Kumar Doddavula and Brijesh Deb
A well-engineered approach to QoS concerns in
the scope of applications performance can lessen
many a hassle
I
T systems are increasingly gaining strategic
importance in enterprises and have become an
integral part of business operations today. Thus
it has become imperative that the IT systems
deployed by enterprises not only fulll their
business functionalities but also cater to Quality
of Service (QoS) concerns including performance,
scalability and availability. Degradation in
any of these QoS aspects poses high risk to
enterprises in terms of productivity, revenue and
business continuity. With business transactions
increasingly going online, competition is only a
click away.
Challenges before IT managers who are to
ensure QoS of IT systems have become manifold.
IT systems have become very complex in terms
of interconnectivity, integration with external
systems, heterogeneous technology stack and
various emerging technology trends like Service
Oriented Architecture (SOA), Grid computing,
Web 2.0, virtualization, etc. Moreover, to keep
pace with todays dynamic business environment,
IT systems have to be exible and adaptive

enough to facilitate frequent releases with quick
turnaround time. Under the circumstances the
traditional reactive approach of addressing
QoS concerns in enterprise applications needs
a relook. In this paper we explore an approach
that ensures that applications are proactively
engineered to address QoS concerns.
A report estimates that poorly performing
IT applications cost industrialized nations almost
45 billion annually [1]. One of the reasons why
there are so many IT systems that fail to meet
customer expectations is because they are not
proactively engineered to meet QoS concerns.
Traditional software development processes
focus more on the functional requirements
of the application while QoS concerns like
performance, scalability, availability are mostly
afterthoughts. At best in system testing phase
applications are tested for these QoS concerns
along with functional requirements, but by
that time the system is already built. This lack
of focus on QoS considerations in Software
Development Lifecycle (SDLC) leads to serious
32
aws in application architecture, design and
code. By the time QoS related issues are detected
re-architecting and re-building the application
becomes an expensive proposition. Moreover
knowledge gained in applications having similar
non-functional requirements is not getting reused
and as a result each application development
team comes up with its own solution.
PROACTIVE QOS ENGINEERING
The rst step in ensuring that only applications
with expected QoS go into production is to
ensure that a robust gating mechanism is in place
before applications are allowed to go live. Even
though some of the QoS concerns are tested in
system testing phase before the application goes
to production, it is still a reactive approach. As
the system has already been built by that time,
xing any QoS issue will involve re-architecting
and re-implementing the application which is an
expensive proposition. Thus we need an engineering
approach which addresses QoS concerns
proactively throughout the whole SDLC process
starting with the architecture and design phase.
PRESCRIPTIVE ARCHITECTURES
Architecture phase is a key stage in SDLC
that lays the foundation for non-functional
characteristic of the target application. Key
architectural and design decisions including
the application structure, basic building blocks
and their relationships, products and other
infrastructure choices are made in this phase.
Lack of QoS focus herein may lead to structural
issues that prove to be very expensive to x in
subsequent phases of SDLC.
A good mechanism to focus on QoS
concerns early in SDLC and address them
through the development process is to identify
applications patterns that are developed in
an enterprise, identify the common technical
and QoS concerns and dene prescriptive
architectures to address those concerns.
Prescriptive Architectures (PA) ensures context
based usage of industry standards and best
practices by dening solution strategies for
typical application patterns which can then
be tailored to dene specic application
architectures.
Common enterprise application patterns
include B2C, Enterprise Back Ofce (EBO), B2B,
etc. In case of enterprise back ofce applications,
some of the typical QoS concerns include -
Availability: Restart and recovery of the batch
processes from the point of execution in case
there is failure
Performance and Scalability: Ability to process
huge amounts of data in a xed time window
Manageability: Ability to monitor the status of
batch jobs and receive alerts on failures etc.
Now, instead of each application
architect addressing these concerns individually,
every time a back ofce application needs to
be developed, a better approach is to dene a
prescriptive architecture for this category of
applications so that there is consistency and
lesser risk of structural issues.
Figure 1 shows a snapshot of the logical
diagram of a prescriptive architecture for back
ofce applications developed using enterprise
java technologies (where there is lot of complex
processing logic requiring an O-O language).
The prescriptive architecture uses
Staged Event Driven Architecture (SEDA)
for architectural requirements like parallel
processing, pipelining and asynchronous
processing. Parallelization is achieved at
two levels job and tasks. The Job Scheduler

33
component helps schedule batch jobs by
providing a mechanism to dene events that
triggers jobs and also inter-job dependencies.
Job execution engine splits the input data into
job instances using a parallelizer component and
executes them in parallel. Each job can then have
a sequence of tasks that are executed by Task
Execution Engine.
The batch management application
programming interface (API) provides the
mechanism for monitoring and managing the
jobs. It helps to monitor the running job and
task instances and also to manage them for
example, stop job instance, etc. It uses JMX to
enable the management and monitoring of the
job processing using tools like Tivoli.














Some of the key architecture strategies
applied to address the architectural concerns
identied earlier include:
Availability: Persistent queues for job inputs
are pooled and processed by a batch engine that
maintains the batch status in a persistent data store.
Check pointing mechanism is employed for the job
processers to maintain the state of processing.
Performance and Scalability: Parallelization
of processing and pooled worker thread
architecture can look at these issues. Asynch
processing with queues inputs and staged event
driven processing can also address the concerns
around performance and scalability.
Figure 1: Prescriptive Architecture for EBO Application Source: Infosys Research
34
Manageability: Instrumentation using JMX and
batch administration component providing API
and UI can address manageability.
Thus prescriptive architectures enable the
application architect to reuse proven architectures
and reduce the risks of not meeting QoS concerns.
CHALLENGES IN INSTITUTIONALIZING
PRESCRIPTIVE ARCHIETCTURES
A key challenge in addressing QoS concerns
through proactive engineering process is to ensure
that the prescriptive architectures and design best
practices are followed by the application architects
and developers when an application is being
developed. Traditional approach to address this
is to provide a knowledge base on prescriptive
architectures that application architects and
development teams can refer to. Often this knowledge
base is in the form of architecture documents.
Subsequently manual reviews are conducted in
architecture, design and build phases to ensure
that the prescriptive architecture is being followed.









However this is not a very effective
approach to institutionalize prescriptive
architectures. Application development team has
to invest time and effort in understanding the
prescriptive architecture and thereafter codify
it. This leads to multiple code bases for the same
prescriptive architecture in an enterprise, which
is difcult to maintain. Moreover manual reviews
required to check conformance is time consuming
and adds to project cost. Any changes suggested
during these reviews also leads to rework.
ENTERPRISE APPLICATION FRAMEWORKS
A more effective technique is to codify different
prescriptive architectures and best practices into
application frameworks and standardize them
across the enterprise so that applications built
using the frameworks automatically benet from
the best practices embedded in the framework
components.
Figure 2 illustrates how prescriptive
architecture and enterprise application framework
Figure 2: Addressing QoS Concerns using Enterprise
Application Framework
Source: Infosys Research
35
can be leveraged to address QoS concerns in an
application. Prescriptive architectures imbibe
standards, best practices and design patterns
to cater to QoS concerns specic to different
architectural patterns. The architectural building
blocks identied in prescriptive architectures
can then be codied into enterprise application
framework services and components. While
specic application architectures can be derived
from prescriptive architectures, framework
components and services can help jump-start
their realization while addressing relevant QoS
concerns.
Figure 3 gives a snapshot of RADIEN
Figure 3: RADIEN Framework Source: Infosys Research
36
an enterprise application framework for
developing application using enterprise Java
technologies. The framework consists of a set
of infrastructure services like logging, caching,
notication, etc., that are commonly needed in
any application development. It also consists
of several sub-frameworks like presentation,
persistence, business rules, batch processing,
etc. These out-of-shelf framework services have
been derived from prescriptive architectures.
Thus design patterns, standards and best
practices recommended in the prescriptive
architectures come in-built in framework
services and benet all applications that use
them.
Advantages of using RADIEN
framework in implementing enterprise
applications are manifold. It helps jump
start application implementation and enables
development teams to address QoS concerns
with minimal risk. Framework services can
be benchmarked for different QoS aspects
that minimize the probability of application
outrages and consequent cost implications.
This also reduces Cost of Quality (CoQ) as
same framework services are used across
several applications.




















CONCLUSION
Traditional software development process that
takes a reactive approach towards QoS concerns
is ill-equipped to meet stringent non-functional
requirements of todays enterprise applications.
In this paper we have explored a proactive
QoS engineering approach wherein enterprise
application framework is leveraged to address
QoS concerns through software development
lifecycle starting with architecture and design
phase.
REFERENCES
1. The Cost Benefit of Monitoring
Applications, Butler Group, May
2005. Available on http://www.
wilytech. com/solutions/resource/
index.php
2. Shyam Kumar Doddavula and Sandeep
Karamongikar, Designing Enterprise
Framework for SOA, 2005. Available
at http://today.java.net/pub/a/today
/2005/04/28/soadesign.html
3. Matt Welsh, SEDA: An Architecture for
Highly Concurrent Server Application.
Available at http://www.eecs.harvard.
edu/~mdw/proj/seda/
37
SETLabs Briefings
VOL 7 NO 1
2009
Performance Engineering in ETL:
A Proactive Approach
By Hari Shankar Sudhakaran and Anaga Mahadevan
Performance checkpoints and statistical
methods can help optimizing data
management process
R
eal-time access to data is imperative for
quick decisions as change is the order of
the day. Decision making is now driven not
only based on historic projections but also
considering day to day trends. To add to the
existing concerns are the expanding businesses
that churn out volumes of data that needs to
be integrated and consolidated into a usable
format.
The data integration systems in a data
warehouse environment integrate the data
from various sources in different formats
and provide a consolidated information
repository for easy and quick access for the
decision support systems. The primary goal
of the design of the data integration platform
is to ensure timely data availability and
resolve dependencies with other downstream
applications.
This paper discusses a few realistic
recommendations for optimizing the
performance of the data management process.
These recommendations will proactively









ensure that the system is designed for required
performance. The performance of new and
existing extract, transform and load (ETL)
processes can be enhanced by introducing
performance checkpoint reviews for each
of the life cycle stages and measuring the
ETL performance through Statistical Process
Control (SPC) mechanisms.

TYPICAL ETL PROCESS
The most complex and critical aspect of
data integration system is ETL process. ETL
processes the data that is used by the reporting
tools and the accuracy and dependability of
the reporting systems depends on the quality
of this process.
ETL processes comprise of multiple
steps that include retrieving the data
from all operational and disparate sources;
data cleansing and validation of data against
business rules; integration; and transforming
the data into meaningful information. In other
words, the process sets the stage for reliable
38
reporting. The reports are used to make business
decisions and tracking the performance of a
company.
ETL CHALLENGES
The key challenge with ETL solution is to achieve
performance expectation while delivering
data from disparate systems to the business
community for timely use. There can be external
applications that use this processed data as
source for further processing and projections.
Since the ETL processes are quite
complex, ETL applications designed with
inadequate focus on performance aspects have
run into feasibility/scalability issues soon after
their implementation. Also, increasing volumes
of data may require ETL application design to
be scalable to accommodate more frequent data
refresh cycles due to change in business needs.
Apart from this, data quality issues and
complex business rules to be applied on source
data increases the processing load. There are
stringent time window constraints to access the
source systems as well. The time available to
extract from source systems may change, which
may mean the same amount of data may have to
be processed in less time.
The robustness of the ETL design and its
performance scalability are critical to the success
of an ETL solution implementation.
PERFORMANCE OPTIMIZATION:
CASE STUDY
When ETL performance is not given required
signicance during ETL development life cycle,
we end up taking a reactive approach to x
the ETL performance requirements. Reactive
approach invites rework, schedules delays and
arouses dissatisfaction as it is never planned for.
Here we present a case study and propose
recommendations that can help prevent/identify



performance bottlenecks early in the project life
cycle and emphasize the need to proactively
address performance in the ETL process.
This case study is of a project that
implements a DW/BI solution for a retail
customer, where ETL as a major component of
the solution. The development of the application
was completed as planned but when it was
opened up for historic data processing and later
for incremental data refreshes, ETL performance
deviated from the expectation and ETL batch
runs were inconsistent.
A task force was formed to identify the
performance issues and it started addressing
the issues based on the severity of impact.
Though reactive steps could resolve the issues,
it impacted the project due to rework, which
further involved additional resources and for
sure an extension of the project schedule.
Performance Bottlenecks
The key performance bottlenecks faced during
the project are discussed in the following
section:
Elongated ETL Window and Inconsistent Process
Time: On analyzing the reasons for the elongated
ETL window and inconsistent time we identied
huge volumes of data being updated as part
of business rules. There were no effective
checkpoints available to gure out the exact
performance bottleneck process. It is therefore
important to capture logging at a low level at
least in the initial stages of an application so that
problem identication becomes easier.
Additionally, there was no data available
at the design stage to validate if the design could
deliver as per performance expectations. To
avoid such situations, access to past project data
is recommended so that there are parameters
to judge the performance of the application as
39
per the expectations. Environments also play
a very crucial role in performance as unstable
environments result in varied load time across
environments. In the given situation, initial load
statistics did not prove right to extrapolate the
complete load time.
Unexpected ETL Jobs Failures: There were
some inherent product issues and database
cluster locks that caused the ETL jobs to fail
unpredictably and frequently. Failures due to
insufcient table space size (especially temp
sizes) were also encountered as they were not
adequately estimated.
Unpredictable ETL Process Time: The ETL process
performance was different with change in the
data volume. The extrapolation rule failed
when applied to the ETL process. There was
no mechanism to predict the behavior of the
processes for various volumes of data sets.
Reactive Approach
Performance issues were identied, analyzed
and resolved through reactive steps to improve
the performance. There are four key factors
that inuence performance of an ETL solution.
Reactive solutions were developed to address
the performance issues that would primarily
fall under any of the categories as shown in
Figure 1.
Following are the reactive steps taken
against each of the core dimensions to improve
performance:
Data Model: The data model structure plays a
very critical role in the load and query time in
a data warehousing project. The data model
has to be futuristic while also considering the
appropriate data to be stored. Many a times data
modelers include too much information and not
all may be of importance to the organization.













In our project, we remodeled the FACT
tables to reduce the updates and removed the
self-referencing columns. The data model was
designed focusing mainly on the expectation
from the report response time and this moved
all the processing to the ETL. Every column that
was loaded was validated for performance from
both ETL and reports. Some columns that were
easier to be handled at the reports were moved
to the reporting layer. Data model was optimized
for both query and load performance.
Business Rules: The ETL process in any given
situation includes application of complex
business rules to resolve data inconsistencies and
categorization of data. Thereon, business logic
classies the data to dene the purposeful key
performance indicators (KPIs) so that remedial
action can be taken. The reports highlight the
performing and non-performing categories
based on the KPIs.
Since the business rules had to be applied
to a huge set of transaction data, much time
was consumed for every transaction done.
An analysis was done to check if this could be
applied on a sample set and then propagated to
the entire set. This ensured that the application
Figure 1: Factors that Influence ETL Performance
Source: Infosys Research
40
could be tuned to optimum level.
Few of the inconsistencies and quality
issues in data were better arrested at source by
using data quality tools. This removed few of the
additional data validation rules at ETL layer.
Data Volume and Characteristics: Staying
competitive in todays business demands the
capability to process increasing volumes of data
at lightning speed. Data warehouse users always
experience that access to processed data is slow.
Apart from availability of data the accuracy
of the data is also very crucial. But achieving
accuracy and speed with large, diverse sets of
data can be challenging.
Since a huge volume of data was getting
updated as part of the ETL process, the design
was revisited and altered to minimize the
updates, thereby reducing the processing time.
Intermediate commits were introduced and
data was processed in smaller logical volumes.
This cleared up the undo and temp table spaces
at logical intervals, improving the database
performance.
Environment: The infrastructure of a data
warehouse is built while taking into
consideration the total number of users accessing
the application, concurrent users and growth
rate expected on a yearly basis.
Apart from making sure that the code
of the ETL is tuned for performance, it is also
important that the infrastructure supports it. The
congurations of the infrastructure have a key
role to play in the performance of any application.
as are also to be designed for performance. We
had changed the following parameters in the
database to optimize it
Parallelism
Partitioning
Indexes/Partitioned Indexing






Usage of right data block sizes
Increased PGA memory
Estimation of the table spaces based on
data volume processed daily
The chart in Figure 2 captures the
change in run performance after taking reactive
measures to address the ETL performance
issues.
HOW TO MAKE IT PROACTIVE?
While we took series of reactive steps to
counter the performance issues, we were on
the look out for possible proactive measures
that can arrest these issues at the inception. We
Figure 2: Comparision of Daily Load Performance
Source: Infosys Research
41
were able to device few strategies that could
detect potential threats to ETL performance.
Proactive mechanisms were aimed at detecting
possible ETL performance issues at early
stages and to predict the ETL performance on
the target environment.
From the case study, we jotted down
a few strategies that can address future ETL
performance issues proactively:
Performance checkpoint reviews
Adopting SPC to evaluate ETL
performance
PERFORMANCE CHECKPOINT REVIEWS
As performance testing is conducted during
the later stages of the development life
cycle; traditionally towards the closure of
the life cycle, project teams take corrective
action to resolve performance issues. Though
reviews happen on functional/non-functional




requirements, architecture and design of a
solution, it is noted that the review team is
often not equipped with sufcient parameters
to detect early signs of performance issues.
Most often than not, review effectiveness
wholly depends on the experience of the
review team.
We recommend an approach where
the reviewer veries if the performance
requirements are captured, evaluated and
tracked over different phases of the project.
These reviews will be driven through a review
checklist that will have suggested checkpoints
based on past project information. The aim of
this review will be to uncover all performance
issues at the earliest stage possible to safeguard
performance expectations. Review effectiveness
can be improved when the review team has
access to best practices and learning. Figure 3
indicates the key checkpoints for the reviews
depending on the ETL development phases.
Figure 3: Performance Engineering Checkpoints Source: Infosys Research
42
It is recommended to supplement the
development checklists with an additional
performance checklist.
Below we discuss a few performance
specic checkpoints suggested to be
validated.
Volumetric Analysis
During discovery phase volumetric
information viz., data volume at the source,
incremental data volume, expected data
growth, etc. is to be collected. These have to
be analyzed with the data retention/archival
requirements. This information will also be
an input to the performance testing and will
help in devising strategies to manage target
data growth.




Performance Feasibility
ETL performance expectations are to be
validated against the feasibility vis--vis
environment and data. Past project information
can be used to get a guideline on the expected
ETL performance.
Establishing Performance Goals and Monitoring
Based on the performance feasibility study
with the past project data, project specic
performance goal shall be derived from
organization performance guideline. Using the
statistical technique the performance of ETL in
a project can be evaluated.






Load/Volume Testing
In the design phase the load/volume test has
to be planned and as per the goals and metrics
dened.
Data should be captured for various loads
and volume scenarios in the build environment.
Load/volume test should also be conducted
in the integration testing and acceptable test
environments, and comparison of data will
indicate probable performance issues that may
be encountered during implementation. This
will ensure that there are no surprises when the
code is deployed in production.
PERFORMANCE BASELINES
Performance baselines can be established
based on the data points from past projects
that will give good indication on expected
performance of an ETL application.
Performance baselines can be created based on
parameters like ETL tool used, OS platform,
complexity of the data transformation rules,
volume of data involved, etc.
The prime advantage of these baselines
is to provide fair performance expectation for
a given ETL application based on organization
project history. This guides the project team
to set reasonable performance goals for the
application. This is analogous to the process
capability baselines that organizations might
maintain.
Checkpoints for performance tests should be in place
to avoid cropping up of unwarranted issues when the
application goes live
43
When we have performance baselines,
we can use the SPC techniques to measure
and monitor the ETL performance and take
necessary action for further improvement,
wherever feasible.
ADOPTING SPC FOR ETL PERFORMANCE
EVALUATION
SPC technique is an effective method to
monitor a process through the use of control
charts. Organizations use SPC to measure
process stability. We can adopt statistical
methods like moving averages to measure
ETL performance stability. SPC uses the
organization performance baselines to dene
the acceptable limits within which we expect
the ETL performance to fall.
ETL application development projects
will consider the organizational baselines
depending on the solution characteristics
(ETL tool, platform, database, etc.). ETL run
performance is then plotted and analyzed to
check if most of the data points t into the
baselines (organization or project-specic).
For data points that fall beyond the limits call
for investigation of the cause.
The baselines are to be revived as and
when ETL project data is available. Such ETL
performance baselines will further provide
predictability in what can be achieved based
on past project information.
PERFORMANCE GOALS / METRICS
The organizational performance baselines
discussed in the previous section will be used
to derive the project specic performance goals.
These performance goals/metrics are dened
based on the performance requirements of the
ETL solution.
Organizational performance baselines
and project goals will be treated as the control




















limits while plotting the control charts.
Projects can dene warning limits and plan
to take action for data points falling beyond
these limits. This will help projects detect
performance issues before they go beyond the
scope of goals set.
CONCLUSION
Performance issues can create roadblocks in
implementing ETL solutions. Effective way to
manage performance issues is to proactively
assess and employ preventive measures.
To maximize the benets performance
engineering has to be part of the entire life
cycle rather than just the implementation.
It is also vital to validate if the proactive
controls are effective. For example, were the
checkpoint reviews able to detect early signs
of ETL performance? ETL performance model
can be developed for ETL solution that in turn
can provide predictability of the performance
under scalability conditions.
REFERENCES
1. Ralph Kimball and Joe Caserta, The
Data Warehouse ETL Toolkit: Practical
Techniques for Extracting, Cleaning,
Conforming, and Delivering Data, John
Wiley and Sons, 2004
2. Statistical Process Control. http://
en. wikipedia. org/wiki/statistical_
process_control
3. Sid Adelman and Larissa Moss, Data
Warehouse Failures, 2000. Available at
http://www.tdan.com/view-articles/
4876
4. The Challenges of Assessing ETL
Product Performance, 2006. Available
at http://www.tdwi.org/Publications/
display.aspx?id=8282&t=y
5. Performance Tuning Wiki, Oracle
44
Database. Available at http://wiki.
oracle.com/page/Performance+Tuning
6. Data Warehouse Training: Design Tips,
Kimball Group. Available at http://
www.kimballgroup.com/html/07dt.
html.
45
SETLabs Briefings
VOL 7 NO 1
2009
Integrating Performance
Engineering Services into
Enterprise Quality Framework
By Ganesh Kumar M, Srikanth Chandrasekhar, Sundar Krishnamurthi and Yiqun Ge PhD
An end-to-end approach of integrating
performance engineering services can match high
performance needs
I
n an age where resul t s are expect ed
instantaneously, organizations must realize
that there is no replacement for high performance,
especially so when it is all about mission-critical
business applications handling millions of
transactions worth of data. High performance
and positive end-user experiences are always the
untold core needs and wants of this competitive
era.
Many enterprises however today are laid-
back and spend a signicant part of their budget to
resolve critical production issues. Failing to scale
in todays mission-critical business applications
in Banking, Financial Services and Insurance
(BFSI) and lack of enterprise-wide mandate to
institutionalize design for performance for all
mission critical applications are amongst the crux
of the issues that organizations face today.
How can organizations adopt a proactive
and structured model that will ensure scalability
and high performance of their mission-critical
applications? How can enterprises contribute to
an enriching end-user experience when working
with such applications?
The A to F approach provides a robust
end-to-end model for enterprises to integrate
performance engineering services into their
enterprise quality framework. This model
can be viewed as an eye-opener that can take
organizations to the next level where high
performance, a positive end-user experience
and wise budget spend are end of the day
takeaways.
This paper details the A to F approach
and includes a case study that can be applied to
organizations that desire to scale up in creating
high performance applications. It also showcases
the tangible quantitative and qualitative benets
46
that can be achieved by institutionalizing this
model.
Understanding As-is gaps, tools and
processes in testing business applications for
performance, understanding strong Business
need, Conceptualizing and strategizing to
integrate performance engineering services
into enterprise quality framework, Defining
core metrics and processes, Evangelizing
performance engineering service offerings
with stated business benets and Focusing on
providing governance in enterprise architecture
constitute this A to F approach.
THE A - AS IS GAPS, TOOLS AND
PROCESSES
Organizations around the world have invested
a considerable amount of time and attention to
problems in their production applications that
cause outages. Metrics are readily available
on causes and impacts of data problems, code
and design bugs, network connectivity issues,
hardware failures and the like. However,
not much is done to quantify the impact that
performance issues and degradation cause on
the same systems.
The quality framework adopted in most
organizations is focused towards the correctness,
consistency, repeatability and improvement
of processes. Deciding on which parts of the
software development process need attention,
requires deep introspection. Despite the universal
acceptance that quicker-faster-better is the only
way to survive, organizations neglect investing in
specic tools and competencies that can engineer
applications for the quicker and faster aspects
that actually mean better in most cases. Fallouts
of neglect are poorly defined non-functional
requirements, lack of knowledge and expertise
in specic performance guidelines for coding
practices, non-existent to minimal stress, volume

and endurance testing and last but not the least
limited ability to troubleshoot and identify
performance bottlenecks in poorly performing
applications. Most of us are very familiar with
the graph that shows increasing cost-to-x with
advancement in the SDLC. The reality of the cost-
to-x extends as much to performance degradation
and outage as to any other manifestation.
Typical investment and effective usage of
the right proling, modeling and stress testing
tools also leaves much to be desired. Quality
frameworks for the most part do not dene the basis
on which the degree of performance engineering
required for an application can be judged. This
allows project teams ghting deadlines, budget
constraints and scope changes to put performance
at the bottom of their priority lists.
THE B - BUSINESS NEED
Performance management has always been
essential for ensuring enterprise service delivery
quality and availability. Today, meeting SLAs
and end-user performance requirements is even
more challenging due to increased complexity of
infrastructure and mission-critical applications.
Very often, the industry faces various performance
issues in the production environment that
signicantly impacts service delivery availability
and quality. Improved application performance
cuts productivity losses by as much as 40% [1].
In order to qualitatively and quantitatively
assess business need for integrating performance
engineering into enterprise quality management
system, we conducted a case study of a nancial
company on their production downtime incidents,
root causes and permanent solutions for mission
critical applications. This study showed that,
~42% of the total internal productivity time lost
(or ~25% of downtime incidents) is caused by
performance/capacity related issues.
This is mainly because in traditional SDLC,
47
performance engineering is either informal
and ad hoc, or is integrated only at the testing
phase. In other words, it is integrated after
design is completed and code construction
is done. There are often insufficient non-
functional requirements for performance
testing. When performance is identied as a
problem, developers often need to go back to x
design or code issues. Our cost analysis study
showed that, it will be very expensive to x
performance issues at the later stage of SDLC
or in production. Figure 1 shows detailed root
causes categorized by SDLC or different quality
management stages.
There is a need for an early integration
point beginning at requirement elicitation, so
that design is based on business use cases and
predicted usage scenarios. Our study also shows
that, average productivity lost per incident caused
by performance/capacity related issues is ~2.18
times of non-performance/capacity related issues.
On an average, a performance/capacity problem
has much more recurring incidents than a non-
performance/capacity problem in production
environment. Thus potential cost avoidance from
resolving a unique performance/capacity related
problem at an early stage is much higher than
xing other problems (~3.5 times).
To sum it up, the potential cost avoidance is
signicant if we resolve ~75% of the performance
issues before they reach production stage, or if
we proactively resolve the issues by periodical
capacity planning, conguration auditing, etc.,
in the production environment.
This calls for a paradigm shift from
informal and ad hoc performance engineering
toward application-centric holistic approaches.
The entire technology stack and platform
need to be optimized for the best performance
improvement at business transaction level for
meeting SLA. Proactive enterprise performance
engineering should focus across the life cycle of
the application and is an on-going process that
reduces risk, improves the cost-effectiveness and
time-to-market for performance-critical systems.
Integrating performance engineering services
into the enterprise quality framework as well as
post-deployment phase, will change the current
software development paradigm to performance
driven development. Thus, performance issues
can be resolved in the earlier stage of the
enterprise quality management phases, to avoid
major software design and system deployment
architecture changes.
THE C - CONCEPTUALIZE
Integrating performance engineering into the
fabric of the SDLC requires it to be an integral
part of the quality framework. Most organizations
invest great amounts of time in gating the
various stages, dening entry and exit criteria,
formalizing the review, defect capture and rework
processes and nally the release audit. With all
these processes in place it is a far simpler to fold
performance engineering aspects into existing
processes than most organizations imagine.
Figure 1: Production Downtime Incidents
Source: Infosys Experience
48
Figure 2 represents basic footprint
performance engineering activities should have in
the SDLC. The biggest battle to win is to convince
the leadership of the impact performance
degradations and outages have on the client
and user experience, and the benets that a little
diligence and performance engineering tools and
utilities can provide.
THE D DEFINE
Prior to updating the quality methodology to
incorporate performance engineering processes
the organization needs to assess whether it has a
sufciently robust and exible performance testing
infrastructure. Since actual performance testing is
the closest an application can get to behaving
as it would in the production environment, it
is the most important link in the chain that the
organization needs to set up.
It is recommended that an industry
standard tool that is scalable and capable of
generating and sustaining large loads be brought
in and implemented. The choice of tool should
be based on parameters like the organizations
requirements for peak volumes, typical platforms
and protocols that would be tested, concurrency
required for test sets, geographical distribution
of the infrastructure for the Applications Under
Test (AUT) and quality of support available from
the product vendor.
The definition of the performance
engineering activities and the relevant processes,
artifacts and documents to support them can be
formalized as in Figure 3. The required tweaks
Figure 2: Basic Performance Engineering Activities Source: Infosys Analysis
49
and updates would need to be decided based on
the feedback from actual execution.
The identication of an appropriate set of
measures that will provide quantitative evidence of
the adoption and effectiveness of the performance
engineering function is very important. The
following metrics can be tracked to this end:
Performance Testing Utility Management:
Number of customers, number of requests for
service, total number of projects, actual percentage
of projects engaging the performance engineering
team, actual versus expected percentage of
projects engaging the performance engineering
team and ROI.
Performance Engineering Project Management:
Effort variance, schedule variance and customer
satisfaction rating
Application Project Metrics: Transaction
response time versus Services Level Agreement
(SLA), transactions per second versus SLA,
transaction error rate versus SLA, system
utilization versus goals, business requirements
coverage and high severity defects.
Productivity: Test scripts per person per day,
validation and optimization runs per person
per day, end-to-end performance validation/
optimization projects per person per month.
THE E - EVANGELIZE
It is imperative for organizations to understand
the importance of integrating performance
engineering services into the enterprise quality
framework and also evangelize the key tangible
benets that can be achieved by bringing about
this change. Evangelization of this model, its
acceptance, adoption and practice by human
resources are various steps in evolution chain of
organizations that yearn to design and deploy
high performance applications.
Leadership can adopt a three-step
Figure 3: Artifact Categorization vis-a-vis SDLC Stages Source: Infosys Analysis
50
evangelization agenda to showcase how
integrated performance engineering service
offerings can bring value-add to the table.
As a rst step, organizational leadership can
conduct periodic and recurring brown-bags,
seminars and knowledge sharing sessions to
various levels in the organization hierarchy.
They can then emphasize the importance of
adopting an integrated performance engineering
approach mandated by quality standards using
metrics of sample projects that were beneted
by this model. The quantitative benets can be
showcased in terms of improved cost of quality
and resultant dollar saves.
Secondl y, qual i ty assurance team
can popularize this model by stressing its
importance in every quality audit and marking
any non-compliance items that they witness and

then enforce closure of these items. The third
step of evangelization constitutes recognition
of proj ects that adopted this model and
hence shown cost benets and high end-user
satisfaction. Sponsors of such projects should
announce cash rewards and publish recognition
certicates for teams that complied and showed
results.
THE F FOCUS ON GOVERNANCE
Institutionalizing robust governance for
performance is an essential mandate for
organizations that care for high performance
in their business applications. Governance
f aci l i t at es peri odi c checkpoi nt s where
organizations can study effectiveness of
adoption of performance engineering service
offerings, analysis of production problems and
related metrics, realization of quantitative and
qualitative tangible benets and emphasis on
tailoring performance engineering services to
required level.
Firstly, governance body can collate data on
number of mission-critical business applications
that demand high performance during a
slated period of evaluation. Categorization
of applications based on adoption and non-
adoption of performance engineering services,
related production metrics on performance
issues, application downtimes and associated
cost impact can be captured. Consolidated
cost avoidance achieved through performance
engineering services can be tabled down. What
if analysis can be done for applications that fail
to adopt performance engineering services and
resultant cost impact can be derived accordingly.
Finally, governance body can prepare and
publish a performance scorecard for all business
applications to senior management.
Periodic review of performance scorecards,
discussion on improving performance metrics
and trending of key parameters such as
downtimes and cost impact and cost avoidance
by institutionalizing performance engineering
services, can help organizations spend their
technology budget, wisely and effectively.
Problem simulation can help in assessing
unforeseen hitches and a preparation for the same
can help in budgeting
51
CASE STUDY
To validate this model, we conducted several
case studies on performance engineering projects
and activities involved at different phases of
SDLC and production for a nancial services
company. We also observed how this company
is integrating performance engineering services
into enterprise quality management framework
on a phased approach, by expanding and
integrating services from performance testing into
various performance engineering activities. Non-
functional requirements also became mandatory
for these projects.
The as-is gap and production downtime
cost analysis showed that, many production
downtime incidents could have been proactively
prevented during various stages of SDLC by this
integration. Many performance testing, capacity
planning and other performance engineering
projects could have been more effective with
adequate and accurate requirements.
Current business needs have led us to the
fact that many applications in this environment
are customized vendor products. Performance
tuning and validation at integration phase
becomes crucial to help improve end-user
experience, prevent performance degradation
and outage, and identify potential performance
related risks to integrate these applications into
the enterprise infrastructure. More signicantly,
the application team separated from performance
concerns could focus on tasks at hand to ensure
software quality.
Based on the analysis results it was
found that the majority of the issues were due to
either incomplete or bad requirements or poor
integration practices. The team was thus able to
conceptualize the services that would be required
for the problems the organization faced. These
services also needed to mandatorily be enforced
into the enterprise quality framework in order to
be effective.
In general, the study demonstrated that
with clearly defined processes and metrics,
reusable methodology and knowledge artifacts, as
well as a robust performance testing environment,
enterprise will be able to avoid billions of dollars
on productivity loss, increase service quality
and availability, reduce Total Cost of Ownership
(TCO), provide high ROI from improving
poorly performing applications, and achieves
performance and scalability goals for complex,
mission and business critical applications.
For a large nancial services client, the team
evangelized the services by depicting the cost that
performance outages and degradation are having
and presenting process improvement proposals
to modify the existing quality methodology. This
enabled the performance engineering activities
to be ingrained into the software development
lifecycle enforced through the enterprise quality
framework.
Effective requirements gathering can also help in
gauging performance needs and build an appropriate
testing environment
52
In addition to incorporating performance
engineering activities into the quality methodology
the process improvement proposals submitted
also updated the various gating stages and
audit checklists used to validate quality and
correctness of deliverables. What if analysis and
cost optimization calculations were performed
and metrics reported to senior management
emphasizing the focus on governance. It is
important to note that if the enterprise quality
methodol ogy al ready fol l ows an Entry,
Transaction, Validation and Exit (ETVX) model
the checkpoints are far easier to implement.
The process improvement proposals are
being approved and incorporated into the next
release of the quality framework. Meanwhile for
a critical nancial planning tool suite project, the
suggested approach of combining application
profiling, performance and memory tuning,
and stress testing has significantly helped
improve overall user satisfaction with improved
performance and proactively prevented/
reduced performance degradation and outage in
production.
CONCLUSION
A strong impetus is necessary for organizations to
look beyond functional design and plan effectively
for high performance in applications. Performance
engineering services mandated through quality
standards as part of enterprise quality framework
is a pro-active approach that organizations can take
to avoid key capacity issues and bottlenecks that
can arise out of lack of design for performance. This
approach helps derive tangible benets across the
organization horizon. It also facilitates application
development teams to engage the performance
engineering services group periodically at various
phases of SDLC and ensuring course-correction
through interpretation of intermediate actual
phase-wise performance results as compared

to the stated performance baseline of business
application. Studies show that most performance
issues are repetitive and potential cost avoidance
by pro-actively preventing a unique performance
issue is much higher than xing such issues in a
reactive mode.
In essence, organizations can quantitatively
benet by saving millions of dollars by potential
cost avoidance through phase-wise (analysis,
design, build and testing) capture of capacity
issues through performance engineering services
mandated by a robust quality framework.
Qualitatively it enriches the end-user experience
resulted by issue-free high performance production
systems.
REFERENCES
1. http://synergykraft.com/services
2. L G Williams, Making the Business Case for
Software Performance Engineering, CMG,
Dallas, 2003. Available on http://www.
cmg.org/proceedings/2003/3194.pdf
3. A Framework for Software Product Line
Practice, Version 5.0, SEI-CMU. Available
at http://www.sei.cmu.edu/productlines/
framework.html
4. Karen D Schwartz, ABC: An Introduction
to IT Governance, 2007. Available at
www.cio.com/article/111700/ABC_An_
Introduction_to_IT_Governance
5. Ernie Nielsen, The Road to IT Governance
Excellence, Serena Software Inc., 2007.
Available at www.serena.com/Docs/
Repository/products/Mariner/WP_CG_
Excellence_BYU.pdf
6. Lloyd G Williams and Connie U Smith,
The Economics of Software Performance
Engineering, Software Engineering
Research and Performance Engineering
Services, 2002. Available at www.perfeng.
com/papers/cmgpanel.pdf.
53
SETLabs Briefings
VOL 7 NO 1
2009
An Agile Approach to
Performance Tuning
By Shailesh Bhate, Rajat Gupta, Manoj Macwan and Sandip Jaju
Performance tests in an agile environment can
monitor every minute slip thereby delivering a cost
effective and robust application
E
nterprise applications today have an
increasing amount of complexity and lesser
time to market than ever. With the growing
complexity of applications, it is imperative to
dene what we mean by performance engineering
and also agree on the scope and boundaries of its
activities.
Businesses around the world are well
aware of the loss that results from poorly
performing application. Independent studies
reveal that poor performing IT applications cost
industrialized nations as huge an amount as US
$60 billion annually [1]. An improved application
performance can be a savior here and is sure
to slash productivity losses by a huge margin,
thereby improving profitability. Performance
engineering helps IT investments yield a higher
return on investment (ROI) when properly
managed with respect to relevant and critical
situations.
















NEED FOR AN IN-LIFE PERFORMANCE
TUNING METHODOLOGY
Let us exami ne t he need f or an i n- l i f e
performance tuning methodology mapped
against various business and technical needs.
One would agree that it is often seen that the
success of an application becomes a cause
for its downfall. For instance, an application
might have met with initial success that
was subsequentl y over-uti l i zed, as new
functionality was added on to it relentlessly,
much beyond its expected scope. Moreover,
a heavy emphasis on faster time to market
ensures that the development cycles are
shorter than ever before, sometimes running
i nt o a coupl e of weeks each. In such a
scenario, the performance requirements of an
application can fluctuate a lot and performance
engineering becomes an ongoing exercise not
a one-off effort.
54
AGILE METHODOLOGY FOR
PERFORMANCE TUNING
Without an applied methodology, performance
engineering is simply an exercise of trial and error
of solutions. A sound methodology increases
the performance engineering teams efciency
and effectiveness. The performance engineering
methodology discussed in the section addresses
a typical scenario where an application is a
victim of its own success, the customer wants
to push in more requirements faster and the
performance of the application needs to be
maintained across these enhancements.
Faster time to market and ever changing
requirements has shaped a new delivery
methodology for software development. Agile





development methodology is nding its way
into more and more organizations by the day.
The key aspects of agile development are that
the software is simultaneously coded, tested
and delivered in short iterations or sprints.
Adequat e per f or mance of an I T
application is of utmost significance to any
business. So, in a nutshell, we have performance
tuning objectives to be satised and on the
other hand, we need to be developing and
deploying new code faster and faster onto the
live production process.
How do we marry these two needs
together? Agile development holds the key.
Performance tuning of an application while still
delivering the software in an agile fashion is
the basic aspect of the methodology discussed
below. Let us take a look at the basic structure
of such a methodology:
Pre-sprint: This step is a preparatory step for
the actual sprint to be carried out. In order
to plan effectively, we need an accurate and
precise picture of the current situation going
live. This means we need to measure certain
parameters on the live system. These live
system parameters are collected using various
programs provided by the OS vendor, database
vendor and of course, the application logs.
Various data points taken from live system are,
but not limited to, the following:
System parameters like CPU, memory,
disk, network utilization, etc.
Database parameters like connection leaks,
redo log generation, tablespace usage, full
table scans, etc.
Application parameters like memory
leaks, thread pool usage, latency, etc.
Overall parameters like cycle time (CT),
right rst time (RFT), SLA adherence, etc.
The measurements gathered from existing
live system serve as inputs for baselining current
performance of the system. The measurements
along with the functional work stack serves as
the basis for setting performance targets. To set
When broken in sprints, fixing problems
on a live production process is no
more a challenge
55
the performance targets, various parameters are
considered viz., as-is behavior of the system,
expected load on system in the near future,
expected user load over long term like burst/
peak load, etc. It also includes any change in
previously set thresholds viz., SLA, increase in
user load, interaction with a new system and so
on.
Having the current system behavior with
us and the expected behavior derived as above,
we are now in a position to identify solutions to
enable achievement of the expected performance
targets. It is pertinent to note that there may be
more than a single possible solution to achieve
the desired result. All these solutions need to
be identified in this step. This step includes
determining and prototyping possible solutions
that can be implemented to achieve the revised
targets set. The solutions can include features as
stated here:
Changing application processes so that
non-user interaction processes can be
made asynchronous
Throttling of certain processes in times of
burst load
Isolating resource processes on a dedicated
node in case of a cluster
Segregating high priority processes on
dedicated nodes to ensure enough re
power is made available to them
In-memory processing to avoid disk I/O
when slow
Database table partitioning and query
tuning
Thread pooling and connection pooling.
This is followed by assigning a business
value to each of the parameters that need to be
tuned. It is important to note that the business
value is determined by the customer. Business
values are assigned to each possible solution asset
by considering various parameters like criticality
of the improvement desired, impact on SLA,
funds and trade-offs in the ofng.
A sprint planning meeting is then held
with the customer and all other stakeholders
where all the parameters are assigned a unique
priority depending on business values and
application needs. Here, it is important to note
that it is the customer, albeit in concert with the
development team, who assigns priority to each
solution. The development team accepts a certain
set of solutions to be implemented in the sprint
depending upon the capacity available. The rest
of the remnant solutions are kept as sprint backlog
to be taken up in the next sprint. The stage is now
completely set for a sprint kick-off.
Sprint: A sprint is nothing but a small time-boxed
cycle where different activities like detailed
design, implementation and testing are carried
Business values assigned to solution assets across
various parameters should be prioritized and
handled accordingly
56
Figure 1: Performance Engineering Methodology Source: Infosys Research
out on a daily and continuous basis. Some of the
key phases of a sprint are as follows -
Implementation - For each identied performance
goal for the cycle a prototyped solution is selected.
Care is taken to ensure that functionality is
not forfeited in the process. The solution for
that performance goal is implemented by the
development team. During the development of
the solution, the team also generates stubs and
simulators that are used for performance testing
in the next stage.
57
Individual Testing - Performance tests are
then carried out to check whether the desired
performance gain is achieved. If the test result
does not satisfy the goal, the solution is reworked/
tuned or next prototyped solution is selected for
implementation. Similarly, all the identified
parameters are tuned in parallel.
Continuous Functional Testing - Agile methodology
implies continuous integration and testing.
Therefore, automated builds are made everyday
in what is called nightly builds. These builds are
tested by an automated test suite. Note that the
test environment is a different environment from
the performance test environment. These tests
primarily are concerned with functionality and
not the performance. Test results are documented,
analyzed and any defects are xed on priority by
the development team before proceeding further
with development.
Continuous Performance Testing - Parallel to this
testing, overall performance of the modified
system is tested in planned and scheduled tests.
These tests are required to be carried out on
designated performance test machines. These
machines are typically scaled down versions of
the real system.
Perf ormance Measurement and Anal ysi s -
Performance testing carried out is used to check
whether the desired performance goals are
achieved. Mutual impact of the solutions applied
to achieve different performance goals is analyzed.
If any solutions applied for different requirements
conict with each other, the prototyped solution
is revisited and the parameter tuned again.
If a scaled down hardware of the real
system is used, performance data is extrapolated
to predict live system performance. To do this,
an empirical formula needs to be derived by the









performance experts in the team. This formula
is continuously ne-tuned as more and more
sprints are implemented and more and more data
is collected.
Release - Once the systems functional and
performance goals for that particular sprint are
met, a release is made for user acceptance test
(UAT).
Post-sprint: Having developed the sprint
deliverables, we now enter the post-sprint phase.
This stage constitutes E2E testing, deployment
and post-deployment activities.
End-to-end Testing/UAT- After the sprint is
delivered, end-to-end testing is performed,
primarily for functionality. The end-to-end (E2E)
testing team might not have the hardware and/
or the expertise to test an individual systems
performance. Neither should the E2E team
be concerned about the individual system
performance; it should focus on E2E functionality
and E2E cycle time.
Deployment - Once the E2E testing/UAT testing
is completed, a sign-off is obtained from
the test team. A suitable deployment date is
nalized considering the needs of the customer.
Deployment is carried out by rolling out the
patches developed and tested in the earlier
stages.
Fine Tuning and Post-deployment Monitoring
Considering that the performance testing was
carried out on a scaled down version and/or
simulated test data was used, chances are that
the system may not behave optimally once
deployed on live. Patterns of input data on live
are very hard to predict and often are random.
Hence, once the code goes live, the system is
continuously monitored for the following:
58
Performance improvement on targeted
parameters
Memory leaks, especially under stress
Functionality defects, especially special
cases that might have been missed during
E2E testing
Congestion of database objects.
Furt her f i ne- t uni ng i s perf ormed
depending on this data. In case of a cluster,
resource-hungry processes are aligned to
dedicated nodes. Timing of batch jobs as well
as batch job size is optimized to improve better
utilization of system resources. Sometimes,
massive data les are to be sent out of the system
as an output of the batch processing. This data
is aggregated in optimal chunks before it is sent
out. Other system parameters such as thread pool
size and database connection pool sizes are also
ne tuned.
Performance Data Collection - The data collection
process continues in the live environment.
The system, database, application and overall
performance parameters are collected in a
regular and planned manner. This data acts
as the feed for the next sprint planning and
performance goal setting.
Business Benefits of this Methodology This
methodology, if applied in spirit, provides a
strong base for reaping prioritized delivery of
performance enhancements. In other words, if
there are multiple performance parameters that
need to be tuned, then the parameters carrying
the maximum business benet are tuned rst.
Performance Tuning of an already Deployed
Application In most real-life scenarios,
performance tuning exercise is required on
systems that are already in production. This
methodology provides a way to tune such a
systems performance.
Lower Cost, Higher ROI and Prioritized Delivery
This, in effect, is application of Paretos principle
of prioritizing performance goals to be achieved
in each sprint cycle. Since only a few goals are
prioritized for performance tuning in each sprint
cycle, the cost utilized for improving those
parameters gives a high ROI. Requirements that
are not expected to produce large benets are
descoped from the sprint automatically during
the prioritization step.
Faster Time to Market Since only a few
performance requirements are implemented
in each sprint and not all are considered, the
time required to deliver these live is far less.
This ensures that the customer reaps maximum
benets in the fastest time possible. This is more
desirable rather than targeting all performance
parameters in one go and delivering the entire
solution after a few months into live.
Continuous Improvement Since the performance
improvement happens over a number of
iterations, performance parameters can be
continuously tuned as per the changing needs of
the application. Load conditions, load patterns,
etc., may change on live systems over a period
of time and a previously well-tuned system
can slowly become very tardy unless re-tuning
is not performed. This methodology helps in
continuous performance improvement.
CASE STUDY: APPLIED SOLUTIONS
USING THE METHODOLOGY
The case study involved a system where cycle time
and RFT were two of the most important goals set
for measuring customer satisfaction. The system
is a blend of an online transaction processing
59
(OLTP) module and batch processing module.
The OLTP module processes around 7 million
transactions per day and the batch processing
module processes close to 800 million records per
day. The batch processing module was hitting 24
hours for full processing. The OLTP transaction
speed of the system was 210 tps.
System Hardware - The system was deployed
as an Oracle Real Application Cluster (RAC)
cluster having 4 Sun Solaris E3900 nodes and
a large storage area network (SAN) storage of
around 3 TB.
Desired Performance and Constraints In order to
deliver the systems benet to its end users, the
cycle time for the batch processing module had
to be less than 20 hours. Also, the OLTP rate had
to be increased to at least 250 tps considering the
growing user load.
Apart from this, it was found that the
incoming data for batch processing was violating
agreed interface specications with a third party
equipment vendor. A simple solution could be
to x these interface issues on the equipment,
but since thousands of these equipments
were installed on a national basis and were in
production, any change on them was ruled out.
Thus, our system had to adjust to these errors
while maintaining performance goals.
Pre-sprint Stage: Following steps contributed
to nalizing the performance requirement and
alternatives that could be used for performance
tuning. Following were some of the ndings of
as-is data analysis:
Memory Leak - As shown in Figure 2, a memory
leak was found in the system during the peak
hours of system transaction. These figures
were brought up by monitoring the memory
and garbage collector activity during peak
hours.
Cycle time of Batch Processing - While monitoring
the application for cycle time SLA in batch
processing, it was observed that the cycle time
was on the threshold of exceeding 20 hours SLA.
Table 1 shows one part of the batch processing
exceeding 15 hours; the cycle time was increasing
gradually but surely with time. Rest of the
processing had taken 6-8 hours.
Table1: Cycle time for Batch processing Source: Infosys Experience
Figure 2: The Memory Leak Problem
Source: Infosys Experience
60
Redo Log Generation - It was also observed that
due to huge amount of transactions the redo
log generation had increased drastically. A
performance x made earlier to speed up the batch
processing only added to the rate of generation
of these logs. About 550 GB/day of logs were
generated. The DBA policy recommended
keeping at least 2 days worth of logs on the disk
which was way beyond the storage available
for the cluster. Coupled with this, user base
was supposed to increase rapidly in the near
future and hence increase in transactions would
have directly impacted the amount of redo logs
generated.
Cycle time for OLTP - The cycle time for OLTP
was the most critical fact that was observed
during the analysis of live data. The SLA to
send out a feed to a system in 5 mins was getting
violated due to huge amount of incoming events
and processing of those events at various levels.
The bottlenecks for this process were identied,
that revealed issues regarding input event
consumption and feed creation for the same.
Oracle Object level Congestion - The observation
of the behavior of the database system objects
led us to a conclusion that some of the database
objects were excessively used in transactions.
This fact, coupled with other observations,
revealed some deadlocks in database and very
slow response for the transaction on these
objects.
Identification and Prototyping Solution
Assets: As expl ai ned above, thi s phase
involves identifying performance targets and
sets of solutions to satisfy each performance
goal.
Using the as-is analysis in the previous
section, performance targets identied were:
Maximum cycle time of 20 hours for batch
processing
Maximum cycle time of 5 minutes for feeds
generation in OLTP
Reduction of redo logs generation
Improvement in OLTP tps.





After carrying out some PoC activity, some
solution assets were found that seemed feasible
to be implemented. It is important to note that
some of these solutions are mutually exclusive,
while some solutions are alternatives to one other.
Following are the possible and feasible solutions
that were gured out
Asynchronization in Batch Processing - It was
observed that the increased cycle time in batch
processing was due to sequential and synchronous
processing of various sub-processes in batch
processing. Processes were also not running to
its max capability.
An as-is analysis can reveal some bottlenecks and
mark some targets before the performance targets are
actually chased
61
Asynchronization and Throttling of OLTP in Multi-
threading - It was found that there was scope of
introducing asynchronization in the backend part of
OLTP transactions by introduction of multithreading
of backend sub-processes involved in OLTP.
Isolating Resource Heavy Processes on Dedicated Nodes -
Some of the heavy processes in the problem system
were initiated and scheduled by DBMS jobs on the
4 node Oracle 9i RAC cluster. All the DBMS jobs
were running on the default rst node where the
maximum processing was aligned. This led to the
idea of isolating these resource intensive processes
on dedicated nodes.
In-memory Processing with Global Temporary Tables
(GTT)- As discussed above, a critical requirement
of decreasing the redo logging in the system has
arisen to limit the space utilization on the SAN.
Introduction of session and transaction specic
global temporary tables were experimented to
generate less redo logs as compared to static
tables, since GTTs do not result in any redo logs.
Functionally as well, the need for this data to be
backed up was not important.
Table Partitioning - Due to the huge amount of
parallel transactions on the same table but on
different set of data some OLTP transactions
were taking long time to execute. These tables
accommodated more than 10 million rows at a
time. To release some pressure from these tables
different categories of table partitions were
thought of for performance tuning.
Alternative for Data Load - The batch processing
involved huge amount of data loading from
at les to Oracle tables. It used Oracle external
tables to load the data that made it sequential and
slow. This was because the le had to be at a xed
location till the time the entire processes were
completed. SQL Loader was identied as a good
option that did not have this constraint.
Business Value Mapping: Delivering any code
into live is meaningless if there is no business
benet attached to it. Therefore, we quantitatively
identied how much business benet each of
the solution delivers. Accordingly, each solution
found was assigned a business value using the
following guidelines:
Amount of Improvement - This particular
requirement dened the comparison between
current parameters and expected new parameters
after solution. Different parameters were cycle
time, latency, CPU utilization, memory utilization,
data loss, etc.
Criticality of Improvement - Priority is laid against
criticality of a performance x as a problem might
invite a mandate in future. If the x is important
Identifying business values and attaching criticality
adds more potency when it comes to assessing performance
needs of each solution
62
to the system even while the gain is not so high,
it still needs to be taken on priority.
Criticality of Problem - SLA also required to be
given highest priority as at times the SLA agreed
upon might have already been violated or would
be so in the near future.
Project Funding - Project funding was also
considered to be considered while deciding
among all the performance tuning alternatives.
Possible Adverse Effects - Possible adverse effects
after applying this solution also need to be studied
in detail. This exercise has to be carried out as part
of PoC done for each alternative performance
solution derived.
Sensitivity of Module in Project - Some of the
modules are functionally too critical to miss any
aspect in any x in these. These areas should
not be included unless there is very critical
requirement of performance.
After assigning a business priority to
each of the above solutions, a sprint planning
meeting was held with the client to assign a
unique priority to each improvement. Depending
upon the capacity of the development team, a few
improvements were chosen to be implemented in
that iteration cycle.
Sprint Development: Sprint development
was kicked off after all the pre-sprint activities
were carried out and business priorities were
identied. A sprint is kicked off by starting with
implementation of identied solutions.
Solution Implementation - The solutions that
were implemented as part of the sprint were
-asynchronization and throttling of processes
used in batch processing, using SQL loader and
in-memory processing .
The remaining performance goals were
kept in the release backlog to be implemented
in the next iteration. Once the sprint was kicked
off, these solutions were implemented and tested
for performance improvements. Nightly builds
and automated test suites were used to test
the functionality since no functionality was
supposed to change, these test suites were used
as available.
After the completion of functional
testing, the code was tested for performance
improvement on the whole. The results were
then analyzed to check if they satised the target
performance improvement.
Analyzing Results - Analysis of test results was
the most important and most difcult part of the
applied methodology. It helped in identifying
the performance parameters that are being met,
ones that arent and the reason behind the same.
The record of the tests being analyzed and the
results of this analysis also helped in future
sprints.
Redesi gni ng Key Processes - Some of the
implemented performance solutions hampered
the as-is system functionality. As part of
asynchronization in batch processing, there
was a major redesigning of key processes of the
application. The earlier batch processing had
different processes running sequentially and in
synchronization with each other. There was a
change in the existing functionality to introduce
global temporary table instead of external tables
and usage of thread pooling instead of sequential
processing.
The re-designed functionality required an
approval from functional experts of the system
in order to make sure that there is no change
visible in the functionality of the system to the
end user.
63
Post sprint: After the sprint implementation, we
moved on to UAT, delivery, deployment and
post-deployment ne-tuning.
UAT and Deployment - UAT was carried out by an
independent UAT team. A detailed knowledge
transfer session was planned in order to advise
support people regarding the changed system
operations. It results in a lot of lost productivity
if this knowledge transfer is not carried out. The
support teams may feel that the application is
behaving strangely and sometimes this may
lead to false alarms. After the UAT sign-off was
obtained, a deployment as well as a rollback
plan was prepared. The code was deployed to
live after this.
Fine Tuning and Post Deployment Monitoring - The
testing for this system was done on a dedicated
two-node cluster with much less processing power
than the live version. The results were extrapolated
for live server. Hence, post-deployment, various
conguration parameters like thread-pool size,
node alignment, batch size, etc., were required
to be fine-tuned. After fine-tuning, data was
collected from the system to act as a feed to the
next sprint cycle.
So what did we achieve? We looked at
as-is data from live and identied performance
bottlenecks and we identified solution assets
and assigned business values to each of them.
This led us to the sprint where we implemented
the prioritized solutions and tested the same.
Post-sprint, we performed UAT, deployment and
post-deployment ne-tuning.
CONCLUSION
Continuous performance engineering helps
development of complicated systems into
strategic business enablers that help to support
growing needs of the business. This methodology
can also reduce the costs of maintaining and
upgrading the application. This methodology
employs prioritization techniques, coupled with
Table 2: Cycle time for Batch Processing Source: Infosys Experience
64
agile processes to deliver quick benets of high
value to the business.
The methodology implemented is blended
with features like optimized performance
enhancements through continuous monitoring
and development in sprint cycles and delivering
superior performance. This is achieved by following
a business centric approach, understanding
complex applications and fundamental agile
practices and decreasing overhead cost while
increasing ROI.
REFERENCES
1. Vaidyanatha Siva and Sridhar Sharma,
Should Performance be an Integral
part of the SDLC? SETLabs Briefings,
Vol 4 No 2, 2006. Available at http://
www. i nf os ys . c om/I T- s e r vi c e s /
architecture-services/white-papers/
per f or mance- an- i nt egr al - par t - of -
SDLC.pdf
2. Study Shows Severe Financial Impact
Caused by IT/Business Disconnect,
Compuware, 2008. Available at http://
www. compuware. com/product s/
vantage/7127_ENG_HTML.htm
3. Scott Ambler, Agile Development and
the Developer/DBA Connection, Oracle.
Available at http://www.oracle.com/
technology/pub/columns/ambler_agile.
html
4. Bill Venners, Tuning Performance and
Process: A Conversation with Martin
Fowler, Part VI, Atrima Developer, 2002.
Available at http://www.artima.com/
intv/tunableP.html.
65
SETLabs Briefings
VOL 7 NO 1
2009
Server Consolidation: Leveraging the
Benefits of Virtualization
By Manogna Chebiyyam, Rashi Malviya, Sumit Kumar Bose, Srikanth Sundarrajan
Enhancing performance through Virtualization
realizes operational efficiencies and offers reliability
H
ow do the existing data centers operate
in a high capacity environment? Are
they adequately equipped to handle peak hour
loads? Are these data centers plagued with low
average server utilization? If yes, then the issue
might be in providing applications for peak loads
and the concept of having dedicated server for
each application. This results in situations that
are termed as server sprawls [1]. Virtualization
technologies, however, provide mechanism for
enabling multi-tenancy within next generation
data centers. These technologies motivate
organizations to undertake server consolidation
exercises that in turn helps organizations gain
operational efficiencies and simplify server
management. Consequently, it helps enterprises
to lower its total cost of operation (TCO)
while ensuring higher performance and higher
reliability. The server consolidation should be
performed intelligently such that there is no
negative impact on performance. This paper
focuses on innovative infrastructure optimization
algorithms that help organizations realize the
benefits of server consolidation. The paper
also discusses how the algorithmic approaches
enhance the performance of the data centers.
MOTIVATION FOR CONSOLIDATION
The usual practice in data centers has been to host
applications on dedicated commodity servers
based on Wintel/Lintel. The need for hosting
applications on dedicated resources is not hard
to fathom. In order to address the concern,
firstly the capacity of the servers should be
sufciently large such that the quality of service
(QoS) guarantees can be met during peak load
hours. This ensures service level agreements
are adhered to and degradation in performance
does not take place even at peak loads. For
example, utilization of an application that
computes salaries of employees and contractors
reaches a peak at the end of the month and has
relatively lower utilization for the rest of the time.
Secondly, the isolation requirements (in terms of
the environment in which the applications are
running) ensure that different applications are
66
hosted on separate servers. Most applications like
ones that deal with sensitive data, require a secure
environment. Due to such reasons, enterprises
end up having multiple servers that remain
under-utilized most of the times or what we call
as server sprawl. Server sprawls in enterprise
data centers are characterized by low server
utilization (due to over provisioning of hardware
capacity for peak loads), high maintenance and
infrastructure management costs (due to high
number of support staff). Additionally, data
center environments are typically static. In order
to cope with ever increasing workloads, data
centers have to undergo a long cycle of procuring
new and faster hardware, manually provisioning
the new servers and installing patches and
applications. This is not only time consuming
but also cumbersome. As the enterprises
scale out their existing infrastructure, system
administrators nd it increasingly difcult to
manage, expand, extend and support it. All
these indicate that an effective management of
infrastructure has become imperative. Grid and
virtualization technologies can complement each
other in ways that they can iron out existent
issues in the data centers today.
Enterprises and e-science community
has traditionally used grid technology to solve
grand challenge problems and computationally
intensive problems. More recently, the enterprises
have realized the promise and the potential that
grid holds in virtualizing the infrastructure. A
virtualized infrastructure in general provides
a more optimized operating environment
that is malleable to suit varying demands.
Telecom and data center hosting providers
who host applications and servers for their
customers are at the threshold of using grid
technology to build an optimized multi-tenant
hosting environment. Unlike conventional high
performance computing (HPC) applications
like BLAST, Monte-Carlo simulations, CFD,
finite element analysis, protein folding, etc.,
provisioning and deployment of enterprise
applications are much more complex. Besides
the load balancing, monitoring, scheduling and
resource allocation capabilities of grid solutions,
there is need for powerful provisioning, resource
management, application level monitoring
and SLA conformance mechanisms. It is in
light of these problems that one should view
the recent research and advancements in
server virtualization technology on commodity
hardware. Server or system virtualization is a
technology that enables partitioning a physical
machine into several equivalent virtual machines
capable of hosting independent operating system
instances [2]. Recent reports and case studies
have shown that server virtualization is now
mature and ready for mainstream production
deployment and there are innumerable cases
of successful deployment [3, 4]. Several server
Virtualized set up offers a flexible platform
to suit the demands of fluctuating needs of
server utilization
67
virtualization capabilities such as virtual resource
management, automated provisioning capability
and live migration are worthy of taking note in
this context and are capable of addressing the
problem outlined in the earlier part of this section
[5, 6]. Clearly grid and server virtualization
technologies can complement each other [7,
8] and come together to build an Enterprise
Grid [9, 10] that can go beyond solving HPC
problems. The next section discusses the ability
of the virtualization technologies in enabling and
automating server consolidation practices.
VIRTUALIZATIONSCALE-OUT SOLUTION
Virtualization technologies can provide scalable
and cost effective solutions to a number of








infrastructure management problems that
naturally occur in data centers. It is a technique
through which the hardware resources of a
system like processor, storage, I/O and network
can be multiplexed through hardware/software
portioning, time sharing and simulation/
emulation into multiple execution environments,
each of which can act as a complete system by
itself. Virtualization technologies can be used
to physically consolidate different applications
on to a single physical machine. Each virtual
machine hosts one application and wrapping
applications within virtual machines provide
isolation guarantees to the applications. Multiple
virtual machines act and operate as distinct and
separate server environments. If one virtual
machine crashes, it does not affect the remaining
virtual machines (or the applications running
on them) in a particular physical machine.
Consolidating multiple applications into one
physical host requires that the applications be
analyzed for their resource utilizations (system,
storage and network) over a sustained period
of time. Benets that are accrued from adopting
virtualization technologies within enterprise data
centers are as follows:
Reduced Complexity: Less number of physical
machines mean lower complexity for managing
the infrastructure, as a set of physical machines can
be managed from a single management interface.
This results in reduced cost of communication
between the servers and consequently the data
center is easier to manage and support.
Increased Server Utilization: The average
server utilization in an enterprise environment
currently ranges from 5 to 40 percent, leaving at
least 60 percent of the available capacity unused
[9]. Virtualization enables one to consolidate
complimentary workloads on multiple physical
servers onto a single physical server and harness
unused computing resources. This entails
replacing several older servers dedicated to single
applications with a single more powerful server
running multiple applications.
Virtualization accommodates multiple applications on
to a single physical host and can offer relief in space
management as well
68
Rapid Application Deployment: As an
application is developed and scaled, deploying
it becomes complicated. Virtualization enables
speedy infrastructure provisioning due to
enhanced cooperation and improved access to
the infrastructure.
Business Resilience: Businesses operating
today have IT-focused business plans that
aim to preserve critical data and minimize
downtime through effective practices and system
redundancies. Virtualization can help IT managers
secure and isolate application workloads and data
within virtual servers and storage devices for
easier replication and reconstruction.
As mentioned in the previous section,
one of the major catalysts for interest in server

consolidation is the under-utilization of servers
due to server sprawl and consequently the
increased complexity needed to manage them.
Most of the servers in a typical data center are
under-utilized, meaning that they consume
more power and cooling resources than can be
justified by the average workloads running on
them. This in turn impacts the IT spending of
organizations adversely. In most organizations,
servers handle small or periodic workloads and
run at only 10-20% of their capacity. Each server
tends to run a single operating system instance
and a single business application. This is a
major contributor to low CPU utilization and
server sprawl. Under-utilized servers mean that
resources are being wasted while the hardware
continues to consume power and occupy real
estate space. Over-utilization of servers is less
common, but may occur when workloads grow
more rapidly than expected. Monitoring fewer
numbers of servers, combating server sprawl
and simplifying ways to manage them is the
goal of server consolidation [10]. It optimizes
utilization of resources viz., servers, storage,
network, supporting staff and real estate by
consolidating multiple server environments
to a single server. This allows enterprises to
harness the unused computing power from
the same amount of resources. In addition to
the benefits listed above, consolidation results
in:
Improved Hardware: Consolidation
directly impacts the resource utilization.
It does away with the need to install more
servers. Consolidation approaches result
in reduction in load on servers at peak
hours. This in turn results in improved
performance, availability, scalability
leading to lower hardware maintenance
costs.
Improved ROI: A reduction in the number
of server results in reduced real estate
prices and reduced cooling costs. This
Underutilization of servers is a major concern and affects
IT spend immensely. Consolidation will lend the much
needed balance to optimize the use of servers
69
in turn leads to improved returns on
investment.
SERVER CONSOLIDATION TECHNIQUES
A few server consolidation techniques are
elaborated in the following section:
Physi cal Consol i dat i on: Thi s i nvol ves
consolidating data centers and moving servers
to fewer physical locations [11]. The rationale
behind physical consolidation prompts to have
servers in fewer physical locations and achieve
management consistencies and economies
of scale more easily than one can when the
servers are dispersed. Physical consolidations
may reduce data center real estate costs. It is
generally felt that physical consolidation has
the lowest risk.
Logi cal Consol i dat i on: Thi s i nvol ves
implementing standards and best practices
across the server population [11]. This allows to
realize substantial benets in the productivity of
the IT staff. They can manage the environment
more efciently and more effectively. This can
often result in lower systems management costs
and in lower TCO. Logical consolidation is often
implemented with physical consolidation and
rationalization.
Application Consolidation: This involves
consolidating multiple applications and
application instances to reduce the overall
applications in an enterprise. It can be both
homogeneous and heterogeneous.
Homogeneous consolidation combines
several instances of the same application on a
single server, while heterogeneous consolidation
combines several different application types on
the same server.
Our di scussi on wi t hi n t he scope
of this paper will be focused on physical
consolidation.
ALGORITHMIC APPROACHES TO
CONSOLIDATION
The emphasis of algorithmic approaches lay in
automating the consolidation process. Physical
consolidation involves moving a large number
of applications hosted on dedicated servers to a
small number of high performing target servers.
The traditional approach for consolidating servers
involves listing down servers along with their
system, storage and network characteristics and
manually mapping them. However, this manual
process is time consuming and intractable when
the number of applications are in the order of
thousands. For example, in a typical data-center
scenario, total number of applications that require
consolidation may very well exceed 5000. The
algorithmic perspective for approaching the
physical server consolidation problem is to model
Physical server consolidation can adopt algorithmic
approach to make the herculean task of manual
consolidation much easier
70
it as vector packing problem (VPP). The vector
packing problem is a fundamental problem in
the operations research literature that deals with
the question of packing items of different sizes
into least number of bins. Items to be packed
are the applications being consolidated, item
sizes are resource utilizations, and bins are high
performing destination servers. The static version
of the server consolidation problem has been
modeled by Ajiro and Tanaka as a vector packing
problem in two dimensions (CPU and memory)
[12]. In addition to CPU and memory, there can
be multiple other dimensions such as disk-I/O,
network-I/O and average disk queue length. For
example, the average disk queue length indicates
the number of disk requests that are waiting to
be serviced. In any dimension, the consolidated
usage of the server utilization by the different
applications hosted on it should be well within
the bounds, determined as a certain percentage
of the server capacity in that dimension. In other
words, the sum of utilizations by the applications
hosted on the server should not exceed the
threshold capacity defined for that server.
Resource utilizations in excess to the thresholds
capacity results in degradation in performance of
the applications hosted on it.
Metrics Collection Using SNMP Protocol
Prior to applying the algorithmic approaches,
it is required to measure the CPU, memory and
disk utilizations of different servers requiring
consolidation in the data centre over a period
of time. The relevant data (CPU and memory)
can be collected from the different servers
using the existing management software/
monitoring system. In case such data does not
exist, then the same is obtained by enabling
the Simple Network Management Protocol
(SNMP) services on each of the servers. SNMP
is an important part of the internet protocol
suite and is used in network management
systems to monitor devices attached to the
network. All data collected from all the servers,
requiring consolidation is communicated to a
central server for compilation. In this context,
the servers needing consolidation are managed
systems and the server collating all these data
and calculating different metrics of interest is the
managing system. Software (SNMP) agents run
on each of the managed system and report CPU
and memory utilization data to the managing
system via SNMP. This raw data is then
analyzed by the managing system and different
metrics of interest like the average utilization
and standard deviation of the utilization for
different dimensions are determined. The
managing system then invokes an algorithm for
determining the best possible tment/packing of
the different servers. There are many intricacies
involved in algorithm design for carrying out
automated consolidation.
Needless to say, one must be very cautious with minute
details when it comes to algorithmic design for an
automated consolidation
71
Incompatibility Constraints
The algorithmic approaches need to take
care of the different sets of constraints that
naturally occur in any server consolidation
exercise. Examples of such constraints are the
co-location constraints (also called the item-item
incompatibility constraints) and the bin-item
incompatibility constraints. The item-item
incompatibility constraints state that a certain
sub-set of items cannot be co-located in the same
bin. For instance, an item item computability
constraint in the vector-packing algorithm can
be mapped to a real world constraint such as
Application A and Application B can not be co-
located/consolidated on the host due to business
or performance reasons. Similarly, the bin-item
incompatibility constraint states that a certain
item cannot be located in a certain bin. These
constraints are useful when incompatibilities
such as 64 bit applications allocated to 32 bit
machines etc., need to be considered and avoided.
Thus, any solution to the server consolidation
problem should consider the item-bin and bin-
item constraints. Second, for effective resource
utilization, there are cardinality constraints.
Cardinality constraints restrict the total number of
applications/virtual machines that can be hosted
on a particular physical machine to a pre-dened
value. Third, the servers in a data centre are of
different make, type and genre. Comparison
across this diverse and heterogeneous set becomes
extremely difficult. This presents additional
set of challenges for applying algorithmic
approaches for consolidating servers. Thus, it is
difcult to compare the processing ability of the
latest state of the physical machines available
from different vendors. Additionally, it is also
difficult to compare the processing ability of
the multi-core or multi-processor machine with
that of single core or processor. Besides different
generation of processors from a single vendor
have different computing capabilities. There is a
need, therefore, to factor in this heterogeneity at
the time of undertaking consolidation exercise.
Prior to applying the algorithms, there should be
a way to express the processing (CPU) capacity
of one type of machine in terms of the other.
This pre-processing step is called normalization.
Normalization allows the user to identify one
physical machine as a reference machine and
express the CPU capacity of all other machines
in the data centre with respect to the CPU
capacity of the reference machine. In other words,
normalization enables homogenization of an
otherwise heterogeneous environment.
Albeit far from accurate, historical SPEC
benchmarks on various processors can be used in
this normalization process.
Heuristic Approach
A number of algorithms have been proposed in
the bin packing context for consolidating items
Normalization as a pre-processing step is
a must to allow users identify one machine
from the other
72
into minimum number of bins and is a much
researched topic. However, as already stated
above, the server consolidation problem is a
variant of the bin packing problem called the
vector packing problem. We devised new and
innovative algorithms based on new measures
of fitment and efficiency to solve the vector
packing problem. Our heuristic is based on
our observation that the growth in resource
consumption of the bin (target server) at the time
of packing should ideally be dictated by direction
of the vector formed from the dimensions of the i
th

bin (cpu
i
, memory
i
, disk
i
). The heuristic is motivated
by our observation that any divergence from
this direction signies wasted resource. Figure
1 shows the underlying motivation behind this
heuristic, graphically.
To take care of the multiple item-item
constraints we modelled the server consolidation
problem as a graph colouring problem and
provided a solution for the same. We extended
the basic algorithm to take care of the bin-item
incompatibility constraints and modelled the
problem with bin-item incompatibility constraints
as a pre-colored graph colouring problem.
In addition to these heuristic algorithms, we
investigated different meta-heuristic techniques
like genetic algorithms and ant colony optimization
and applied the same to this problem. Genetic
algorithm is a stochastic search algorithm
inspired by techniques from evolutionary biology.
Similarly, ant colony optimization is a probabilistic
technique for solving hard discrete optimization
problems inspired by the behaviour of the ants.
These algorithms are therefore appropriately
called evolutionary algorithms. Genetic algorithm
uses the principles of survival of the ttest and
genetic mutation from evolutionary biology to
determine globally optimal solution. Survival
of the fittest is ensured during the crossover
process and genetic mutation enables species
to overcome drastic changes in environmental
conditions. It encodes these philosophies in two
operator called recombination and mutation.
Mutation helps to retain the genetic diversity
from one generation of chromosomes to the next.
Genetic algorithm maintains a set of feasible
solutions, called population, where each feasible
solution is represented by a chromosome. Genetic
algorithm uses the two operators to combine
Figure 1: Rate of Growth in Resource Consumption for a
Problem in Two Dimensions
Source: Infosys Research
73
existing (parent) population to determine the child
population and propagate only the best. Likewise,
ant colony optimization uses the underlying
behaviour exhibited by ants to locate the shortest
path between two given points through collective
learning. Ants communicate amongst themselves
by leaving traces of a chemical called pheromone.
This in turn helps the community of ants to
realize collective learning. Initially, different ants
take different routes probabilistically. However,
as more and more ants progress towards their
destination, the shortest route will have the higher
concentration. This in turn leads to more and
more ants using the shortest route. We use these
principles to devise algorithms that can determine
the least number of target servers required to
consolidate. Calculating exact solutions using
exact methods for extremely large problem
instances is impossible. For example, it will take
ages to calculate the minimum number of target
servers that would be required to consolidate
15,000 applications using methods that guarantee
optimal solutions (in the current context, optimal
solution is one that results in the least number of
target servers/bins). Stochastic search algorithms
like genetic algorithms and ant colony optimization
on the other hand do not guarantee optimal
solutions but terminate very quickly and return
near-optimal solutions. More often than not, the
stochastic search algorithms when terminated
return optimal solutions, but the optimality of the
solutions are not proven or guaranteed. Solutions
generated by the meta-heuristic algorithms and
heuristic algorithms show a great improvement
over the nave solutions calculated after a manual
exercise.
Automating the consolidation process
using state-of-the art algorithmic techniques can
help data centers reduce the number of servers,
while at the same time improving the utilization
of the servers overall. Unused CPU power
can be more efciently utilized. Additionally,
consolidating multiple under-utilized applications
into fewer numbers of high performing servers
provides effective solutions for the real time
infrastructure management problems.
CONCLUSION
Performance of next generation data centers
can be enhanced signicantly by analyzing the
historical CPU and memory usage patterns of
the applications and consolidating them in an
intelligent way. Automating the consolidation
process involves (i) executing an agent on each
of the physical machines that can dynamically
capture the resource utilization of the servers
and detect under-utilization or over-utilization
of the same and (ii) an algorithm that can
identify candidates for migration/consolidation
and relocate them dynamically such that fewer
numbers of servers are switched on at any point
of time. There are however certain risks involved
in automating the consolidation process such
as in case of le or email servers the utilization
parameters can vary drastically which invalidates
the projected consolidation results. Also, the
output of this algorithm should be validated
with any of the known consolidation lists before
using in business enterprises. In spite of the risks
associated with the automated process, benets
of employing the algorithms that help realize
automated consolidation far outweigh the costs
that we need to incur in the absence of the same.
REFERENCES
1. http://searchdatacenter.techtarget.com/
sDenition/0,,sid80_gci1070280,00.html
2. Gerald J Popek and Robert P Goldberg,
Formal Requirements for Virtualizable
Thi r d Gener at i on Ar chi t ect ur es,
Communications of the ACM, Vol 17,
No 7, 1974. Available at http://delivery.
74
acm.org/10.1145/370000/361073/p412-
popek.pdf?key1=361073&key2=6671756
221&coll=GUIDE&dl=GUIDE&CFID=10
328724&CFTOKEN=73443766
3. Denise Dubie, Server Virtualization
goes Mainstream, NetworkWorld, 2007.
Available at http://www.networkworld.
c om/news /2007/021207- s er ver -
virtualization.html
4. Virtualization Case Studies, Microsoft.
Available at http://www.microsoft.
com/Virtualization/case-studies.mspx
5. Christopher Clark, Keir Fraser, Steven H,
Jacob Gorm Hansen, Eric Jul, Christian
Limpach, Ian Pratt and Andrew Wareld,
Live Migration of Virtual Machines, In
Proceedings of the 2nd ACM/USENIX
Symposium on Networked Systems
Design and Implementation (NSDI),
Bost on, USA, 2005. Avai l abl e on
http://citeseerx.ist.psu.edu/viewdoc/
summary?doi=10.1.1.59.6685
6. Constantine P Sapuntzakis, Ramesh
Chandra, Ben Pfaff, Jim Chow, Monica S
Lam and Mendel Rosenblum, Optimizing
the Migration of Virtual Computers,
In Proceedings of the 5th Operating
Systems Design and Implementation,
Boston, USA, 2002. Available at http://
suif.stanford.edu/collective/osdi02-
optimize-migrate-computer.pdf
7. Renato J Figueiredo, Peter A Dinda
and Jos, A Case for Grid Computing
on Virtual Machines, In Proceedings
of the 23rd International Conference
on Distributed Computing Systems,
2003. Available at http://virtuoso.
cs.northwestern.edu/icdcs03.pdf
8. Katarzyna Keahey, Ian Foster, Timothy
Freeman, Xuehai Zhang and Daniel
Galron., Virtual Workspaces in the
Grid, In Proceedings of the 11th Euro-
Par Conference, Lisbon, Portugal, 2005.
Available at http://workspace.globus.
org/VirtualWorkspaces_EuroPar.pdf
9. GGF OGSA Glossary. Available at
http://forge.ggf.org/sf/docman/do/
downloadDocument/projects.ogsa-wg/
docman.root.published_documents.
glossary/doc13958/1;jsessionid=BBDF
4DFE634C86C343E40E0F86AC23FC
10. Yaron Haviv, Open Grid Forums,
2005. Available at http://www.ggf.
org/GGF18/materials/404/BOF%20
Gri d%20and%20Vi rtual i zati on%20
Yaron%20Haviv.pdf
11. Consolidation in the Data Center, Sun
Microsystems, 2002. Available at http://
www.informit.com/articles/article.
aspx?p=30315
12. Yasuhiro Ajiro and Atsuhiro Tanaka, A
Combinatorial Optimization Algorithm
for Server Consolidation, The 21st
Annual Conference of the Japanese
Society for Articial Intelligence, 2007.
Available at http://www.ai-gakkai.
or.jp/jsai/conf/2007/data/pdf/100089.
pdf.
75
THE LAST WORD
Tide Tough Times with
Performance-driven Development
Development cannot happen in silos and
oblivious to the needs of the customer.
Consulting Editor Rajinder Gandotra, strongly
feels it is time to be driven by performance to
wring the best value out of IT systems.
T
he current economic downturn has
accelerated business consolidation and
stepped up the need to innovate. Cost reductions,
faster time-to-market, better products and
differentiated services are now the key themes.
These coupled with information explosion
through social networking and addition of new
users -- for e.g., cell phone users who contribute
to almost n times of current data users, are
adding to the complexity and demand for high
performance and scalability from IT systems.
There is no gainsaying the fact that
performance issues lead to losses in millions of
dollars and in some instances client relationships
too. In todays economic scenario the purpose of
performance engineering is even more critical in
ensuring that IT systems rightly support business
objectives. The current industry paradigm is
reactive and disjointed, though. For instance,
whenever there is an issue around performance
or scalability in a production scenario, typically a
reactive paradigm is adopted. In other words



call all the vendors or add more hardware kind
of reaction. This reaction fails to solve issues
as addition of hardware is akin to treatment
of the symptoms without addressing the core
underlying issues. The stop-gap nature of the
solution more often leads the underlying issues
to manifest again causing huge business losses.
This is especially crucial for businesses that are
characterized by cyclical demands.
It is commonly observed that application
vendors claim responsibility for optimizing the
performance of the application while hardware
vendors focus only on hardware and network
vendors focus on network alone. This results
in a sub-optimal solution from the customers
point of view because no one entity bears the
responsibility of optimizing the entire technology
stack in this scenario and performance issues
go unresolved for long. Undoubtedly, this
disjointed approach leads to serious loss of
productivity and business. The solution lies in
adopting a holistic approach where vendors bear
76
responsibility of the entire technology stack.
Drawing from the brief discussion above,
I would like to reiterate that the current software
engineering paradigm is reactive, which certainly
needs to move to a higher level of maturity.
The last change in software engineering life
cycle included performance testing. This was an
important change as performance testing helps in
validating application performance before it goes
to production and thus avoids surprises.
However, if serious performance and
scalability issues like bad design or bad coding
practice resulting in non-scalable architecture are
discovered at the fag end of the development,
this leads to go, no-go decisions for letting the
application go live. A no-go decision leads to lost
business opportunity and most often a serious
loss of reputation.
The need is to move to the next
paradigm of software engineering that can be
called as performance driven development.
Performance focus through a performance
engineering governance ofce is suggested
which would help govern performance right
from the business requirements gathering
stage till go-live stage in production. This
will typically involve mapping performance
engineering activities to each SDLC stage. For
example, during architecture denition/review
stage, techniques like performance modeling
can be used to predict the performance
of application before getting into serious
development. Move to performance driven
development will also involve education/

change in developers mindset as they need to
appreciate the importance of the performance/
cost of code they write, in addition to the
business impact of the functionality.
The suggested shift in the software
engineering approach will result in higher
predictability and risk reduction by ensuring that
the business deadlines are met. This will also help
in cost reduction and productivity improvement
by ensuring optimal infrastructure cost. Better
transaction response time will result in end users
spending more time with their customers rather
than waiting for the screen to appear. Thus a shift
from reactive to a proactive mode is more likely
to lead to higher performance and scalability as it
would ensure that mission and business critical
programs meet their stated objectives.
To sum it up, a move to performance
driven development is paramount to realizing
better value from technology investments and
ensuring that the IT systems are aligned to
organizations business and scalability goals.
Author Prole
Rajinder Gandotra is an AVP and Heads the
Performance Engineering and Enhancement (PE2)
practice at Infosys Systems Integration unit. He has
over 19 years of IT experience spanning Consulting,
Management, Strategy and Execution. Prior to
starting the PE2 practice, he set up the Technology
Consulting practice at Finacle, Infosys Universal
Banking solution. He has a Bachelors Degree in
Engineering in the eld of Electrical and Electronics.
He can be reached at [email protected].
77
Index
Analytical Hierarchy Process, also AHP 20, 22
Application Programming Interface,
also API 33-34
Applications Under Test, also AUT 48
Automated Storage Management, also ASM 10
Cost of Quality, also CoQ 36, 50
CRM 26
Cycle Time, also CT 54, 57-61, 63
Design 26, 28, 35-36
Hardware Architecture 28
Patterns 26, 35-36
Physical Architecture 28
Enterprise Back Ofce, also EBO 32-33
Extract, Transform and Load, also ETL 24, 37-43
Finite Element Analysis 66
Global Temporary Table, also GTT 61-62
Grid Computing 31, 74
High Performance Computing, also HPC 66-67
Information Technology Infrastructure Library,
also ITIL 3-4, 7-8
Integration Competency Center, also ICC 28-29
Key Performance Indicator, also KPI 29, 39
Lintel 65
Logical Unit Numbers, also LUN 10
Management 4, 14, 19, 29, 33, 37, 46, 49, 51,
65-67, 73
Batch 33
Business Capacity 4
Component Capacity 4
Data 37
Framework 29, 51
Infrastructure 66-67, 73
Knowledge 29
Performance 14, 29, 46
Resource 66-67
Risk 19
Server 65
Service Capacity 4
Space 67
Model 4, 6, 11-12, 16-19, 21-22, 26, 29, 39, 43, 52
Analytical 21
Application 6
Architecture 29
Business 26
Data 39
Entry, Transaction, Validation and Exit,
also ETVX 52
Information 29
Lifecycle 4
Mathematical 12
Memory 12
Performance 16, 18-19, 21-22, 43
Predictive 11
Queuing 21
Workload 16-18, 22
Monte-Carlo 66
Online Transaction Processing,
also OLTP 16, 59-61
Proof of Concept, also PoC 25, 60, 62
Quality of Service,
also QoS 15, 21, 31-32, 34-36, 65
RADIEN 35-36
Rich Internet Application, also RIA 29
Right First Time, also RFT 54, 58
Service Level Agreement,
also SLA 29, 46-47, 49, 54-55, 59-60, 62, 66
Service Oriented Architecture,
also SOA 29, 31, 36
Simple Network Management Protocol,
also SNMP 13, 70
Software Development Life Cycle,
also SDLC 31-32, 46-49, 51-52, 64
78
Staged Event Driven Architecture,
also SEDA 32, 36
Statistical Process Control, also SPC 37, 41, 43
Storage Area Network, also SAN 9-10, 59, 61
Test Per Second, also TPS 11-12, 59-60
Testing 3, 11, 13, 16, 19, 26, 41-42, 46-49,
51-52, 56- 57
Continuous Performance 57
End-to-End 57
Environment 51
Individual 57
Load 11, 13
Performance 13, 16, 19, 26, 41-42,
47-49, 51, 56-57
Stress 3, 13, 46, 52
Volume and Endurance 46
User Acceptance Test, also UAT 57, 63
Vector Packing Problem, also VPP 70, 72
Virtual Machine, also VM 11, 66-67, 71, 74
Web 2.0 31
Virtualization 31, 65-68, 74
Wintel 65
SETLabs Briefings
BUSINESS INNOVATION through TECHNOLOGY
Editor
Praveen B Malla PhD
Consulting Editor
Rajinder Gandotra
Copy Editor
Sudarshana Dhar
Graphics & Web Editor
Ravishankar SL
IP Manager
K. V. R. S. Sarma
Program Manager
Rajib Das Sharma
ITLS Manager
Ajay Kolhatkar PhD
Marketing Manager
Vijayaraghavan T S
Production Manager
Sudarshan Kumar V S
Distribution Managers
Santhosh Shenoy
Suresh Kumar V H
How to Reach Us:
Email:
[email protected]
Phone:
+91-080-41173871
Fax:
+91-080-28520740
Post:
SETLabs Briengs,
B-19, Infosys Technologies Ltd.
Electronics City, Hosur Road,
Bangalore 560100, India
Subscription:
[email protected]
Rights, Permission, Licensing
and Reprints:
[email protected]
Editorial Office: SETLabs Briefings, B-19, Infosys Technologies Ltd.
Electronics City, Hosur Road, Bangalore 560100, India
Email: [email protected] http://www.infosys.com/setlabs-briefings
SETLabs Briengs is a journal published by Infosys Software Engineering
& Technology Labs (SETLabs) with the objective of offering fresh
perspectives on boardroom business technology. The publication aims at
becoming the most sought after source for thought leading, strategic and
experiential insights on business technology management.
SETLabs is an important part of Infosys commitment to leadership
in innovation using technology. SETLabs anticipates and assesses the
evolution of technology and its impact on businesses and enables Infosys
to constantly synthesize what it learns and catalyze technology enabled
business transformation and thus assume leadership in providing best of
breed solutions to clients across the globe. This is achieved through research
supported by state-of-the-art labs and collaboration with industry leaders.
Infosys Technologies Ltd (NASDAQ: INFY) denes, designs and delivers
IT-enabled business solutions that help Global 2000 companies win in a
at world. These solutions focus on providing strategic differentiation
and operational superiority to clients. Infosys creates these solutions
for its clients by leveraging its domain and business expertise along
with a complete range of services. With Infosys, clients are assured of a
transparent business partner, world-class processes, speed of execution
and the power to stretch their IT budget by leveraging the Global Delivery
Model that Infosys pioneered. To nd out how Infosys can help businesses
achieve competitive advantage, visit www.infosys.com or send an email to
[email protected]
2009, Infosys Technologies Limited
Infosys acknowledges the proprietary rights of the trademarks and product names of the other companies
mentioned in this issue. The information provided in this document is intended for the sole use of the recipient
and for educational purposes only. Infosys makes no express or implied warranties relating to the information
contained herein or to any derived results obtained by the recipient from the use of the information in this
document. Infosys further does not guarantee the sequence, timeliness, accuracy or completeness of the
information and will not be liable in any way to the recipient for any delays, inaccuracies, errors in, or omissions
of, any of the information or in the transmission thereof, or for any damages arising there from. Opinions and
forecasts constitute our judgment at the time of release and are subject to change without notice. This document
does not contain information provided to us in condence by our clients.
NOTES
Authors featured in this issue
ANAGA MAHADEVAN
Anaga Mahadevan is a Project Manager with Infosys Business Intelligence practice. She can be reached at
[email protected].
ARUN KUMAR
Arun Kumar is a Project Manager with Infosys BPM-EAI practice. He can be contacted at [email protected].
ASHUTOSH SHINDE
Ashutosh Shinde is a Senior Technical Architect with Infosys Performance Engineering and Enhancement practice.
He can be reached at [email protected].
BRIJESH DEB
Brijesh Deb is a Senior Technical Architect at Infosys J2EE Center of Excellence. He can be contacted at
[email protected].
BRUNO CALVER
Bruno Calver is a Consultant with Infosys Infrastructure Management Services, specifically part of the Process
Consulting Group. He can be reached at [email protected].
GANESH KUMAR
Ganesh Kumar is a Group Project Manager with Infosys Banking and Capital Markets Unit. He can be contacted at
[email protected].
HARI SHANKAR SUDHAKARAN
Hari Shankar Sudhakaran is a Project Manager with Infosys Business Intelligence practice. He can be reached at
[email protected].
MANOGNA CHEBIYYAM
Manogna Ramakrishna Chebiyyam is a Software Engineer at SETLabs, Infosys. She can be contacted at
[email protected].
MANOJ MACWAN
Manoj Macwan is a Software Engineer with the Infosys CMEX business unit. He can be reached at
[email protected].
RAJAT GUPTA
Rajat Gupta is a Software Engineer with Infosys CMEX practice. He can be contacted at [email protected].
RAJIB DAS SHARMA
Rajib Das Sharma is a Principal Architect with Infosys Performance Engineering and Enhancement practice. He can
be reached at [email protected].
RASHI MALVIYA
Rashi Malviya is a Software Engineer at SETLabs, Infosys. She can be contacted at [email protected].
SANDIP JAJU
Sandip Jaju is a Software Engineer with Infosys CMEX practice. He can be reached at [email protected].
SATHYA NARAYANAN NAGARAJAN
Sathya Narayanan Nagarajan is a Senior Technical Architect with Communications, Media and Entertainment unit of
Infosys. He can be contacted at [email protected].
SHAILESH BHATE
Shailesh Bhate is a Project Manager with Infosys CMEX practice. He can be reached at [email protected].
SHYAM KUMAR DODDAVULA
Shyam Kumar Doddavula is a Principal Architect at Infosys J2EE Center of Excellence. He can be contacted at
[email protected].
SRIKANTH CHANDRASEKHAR
Srikanth Chandrasekhar is a Project Manager with Infosys in the Banking and Capital Markets unit. Srikanth can be
reached at [email protected].
SUNDAR KRISHNAMURTHI
Sundar Krishnamurthi, PMP is a Project Manager with Infosys in the Banking and Capital Markets unit. He can be
contacted at [email protected].
SUNDARAVADIVELU VAJRAVELU
Sundaravadivelu Vajravelu is a Senior Project Manager at Communications, Media and Entertainment unit of Infosys.
He can be reached at [email protected].
SRIKANTH SUNDARRAJAN
Srikanth Sundarrajan is a Senior Architect with the Grid Computing team at SETLabs, Infosys. He can be contacted
at [email protected].
SUMIT KUMAR BOSE
Sumit Kumar Bose is a Research Associate with the Grid and Computing team at SETLabs, Infosys. He can be
reached at [email protected].
YIQUN GE
Yiqun Ge PhD is a Senior Technical Architect with Infosys Performance Engineering and Enhancement Group
practice. Yiqun can be contacted at [email protected].
At SETLabs, we constantly look for opportunities to leverage
technology while creating and implementing innovative business
solutions for our clients. As part of this quest, we develop engineering
methodologies that help Infosys implement these solutions right rst
time and every time.
Subu Goparaju
Vice President
and Head of SETLabs
For information on obtaining additional copies, reprinting or translating articles, and all other correspondence,
please contact:
Telephone : 91-80-41173871
Email: [email protected]
SETLabs 2008, Infosys Technologies Limited.
Infosys acknowledges the proprietary rights of the trademarks and product names of the other
companies mentioned in this issue of SETLabs Briengs. The information provided in this document
is intended for the sole use of the recipient and for educational purposes only. Infosys makes no
express or implied warranties relating to the information contained in this document or to any
derived results obtained by the recipient from the use of the information in the document. Infosys
further does not guarantee the sequence, timeliness, accuracy or completeness of the information and
will not be liable in any way to the recipient for any delays, inaccuracies, errors in, or omissions of,
any of the information or in the transmission thereof, or for any damages arising there from. Opinions
and forecasts constitute our judgment at the time of release and are subject to change without notice.
This document does not contain information provided to us in condence by our clients.

You might also like