Vs J2EE

RESEARCH REPORT
Comparing Microsoft .NET

and IBM WebSphere/J2EE
A Productivity, Performance, Reliability and Manageability Analysis
http://www.MiddlewareRESEARCH.com
David Herst with William Edwards and Steve Wilkes
September 2004
[email protected]
1 Disclosures
1.1 Research Code of Conduct

The Middleware Company offers the worlds leading knowledge network for middleware
professionals. The Middleware Company operates communities, sells consulting and conducts
research. As a research organization, The Middleware Company is dedicated to producing
independent intelligence about techniques, technologies, products and practices in the
middleware industry. Our goal is to provide practical information to aid technical decision
making.
Our research is credible. We publish only what we believe and can stand behind.
Our research is honest. To the greatest extent allowable by law we publish the
parameters, methodology and artifacts of a research endeavor. Where the research
adheres to a specification, we publish that specification. Where the research produces
source code, we publish the code for inspection. Where it produces quantitative results,
we fully explain how they were produced and calculated.
Our research is community-based. Where possible, we engage the community and
relevant experts for participation, feedback, and validation.
If the research is sponsored, we give the sponsor the opportunity to prevent publication if they
deem that publishing the results would harm them. This policy allows us to preserve our
research integrity, and simultaneously creates incentives for organizations to sponsor creative
experiments as opposed to scenarios they can win.
This Code of Conduct applies to all research conducted and authored by The Middleware
Company, and is reproduced in all our research reports. It does not apply to research products
conducted by other organizations that we may publish or mention because we consider them of
interest to the community.
1.2 Disclosure
This study was commissioned by Microsoft.
The Middleware Company has in the past done other business with both Microsoft and IBM.
Moreover, The Middleware Company is an independently operating but wholly owned

subsidiary of VERITAS Software (www.veritas.com, NASDAQ:VRTS). VERITAS and IBM have
a number of business relationships in certain technology areas, and compete directly against
each other in other technology areas.
Microsoft commissioned The Middleware Company to perform this study on the expectation
that we would remain vendor-neutral and therefore unbiased in the outcome. The Middleware
Company stands behind the results of this study and pledges its impartiality in conducting this
study.
1.3 Why are we doing this study? What is our agenda?

We are compelled to answer questions such as this one, due to controversy that sponsored
studies occasionally create.
Page 2 of 109
.NET-WebSphere/J2EE Comparison Report
Copyright 2004 The Middleware Company
First, what our agenda is not: It is not to demonstrate that a particular company, product,
technology, or approach is better than others.
Simple words such as better or faster are gross and ultimately useless generalizations. Life,
especially when it involves critical enterprise applications, is more complicated. We do our best
to openly discuss the meaning (or lack of meaning) of our results and go to great lengths to
point out the several cases in which the result cannot and should not be generalized.
Our agenda is to provide useful, reliable, and profitable research and consulting services to our
clients and to the community at large.
To help our clients in the future, we believe we need to be experienced in and be proficient in a
number of platforms, tools, and technologies. We conduct serious experiments such as this one
because they are great learning experiences, and because we feel that every technology
consulting firm should conduct some learning experiments to provide their clients with the best
value.
If we go one step further and ask technology vendors to sponsor the studies (with both
expertise and expenses), if we involve the community and known experts, and if we document
and disclose what were doing, then we can:
Lower our cost of doing these studies

Do bigger studies
Do more studies
Make sure we dont do anything silly in these studies and reach the wrong conclusions
Make the studies learning experiences for the entire community (not just us)
1.4 Does a sponsored study always produce results favorable to the sponsor?
No.
Our arrangement with sponsors is that we will write only what we believe, and only what we can
stand behind, but we allow them the option to prevent us from publishing the study if they feel it
would be harmful publicity. We refuse to be influenced by the sponsor in the writing of this
report. Sponsorship fees are not contingent upon the results. We make these constraints clear
to sponsors up front and urge them to consider the constraints carefully before they
commission us to perform a study.
Page 3 of 109
2 TABLE OF CONTENTS
1 DISCLOSURES .................................................................................................2
1.1 Research Code of Conduct .........................................................................2
1.2 Disclosure.....................................................................................................2
1.3 Why are we doing this study? What is our agenda? ................................2
1.4 Does a sponsored study always produce results favorable to the
sponsor?.......................................................................................................3
2 TABLE OF CONTENTS.....................................................................................4
3 EXECUTIVE SUMMARY ...................................................................................9
3.1 The Teams...................................................................................................9
3.2 The System ..................................................................................................9
3.3 The Implementations ...................................................................................9
3.4 Developer Productivity Results....................................................................9
3.5 Configuration and Tuning Results .............................................................10
3.6 Performance Results .................................................................................10
3.7 Reliability and Manageability Results ........................................................11
4 INTRODUCTION..............................................................................................12
4.1 How this Report is Organized....................................................................12
4.2 Goals of the Study .....................................................................................13
4.3 The Approach.............................................................................................14
4.4 The ITS System .........................................................................................14
4.5 Development Environments Tested ..........................................................16
4.6 Application Platform Technologies Tested................................................17
4.7 Application Code Availability......................................................................17
5 THE EVALUATION METHODOLOGY ............................................................18
5.1 The Teams.................................................................................................18
5.1.1 The IBM WebSphere Team ..................................................................... 19
5.1.2 The Microsoft .Net Team ........................................................................ 19
5.2 Controlling the Laboratory and Conducting the Analysis ..........................20
5.3 The Project Timeline ..................................................................................20
5.3.1 Target Schedule ..................................................................................... 20
5.3.2 Division of Lab Time Between the Teams................................................. 21
5.3.3 Detailed Schedule................................................................................... 21
5.4 Laboratory Rules and Conditions ..............................................................22
5.4.1 Overall Rules .......................................................................................... 22
5.4.2 Development Phase................................................................................ 23
5.4.3 Deployment and Tuning Phase................................................................ 23
5.4.4 Testing Phase ........................................................................................ 24
5.5 The Evaluation Tests .................................................................................24
6 THE ITS PHYSICAL ARCHITECTURE...........................................................25
6.1 Details of the WebSphere Architecture .....................................................27
6.1.1 IBM WebSphere ..................................................................................... 27
6.1.2 IBM HTTP Server (Apache) ..................................................................... 28
6.1.3 IBM Edge Server .................................................................................... 28
6.1.4 IBM WebSphere MQ ............................................................................... 29
6.2 Details of the .NET Architecture ................................................................29
6.2.1 Microsoft Internet Information Services (IIS) ............................................. 29
6.2.2 Microsoft Network Load Balancing (NLB) ................................................. 29
Page 4 of 109
6.2.3 Microsoft Message Queue (MSMQ) ......................................................... 29
7 TOOLS CHOSEN ............................................................................................30
7.1 Tools Used by the J2EE Team ..................................................................30
7.1.1 Development Tools ................................................................................. 30
7.1.1.1 Rational Rapid Developer Implementation ........................................ 31
7.1.1.2 WebSphere Studio Application Developer Implementation ................. 31
7.1.2 Analysis, Profiling and Tuning Tools......................................................... 32
7.2 Tools Used by the .NET Team ..................................................................32
7.2.1 Development Tools ................................................................................. 32
7.2.2 Analysis, Profiling and Tuning Tools......................................................... 32
8 DEVELOPER PRODUCTIVITY RESULTS .....................................................34
8.1 Quantitative Results ...................................................................................34
8.1.1 The Basic Data....................................................................................... 34
8.1.2 .NET vs. RRD......................................................................................... 36
8.1.3 .NET vs. WSAD...................................................................................... 37
8.2 RRD Development Process.......................................................................37
8.2.1 Architecture Summary ............................................................................. 37
8.2.1.1 RRD Applications ............................................................................ 37
8.2.1.2 Database Access ............................................................................ 38
8.2.1.3 Overall Shape of the Code ............................................................... 38
8.2.1.4 Distributed Transactions .................................................................. 39
8.2.2 What Went Well...................................................................................... 39
8.2.2.1 Web Interfaces ................................................................................ 39
8.2.2.2 Web Service Integration................................................................... 39
8.2.3 Significant Technical Roadblocks............................................................. 39
8.2.3.1 Holding Data in Sessions ................................................................. 39
8.2.3.2 Web Service Integration................................................................... 40
8.2.3.3 Configuring and Using WebSphere MQ ............................................ 40
8.2.3.4 Handling Null Strings in Oracle......................................................... 40
8.2.3.5 Building the Handheld Module.......................................................... 40
8.2.3.6 Miscellaneous RRD Headaches ....................................................... 41
8.3 WSAD Development Process....................................................................42
8.3.1 Architecture Summary ............................................................................. 42
8.3.1.1 Overall Shape of the Code ............................................................... 42
8.3.1.3 Organization of Applications in WSAD .............................................. 43
8.3.2 What Went Well...................................................................................... 44
8.3.2.1 Navigating the IDE .......................................................................... 44
8.3.2.2 Building for Deployment ................................................................... 44
8.3.2.3 Testing in WebSphere ..................................................................... 44
8.3.2.4 Common Logic in JSPs ................................................................... 44
8.3.3 Signficant Technical Roadblocks ............................................................. 45
8.3.3.1 XA Recovery Errors from Server ...................................................... 45
8.3.3.2 Miscellaneous WSAD Headaches .................................................... 45
8.4 Microsoft .NET Development Process ......................................................46
8.4.1 .NET Architecture Summary .................................................................... 46
8.4.1.1 Organization of .NET Applications .................................................... 46
8.4.1.2 Database Access ............................................................................ 47
8.4.1.4 ASP.NET Session State .................................................................. 48
8.4.2 What Went Well...................................................................................... 48
8.4.3 Significant Technical Roadblocks............................................................. 48
8.4.3.1 Transactional MSMQ Remote Read ................................................. 48
8.4.4 Miscellaneous .NET Headaches .............................................................. 50
Page 5 of 109
8.4.4.1 DataGrid Paging ............................................................................. 50
8.4.4.2 Web Services Returning DataSets.................................................... 50
8.4.4.3 The Mobile Application .................................................................... 51
8.4.4.4 Model Object Class Creation............................................................ 51
9 CONFIGURATION AND TUNING RESULTS .................................................52
10 WEBSPHERE CONFIGURAT ION AND TUNING PROCESS SUMMARY....54
10.1 RRD Round: Installing Software ................................................................54
10.1.1 Starting Point .......................................................................................... 54
10.1.2 Installing WebSphere Network Deployment .............................................. 54
10.1.3 Installing IBM HTTP Server ..................................................................... 55
10.1.4 Installing IBM Edge Server ...................................................................... 55
10.2 RRD Round: Configuring the System........................................................55
10.2.1 Configuring JNDI .................................................................................... 55
10.2.2 Configuring the WebSphere Web Server Plugin........................................ 56
10.3 RRD Round: Resolving Code Bottlenecks ................................................56
10.3.1 Rogue Threads ....................................................................................... 56
10.3.2 Optimizing Database Calls ...................................................................... 56
10.3.3 Optimizing the Web Service .................................................................... 56
10.3.4 Paging Query Results ............................................................................. 57
10.3.5 Caching JNDI Objects............................................................................. 57
10.3.6 Using DTOs for Work Tickets .................................................................. 58
10.3.7 Handling Queues in Customer Service Application.................................... 58
10.4 RRD Round: Tuning the System for Performance....................................58
10.4.1 Tuning Strategy ...................................................................................... 58
10.4.2 Performance Indicators ........................................................................... 58
10.4.3 Tuning the JVM ...................................................................................... 59
10.4.3.1 Garbage Collection.......................................................................... 59
10.4.3.2 Heap Size....................................................................................... 60
10.4.4 Vertical Scaling....................................................................................... 61
10.4.5 Database Tuning .................................................................................... 61
10.4.6 Tuning JDBC Settings ............................................................................. 61
10.4.7 Web Container Tuning ............................................................................ 61
10.4.7.1 Web Thread Pool ............................................................................ 61
10.4.7.2 Maximum HTTP Sessions................................................................ 61
10.4.8 Web Server Tuning ................................................................................. 62
10.4.9 Session Persistence ............................................................................... 62
10.5 WSAD Round: Issues ................................................................................62
10.5.1 Use of External Libraries and Classloading in WebSphere......................... 62
10.5.2 Pooling Objects ...................................................................................... 63
10.5.3 Streamlining the Web Service I/O ............................................................ 63
10.5.4 Optimizing Queries ................................................................................. 64
10.6 Significant Technical Roadblocks ..............................................................64
10.6.1 Switching JVMs with WebSphere............................................................. 65
10.6.2 Configuring Linux for Edge Server, Act 1.................................................. 65
10.6.5 Configuring JNDI for WebSphere ND ....................................................... 68
10.6.6 Edge Servers Erratic Behavior ................................................................ 69
10.6.7 Session Persistence ............................................................................... 70
10.6.7.1 Persisting to a Database.................................................................. 70
10.6.7.2 In-Memory Replication ..................................................................... 71
10.6.7.3 Tuning Session Persistence............................................................. 72
10.6.8 Hot Deploying Changes to an Application ................................................. 73
10.6.9 Configuring for Graceful Failover ............................................................. 74
Page 6 of 109
10.6.9.1 Failover Requirements..................................................................... 75
10.6.9.2 Standard Topology .......................................................................... 75
10.6.9.3 Non-Standard Topology ................................................................... 76
10.6.9.4 Modified Standard Topology ............................................................ 77
10.6.10 Deploying the WSAD Web Service .................................................. 78
10.6.11 The Sudden, Bizarre Failure of the Work Order Application ............... 78
10.6.12 Using Mercury LoadRunner ............................................................. 79
11 .NET CONFIGURATION AND TUNING PROCESS SUMMARY...................81
11.1 Installing and Configuring Software...........................................................81
11.1.1 Network Load Balancing (NLB)................................................................ 81
11.1.2 ASP.NET Session State Server ............................................................... 83
11.2 Resolving Code Bottlenecks......................................................................84
11.3 Base Tuning Process.................................................................................84
11.3.1 Tuning the Database............................................................................... 84
11.3.2 Tuning the Web Applications ................................................................... 84
11.3.3 Tuning the Servers ................................................................................. 85
11.3.4 Tuning the Session State Server.............................................................. 85
11.3.5 Code Modifications ................................................................................. 85
11.3.6 Tuning Data Access Logic....................................................................... 85
11.3.7 Tuning Message Processing.................................................................... 85
11.3.8 Other Changes ....................................................................................... 85
11.3.9 Changes to Machine.config ..................................................................... 86
11.3.10 Changes Not Pursued ..................................................................... 86
11.4 Significant Technical Roadblocks ..............................................................86
11.4.1 Performance Dips in Web Service............................................................ 86
11.4.2 Lost Session Server Connections ............................................................ 86
12 PERFORMANCE TESTING ............................................................................88
12.1 Performance Testing Overview .................................................................88
12.2 Performance Test Results .........................................................................88
12.2.1 ITS Customer Service Application ............................................................ 88
12.2.2 ITS Work Order Web Application ............................................................. 89
12.2.3 Integrated Scenario................................................................................. 91
12.2.4 Message Processing............................................................................... 92
12.3 Conclusions from Performance Tests .......................................................93
13 MANAGEABILITY TESTING ...........................................................................95
13.1 Manageability Testing Overview ................................................................95
13.2 Manageability Test Results........................................................................95
13.2.1 Change Request 1: Changing a Database Query ...................................... 95
13.2.2 Change Request 2: Adding a Web Page .................................................. 97
13.2.3 Change Request 3: Binding a Web Page Field to a Database.................... 97
13.3 Conclusions from Manageability Tests......................................................98
14 RELIABILITY TESTING ...................................................................................99
14.1 Reliability Testing Overview.......................................................................99
14.2 Reliability Test Results...............................................................................99
14.2.1 Controlled Shutdown Test ....................................................................... 99
14.2.2 Catastrophic Hardware Failure Test ....................................................... 100
14.2.3 Loosely Coupled Test ........................................................................... 100
14.2.4 Long Duration Test ............................................................................... 101
Page 7 of 109
14.3 Conclusions from Reliability Tests...........................................................101
15 OVERALL CONCLUSIONS ...........................................................................102
16 APPENDIX: RELATED DOCUMENTS .........................................................105
17 APPENDIX: SOURCES USED......................................................................106
17.1 Sources Used by the IBM WebSphere Team .........................................106
17.2 Sources Used by the Microsoft .NET Team ............................................106
18 APPENDIX: SOFTWARE PRICING DATA...................................................107
18.1 IBM Software............................................................................................107
18.2 Microsoft Software ...................................................................................108
Page 8 of 109
3 EXECUTIVE SUMMARY
This study compares the productivity, performance, manageability and reliability of an IBM
WebSphere/J2EE system running on Linux to that of a Microsoft .NET system running on
Windows Server 2003.
3.1 The Teams

To conduct the study, The Middleware Company assembled two independent teams, one for
J2EE using IBM WebSphere, the other for Microsoft .NET. Each team consisted of senior
developers similarly skilled on their respective platforms in terms of development, deployment,
configuration, and performance tuning experience.
3.2 The System

Each team received the same specification for a loosely-coupled system to be developed,
deployed, tuned and tested in a controlled laboratory setting. The system consisted of two Web
application subsystems and a handheld device user interface, all integrated via messaging and
Web services.
3.3 The Implementations

The WebSphere team developed two different implementations of the specification, one using
IBMs model-driven tool Rational Rapid Developer (RRD), the other with IBMs code-centric tool
WebSphere Studio Application Developer (WSAD). The .NET team developed its single
implementation using Visual Studio.NET as the primary development tool.
3.4 Developer Productivity Results

In the development phase of the study, the time it took each team to complete the initial
implementation (including installing all necessary development and runtime software) was
carefully measured to determine overall developer productivity. The .NET implementation was
completed significantly faster than the RRD implementation, and also faster than the WSAD
implementation.
Page 9 of 109
.NET vs. RRD .NET vs. WSAD RRD vs. WSAD
Development Productivity
Significantly better. Better; uncertain how Worse, uncertain how

Greatest difference in much.* much. *
product installs; less to
install/configure on
Windows Server side.
Also differences in
developer productivity for
all subsystems.
.NET team had longer
history w/ VS than J2EE
team w/ RRD.
* The team using WSAD had already built the same application in RRD, and hence realized
productivity advantages not realized for RRD or .NET, since they were familiar with the
specification.
3.5 Configuration and Tuning Results

After developing their system, each team was measured in how long they took to configure and
tune it in preparation for a series of performance, manageability, and reliability tests. The .NET
team completed this stage in significantly less time than the WebSphere team for the RRD
implementation. The .NET team took 16 man days for configuration and tuning, while the
WebSphere team took 71 man days (much of it spent addressing software installation issues
and patching the operating system, however). Later, when they deployed the WSAD
implementation to the existing WebSphere infrastructure, the WebSphere team spent an
additional 24 man days tuning and configuring.
Tuning Productivity
Significantly better. Better; uncertain how much. Uncertain.

Huge part of RRD time Since complete runtime
taken patching Linux, platform had already
dealing with Edge Server been installed/configured
issues. and tuned for the RRD
Significant time also implementation, this
spent re-working RRD- stage was completed
generated code to get much more quickly for
better performance. WSAD.
3.6 Performance Results

In a battery of performance tests, the .NET implementation running on Windows Server 2003
significantly outperformed the RRD implementation running on Linux in four tests. Compared
to the WSAD implementation, the .NET version performed about equally well overall, doing
better on some tests and not as well on others.
Page 10 of 109
Performance
Significantly better on 3 of 4 About equal. Significantly worse on all 4

tests. .NET achieved higher tests.
.NET achieved user user throughput in 1 test,
throughput 66-123% slightly higher in 1, worse
higher than RRD in 3 in 1.
tests. th
In the 4 test, WSAD
th
In 4 test, .NET achieved achieved nearly 3 times
40% higher message the message processing
processing thruput. thruput.
3.7 Reliability and Manageability Results

Manageability and reliability tests revealed better results for .NET on Windows Server 2003; it
significantly surpassed the two J2EE implementations on Linux in terms of deploying changes
under load, gracefully downing servers, and handling catastrophic failover. In terms of
sustained, long-term operation under normal load, all three implementations performed equally
well.
Manageability
Significantly better. Significantly better. Better.

.NET had many fewer .NET had many fewer RRD had many fewer
errors during deployment. errors during deployment. errors during deployment.
.NET slightly faster to .NET slightly faster to RRD preserved sessions
deploy. deploy. more reliably.
.NET preserved sessions .NET preserved sessions Time to deploy about the
more reliably. more reliably. same.
Reliability: Handling Failover
Significantly better. Significantly better. Worse.

RRD implementation: WSAD implementation RRD implementation:
Could not add server to could not handle Could not add server to
cluster after graceful catastrophic failover. cluster after graceful
shutdown. shutdown.
RRD implementation
could not handle
catastrophic failover.
Reliability: Sustained Operation
Over 12 Hour Period Under Moderate Load
Equal. Equal. Equal.
Page 11 of 109
4 INTRODUCTION
Previous studies by The Middleware Company have compared tools or platforms on the basis
of one criterion or another, such as developer productivity, ease of maintenance or application
performance.
This study compares two enterprise application platforms, Microsoft .NET and IBM
WebSphere/ J2EE, across a full range of technical criteria: developer productivity, application
performance, application reliability, and application manageability.
Although sponsored by Microsoft, the study was conducted independently, in a strictly

controlled laboratory environment, with no direct vendor involvement by either Microsoft or IBM.
The Middleware Company cannot emphasize enough that Microsoft had no control over the
development, testing, and results of the study, and we firmly stand by those results as accurate
and unbiased.
Towards that end, TMC has published the methodology used and the source code for both the
.NET and J2EE application implementations for public download and scrutiny. Customers can
review and comment on the methodology, examine the code, and even repeat the tests in their
own testing environment.
4.1 How this Report is Organized

This report covers every aspect of the Microsoft.NET IBM WebSphere/J2EE Comparison
Study: its purpose and methodology, participants, rules and procedures, schedule and working
conditions; not to mention the results, both qualitative and quantitative.
Section 1 discloses the conditions under which The Middleware Company conducted this
study, including our research code of conduct and our policy regarding sponsored studies such
as this one.
Section 3 gives a brief, high-level summary of the study and its results.
Section 4 (this section) introduces the study. It describes:
The goals we tried to achieve with the study

The unique overall approach that the study takes
The software system that the two teams developed and tuned
The development environments that were tested
The technologies of the .NET and WebSphere platforms that were tested
What study artifacts are available and how to obtain them
Section 5 covers the studys methodology in detail:
The composition of the two teams

The independent auditor who controlled the study conditions and conducted the analysis
The project schedule
The rules and conditions that governed the two teams in the laboratory
A summary of the tests conducted
Section 6 details the physical architecture of the system:
The hardware infrastructure used by both teams

The software infrastructure that each team installed
Page 12 of 109
Section 7 describes the tools that each team used during the different phases of the study. In
particular, since the J2EE team built two implementations of the system using two different
development tools, this section compares the two IDEs.
Section 8 presents the developer productivity results:
The quantitative results broken out by core development tasks

The qualitative experiences of the two teams developing each of the three
implementations, including important technical roadblocks
Sections 9-11 present the deployment, configuration and tuning results:
Section 9 lays out the quantitative results

Section 10 describes the WebSphere teams experience, including significant technical
roadblocks
Section 11 describes the .NET teams experience, including significant technical
roadblocks
Section 12 presents the results of the performance tests
Section 13 presents the results of the manageability tests
Section 14 presents the results of the reliability tests
Section 15 presents our conclusions
The final sections, from 16 on, contain various appendices:
Where to find documents related to this report

Important sources used by both teams
Pricing data on the software used in this study
4.2 Goals of the Study

Commentary abounds about the technical merits of both J2EE and Microsoft .NET for
enterprise application development. The Middleware Company in particular has conducted
various studies in the past to compare these two enterprise platforms. Some of these studies,
such as Model Driven Development with IBM Rational Rapid Developer, address developer
productivity. Others, such as J2EE and .NET Application Server and Web Services
Performance Comparison, focus on performance. None, however, has spanned a wider set of
technical criteria that includes not only productivity and performance, but application platform
manageability and reliability as well.
This study is the first of its kind to measure all of these criteria, using a novel evaluation
approach. While we expect the study to spark controversy, we also hope it will fulfill two
important goals:
Provide valuable insight into the Microsoft .NET and IBM WebSphere/J2EE development
platforms.
Suggest a controlled, hands-on evaluation approach that organizations can use to structure
their own comparisons and technical evaluations of competing vendor offerings.
Page 13 of 109
4.3 The Approach
This study took the approach of simulating a corporate evaluation scenario. In it, a
development team is tasked with building, deploying and testing a pilot B2B integrated
application in a fixed amount of time, after which we evaluate the results the team was able to
achieve in this time period.
In the study we executed the scenario three times, once using Microsoft .NET 1.1 running on
the Windows 2003 platform, and twice using IBM WebSphere 5.1 running on the Red Hat
Enterprise Linux AS 2.1 platform. (The latter two cases differed in the development tool used;
more on this in Section 4.5.)
We assembled two different teams, one for each platform, each similarly skilled on their
respective platforms. Each team consisted of senior developers experienced in enterprise
application architecture, application development, and/or performance tuning. The rules limited
each team to no more than two members in the lab at any time, but did not require the same
two members for all phases of the exercise.
The IBM WebSphere/J2EE team consisted of three senior developers from The Middleware
Company with 16 years combined experience in J2EE. The same two of these developers
built both J2EE implementations, and all three participated at different times in the deployment,
tuning and testing phases. For the installation, deployment and initial tuning of the WebSphere
platform, the J2EE team also used two independent, WebSphere-certified consultants having a
total of 7 years experience with the WebSphere platform.
The Microsoft .NET team consisted of three senior developers from Vertigo Software, a
California-based Microsoft Solution Provider, with a combined 10 years experience building
software on Microsoft .NET.
The Middleware Company took pains to keep the study free of vendor influence:
We subcontracted CN2 Technology, a third-party testing company, to prepare the

application specification, set up the testing lab, audit the development process for each
team, and independently perform the actual application testing for each platform.
The teams did not communicate with each other during the study.
Neither team had knowledge of the other teams results until after the study was
completed.
Neither Microsoft nor IBM had any influence over the development teams during the study.
It is important to note that this study represents what the development teams could achieve
using only publicly available technical materials and vendor support channels for their platform.
It does not represent what the vendors themselves might have achieved, nor what each team
might have achieved if given a longer development and tuning schedule or allowed direct
interaction with vendor consultants. Therefore, the resulting applications developed by the two
teams may not fully represent vendor best practices or vendor-approved architectures. Rather,
they reflect what customers themselves might achieve if tasked with independently building
their own custom application using publicly available development patterns, technical guidance
and vendor support channels.
4.4 The ITS System

The comparison at the heart of the study centers around the development and testing of a
loosely coupled system known as ITS. ITS is a facilities management system created for the
fictitious ITS Facilities Management Company (ITS-FMC). The system represents a B2B
integration scenario, allowing corporate customers of ITS-FMC to use a Web-based hosted
Page 14 of 109
application to create and track work order requests for facilities management on their corporate
premises.
The ITS system comprises three core subsystems that operate together in in both a loosely
coupled fashion (vi a messaging) and a tightly coupled fashion (via synchronous Web Service
requests):
The ITS Customer Service Application. ITS-FMCs corporate clients use this Web-based
application to create and track work order requests for facilities management at their
premises. The application automatically dispatches work order requests via messaging to
the central ITS system, which operates across the Internet on a separate ITS-FMC internal
network. The ITS Customer Service Application also allows customers to track the status
of their work orders via Web service calls to the ITS central system, as well as view/modify
customer and user information.
The ITS Central Work Order Processing Application. This application is operated by
ITS-FMC itself on a separate corporate network. The application receives incoming work
order requests (as messages) from the ITS Customer Service Application. It places the
requests into a database for further business processing, including assignment to a specific
on-site technician. The application hosts the Web service that returns work order status
and historical information to the ITS Customer Service Application. Additionally, this
application has a Web user interface that ITS -FMCs central dispatching clerks can use to
search, track and update work order requests, as well as query customer information and
query/modify technician data.
The Technician Work Order Mobile Device Application. This application operates on a
handheld device, allowing technicians to retrieve their newly assigned work items and
update work order status as they complete their work orders at the customer premises.
Technicians use this application for dispatching purposes, and to log the time spent
working on an issue so that customer billing can occur.
Page 15 of 109
The following diagram illustrates these three subsystems and their interactions:
Technician Mobile
Device Application
ITS Corporate Network
B2B Internet
ITS Work Order ITS Work Order
ITS Customer Service Connectivity
Message Queue Processing
Application Server Application
ITS Durable ITS Work Order

ITS Customer Processing
Service Database Message Queue
Database
Figure 1. ITS Connected System Diagram
4.5 Development Environments Tested

In a study that focuses on both developer productivity and application performance, the
development environment can influence the studys outcome as much as the deployment
platform. The choice of development environment affects how quickly and easily the developer
can
build the application initially

alter the code to eliminate performance bottlenecks
While there are.NET development tools from third party vendors such as Borland, the vast
majority of .NET development is done using Visual Studio.NET from Microsoft. This is the
development environment that the .NET team used to produce its implementation.
The J2EE world, on the other hand, offers many competing development tools with different
approaches and advantages. Even within IBMs domain, choices exist. To reflect this range of
offerings and enhance the studys usefulness, we had the J2EE team develop two different
implementations of the ITS system using two different IBM tools: Rational Rapid Developer
(RRD) and WebSphere Studio Application Developer (WSAD). Since both IDEs belong to IBM
and are designed to work well with WebSphere, they are both consistent with the studys focus
on the IBM WebSphere platform. But the two IDEs have important differences that ultimately
led to different results.
Page 16 of 109
Details on these tools, how they compare, how they were used, and other development
software used with them can be found in Section 7.
4.6 Application Platform Technologies Tested

The ITS system tests the following functionality of the two application platforms:
Web application development

Web application configuration/tuning
Web application manageability, reliability and performance
Message-based application development
Message queue reliability and performance
Mobile device application development
4.7 Application Code Availability

The application code for both the .NET and J2EE implementations can be downloaded from
http://www.middlewareresearch.com. Customers can download the applications and install
them in their own environments for further testing and confirmation of the results.
The discussion forum for the study is located at http://www.theserverside.com.
Finally, customers and vendors can email The Middleware Company to discuss the report and
propose further testing or offer comments by emailing to: [email protected].
Page 17 of 109
5 THE EVALUATION METHODOLOGY
This study was designed to simulate two enterprise development teams given a fixed amount of
time to build and tune a working pilot application according to a set of business and technical
requirements. One team developed the application using IBM WebSphere running on Linux,
while the other team developed the application using Microsoft .NET running on Windows
2003.
Development took place in a controlled laboratory environment where the time taken to
complete the system was carefully measured. The two teams worked from a common
application specification derived from a set of business and technical requirements. Neither
team had access to the specification until development started in the controlled lab setting.
After developing an implementation, the team then tuned and configured it as part of a
measured deployment phase. Each implementation was then put through a set of basic
performance, manageability and reliability tests while running under load on the production
equipment. Hence this study not only compares the relative productivity achieved by each
development team, but also captures the base performance, manageability and reliability of
each application in a deployed production environment.
It is extremely important to note that the study allocated a fixed amount of time to each phase
of the project, and hence objectively documents what each team was able to achieve in this
1
fixed amount of time. The study objectively documents exactly what each team was able to
achieve, inclusive of detailed notes documenting technical roadblocks encountered by each
team, and how these were resolved. As such, the study tells an interesting story that will
undoubtedly spark much debate, but also shed valuable light on each platform based on actual
hands-on development and testing of a pilot business application.
5.1 The Teams

Each team fielded two developers skilled in their respective development platform, and each
team was selected such that their product experience levels and skill sets matched as closely
as possible. As noted in Section 4.3, each team could have only two members in the lab at one
time, but did not have to use the same two throughout the exercise.
Neither team included any representative from either IBM or Microsoft, and neither team was
allowed any direct interaction with vendor technicians from IBM or Microsoft other than the
standard online customer support channels available to any customer. In cases where a team
used a vendor support channel, support technicians were not told they were assisting a
research project conducted by The Middleware Company; so the team received only the
standard treatment afforded any developer on these channels.
To mirror the development process of a typical corporate development team, we allowed the
teams to consult with other members of their organizations outside the lab, to answer technical
questions and provide guidance as required. Such access to external resources was
monitored and logged, and we extended the rule prohibiting direct vendor interactions (other
than with standard customer support channels) to all resources contacted during the
development and testing phases of the project.
Here are details on the makeup and experience of the two teams.
1
Note that under certain circumstances we allowed a team to go beyond that fixed time period. See section 5.3.1 for details.
Page 18 of 109
5.1.1 The IBM WebSphere Team
The WebSphere team consisted of three developers from The Middleware Company, described
in the following table. Members A and B developed both the RRD and WSAD implementations,
while all three members participated at different times in the tuning and testing phases.
J2EE Team Members from The Middleware Company
Development Java J2EE

Team Experience Experience Experience
Member (years) (years) (years) Other Relevant Experience
A 14 7 4 Broad experience with

development tools and
platforms. Particular strength
in design.
B 15 8 6* Experienced in RRD,
modeling and design.
C 23 8 6* Extensive experience in
tuning enterprise applications
for performance.
* Includes experience with Java servlet API predating the introduction of J2EE in 1999.
Additionally, the J2EE team used two independent, IBM-certified WebSphere consultants at
different times during the deployment and tuning phase.
One had three years experience as a WebSphere administrator on various Unix platforms,
including Linux.
The other had over four years experience installing, configuring and supporting IBM
WebSphere on multiple platforms, including Linux.
5.1.2 The Microsoft .Net Team

The .NET team consisted of three senior developers from Vertigo Software, a California-based
Microsoft Solution Provider, with the following credentials:
.NET Team Members from Vertigo Software
Microsoft
Development Platform .NET
Team Experience Experience Experience
Member (years) (years) (years) Other Relevant Experience
A 7 7 3 Experienced in Web
application development and
design
B 13 13 4 Experienced in design in the

presentation, business, and
database tiers
C 7 5 3 Experienced in development
and performance tuning
Page 19 of 109
5.2 Controlling the Laboratory and Conducting the Analysis
The Middleware Company subcontracted a third-party testing organization, CN2 Technology, to
write a specification for the ITS system, set up the lab environment, design the tests, monitor
and control the testing environment, and conduct the actual tests of the J2EE and .NET
implementations. CN2 strictly monitored the time spent by each development team on the
various phases of the project, and controlled the lab environment. CN2 also strictly monitored
Internet access and email access, including logging all such access from within the lab, to
ensure that neither team violated the rules of the lab.
For details on those rules, see Section 5.4.
5.3 The Project Timeline
5.3.1 Target Schedule

This study was designed with the objective that each team would complete its work in 25
workdays (five calendar weeks), distributed as follows:
Phase Description Days
Phase 1 Development 10
Phase 2 Deployment and tuning 10
Phase 3 Formal evaluation testing up to 5 (as needed)
While we felt confident that the teams could complete Phases 1 and 3 in the allotted time, we
were less certain about Phase 2. If, after ten days of deployment and tuning, the
implementation did not perform up to even minimal standards, the results of formal testing in
Phase 3 would have little meaning.
So we added a requirement that each team continue their configuration and performance
tuning until satisfied that their implementation would perform well enough to actually undergo
the tests in the final week. This meant that each team was allowed to go beyond their allotted
ten days if they desired, with the understanding that all time spent would be monitored and
reported.
Page 20 of 109
5.3.2 Division of Lab Time Between the Teams
To keep the two teams from communicating with each other, while at the same time preserving
the continuity of their work, we interleaved their time in the lab in the following sequence:
.NET Team J2EE Team
Implementation Phase Implementation Phase
.NET 1: Development
RRD 1: Development
.NET 2: Deployment / tuning
.NET 3: Evaluation testing
RRD 2: Deployment / tuning
RRD 3: Evaluation testing
WSAD * 1: Development
WSAD 2: Deployment / tuning
WSAD 3: Evaluation testing

*Note that the J2EE team developed the WSAD implementation offsite, not in the controlled lab
environment. This implementation was not in the initial scope of the project, but was added to
ensure that the performance, reliability and manageability tests painted a more complete
picture of J2EE/WebSphere for the community.
5.3.3 Detailed Schedule

The following table documents the desired project schedule including the schedule goals
established for the development, tuning/configuration and testing of each implementation. As
explained in Section 5.3.2, the two teams occupied the lab at different times, so this schedule
was repeated for each implementation.
Desired Development and Testing Schedule

(established prior to start of exercise)
Schedule
Timeline Task/Event Description
Phase 1: Development
Day 1 Development team arrives in lab.
Day 1 Overview of lab rules and Team was introduced to the lab
(1 hour) hardware environment. environment for the first time, lab rules
were explained, and a walkthrough of the
hardware was conducted.
Day 1 Development team given CN2 Technology provided a detailed

(2 hours) application specification for first walkthrough of the application
time. Two hour application specification and answered initial
specification overview with Q&A. questions about the specification.
Page 21 of 109
Desired Development and Testing Schedule
(established prior to start of exercise)
Schedule
Timeline Task/Event Description
Day 1 Application specification review, Team reviewed the application

development tool and application specification in detail, and created a
server setup. strategy for dividing the work and
beginning development.
Days 1-10 Application development. Team developed the application

according to the provided specification.
All development time in the lab was
carefully tracked for each component of
the system. CN2 deemed development
completed when the implementation
passed a series of functional tests.
Phase 2: Deployment and Tuning
Day 11 Review of base performance, CN2 reviewed with team the tests to be
manageability and reliability performed and technical
tests and requirements, requirements/goals for these tests. CN2
including review of Mercury Load provided a walkthrough of the Mercury
Runner test scripts and test tool. LoadRunner testing environment and
base test scripts so the team could begin
configuring and tuning.
Days 11-20+ Application performance and Ten 8-hour days were initially allotted for
configuration tuning. tuning in preparation for evaluation tests.
However, the team was allowed more
time if required to ensure they felt ready
to conduct the actual tests.
Phase 3: Evaluation Testing
Days 21-25 Performance, manageability and Performance, manageability and

reliability tests conducted. reliability tests were conducted in the lab
and results logged.
5.4 Laboratory Rules and Conditions

This section describes the conditions each team faced as they started each phase of the study
and the various rules governing their behavior inside and outside the laboratory environment.
5.4.1 Overall Rules

Several rules applied to the entire exercise:
Team members could only use the provided machines for development work and Internet
access. Personal laptops were barred from the lab.
Each day was limited to 8 hours working time in the lab, with an additional hour for lunch.
The team could seek technical support and guidance from other members of their
organization outside the lab as required. They could communicate via telephone or email.
Page 22 of 109
Neither team members nor their offsite colleagues could have any interaction with vendor
technicians from IBM or Microsoft, other than through standard online customer support
channels.
If they did use vendor support channels, team members could not reveal that they were
participating in a study involving IBM and Microsoft software; they received only the
standard treatment afforded any developer on these channels.
Note, however, that the WSAD implementation was developed after the RRD implementation,
and was developed offsite, not in the controlled lab environment.
5.4.2 Development Phase

When a team entered the lab for the first time, they were given the following initial environment:
A development machine for each developer, pre-configured with Windows XP and Internet
access.
Two machines with the two ITS databases pre-installed and pre-populated with data. The
database server was Microsoft SQL Server for the .NET team, Oracle for the WebSphere
team.
Four application server machines pre-configured with the base OS installation only
(Windows Server 2003 for the .NET team, Red Hat Enterprise Linux 2.1 for the WebSphere
team).
As for augmenting or modifying this initial environment, both teams were under the same
restrictions:
They had to install/configure their development environment (tools, source control, etc) as
part of the measured time to complete the application development phase.
They had to install the application server software separately on each server as part of the
measured development time.
They could not make changes to the database schemas, other than adding functions,
stored procedures, or indexes.
This rule applied specifically to coding of the RRD and .NET implementations:
Team members were not allowed to work on code outside the lab. This meant they could
not remove code from or bring code into the lab.
For all implementations (RRD, WSAD and .NET) this rule applied:
We allowed use of publicly available sample code and publicly available pre-packaged
libraries, since a typical corporate development team would also have access to such code.
5.4.3 Deployment and Tuning Phase

When Phase 2 began, the CN2 auditor introduced the test environment that the team would be
using. This environment consisted of:
Mercury LoadRunner to simulate load

A dedicated machine for the LoadRunner controller
Some 40 additional machines to generate client load
Base tests scripts created by CN2 so that the development team did not have to spend
time doing so
Page 23 of 109
5.4.4 Testing Phase
The rules for Phase 3 were the most restrictive, since this phase consisted of the formal
evaluation tests conducted by the CN2 auditor:
Team members could not modify application code or system configurations except as
needed during a test.
After a load test was launched, the team would have to leave the lab until the test reached
completion (typically 1-4 hours later).
5.5 The Evaluation Tests

During Phase 3 the CN2 auditor conducted a variety of tests to measure manageability,
reliability and performance of the implementations. Some of these tests required the active
participation of the teams; others did not.
Most of the tests were performed under load. As mentioned above, in this study Mercury
LoadRunner running on 40 client machines was used to simulate load. CN2 provided the
teams with a set of LoadRunner scripts for each implementation.
The three sets of scripts were carefully constructed to perform the same set of actions,
ensuring that they tested the exact same functionality for each implementation in a consistent
2
manner.
Here is a summary of the tests performed; for more details and for test results see Sections 12
to 14.
Performance capacity (stress test). How many users can the system handle before
response times become unacceptable or errors occur at a significant rate?
Performance reliability. Given a reasonable load (based on the results of the stress test),
how reliably does the system perform over a sustained period (say, 12 hours)?
Efficiency of message processing. How quickly can the Work Order module process a
backlog of messages in the queue?
Ease of implementing change requests. How quickly and easily can a developer
implement a requested change to the specification?
Ease and reliability of planned maintenance. How easily and seamlessly can system
updates be deployed to the system while under load?
Graceful failover. How well does the clustered Customer Service module respond when
an instance goes down.
Session sharing under load. If one of the clustered Customer Service instances fails
under load, are the sessions that were handled by the failed instance seamlessly resumed
by the other Customer Service instance?
2
CN2 could not provide a single set of scripts for all three implementations because the three differed in certain low-level details, such as the
URL of a given page, the names of fields in that page and whether that page was to be invoked with GET or POST.
Page 24 of 109
6 THE ITS PHYSICAL ARCHITECTURE
This section describes the hardware and software infrastructure each team used to run its
implementation of the ITS system.
The specification required that the teams deploy to identical hardware; in fact, they used the
same hardware. On the machines hosting the applications and the message server, each team
had its own removable hard drive that was swapped in. On the machines hosting databases,
the two teams DBMSs shared the same drive, but were never run simultaneously. In this way
all three implementations used the very same processors, memory and network hardware.
On the software side, the teams started with the operating systems and database engines
already installed. They were responsible for installing the application server, message server,
load balancing software and handheld device software.
Page 25 of 109
This table lists the hardware and software used each by each team:
ITS
Subsystem Servers Hardware .NET Software J2EE Software
Customer 2 Hewlett Packard Windows Server Red Hat Enterprise

Service (identical, DL580 with 4 1.3 2003 Linux AS 2.1
application load- GHz processors, .NET 1.1 IBM WebSphere
balanced) 2GB of RAM and development Network
Gigabit networking framework and Deployment 5.1
Work Order 1
runtime (part of
Processing
Windows Server
application 2003)
Dedicated 1 Windows Server Red Hat Enterprise
durable 2003 Linux AS 2.1
message Microsoft MSMQ IBM MQSeries
queue (part of Windows IBM WebSphere
server Server 2003) Deployment
Manager
IBM Edge Server
Customer 1 Hewlett Packard Windows Server Windows Server

Service DL760 with 8 900 2003 2003
database MHz processors SQL Server 2000 Oracle 9i Enterprise
and 4 GB RAM Enterprise Edition
Work Order 1 attached to a Edition
database SANs network
storage array with
500 GB of storage
in a RAID 10
configuration
Technician n/a Hewlett Packard .NET Compact Insignia Jeode JVM

Mobile iPAQ 5500 Framework Mobile Information
Device PocketPC Device Profile
application (MIDP) 2.0
Page 26 of 109
The following diagram shows the physical deployment of the ITS system to the network,
including all the machines listed above. It also shows the machine hosting the Mercury
LoadRunner controller and the 40 machines providing client load.
Figure 2. ITS Connected System Physi cal Deployment Diagram
6.1 Details of the WebSphere Architecture

The basic WebSphere infrastructure described below was used with both J2EE
implementations. The J2EE team installed it during the RRD round and did not substantially
change it during the WSAD round.
6.1.1 IBM WebSphere

The team used WebSphere Network Deployment (ND) Edition version 5.1. This version of
WebSphere has the same core functionality as basic WebSphere, but allows for central
administration of multiple WebSphere instances across a network. It also allows instances to
be clustered for the purpose of application deployment, so that, for example, one can deploy
the Customer Service application to two WebSphere instances at once.
Page 27 of 109
Initially the team included three nodes in the WebSphere network: the two Customer Service
machines and the single Work Order machine. Later they included the Message Queue Server
machine as well, so that they could run a WebSphere instance there for sharing session state
in the Customer Service application.
WebSphere ND includes a Deployment Manager, a separate server dedicated to system

administration. This server communicates with node agents on each node to handle remote
deployment and configuration. The team installed the Deployment Manager on the same host
as the MQ server.
In terms of WebSphere instances, the team started with one per node. Along the way they
experimented with multiple instances per node (for example, to run each Work Order module in
a dedicated instance), but found no improvement and returned to the original configuration.
6.1.2 IBM HTTP Server (Apache)

WebSphere has an HTTP transport listening on port 9080 that acts as a Web server. This
transport is adequate for development and for running under very light loads, but cannot handle
the traffic of even moderate loads. For this reason the team needed a separate Web server.
Even though Red Hat Linux includes an Apache Web server distribution, the team chose to
install IBM HTTP Server (IHS) 2.0, IBMs distribution of the Apache Web server.
Using an external Web server necessitates the use of IBMs Web Server Plugin, an interface
between the Web server and the WebSphere HTTP transport. The plugin consists of a native
runtime library and an XML configuration file, plugin-cfg.xml. Applying the plugin consists of
these steps:
1. Modify Apaches httpd.conf file to load the plugin library.
2. Modify Apaches httpd.conf to point to the plugin configuration file.
3. From within WebSpheres Deployment Manager, automatically update the plugin

configuration file and copy it to the ND nodes. Normally this process takes only a few
seconds but must be done every time there is a change in the configuration of an
application using the Web (such as the name or location of the Web application).
4. Bounce IHS if the configuration file has changed.
Note that along the way the team found reason to customize the plugin configuration in ways
not possible through the Deployment Manager. That meant they departed from the normal
plugin configuration update process described in Step 3. For details, see Section 10.6.9.4.
6.1.3 IBM Edge Server

The WebSphere team thought carefully about how to handle load balancing to and failover
between the two Customer Service instances. One simple solution is a DNS round-robin
arrangement, where a DNS server takes incoming requests to a single cluster IP address and
distributes them evenly between the addresses of the two Customer Service machines. This
solution addresses load balancing, but not failover.
To handle both, the WebSphere team decided to use IBMs preferred solution, Edge Server.
This component sits in front of the Web servers and balances load among them. But it also
monitors the health of the Web servers and channels traffic away from one that fails.
The team installed Edge Server on the MQ server host, because that machine was guaranteed
not to go down. Then they had to configure that host and the Customer Service hosts at the
operating system level for Edge Server to work properly. These configuration requirements led
to some of the most vexing problems faced by the WebSphere team, as discussed in Section
10.6.
Page 28 of 109
6.1.4 IBM WebSphere MQ
The WebSphere team used IBMs WebSphere MQ Series for its message server. MQ was
installed on the host designated for that purpose. Using it also required that host to have an
instance of WebSphere, whose JMS server acts as a front end for MQ.
6.2 Details of the .NET Architecture

For the servers, the .NET team required no software other than Windows Server 2003. With a
base installation of Windows Server 2003, enabling Application Server mode installs and
configures the .NET Framework, Internet Information Services, MSMQ, and all other
components that the .NET team needed to build their implement ation. The .NET Framework
has built-in support for Web services and message queuing, which enabled the team to provide
integration between the Customer Service and Work Order applications.
6.2.1 Microsoft Internet Information Services (IIS)

Microsoft Windows Server 2003 comes with Internet Information Services (IIS) version 6.0.
Like Apache for Linux, IIS 6.0 is a widely used Web server for Windows Server 2003.
ASP.NET, the Web application engine for .NET applications, is integrated directly with IIS 6.0.
In addition, Visual Studio enables developers to deploy applications to production servers or
staging servers directly from their development machines, a feature that the .NET development
team utilized during development.
6.2.2 Microsoft Network Load Balancing (NLB)

One of the requirements for the ITS Customer Service application was to support load
balancing and failover. Microsoft Network Load Balancing (NLB) Service is designed to provide
this functionality. Built into Windows Server 2003, this service balances the load among
multiple Web servers (in this case, two) and monitors their health, providing sub-second failover
from a server that fails. The .NET team had to configure NLB on the two servers that hosted the
Customer Service application using the graphical configuration tools built into Windows Server
2003.
The details on how the .NET team configured NLB for the ITS system are found in the Section
11.1.1.
6.2.3 Microsoft Message Queue (MSMQ)

The ITS specification also required loosely coupled integration between the Customer Service
and Work Order applications via a message-driven architecture. The .NET team used
Microsoft Message Queue (MSMQ) to satisfy this requirement.
Like IIS and NLB, MSMQ also comes built into Microsoft Windows 2003. The .NET team had
to enable MSMQ and create and configure the queues for the application. .NET provides
classes for accessing and manipulating the queues. As per the specification, a separate,
dedicated queue server was used for message queuing, with the Customer Service application
writing to the remote queue on this server, and the Work Order application reading messages
from this remote queue for processing.
Page 29 of 109
7 TOOLS CHOSEN
Each team had the freedom to choose any development, analysis, profiling and support tools
they wished to complete their work for their platform. This section describes the various tools
they chose.
7.1 Tools Used by the J2EE Team
7.1.1 Development Tools

The J2EE team had a broad choice of development environment. To give a better overview of
the tradeoffs between different types of tools, two implementations for WebSphere were built:
one using IBMs Rational Rapid Developer (RRD), the second using IBM WebSphere Studio
Application Developer (WSAD).
RRD is a a model driven, visual tool that provides O/R mapping and data binding technology
and generates J2EE code from visual constructs. WSAD is a more mainstream J2EE
development tool dedicated to WebSphere.
These two IDEs have important differences that pertain to this study:
RRDs approach emphasizes developer productivity. But the code it generates is not
optimized for performance and does not lend itself to manual tuning.
WSADs approach requires the developer to write much more code manually, but gives the
developer complete freedom to optimize that code.
While both tools work well with WebSphere, WSAD integrates more tightly and provides a
lightweight version of WebSphere for development testing.
This table compares the two IDEs in greater detail:
Comparing Rational Rapid Developer (RRD)

and WebSphere Studio Application Developer (WSAD)
as Development Tools
Aspect of
Development RRD WSAD
Approach to Takes a model-driven approach that Takes a conventional

J2EE removes you, the developer, from the approach in that you must write
development J2EE platform by several degrees. Has and manage all the Java and
you model your classes, pages, JSP code directly. WSAD
components, messages and business may offer templates or wizards
logic in its own format; then it generates to get you started on a
Java and JSP code for you. That particular coding path, but you
generated code becomes just another must still handle the resulting
product of the tool which, like generated code.
deployment descriptors, you would
normally not touch, much less edit.
Page 30 of 109
Comparing Rational Rapid Developer (RRD)
and WebSphere Studio Application Developer (WSAD)
as Development Tools
Aspect of
Development RRD WSAD
Approach to Has you place controls in a page design Again, more conventional:
page space, then bind them to data objects from You write business logic code
development your class model. Each page is served by to be used in standard JSPs,
its own subset of classes and attributes then write the JSPs
from the model. themselves. If desired you can
use Struts.
Deployment Supports a number of popular platforms, Dedicated to WebSphere. You

platform for including WebSphere, WebLogic and can deploy your application
development Apache Tomcat. For development directly to a WebSphere
purposes IBM recommends deploying to instance.
the much lighter-weight Tomcat platform, Also includes a WebSphere
then at the end regenerating for and test environment (a lightweight
deploying to WebSphere. version of WebSphere), that
speeds development.
Configuring Has platform settings for WebSphere that Lets you configure your target
WebSphere let you specify JDBC datasources, JMS platform, whether the
message queues and other critical WebSphere test environment
resources. But these settings affect the or a real WebSphere instance,
application only, not the target platform. through the IDE.
You must still configure WebSphere Conversely you can also
directly. configure WSADs test
environment through a
standard WebSphere admin
console just as you would the
real WebSphere.
7.1.1.1 Rational Rapid Developer Implementation

For their first implementation the J2EE team used RRD for most, but not all, development work:
They used RRD to build the two Web applications (Customer Service and Work Order), the
Work Order message consumption module and the Work Order Web service, which
answers work ticket queries from the Customer Service application.
RRD was not suited for developing the handheld module, however. For that piece the
team used Sun One Studio, Mobile Edition.
During the tuning phase they developed a small library of custom classes to solve some
performance bottlenecks. They used TextPad to write the classes and the Java
Development Kit (JDK) to compile and package the library.
For source control of the RRD code, the team used Microsoft Visual Source Safe, which
integrates nicely with RRD.
7.1.1.2 WebSphere Studio Application Developer Implementation

For their second implementation the J2EE team used WSAD for all development, except the
handheld module, which they did not redevelop in the second implementation.
Page 31 of 109
Although WSAD works with certain source control software, including CVS and Rational Clear
Case, the team did not use either for this implementation. Instead they simply divided the work
carefully and copied changed source files between their two development machines.
7.1.2 Analysis, Profiling and Tuning Tools

To profile the application, identify bottlenecks within the code and analyze system performance,
the team used these tools at various times:
WebSpheres tracing service. A crude runtime monitor built into WebSphere. From the
admin console you can select all the different activities you want to monitor in WebSphere;
the list covers everything the server does. You choose the categories, restart the server,
and see the output in a log file.
IBM Tivoli Performance Viewer (TPV). A profiler that integrates easily with WebSphere.
It displays a wide range of performance information. TPV also has a performance advisor
that recommends changes for better performance.
VERITAS Indepth for J2EE. This is a sophisticated profiler that lets you measure the
performance of code to almost any desired granularity.
Borland Optimizeit. This is another profiling tool that gave the team important information
about thread usage which Indepth could not provide.
Oracle Enterprise Manager. The team used this tool to manage and tune the database,
for example to adjust the size of Oracles buffer cache. But Enterprise Manager also has a
suite of analysis tools that the team used from time to time. By far the most useful was Top
SQL, which gives valuable statistics on the SQL statements executed against the
database.
top and Windows Performance Monitor. The team used these simple tools to monitor
CPU usage on the Linux and Windows machines respectively.
7.2 Tools Used by the .NET Team

7.2.1 Development Tools
For development, the Microsoft .NET team chose Microsoft Visual Studio .NET Enterprise
Architect Edition 2003, coupled with Visual SourceSafe 6.0d for source control. They used
Visual Studio to lay out ASP.NET Web Forms graphically, but coded the back-end business
and data logic manually for all applications using C#.
For Web development, Visual Studio includes a feature that makes deployment fairly easy.
The Copy Project mechanism allows a developer to deploy a Web application to any machine
with IIS installed.
The .NET team also used Visual Studio to develop the handheld application since they chose
to target a Microsoft Windows Mobile 2003-based Pocket PC, which includes the .NET
Compact Framework. To develop the application, the team used Visual Studios Pocket PC
emulator; for testing and deployment, they used the real device. With Visual Studio, deploying
to a real device was straightforward.
7.2.2 Analysis, Profiling and Tuning Tools

The primary tool the .NET team used for analysis was Windows Performance Monitor. Given
the broad range of performance counters available in Windows, this tool can provide fine-
grained visibility of the resource utilization of the applications under investigation.
Page 32 of 109
To help them analyze database activity, the team used these Microsoft SQL Server 2000 tools:
Query Analyzer (Index Tuning Wizard)

Enterprise Manager
Profiler
Page 33 of 109
8 DEVELOPER PRODUCTIVITY RESULTS
The focus during the development phase of the project was on developer productivity: how
quickly and easily can a team of two developers build the ITS system to specification?
Section 8.1 presents the quantitative productivity results of the development phase. The rest of
Section 8 details the experiences of the two development teams: the architecture they chose
for their implementations, what went well for them during the development phase, and the
major roadblocks they encountered.
8.1 Quantitative Results

The .NET and RRD implementations were built in the controlled environment of the lab. During
their development, the team members carefully tracked and the auditor recorded the time spent
building each of the core elements of the ITS system. Since the Oracle 9i and SQL Server
2000 databases were fully installed and configured in advance, neither team had to spend time
creating database schemas.
The WSAD implementation, on the other hand, was built later under special circumstances:
The J2EE team had already built the application once.

The team did not work in the lab, so the auditor could not monitor their time. Instead they
carefully tracked their own times.
The team did not reinstall WebSphere on the development or production machines.
The team did not redevelop the handheld application.
For these reasons the auditors report provides productivity results only for the .NET and RRD
implementations, while issuing a disclaimer regarding the WSAD results.
8.1.1 The Basic Data

The study tracked these core development tasks:
Installing Products. This included time to install software on both development and
server machines. All equipment used by the two teams initially had only a core OS
installation, except for the two databases (Customer Service and Work Order) which were
already installed and pre-loaded with an initial data set.
Building the Customer Service Web Application. This included constructing the Web UI
and backend processing for the Customer Service application according to the provided
specification, as well as the functionality to send messages. It also included creating the
Web service request that provides the ticket search functionality in the Customer Service
application, and ensuring the application could be deployed to a cluster of two load-
balanced servers with centralized server-side state management for failover.
Building the Work Order Processing Application. This included building the Web UI
and backend processing for the Work Order application according to the provided
specification, as well as the message handling functionality. It also included creating the
Web service for handling ticket search requests from the Customer Service application.
Building the Technician Mobile Device (Handheld) Application. This development task
included building the complete mobile device application according to the provided
specification.
Page 34 of 109
System-Wide Development Tasks. This category included working out general design
issues, writing shared code and general deployment and testing.
The following table shows the actual time spent building the .NET and RRD implementations, in
developer hours. The data come from the auditors report:
Time Spent Developing the ITS System, by Development Task

(in developer hours)
Team / Tool Used

Development Task /
ITS System Component .NET / VS.NET J2EE / RRD
Customer Service 40 69
Application
Work Order Processing 41 59

Application
System-Wide Development 2 29
Tasks
Subtotal 83 157
Product Installs 4 22
Technician Mobile Device 7 16

Application
Overall total 94 196
The WSAD implementation was created later by the same team that had previously created the
RRD implementation. It was also created outside of the controlled lab setting. Hence,
productivity data for this implementation cannot be directly compared to the other two, since the
team benefited from already having already built the same application once. In addition, the
team did not reinstall the WebSphere software nor redevelop the handheld application for the
WSAD implementation.
Page 35 of 109
Nevertheless, the following table shows the relative time spent developing the WSAD
implementation of the ITS system. The data come from the developers logs.
Time Spent Developing the ITS System, by

Development Task
(in developer hours)
Team / Tool Used

Development Task /
ITS System Component J2EE / WSAD
Customer Service 13
Application
Work Order Processing 46

Application
System-Wide Development 33
Tasks
Subtotal 92
Product Installs n/a
Technician Mobile Device n/a

Application
Overall total n/a
Given how easily two developers working closely together can move quickly among several
tasks, one should not read too much precision into the breakdown of these numbers by
development task. Nevertheless, some interesting conclusions emerge:
8.1.2 .NET vs. RRD

The .NET team developed the entire system about twice as fast as the J2EE did team using
RRD. This greater speed applied across all components.
One of the greatest differences was for product installation. This is not surprising, since several
key server-side .NET components were already present as part of the base installation of
Windows Server 2003:
Internet Information Services (IIS), the Web server

Network Load Balancing (NLB)
Microsoft Message Queue (MSMQ), the message server
3
The corresponding components on the WebSphere side IBM HTTP Server , Edge Server and
WebSphere MQ Server had to be installed separately. So, of course, did the WebSphere
Application Server itself, on both the development and production machines.
Another significant difference was in developing the Mobile Device piece, where the J2EE team
ran into some roadblocks. (See Section 8.2.3.5 for details.)
3
As noted elsewhere, the base Linux installation included an installation of the Apache Web server, but the team chose to use IBMs version
instead.
Page 36 of 109
Even within the core development (the Customer Service and Work Order applications), the
.NET team was more productive. Much of the explanation may lie in the simple fact that Visual
Studio.NET is the dominant.NET tool, and a developer who has worked in .NET for 3 years has
probably worked on VS.NET most or all of that time. In the J2EE world, by contrast, RRD is
one of many tools, and a comparatively new one at that. The .NET team was undoubtedly
more experienced with their tool than the J2EE team with theirs.
Another factor may be the differing approaches taken by the two tools. VS.NET is more
comparable to WSAD than to RRD: a development environment that connects you directly and
explicitly to the platform on which you are developing. RRD, on the other hand, is marketed as
a rapid development tool that accelerates the development process via its model-driven
approach. RRD distances the developer from J2EE, and the team found that it simplified some
tasks but complicated others where low-level code access would have been desirable. In a
wide-ranging development project like ITS, RRDs weaknesses may have outweighed its
particular strengths.
Although the teams did not track the time spent performing different types of tasks (such as
designing a Web page vs . coding database access logic), some inferences are possible. Both
RRD and Visual Studio provide excellent GUI design tools and the ability to bind data objects to
fields in a page. It is likely that the two tools offered much more similar productivity in this area,
and that the greatest differences lay in other aspects of application development, such as
coding the Customer Service logic to create and manage new work tickets in memory.
8.1.3 .NET vs. WSAD

The .NET implementation (excluding product installation and the Mobile Device application)
took approximately 10% less time to develop than the WSAD implementation (although, as
noted, productivity for the WSAD implementation benefited from the fact the team had already
built the same application using RRD). Though the totals are similar, the distributions differ.
The J2EE team spent much more time writing common code and much less time on tasks
specific to the Customer Service application. This was also true of the Work Order Web
application, though the total for that item in the table also includes most of the work on
messaging.
The main reason for the higher total under common is that the J2EE team developed
frameworks for the Web, business logic and persistence tiers. For example, their custom built
base servlet class provided much of the functionality needed by all the servlets in the two Web
applications. This design reduced the time spent developing individual use cases, while
increased the proportion of time spent on common tasks.
8.2 RRD Development Process

This section describes the J2EE teams experience developing the RRD implementation of ITS.
8.2.1 Architecture Summary

8.2.1.1 RRD Applications
The J2EE team divided the ITS system up into three RRD applications:
The Customer Service application, including the Customer Service Web interface and
message production.
The Work Order console application, which includes message consumption and the Web
service used by the Customer Service application. When RRD builds this application, the
Web service is packaged as a separate EAR file and must be installed separately.
The Work Order web application, which stands by itself.
Page 37 of 109
8.2.1.2 Database Access
On the back end, the team chose to forego stored procedures and stick with explicit SQL
through JDBC. The fact that RRD would generate JDBC logic automatically weighed against
writing stored procedures. The team knew from prior experience that, for single database
actions, a prepared statement invoked via JDBC performs at least as well as a stored
procedure. So they expected that RRDs generated logic would suffice for basic CRUD
operations (which covered most cases).
There were cases, however, where they needed to customize that logic. For example:
Work ticket searches. Both the Work Order Web application and the Web service used
by the Customer Service Web application allow ticket queries based on various
combinations of criteria. For example, the Work Order Web application allows queries by
any combination of customer ID, ticket creation date, work type, ticket status and technician
assignment. The developers had to write code that determined which search criteria were
used and constructed a custom SQL statement that uses only those criteria.
Customer search. The Work Order Web application has a customer query function based
on partial match of customer name. For every customer found it returns the number of
tickets in each of three ticket status categories (created, in progress, completed). By
default, RRDs generated code would have separately counted tickets in each category for
each customer, in other words three additional SQL actions per row of customer data
returned. Developer B reduced that number to one action per customer by using a custom
4
SQL statement with a GROUP BY clause to get all three counts in one action.
Later, during the tuning phase, the team discovered that some of the RRD-generated code
performed poorly. In response they added other JDBC optimizations. See Section 10.4.9 for
specifics.
8.2.1.3 Overall Shape of the Code

RRD lets you choose from a variety of code generation patterns that will produce different code
from the same design. The team made its choices based on past experience with development
projects as well as performance research.
Persistence tier. RRD lets you choose EJB entity beans or plain ordinary Java objects
(POJOs) for database operations. The team decided to avoid the overhead of entity beans
and went with POJOs.
Business tier. RRD offers session beans and POJOs. Again the team chose the latter
because of its lower overhead. One exception was the code for message production in the
Customer Service application; RRD wraps that code in stateless session beans.
Web tier. For Web pages, RRD offers JSPs or ordinary servlets. Since neither would be
edited directly, the choice was to a large degree arbitrary. The team chose straight
servlets. Note also that RRD has its own Web application framework, so the use of an
external MVC like Struts was not considered.
Message consumption. EJB message-driven beans (MDBs) have long been the
accepted technique for consuming JMS messages within an application server. They are
simple and lightweight, and RRD generates them by default. The team did not deviate
from that choice.
4
Here is the exact SQL statement:
SELECT count(ticketid) FROM worktickets WHERE customerid = ? GROUP BY ticketstatus ORDER BY ticketstatus
Page 38 of 109
8.2.1.4 Distributed Transactions
Those database actions stemming from JMS message processing required a distributed
transaction to span the database update and the JMS action. Both sides of the system
Customer Service and Work Order had such requirements.
Because distributed transactions require a two-phase commit, they are slower than one-phase
transactions within a single database. So the team did not want to use distributed transactions
for all database actions. Instead they set up two JDBC data sources to each database: one
using an ordinary driver for simple transactions, the other using an XA -capable driver for
distributed transactions.
8.2.2 What Went Well

8.2.2.1 Web Interfaces
The team split up the work of developing the two Web interfaces. Developer A took on the
Customer Service Web application, as well as building some generic login and page security
functionality. Meanwhile Developer B tackled the Work Order Web application, but first setting
up style sheets and look and feel for the applications.
This part went very smoothly. RRDs facility for building Web pages and linking them to data
structures are two of its strong points. By the end of Day 3 (where Day 1 was devoted mostly
to installing software), the team had completed much of the simple logic linking the Web
interfaces to database actions.
8.2.2.2 Web Service Integration

Part of this process went smoothly. Developer A discovered that RRD only supported scalar
types in a message; no schema, no complex data. He found it at least a little restrictive for real
applications. In the face of that restriction he decided the best approach was to have the Web
service return XML strings for its results, and hand code the logic to generate and parse those
XML strings. This took time, but the coding was straightforward.
Other aspects of the Web service piece caused confusion and loss of time. See Section
8.2.3.2 below.
8.2.3 Significant Technical Roadblocks

8.2.3.1 Holding Data in Sessions
In the Customer Service application a user can create one or more work tickets (work order
requests), then submit them to the system. The specification requires that the tickets be held in
memory before submission; the obvious place to hold them is in the clients HTTP session, and
the standard way to do that would be as instances of some work ticket Data Transfer Object
(DTO) class.
Here, however, Developer A discovered one of RRDs limitations. Because of the way it
organizes its generated code, RRD does not lend itself to the standard solution. RRD
organizes all the generated code for a given page in a page-specific package. This includes
page specific classes representing the data structures used by that page. In other words, if two
pages both use the WorkTicket class from the class model, RRD generates a different
WorkTicket Java class for each page, each in a different package. This means that if Page 1
creates an instance of its WorkTicket class and places it in a session, Page 2 cannot use it as
an instance of its WorkTicket class.
Developer A used RRDs preferred solution to this problem: store the data in session in XML
form. He used RRDs features to define an XML data structure and map it to classes in the
Page 39 of 109
5
model. The generated code uses a DOM API to parse the XML. This solution is tedious, and
a bit rankling to the hard core J2EE developer. Nevertheless, it did work.
(Fast forward to Phase 2, when the team found all this to-ing and fro-ing between XML and
objects a major performance bottleneck. They ripped out the XML code and replaced it with
logic that used a custom DTO class.)
8.2.3.2 Web Service Integration

At one point Developer A ran into an anomalous situation with respect to the Web service on
the Work Order side. RRD was not building the Web service properly, and the reason was not
apparent. In his previous experience with RRD he had found that RRD did not generate an
IBM-specific deployment descriptor that seemed to be necessary.
In the past he had used another tool (WSAD) to generate the missing descriptor, so he did so
again. He created and built a simple Web service project simply to generate a descriptor.
When he included this descriptor in the RRD application, however, it did not deploy correctly.
It turned out that the initial failure was related to a bad state in RRD, possibly a source control
issue. One of the files was not open for writing, but RRD didn't tell him. So the Web service
implementation hadnt been saved properly and consequently didnt work.
Once he tracked down and fixed the file problem, RRD did successfully build and deploy the
Web service without the custom descriptor. At that point Developer A took yes for an answer
and moved on without investigating the anomaly. But the detour cost him several hours.
8.2.3.3 Configuring and Using WebSphere MQ

Developer A, responsible for the Customer Service application, also worked out how to get
RRD to talk to WebSphere MQ on the development machine. Despite the fact that all the
software products involved are IBMs, this process was not as simple as it could be. It took
some time to work out the correct permutation of settings to get them all to work. Once he had
built the message senders and receivers in RRD, putting data into them and getting it out was
fairly straightforward (using XML).
Later on, when the team set up its production environment, MQ again gave Developer A a
headache. This while he was setting it up in a remote fashion (one MQ installation that served
three applications residing on different servers). The process was complicated by the
undocumented fact that IBM WebSpheres MQ installation did not use the default port of 1414
6
for MQ. Rather it used 5558, not at all obvious.
8.2.3.4 Handling Null Strings in Oracle

The team discovered that Oracle treats empty strings in updates as null. This caused errors
since most fields in the ITS schemas dont allow nulls. The team compensated through code in
two ways:
Many string attributes required non-empty values. This requirement was enforced in the
input forms.
For non-required attributes, the developer set the initial value to a space character.
8.2.3.5 Building the Handheld Module

Although, as noted earlier, the team used a different tool (Sun One Studio ME) to build the
handheld application, they did involve RRD in the process. Given several ways for the
5
Document Object Model, an API that treats an XML document as a tree of objects.
6
The developer discovered this by getting a complete process list using the ps command (ps efl). He noticed an entry strmqmlsr that had a
port setting, tried this number and it worked.
Page 40 of 109
handheld application to communicate with the Work Order application, they chose to use a
Web service. Using RRD, Developer B added a Web service interface to the Work Order Web
application, defining the five remote operations needed by the handheld application:
Login (logout doesnt require a remote call)

View work orders for this technician for a certain status
View an individual work ticket
Mark a ticket as started
Submit time spent and mark ticket completed
Meanwhile Developer A ran a stub generator in the J2ME wireless toolkit in Sun One Studio to
create the Web service client.
This should have been easy, but it wasnt. It turned out that the stub generator supported only
document/literal Web services, whereas RRD only supported rpc/encoded. No manner of
tweaking the WSDL would make the two talk to each other.
But they had all the logic ready to use in EJB methods. They needed a way to allow the PDA to
execute them. Luckily the wireless toolkit also had a wizard for converting a service (basically a
simple Java class with some methods in it) into a servlet and client piece using HTTP post and
simple data (not web services). So they butchered the EJBs generated by RRD into POJOs,
delegated to their methods from the service class and ran the wizard. With that, they had the
two halves talking to each other.
The final step was to build the front end MIDLet in J2ME. Luckily this step was very easy and
took only a couple of hours.
8.2.3.6 Miscellaneous RRD Headaches

Along the way RRD posed a variety of smaller challenges. Among the more interesting:
Inability to centralize common page logic. Because RRD does not let you work directly with
JSPs and because the ITS specification prohibited use of frames in the pages, the team could
not easily centralize page logic that was common to most or all pages. This logic included the
navigation bar (links to other pages) and page authentication logic to verify that the user is
logged in before displaying the page. In RRD this logic had to be copied to every page. A
tedious but finite process, it would have been much worse had the number of pages been
significantly greater.
False error when building an EAR file for WebSphere. Developer A discovered a glitch in
RRDs build process for WebSphere. The build script that RRD creates makes an EAR file
under <websphere home>/RationalRDApps/<application name>. If there are no JARs in
the application (EJB JARs or custom libraries), the script returns an error code and RRD
aborts. This is true even if there need not be any jars. Developer A worked around it by drop
any old jar in the folder (such as Oracles classes12.jar) to avoid the false error code.
Placement of the GlobalObject class. For each application, RRD generates a GlobalObject
class that contains any global functionality you define. Although the class is application
specific, RRD does not package it in the resulting application EAR file. Rather it is treated like
an external library: It must be placed in the servers class path. This means the team had to
bounce WebSphere whenever the class changed. Also, since they used WebSpheres admin
console rather than RRD to deploy to the production servers, they had to manually copy the
class to its proper destination. It cost them some time figuring out where the class belonged
and ensuring it was properly updated.
Date handling in a Web page. Developer B discovered an apparent bug in RRDs handling of
date fields in a page: After constructing a page with date fields tied to date attributes of an
object in the model, he enter a date using the correct format, then print its value to standard
Page 41 of 109
error. The printed value is one day behind the entered value. He wrote a simple global method
to increment a date value and compensate for the discrepancy.
Inconsistencies regarding editing and source control. RRD works seamlessly with Visual
Source Safe; you can easily check files out and in of VSS from within RRD. And for the most
part, RRD is smart about preventing you from editing files that you have not checked out.
But Developer B found gaps in this intelligence, leading to wasted time. For example. when he
started to add a session attribute to a project, he was able to define the attribute, enter its
name and set its initial value, before realizing he had not checked out the source file where the
attribute would be stored. But without checking out that file he couldnt save his work. And
checking out the file caused him to lose his work.
Adding a static HTML page to a Web application. At one point Developer B wanted to add a
static HTML page to the Customer Service Web application. This turns out not to be easy at all
for RRD. RRD has no facility for directly adding an actual HTML file to the web app. Instead it
has a way to let you take snapshots of pages that change little or not at all. You designate
the page as static in the page properties, then you have to go through a two-step construction
process. Developer B created a dummy page and tried this, but quickly got bogged down in
the details. So he gave up and instead simply dropped an HTML file in a folder that was
included in the WAR build. That did the trick.
8.3 WSAD Development Process
8.3.1 Architecture Summary

For this implementation the team decided to keep things simple and straightforward. They
used only standard J2EE APIs according to established best practices.
8.3.1.1 Overall Shape of the Code

The team chose an architectural framework based on lessons learned from prior development
experience and performance research. Heres a tier-specific breakdown:
Database access. As with the RRD implementation, again the team avoided putting database
logic in the database itself via stored procedures, favoring explicit SQL statements in the Java
code. In this more standard J2EE environment, where they would have to write the database
logic either way, several factors led to this conclusion:
For a single database action, a prepared statement invoked via JDBC performs at least as
well as a stored procedure.
Prepared statements are easier to write than PL/SQL stored procedures.
Having the SQL in the application code rather than in the database simplifies code
maintenance.
Along the way the team discovered that this choice gave them even greater flexibility than first
thought: They could invoke complete PL/SQL statements as JDBC prepared statements, letting
them combine multiple database actions into one. More on this below under Section 10.5.4,
Optimizing Queries.
The persistence tier. Having chosen persistence logic within the application itself, the team
then had to choose how to organize that logic, whether in EJB entity beans, another O/R
mapping layer such as Hibernate or JDO, or POJOs with straight JDBC. Again, from past
experience they decide that entity beans were too expensive. They considered Hibernate, but
rejected it; it might provide a significant productivity gain only if the entity model were extensive,
and the team wasnt certain it would scale. So instead they settled on POJOs.
Page 42 of 109
To centralize all common aspects of JDBC operations, the team created a simple framework
based on the Command pattern. A central JdbcHelper class was responsible for getting a
connection, executing a statement, getting and returning the results of that statement (if the
SQL was a query), and handling errors. It obtained the SQL statement and the actual query
results from a case-specific callback object (the command object).
The callback classes themselves were organized as inner classes of entity-specific JDBC
helper classes, such as CustomerJdbcHelper for customer activity. Each class had methods
representing individual database actions, such as updateCustomer(). Each method would
7
create or obtain the proper callback object and invoke the central JdbcHelper logic to execute
the database action.
This framework proved very effective in minimizing repetitive code, reducing bugs and easing
maintenance.
The business tier. For the business faade the team considered EJB session beans but
rejected it as too expensive. Instead they created a simple framework based on the Faade
pattern. Each application had a single faade class with stateless methods representing every
action required by that applications front end. Most of these methods simply called a method
of the appropriate JDBC helper class. Each faade class also created a single instance of itself
of each JDBC helper class it used to eliminate unnecessary object creation.
The Web tier. The team used JSPs for Web pages and servlets to tie those pages to business
logic. They decided against using an MVC framework (Struts was the prime candidate)
because it would add runtime overhead. Nevertheless, they did create a simple controller
servlet that reproduced some of Struts conveniences:
form field validation

message handling (informational and error)
request forwarding
This servlet became the ancestor class to all servlets in the two ITS Web applications.
The team also did not use any custom tag libraries. This choice meant that their JSPs
contained significant amounts of Java code. For a comparatively short project such as this, the
additional code was acceptable; but in a longer, more enduring project the team would probably
have refactored their JSPs to use Struts or JSTL libraries.
Message consumption. As with the RRD implementation, the team saw no reason to deviate
from using EJB message-driven beans (MDBs).

To handle distributed transactions, the team used the same solution with the WSAD
implementation as they did with RRD: They set up two JDBC data sources to each database:
one using an ordinary driver for simple transactions, the other using an XA -capable driver for
distributed transactions. See Section 8.2.1.4 for details.
8.3.1.3 Organization of Applications in WSAD

WSAD proved very flexible in letting the team organize their code into projects and
applications. The team organized their work into ten projects comprising three applications and
two external libraries:
7
Callback classes representing actions with no parameters (such as get all customers) could be treated as singletons.
Page 43 of 109
Project Produced Description
itsCustServ itsCustServ.ear Umbrella project for Customer Service Web
app
+ itsCustServWeb itsCustServWeb.war Web app for Customer Service
itsWorkOrder itsWorkOrder.ear Umbrella project for Work Order Web app
+ itsWorkOrderWeb itsWorkOrderWeb.war Web app for Work Order
itsWorkOrderConsole itsWorkOrderConsole.ear Non-Web functionality on Work Order side
+ itsWorkOrderConsoleMdbs itsWorkOrderConsoleMdbs.jar Message beans to process customer and

ticket messages
+ itsWorkOrderConsoleWeb itsWorkOrderConsoleWeb.war Web front end for Web service
+ itsWorkOrderConsoleCommon itsWorkOrderConsoleCommon.jar Common logic for message beans and Web

service
common common.jar External library containing generic logic,

e.g. JdbcHelper class
itsCommon itsCommon.jar External library containing ITS-wide logic,

e.g. DTO classes

Some aspects of development (especially page design and coding) were more tedious in
WSAD than in RRD. Still, overall WSAD proved easier to use than RRD, for two reasons:
It was a straightforward tool that facilitated access to J2EE rather than hiding it. This
appealed to the J2EE developers on the team.
It was dedicated to WebSphere.
Several aspects of working with WSAD were particularly easy:
8.3.2.1 Navigating the IDE

While far from simple, WSAD seems to be a better organized IDE than RRD. The flow from
one type of task to another is simpler and easier. WSADs navigational structure is flatter and
allows you to maintain parallel threads of activity simultaneously, such as editing source code,
setting project properties and configuring servers. In RRD, often pursuing one thread of activity
takes you to a place from which it is more difficult to return to your starting point.
8.3.2.2 Building for Deployment

Rebuilding applications in RRD often took many minutes. WSADs build process was much
faster. Given how often the process took place, this small savings per build added up to
significant time savings over the life of the project.
8.3.2.3 Testing in WebSphere

WSADs lightweight WebSphere test environment proved extremely useful. It started and
stopped much more quickly than WebSphere running on the local development machines and
worked seamlessly with the WSAD development environment.
8.3.2.4 Common Logic in JSPs

Because the team was creating explicit JSPs for this implementation, they could take
advantage of the JSP @include directive to put common logic in a central JSP and import into
Page 44 of 109
every page that needed it. The team used this technique for two common aspects of their
pages:
The page header, including the navigation bar (links to other pages)
The page authentication logic to verify that the user is logged in before displaying the page
The only tricky part to using @include was the declaration and use of variables in scriptlets. A
variable used in any JSP had to be declared in the same JSP. This restriction complicated the
design a bit but not significantly.
8.3.3 Signficant Technical Roadblocks

8.3.3.1 XA Recovery Errors from Server
At one point in development testing, Developer B got an unusual error from the WebSphere
test server. The error had a long output trace that pointed to one of the XA datasources and
began with
The transaction service encountered an error on an xa_recover operation.
The error appeared shortly after a test of a distributed transaction had failed and the server was
bounced. It seemed the server was trying unsuccessfully to recover from the failure. The
problem was that the server tried again every minute or so, pumping output into the stdout log.
While not a show-stopper, this problem was annoying, especially because it later also
appeared in the live WebSphere instances. Bouncing WebSphere and Oracle had no effect,
and the team could not find any WebSphere configuration settings that helped. Moreover, the
problem had not appeared during the RRD round.
Eventually a Google search found the answer at an IBM devWorks forum: The datasource user
must have SELECT rights to the Oracle table PUBLIC.DBA_PENDING_TRANSACTIONS. In
the RRD round the team had set up the datasources to log in as SYSTEM, which had that right.
In the WSAD round, they configured the datasource to log in as ITSUSER (the ITS schema
8
user), which didnt. When they granted that right to ITSUSER and restarted the servers, the
problem disappeared.
8.3.3.2 Miscellaneous WSAD Headaches

Along the way WSAD posed a handful of smaller challenges. Among the more interesting:
TODOs in JSPs. WSAD has a nice facility for marking to do tasks in your Java code. It
displays any comment beginning with // TODO in a special To Do list and lets you easily
jump to that comment from the list. However, this facility does not work for comments in
JSPs.
Code completion inside JSPs. WSAD has a nice code completion tool, but it is very slow
for Java code inside JSP scriptlets.
Copying a servlet. One developer discovered that when he created a new servlet by
copying an existing one, WSAD did not automatically add the new servlet to the web
deployment descriptor; because of that he got page not found errors when invoking that
servlet. The Web descriptor editor has a Servlets page that should have allowed him to
add existing servlets to the list, but it didnt point to the right, and the developer didnt see
how to make it do so. So he had to manually edit the descriptor source.
Sharing source code. Because the team chose not to use source control software, they
shared code by zipping up their workspace. After a couple of false starts they learned
which files not to share.
8
Why the change? Having the application log in as SYSTEM meant that at least some SQL table references had to be qualified with the schema
name. Logging in as ITSUSER eliminated that problem.
Page 45 of 109
8.4 Microsoft .NET Development Process
8.4.1 .NET Architecture Summary

The .NET team used a three-tier architecture to implement the ITS system. They used
ASP.NET for the Web application components, with C# code behind the ASP.NET Web Forms.
The backend processing logic was separated into two distinct layers: business logic and data
access. This fully architecture isolated the UI and business tiers from the backend data access
layer so that a different database could be used without changing any business logic or UI
code.
The team spent approximately 50% of their development time writing new code, 25% modifying
code to correct misinterpretations of the specification, 10% creating the overall design, and
15% performing unit and system tests. Since the machines were speedy, build times were
insignificant.
The team used model classes to map objects to relational data, and also used the publicly
available Data Access Application Block (DAAB) to simplify their data access code. DAAB
provides pre-built libraries for ADO.NET and is available on MSDN for both SQL Server and
Oracle backends. More on DAAB below in Section 8.4.1.2.
This diagram shows the software architecture of the Work Order application. The of the
Customer Service and Technician Mobile Device applications had very similar structures,
except that the former lacked message processing and the latter had neither message
processing nor queue access.
Figure 3. Architecture of the .NET implementation of the Work Order application
8.4.1.1 Organization of .NET Applications

The .NET team divided the ITS system into three Visual Studio.NET projects:
Page 46 of 109
The Customer Service project included the Customer Service Web application with its
associated business and data logic. This application exposed the Web UI and produced
customer update and work order messages.
The Work Order project included the Work Order Web application and the Message
Forwarder and Processor console applications, each with its associated business and data
logic. The Web application exposed the Web UI as well as the Web service consumed by
the Customer Service Web application. The Message Forwarder and Processor console
applications worked together to process the customer update and work order messages.
The Technician Mobile Device project included the Pocket PC Windows Forms
application with its associated business and data logic. This application provided the UI for
technicians to process work orders. It connected directly to the Work Order database via
wireless networking.
8.4.1.2 Database Access

Stored Procedures
The .NET team chose to use stored procedures even though all of the database actions were
little more than CRUD operations. Doing so afforded a level of encapsulation and an interface
that allowed the underlying database operations to change if necessary without adversely
affecting the data access logic in the middle tier code. From a manageability standpoint, this
arrangement has certain advantages, as at least some level of query changes are possible
without having to modify or re-deploy any middle tier logic.
At the same time, the .NET team did not limit themselves to stored procedures. Along the way
they found that, for some operations, putting the SQL statements in the data access logic
markedly improved performance. The most notable example was work ticket queries. Since
the specification required the queries to employ several independent but optional search
criteria, the team coded their middle tier to construct a SQL query for such criteria and send it
as a batch to the database.
Data Access Application Block (DAAB)
Although each .NET application had a data access layer, the custom classes in those layers did
not directly invoke the .NET frameworks data access classes (those found in the
System.Data.SqlClient namespace). Instead, the .NET team chose to use a freely available
application block, the Data Access Application Block (DAAB), created by Microsoft and
publicly available for use with both SQL Server and Oracle. DAAB comes in the form of a
single code file containing a few utility classes that encapsulate the most common data access
operations.
The main DAAB utility class, SqlHelper, contains methods to return scalar values or result sets
in the form of a SqlDataReader or DataSet. It also contains methods to execute SQL batches
that have no return value, such as INSERT or UPDATE statements. All SqlHelper methods
have several overloads that take a variety of parameters, such as either a SqlConnection object
or a connection string. In most cases, the .NET team used the overloads that take a
connection string as well as the stored procedure name and a variable list of arguments in
varargs style. This approach reduced much of the data access code down to a single
statement. For instance, the Update method of the CustomerDAO class was very brief:
Page 47 of 109
public static void Update(Customer customer) {
SqlHelper.ExecuteNonQuery(connectionString, "UpdateCustomer",
customer.CustomerId, customer.CompanyName, customer.Address1,
customer.Address2, customer.City, customer.State,
customer.Zip, customer.ContactFirstName,
customer.ContactLastName, customer.ContactEmail,
customer.ContactPhone, customer.ContactFax,
customer.MasterAccountCode);
}

The .NET team managed transactions with the ServiceDomain class instead of the COM+
Catalog. From previous experience, they knew that transactions would perform at least slightly
faster and always be easier to manage with the ServiceDomain class than COM+ Catalog
components (which can also be created via .NET).
However, since the ServiceDomain class does not work on Windows XP (the OS of the
development machines), the team could not test transactional behavior locally. This limitation
did not turn out to be a problem since the code still compiled correctly; the .NET team protected
the transaction-specific code with preprocessor directives (e.g. #if DEBUG). Thus, they were
able to perform functional tests on their development machines and specification conformance
tests (including transactional behavior) on the target machines, which were running Windows
2003 Server.
The .NET team experienced some configuration problems. One was with the Distributed
Transaction Coordinator that .NET uses to manage transactions. They suspected the problem
was due to re-naming the server. They solved this problem without affecting their development
time.
8.4.1.4 ASP.NET Session State

The Customer Service subsystem requires clustering and load balancing to achieve reliability
and scalability. Since the ITS specification required seamless failover with no loss of session
state, the architecture had to include a way to preserve that session state.
.NET offered the team two solutions: storing session state in a database or using ASP.NET
Session State Services. Since the specification did not require that state persist longer than
the session timeout for users (15 minutes), and believing it would perform better, the team
decided to use ASP.NET Session State Services.
ASP.NET Session State service is an out -of-process service that does not depend on any other
processes, so that developers can begin using session state in server farm environments
without worrying about the requests dependency. Once installed and enabled on the reliable
server, this service satisfied the ITS specification for preserving session state in a clustered
environment.

Overall, the .NET teams development experience was normal: some parts were easy while
others were more difficult or revealed unexpected problems. Error handling, error logging,
custom event logging, and project deployment were a few of the easier issues that the .NET
team encountered. Paging, Web service return values, and MSMQ were some of the areas in
which they spent more time creating solutions.
8.4.3 Significant Technical Roadblocks

8.4.3.1 Transactional MSMQ Remote Read
The ITS specification required that messages sent to the Work Order application to create a
work ticket or update a customer would stay in the queue until processed successfully. More
Page 48 of 109
specifically, the application would start a transaction before reading and processing a message.
If the message processing succeeded, the transaction would be committed and the message
would be removed from the queue; if it failed, the transaction would be rolled back and the
message would remain on the queue.
This requirement led to the single most costly problem the .NET team experienced. The issue
stemmed from the way MSMQ handles distributed transactions. Currently, MSMQ supports
transactionally sending messages to, but not reading messages from, a remote queue. In other
words, the Work Order application residing on the Work Order host could not read a message
from the queue residing on the MQ host within a transaction.
The team discovered this issue about two-thirds of the way through the development phase
when they tried a system-level test involving work ticket processing. Why did they not see it
sooner? As described above in Section 8.4.1.3, the team had chosen to manage distributed
(DTC or Distributed Transaction Coordinator) transactions via the ServiceDomain class, which
is not available in Windows XP. So they could not test this functionality on the development
machines.
It took about half a day to diagnose the problem and a full day to implement a workaround. The
solution, a read request queue, is described in the MSDN article Transactional Read-response
Applications, found at http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/msmq/msmq_about_transactions_05wz.asp. It lays out the following architecture:
Figure 4. Architecture for a transactional read -response application using MSMQ
This architecture has these key elements:
Each receiving (message processing) application has a local queue to hold messages it will
process.
When a receiving application wants to process a request, it sends a read-request message
to a read-request queue colocated with the input queue (the queue to which the sending
applications send their original messages).
A separate application (the read-response application) monitors the input and read-request
queues, which are local to it. In a single transaction it reads a read-request message,
obtains from it the target receiving application, reads a message from the input queue and
Page 49 of 109
forwards it to the target. When the forwarded message appears in the targets local queue,
the target receiving application handles the request.
This solution gets around the MSMQ barrier the .NET team encountered because all messages
are read from queues local to the reader; there are no remote reads at all, let alone any that
must be transactional.
To implement this solution, the .NET team had to create
two read-request queues on the MQ host (where the original input queue resided), one
each for customer updates and new tickets
two corresponding local queues on the Work Order host
a read-response application
The team also noted what it considered a non-standard requirement of the ITS specification. In
a production system, if a message consumer could not process a message properly, it would
probably remove the message from the main queue and place it into a secondary failed queue
for management reasons. However, the ITS specification did not allow this option; it required
9
the failed message to remain in the main queue. One adverse consequence to this
requirement was that a corrupt message would continually be re-processed over and over
again.
8.4.4 Miscellaneous .NET Headaches

8.4.4.1 DataGrid Paging
One of the ITS specification requirements called for paging the display when a query returned
more than ten rows of data. The ASP.NET DataGrid class has a built-in paging feature that is
easy to configure; with some property settings and minimal code (one event handler containing
about two lines), it even provides its own Previous and Next links.
However, the specification also called for the Previous and Next buttons to be implemented
as INPUT tags (to work with the Mercury LoadRunner scripts) and named cmdPrevious and
cmdNext, respectively, in the HTML. While these requirements prevented the .NET team from
using all of the DataGrids paging features, they were still able to utilize the auto-paging feature
and write additional code to handle the buttons' events.
Since the specification also forbade client-side caching of query results, the .NET team
disabled view state on Web pages that contained a DataGrid. This change had the added
benefit of reducing the HTML page size, since the view state for a DataGrid can be quite large.
Without view state, however, the team needed another mechanism to hold other, necessary
page state information. They kept track of some of the DataGrid properties (such as the current
page index) in a cookie to get the correct paging behavior.
Had the team been allowed to use the full range of DataGrid features, this particular part of the
development phase would have taken only several minutes. Instead, with all of the custom
code necessary to implement the ITS paging requirements, it consumed a few hours.
8.4.4.2 Web Services Returning DataSets

During the development phase, the .NET team decided to have the ticket search Web service
return a DataSet object. This solution, rather than returning a custom object collection, was the
easiest way to return data in a form that could be bound directly to an ASP.NET DataGrid,
especially since the ITS specification required the application to sort the search result on a
specific column.
9
The reason was simplicity. The ITS specification was designed to include integration technologies such as messaging in the project yet still be
simple enough that a team could develop the ITS system in four developer-weeks.
Page 50 of 109
However, during the tuning phase the .NET team found they could increase performance by
switching to a custom object collection. This change required a more significant coding effort,
since they had to create custom classes not only on the Web service side but also on the client
side. Since the classes auto-generated by Visual Studios Web Reference mechanism expose
only public member variables instead of the public properties required for DataGrid data
binding, the .NET team had to create wrapper classes to expose those fields as properties.
They also had to implement the IComparer interface to get the proper sorting in the DataGrid.
8.4.4.3 The Mobile Application

As part of the .NET teams investigation, they created a simple Pocket PC application that had
a single form containing a data grid filled by a query to a SQL database. While, that simple
application worked fine, when they began to develop the ITS mobile application, compilation of
Microsofts Data Access Application Block (the DAAB implemented in SqlHelper.cs) failed.
When they did a quick retest in a desktop Windows Forms application, the failure did not occur;
it happened only in a Compact Framework project.
It turned out they were using version 1.0 of the DAAB in their initial mobile test project, but
version 2.0 in the Technician Mobile Device project. They did not want to use version 1.0 since
2.0 contains several improvements, but for some reason the Compact Framework build was
unable to find the IDisposable interface of the SqlDataAdapter class. They were able to
enhance version 2.0 of the DAAB to work successfully with the Compact Framework by
creating a custom SqlDataAdapter class to wrap the System.Data.SqlClient.SqlDataAdapter
class. Even with this issue, the total development time for the mobile application was only one
day.
8.4.4.4 Model Object Class Creation

Although Visual Studio is a great tool for creating UIs for all kinds of applications (Web,
Windows, and mobile) and provides assistance when writing code (e.g. IntelliSense and the
Class Browser), it lacks tools to easily create model object classes (entities). The .NET team
had to create all such classes manually. This process was rather tedious since most of the
model object classes had a similar structure: private member variables, public properties
(getters and some setters) to expose them, and one or more constructors to initialize them.
About half of the classes were rather small, but others had as many as a dozen attributes.
Although the time taken to create each class was not very significant, it was still minutes
instead of seconds. Copying and pasting some parts of the code that was common across most
of the classes mitigated some of the effort, but that process introduced the risk of errors.
Fortunately, Visual Studio parses the code while editing, effectively performing a syntax check
without compiling. Compilation was quick, in any case.
Page 51 of 109
9 CONFIGURATION AND TUNING RESULTS
The development teams were initially allotted up to two weeks to tune and configure the system
in preparation for performance, manageability and reliability testing in the final week of the
project. However, each team was allowed additional time if they required it in order to
successfully prepare for the tests and ensure the application was performing properly. The
time spent configuring and tuning the production environment was tracked for comparative
purposes.
The teams used Mercury LoadRunner to simulate load for performance tuning. Each
development team was responsible for their own database indexing, database tuning, and
application server tuning. Changes to code were allowed during this phase if required to make
the application perform more efficiently under load.
The following table shows the amount of time taken by each team to tune each application to
perform as efficiently as possible prior to testing. The data is from the auditors report:
Time Spent Configuring and Tuning
Team / Platform / Implementation Man-Days*
Originally scheduled 20
J2EE / WebSphere / RRD 76
J2EE / WebSphere / WSAD 24**
.NET / .NET / .NET 16

* A man-day is defined as an eight -hour day per individual for all implementations except RDD,
in which case hours per day may vary per individual and be slightly higher than eight.
** Note that the base install, tuning and configuration process from the RRD implementation
had at least some carry-over to the WSAD implementation, reducing to some extent the time
needed for tuning/configuring the WSAD implementation.
The J2EE team clearly went well beyond the two-week time frame when tuning the RRD
implementation. There are several reasons for this, detailed in Section 10:
The RRD code required extensive reworking.

The team experienced significant problems getting some of the WebSphere infrastructure
to work properly, most notably Edge Server (the load-balancing component) and in-memory
session replication.
J2EE systems have many more tunable parts than do .NET systems. This means not only
more knobs to be turned, but more ways in which tuning one part affects another. Basic
J2EE system tuning (JVM, Web server, application server) takes a long time, as it did in
this case.
In the WSAD round, the J2EE team built upon the solutions from the previous round, focusing
their efforts on tuning the WSAD code and improving failover. Again see Section 10 for details.
The .NET team, on the other hand, completed their tuning early. As the auditors report states,
they were
able to tune and configure their implementation in 8 days, less than the 10 days allotted.
Vertigo Software could have used the entire 10 days allotted for this phase; they chose to
consider this phase completed after 8 days.
Page 52 of 109
Again, as noted above, .NET has many fewer knobs to turn than WebSphere. The team did
not, for example, have an equivalent of tuning the JVM. See Section 11 for details on the .NET
teams experience.
Page 53 of 109
10 WEBSPHERE CONFIGURAT ION AND TUNING PROCESS SUMMARY
This section describes the process the J2EE team went through to configure and tune the basic
WebSphere infrastructure. It also describes the major bottlenecks encountered and resolved in
the two implementations.
Here is a high-level summary of the stages the team went through. Details follow in the
sections below.
For the RRD implementation:
1. Install the basic software: WebSphere Network Deployment, Edge Server, IBM HTTP
Server (IHS).
2. Configure the software for the ITS system.
3. Resolve code bottlenecks in the implementation.
4. Tune the system for performance.
For the WSAD implementation, the team did not repeat Stage 1 and did very little Stage 4
tuning. Most of their work in the WSAD round focused on three issues:
session sharing
failover
optimizing database queries
10.1 RRD Round: Installing Software
10.1.1 Starting Point

During the development phase the team had done a basic WebSphere installation to pass
functional tests on the target machines. This consisted of a standalone WebSphere instance
on each of the four servers: two for the Customer Service application, one for the Work Order
application, and one for the MQ server. The Customer Service and Work Order instances were
individually configured for the necessary JDBC and JMS resources:
2 JDBC datasources (one non-XA, one XA) to the appropriate database

1 JMS queue connection factory for the MQ server
2 JMS queues for customer and ticket messages
Additionally, the team had set up session sharing between the Customer Service instances
using a longstanding standard IBM technique: writing the session data to a database.
WebSphere makes this relatively easy to configure. The team used the Customer Service
database as the persistent store for sessions. (This technique worked for functional testing, but
later would prove unacceptably slow under load. The team would then replace it with in-
memory replication; more on this in Section 10.6.7.)
10.1.2 Installing WebSphere Network Deployment

Installing WebSphere Network Deployment was a lengthy but comparatively straightforward
process. After you install the Deployment Manager (the administrative server), you add nodes
to your federation. In doing so you can choose whether the configurations of the WebSphere
instances already installed on those nodes should be preserved. Initially the team did not do
Page 54 of 109
so, losing their resource configurations in the process. Realizing their mistake, they removed
the nodes and added them again in a way that preserved the configurations.
Another false start had to do with adding a node for the MQ servers WebSphere instance
running on the same host as the Deployment Manager. When the team did so, they discovered
two changes in the MQ situation:
How you start MQ changed. As mentioned earlier, you control MQ Series through
WebSpheres own JMS server. If the WebSphere stands alone, its JMS server is
embedded in the application server, and starting the latter starts the former (as well as
MQ). But when you add the WebSphere instance to the Network Deployment, the JMS
server is split out as a separate server and must be started separately.
The MQ configuration names (queue names, JNDI names, etc.) changed, and the
applications could no longer reach the server.
While the first change did not cause a problem, the second did. And since the team had no
compelling reason to include the MQ WebSphere in the federation, they removed it and
restored the status quo. (Ironically, much later the team decided they needed the session
sharing server to run on the MQ host, forcing them to add that node to the federation and
confront the MQ configuration change.)
Once the nodes were added, the team created a cluster for the two Customer Service servers.
10.1.3 Installing IBM HTTP Server

Next the team installed IBM HTTP Server (IHS). This installation went quickly and smoothly.
The only stumbling block came early on when Developer B tried to launch IHS using the
command:
apachectl start
Apache did start, but it wasnt IHS. It took a bit of head scratching to figure out that Linux had
its own Apache server already installed and placed in the system path. The two versions of
Apache were launched with the same command. Executing the command with a path qualifier
pointing to the IHS bin folder solved the problem.
10.1.4 Installing IBM Edge Server

The last piece of the IBM infrastructure was Edge Server, used to handle load balancing and
failover for the Customer Service cluster. Installing the software itself was easy.
Getting it to run properly was a different matter. In fact, Edge Server was directly or indirectly
responsible for the most significant challenges the team faced during this phase. See Section
10.6 for the gory details.
10.2 RRD Round: Configuring the System
10.2.1 Configuring JNDI

One cons equence of converting from standalone WebSphere to the Network Deployment
version is that WebSphere uses different ports for JNDI. Suddenly the applications were
throwing errors. It took one developer a day and a half to sort out the cause and the cure. See
Section 10.6.5 for details.
Page 55 of 109
10.2.2 Configuring the WebSphere Web Server Plugin
The Web Server Plugin is WebSpheres interface between Apache and the HTTP transport
embedded in the application server. The plugin consists of a native runtime library (already
installed with WebSphere) and an XML configuration file, plugin-cfg.xml. Generating that file
and putting it in place on the nodes can be done entirely through the Deployment Manager
console.
The WebSphere literature talks of installing the plugin, but its really a matter of configuring
Apache to use it. This configuration consists of modifying Apaches httpd.conf file to load the
plugin library and point to the plugins configuration file, plugin-cfg.xml. While it took the team a
few tries to get everything right, the procedure is straight forward and well documented.
10.3 RRD Round: Resolving Code Bottlenecks

This section describes changes the team made to the RRD implementation itself to improve
performance.
10.3.1 Rogue Threads

Initial tests of the RRD implementation were dismal, but the Work Order Web application
performed especially poorly. After running a while, WebSphere would actually hang (become
too busy to respond).
It took a long time to diagnose this problem, but finally the team did so using Borland
Optimizeits Thread Debugger. This tool shows you all thread activity in the JVM. It told the
team that the server was spawning many new threads from instances of a log4j called
FileWatchDog.. This class extends Thread and is used to check every now and then that a
certain file has not changed.
What was causing this? The symptom turned out to be RRDs debug settings; the Work Order
Web application had been deployed with debug settings turned on. The team redeployed it
with all debug output suppressed.
10.3.2 Optimizing Database Calls

Watching Oracles TopSQL utility told the team that RRD was performing database calls very
inefficiently in many places. There were two issues in particular:
The generated code often used plain statements instead of prepared statements.
For queries, the code performed a count(*) before doing the actual query, to see how many
rows would be returned. This of course doubled the number of actual database calls.
The team replaced RRDs code for all major queries with custom code that used prepared
statements. It also eliminated the count(*) calls, as these were completely unnecessary. This
coding work took considerable time but proved crucial to improving the applications
performance.
10.3.3 Optimizing the Web Service

The team found the Work Order Web service a major performance bottleneck. This service
queries work tickets for the Customer Service application and returns results as an XML string.
Even after optimizing the query logic, the Web service still responded slowly. So the team
focused on the fact that RRD generated code to wrap the service in a stateless session bean.
Every service call had to acquire a bean instance.
Page 56 of 109
The team tried replacing the session bean with POJOs. They did this by porting all the
generated RRD Java code into another tool (WSAD), then refactoring the bean class and the
logic that used it. This change improved performance on the entire test script by more than
10%.
Next they looked at the code that creates the XML response string from the query. The original
code created DTOs from the JDBC result set, then used the DOM API to construct XML from
the DTOs. The team refactored this code to create XML directly from the result set using a
simple StringBuffer. Eliminating DTOs and DOM (which is notoriously expensive) improved
overall performance by another 16%.
10.3.4 Paging Query Results

The ITS specification set the following rules for large queries:
Queries should return no more than 500 rows.

Results pages should display 10 rows per page.
Results cannot be cached; each new page should re-execute the query.
The teams initial RRD implementation did limit the overall query size, using Oracles maxrows
variable, such as:
SELECT * FROM WHERE maxrows <= 500
And in fact RRDs generated code performed some inherent paging.
But when they switched to custom queries, the team had to add custom paging logic. There
were two important parts to this logic.
First, limiting the query size: In other words, for Page n the query should return the lesser
of 500 and n * 10 + 1.
Second, creating data transfer objects (DTOs) for only the ten rows actually displayed.
So if the application performed asked for Page 3 of a query that could return 100 rows, the
query should only return 31 rows, and the application should create objects only for rows 21-30.
Because RRDs query processing code is so tightly interwoven with its page producing code,
replacing the query code wasnt enough; the team had to work around the code for page
displaying the page. This took the team deep into the realm of working against RRDs
capabilities instead of with them.
But the results were worthwhile in terms of improved performance.
10.3.5 Caching JNDI Objects

The teams experience investigating how RRD handles JNDI lookups made clear that it did so
inefficiently. Every JNDI lookup required creating a new InitialContext object, a very expensive
operation.
The team created a simple ServiceLocator class that cached the InitialContext, JDBC data
sources and EJB homes. (Although the implementation used EJB minimally, it did use
stateless session beans in the Customer Service application to wrap the JMS message
producing code.) This class was packaged in a custom library that was dropped into
WebSpheres AppServer/lib/ext folder.
The library worked well. The only inconvenience was that it meant bouncing WebSphere more
often: when the library changed, and when the Customer Service application was rebuilt and
redeployed (because the EJB home stubs became stale.)
Page 57 of 109
10.3.6 Using DTOs for Work Tickets
As described above in the section on developing the RRD applications, the way RRD
generates code prohibits data objects from the class model from being directly shared across
pages. So for the Customer Service Web application to hold pending work tickets in session
state, it had to do so by converting ticket objects to XML and back again.
This process proved highly inefficient, especially because RRD used a DOM API to construct
the XML. (DOM is notoriously expensive.) After working through other code bottlenecks, the
team focused on this one. The solution was a custom WorkTicket DTO class to replace the
XML format, along with efficient code to go between the DTO class and the page-specific ticket
objects.
This change proved enormously helpful to performance.
10.3.7 Handling Queues in Customer Service Application

One coding change that proved fruitless was to refactor how RRD handles JMS queues in the
Customer Service message producing logic. The generated code opens a queue for every
message sent. The team tried code to keep the queue open but ran into transaction issues.
Moreover, they discovered that WebSphere maintains a connection pool for the queue, so the
effort was unnecessary.
10.4 RRD Round: Tuning the System for Performance

This section discusses various actions the WebSphere team took to tune the ITS system, apart
from changes to the application code. It discusses the strategy and performance indicators
used, the variables tuned, and some important issues address that arose in the process.
10.4.1 Tuning Strategy

With so many variables affecting performance in different and interrelated ways, one can easily
waste a great deal of time if one does not have a strategy. From previous experience the team
settled on a strategy with these basic elements:
1. Work up and out. In other words, take a stab at the application server parameters,
including JVM parameters. Find the maximum load without errors. Get into the
ballpark, dont try for final precision.
2. Then go to one end of the system (the database or the Web tier) and work toward the
other.
3. At each point (for each tuning variable), try making a big change, such as doubling the
size of a pool. Run a quick test, see if it had any effect. Use binary chopping to zero in
on the optimum value.
4. Never change two things at once. Stick with the plan, resist taking shortcuts as time
runs out. One exception to this rule is where two variables are related, such as the
Web container thread pool and the database connection pool.
5. Realize that precise tuning requires several passes through the system.
Regarding the last point: because the team spent so much time optimizing code, they really
had time for only one pass through the system.
10.4.2 Performance Indicators

The team used a number of indicators to measure performance during tuning:
Page 58 of 109
Page hits per second. LoadRunner provides average response times for each individual
action in a test script. With some scripts comprising as many as two dozen actions, and when
you have to run many tests in a long tuning process, it takes too long to record all the response
times. response as a.
The total page hits/second statistic in LoadRunner provides a handy summary performance
indicator. It tells you the peak load that the application can handle. As a load test ramps up the
number of users, at some point hits/second reaches a plateau before response times climb
significantly and errors accumulate. This level amount represents the peak user load.
To calculate peak user load from hits/second, use this formula:
users = ( hits/sec ) * ( total user think secs in script / total hits in script )
All the scripts used in this study had 5 seconds of think time per Web request. If a script had
15 requests with a total of 20 hits, then
users = ( hits/sec ) *
((5 user think secs / request ) * ( 15 actions / script ) / ( 20 hits / script ))
users = ( hits/se c ) * 3.75 user-sec/hit
In other words, it would take approximately 2,250 users to generate 600 hits/second.
CPU usage. To see how hard a machine was working (CPU usage), the team used top on the
Linux servers and Performance Monitor on the Windows machines. The latter also provided
indicators of disk activity to tell them how hard the database was working.
Response times. For more specific issues the team looked at response times of individual
actions in a script.
10.4.3 Tuning the JVM

Tuning the application servers Java virtual machine centers on two issues:
the amount of heap memory allocated to the JVM

the duration and behavior of garbage collection (GC)
The two settings are related, because while a larger heap allows the JVM to work with more
objects and possibly provide greater throughput, it also means that GC must work harder when
heap begins to fill.
10.4.3.1 Garbage Collection

Previous performance studies had taught the team that GC is a critical issue in J2EE
applications. Standard JVMs offer two garbage collection modes:
Non-concurrent. The garbage collection thread sleeps most of the time, then periodically
wakes up to collect garbage. When it does, it pauses the JVM, leading to a backlog of
requests that under load can be overwhelming.
Concurrent. Concurrent mode spreads the performance cost of garbage collection out
over time. The garbage collector runs at a low level in the background most of the time.
This reduces throughput during the steady state. But when full GC kicks in, it takes less
time because the collector has been working continuously, so the backlog of requests
doesnt build to a critical level.
Unrecognized Parameter
Suns JVM uses the parameter -XX:+UseConcMarkSweepGC to turn on concurrent GC.

However, when the team tried it, they got an error indicating that the JVM could not start. The
Page 59 of 109
reason was that WebSphere 5.1 is installed with IBMs JVM 1.4.1, which does not recognize
this parameter.
The team briefly tried having WebSphere use Suns JVM instead, but quickly ran into other
errors. So they abandoned this effort and stayed with the original JVM installation. See
Section 10.6.1 for more details.
Garbage Collection with IBMs JVM
IBMs JVM has a parameter similar to Suns: -Xgcpolicy toggles concurrent marking of objects:
-Xgcpolicy:optthruput turns concurrent mark off (to optimize thruput). This is the default
setting
-Xgcpolicy:optavgpause turns concurrent mark on (to optimize the pause due to full GC).
During tuning, the team experimented with the latter policy. When they reached a stable
configuration, however, they found that the system performed as well with concurrent mark
turned off, and left that setting in place.
Garbage Collection Guidelines
The IBM literature on performance tuning suggests that the JVM should optimally be spending
10
an average of about 15% of its time collecting garbage. If GC is less than this, the JVM may
be wasting memory; if more, the JVM is working too hard, indicating that heap may be too small
and/or objects are not being used efficiently.
The team used this guideline as it examined application server performance. It relied primarily
on Tivoli Performance Viewer to give statistics on garbage collection.
10.4.3.2 Heap Size

As part of the JVM tuning process, the team determined the best heap sizes for the servers.
WebSpheres heap settings default to a maximum size of 256 MB and an initial size of 64 MB
(25% of maximum).
To find the optimum heap size, the IBM literature suggests the following procedure:
1. Choose a maximum size, say 128, 256 or 384 MB.
2. Set the initial size to 25% of maximum.
3. Use the verbosegc JVM parameter, which prints output of GC and heap expansion
activity.
4. Run the server under load. See where heap size stabilizes and where GC falls to an
acceptable level (around 15% of total CPU time).
5. Repeat for different heap sizes.
The team followed this test procedure with different heap sizes, ranging from 128 to 768 MB.
They chose 768 MB as the upper end of the range because each machine had 1 GB total
RAM, and a rule of thumb derived from previous experience suggested devoting no more than
75% of total RAM to the application servers.
What they discovered is that WebSphere does not run better with significantly more memory.
Ultimately the team left the Customer Service servers at 256 MB but increased the Work Order
servers heap size to 384 MB.
10
The IBM Redbook, IBM WebSphere V5.1 Performance, Scalabilityand High Availability, says: The average time between garbage collection
calls should be 5 to 6 times the average duration of a single garbage collection. This translates to a range of 14-17%.
Page 60 of 109
Once the optimum heap size was determined, the team set the initial size = the maximum size,
so that the server doesnt waste time adjusting heap size incrementally.
10.4.4 Vertical Scaling

The team also experimented with vertical scaling, the technique of running multiple WebSphere
instances on the same host to improve throughput. Would multiple instances running the same
application help? What about dedicating an instance to each of the three Work Order
modules?
On the Customer Service side, the team tested 2 and 3 instances per host. (At 256 MB heap
per instance, 3 instances was the most they could run.) On the Work Order side, they tested
two alternatives:
2 identical instances at 384 MB each, with all three applications deployed to both
3 instances at 256 MB apiece, each dedicated to one of the three applications
These alternatives did not improve performance. WebSphere on Linux apparently spawns
additional threads that act like processes. So, after much testing, the team found that the basic
configuration of one instance per machine was best. On the Work Order side, that one
instance hosted all three Work Order applications (the Web application, the message
consumption application and the Web service for queries from Customer Service).
10.4.5 Database Tuning

The team took these major actions to tune the database:
Create indexes. The team found that creating an index on each individual field used in a
query was most efficient. So, for the WorkTickets table for example, they created separate
indexes on CreationDate, WorkStatus and other fields used in queries.
Adjust buffer cache size. Oracle caches recent query results; up to a point, the more it can
cache, the faster it performs. The Oracle Enterprise Manager console shows the optimum
cache size. The team returned to this setting periodically to make sure it was adjusted
properly.
10.4.6 Tuning JDBC Settings

Within WebSphere there are two important JDBC settings:
Connection pool size. The number of connections in the pool affects how long a thread in
the Web container must wait to carry out a JDBC action. The team set this pool size equal
to that of the Web container thread pool (since the applications did not use EJBs for
persistence). They also set the initial size = the maximum size (as they did with all pools)
to get the system initialized more quickly.
Prepared statement cache size. WebSphere caches prepared statements. The more
different SQL statements used in the application, the greater this number should be.
10.4.7 Web Container Tuning

10.4.7.1 Web Thread Pool
The team experimented with different sizes for the thread pool. The optimum is related to the
size of the JDBC connection pool and the Apache parameters.
10.4.7.2 Maximum HTTP Sessions

RRD is greedy when it comes to using HTTP sessions. Every generated servlet asks for a
session on every invocation; even an action that logs out the current user (invalidating the
Page 61 of 109
current session) immediately creates a new session when it redisplays the home page! The
ITS specification required sessions to last at least 15 minutes, and there was no guarantee the
load scripts would perform explicit logouts (not that they would help anyway). So sessions
would linger until they timed out.
After some experimentation the team settled on a maximum session count equal to twice the
peak number of users.
10.4.8 Web Server Tuning

Apache has several settings to tune, all found in its configuration file httpd.conf:
MaxClients
ThreadsPerChild
ListenBacklog
These settings are related to each other in the following ways:
MaxClients sets the limit on the number of simultaneous HTTP requests that will be
served.
Any connection attempts over the MaxClients limit go into the queue, whose length is
determined by Apaches ListenBacklog setting (as well as Linuxs TCP backlog setting).
Apache creates multiple child processes, each of which has ThreadsPerChild threads. So
MaxClients is the maximum number of threads operating simultaneously.
MaxClients / ThreadsPerChild must be an integer and cannot exceed 16 (Apache creates
no more than 16 child processes).
In the case of Apache threads, the team found that more is not always better. There came a
point where reducing MaxClients improved performance. The reason: if resources behind
Apache are choked, making users wait improves performance.
The team also experimented with using multiple Apache instances, but found they didnt help.
10.4.9 Session Persistence

The ITS specification required that the Customer Service application share session state
between the two clustered instances. Session sharing makes possible seamless failover in a
distributed system such as ITS.
This requires persisting sessions to some kind of store. Over the course of the RRD round, the
WebSphere team wrestled with several techniques for persisting session state. Additionally,
the team used other WebSphere settings to control the frequency of session persistence and
thereby tune for performance.
See Section 10.6.7 for a full discussion.
10.5 WSAD Round: Issues
10.5.1 Use of External Libraries and Classloading in WebSphere

In both the RRD and WSAD implementations, the team wrote custom, system-wide code and
packaged it into external libraries. They deployed these libraries to WebSphere by dropping
them into the <WebSphere home>/AppServer/lib/ext folder, then bouncing the server.
In the RRD round the use of these libraries had no effect on how the applications were
deployed. But in the WSAD round it did. The team suddenly got NoClassDefFoundErrors on
code that used classes from one of the libraries.
Page 62 of 109
It took a few hours to find the remedy. WebSphere has an application deployment setting that
controls the order in which class loaders are invoked. The default is parent first (WebSpheres
class loader before the applications). When they changed the setting to parent last, the error
went away.
While making the error go away, this remedy did not explain why the error had appeared with
the WSAD code but not the RRD code. Eventually the team identified the key difference: In
the RRD implementation, the application classes used classes in the library but did not extend
them. In the WSAD implementation, application classes extended library classes. It was the
loading of these dependent classes that caused the error.
Even then a small headache remained. If a team member had to uninstall and reinstall an
application, the classloader setting always reverted back to its default. It required a few extra
steps and a few extra minutes to set it correctly.
10.5.2 Pooling Objects

The WSAD implementation made extensive use of data transfer objects (DTOs). The code
included a dozen DTO classes, one for nearly every data entity in the domain and several
custom DTOs for special queries.
Realizing how expensive the creation of DTOs can be, Developer B created a simple class to
manage a pool of objects.
The ObjectPool class was designed as a wrapper to a java.util.Stack. If the stack is empty
when a client requests an object, the pool creates a new one.
Each DTO type had its own subclass of ObjectPool.
To avoid duplication of pools, each subclass was designed to create a static singleton
instance of itself.
The pooling logic was very simple, using synchronized methods to keep it thread safe. But
as long as the expense of calling those synchronized methods was less than the expense
of creating and garbage-collecting DTO instances, it would be worthwhile.
Refactoring the code to use this class proved fairly easy, and subsequent testing showed that
object pooling improved performance by 5-10%.
After refactoring the code to use the new class, Developer B realized he could have built a
single pool class to cover all cases. It would have maintained a hash map of pool instances,
keyed to DTO class type. This change would have improved code simplicity but not
performance, so he didnt pursue it.
10.5.3 Streamlining the Web Service I/O

In looking at the XML returned by the Work Order Web service in response to Customer
Service queries, the team found two inefficiencies:
XML element names were very long. They were human readable names matching the
column names in the source table.
The Web service returned all columns in a WorkTicket row, even those that the client did
not use.
Developer B refactored the code to shorten the element names to one or two characters each,
and to eliminate the unneeded data elements. For a query returning a full page of data (11
rows ), these changes shrank the size of the XML from about 7500 characters to about 2900.
That shrinkage improved performance noticeably.
Page 63 of 109
10.5.4 Optimizing Queries
While most database activity in the system was simple (meaning it involved only one action on
one or perhaps two tables), some actions were more complex. For example, the Work Order
application processed messages from the Customer Service Web application calling for
creation of a new ticket. To do so, the Work Order application had to execute three steps:
1. Use an Oracle sequence to get a ticket ID (primary key) for the new ticket.
2. Get the ID of a technician assigned to the specified customer and building.
3. Insert a new row into the WorkTickets table using the information gathered in Steps 1
and 2 plus the other data in the message.
As noted earlier, the team did not use stored procedures to handle complex database
operations like this. So as a first cut, this operation would require three database calls.
Nesting one call inside another can help. The SELECT statement invoked in Step 2 can be
nested inside the INSERT in Step 3, cutting the number of invocations to two. But the SELECT
used in Step 1 cannot be nested.
11
The team learned, however, that Oracle PL/SQL logic can be passed explicitly as a JDBC
statement. So they constructed the following prepared statement to handle this operation in a
single database call:
DECLARE
tickID INT;
BEGIN
SELECT SEQ_TICKETID.NEXTVAL INTO tickID FROM DUAL;
INSERT INTO WORKTICKETS

(TICKETID, TECHNICIANID,
CUSTOMERID, BUILDINGID, TYPEID,
CONTACTNAME, CONTACTEMAIL, CONTACTPHONE, CONTACTFAX,
FLOORNUMBER, ROOMNUMBER, WORKDESCRIPTION,
TICKETSTATUS, TICKETPRIORITY, CREATIONDATE, SUBMITTEDBY)
VALUES
(tickID,
(SELECT TECHNICIANID FROM TECHNICIANS
WHERE CUSTOMERID = ? AND BUILDINGID = ? AND ROWNUM = 1),
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);
END;
Using this SQL in place of two separate database calls significantly improved performance on
processing messages for new work tickets.
10.6 Significant Technical Roadblocks

This section provides details on the important technical problems the team encountered.
11
Oracles native language for stored procedures.
Page 64 of 109
10.6.1 Switching JVMs with WebSphere
When the team was ready to tune the WebSphere JVM for garbage collection (see Section
10.4.3.1), they tried using Suns JVM parameter -XX:+UseConcMarkSweepGC to turn on
concurrent GC. However, this produced this error indicating that the JVM could not start:
Unrecognized option: -XX:UseConcMarkSweepGC

Unable to create the Java virtual machine
This error occurred because the JVM installed with WebSphere 5.1, IBMs Java 1.4.1, does not
recognize this parameter.
Concerned that using IBMs JVM might deny them important options provided by Suns JVM,
the team deciding to try having WebSphere use Suns JVM instead. They downloaded Java
1.4.1_04 for Linux from Sun and installed it on one of the Customer Service hosts. To point
WebSphere to this JVM, they used the WebSphere admin console to change the value of the
JAVA_HOME environment variable.
While the server started and appeared to be running, it did log an error indicating that it could
not find a particular IBM library. This was due to the changed Java home.
At this point, suspicious that the server had not started cleanly, and fearful of opening a
Pandoras box that would waste valuable time, the team abandoned this effort and reverted
back to the original JVM installation.
10.6.2 Configuring Linux for Edge Server, Act 1

When first starting Edge Server, the team got an error: kernel extension not loaded. IBM tech
support told them that Edge Server doesnt support their version of the Red Hat Linux kernel,
namely 2.4.9-e.34. IBM said theyd have a patch in a day or two, then later said they couldnt
produce a patch, so the kernel had to be upgraded. The team upgraded to 2.4.9-e.38 on the
Edge Server host.

Once past the kernel upgrade issue, the team ran into a more vexing problem on the machines
in the cluster. This issue proved a major challenge, costing the WebSphere team many man-
days of effort.
Technical Background
First some background. IBM Edge Server performs load balancing using a technique called
MAC forwarding. It redirects a request at a low level by altering the MAC address (machine
address) in the packet.
The machines and IP addresses involved in this project are shown here:
Machine IP Address Description
Customer Service 1 192.168.4.201 Physical address assigned to network card

(WEB01)
192.168.4.200 Cluster address; alias on loopback device
Customer Service 2 192.168.4.202 Physical address assigned to network card

(WEB02)
192.168.4.200 Cluster address; alias on loopback device
Edge Server host 192.168.4.210 Physical address assigned to network card

(MQ01)
192.168.4.200 Cluster address; alias on network card
Page 65 of 109
192.168.4.200 is the address of the Customer Service cluster. Requests using that address
must go to the Edge Server host, so that host is assigned the cluster address as an alias.
When a packet comes in addressed to the cluster, it contains both the IP address and the MAC
address of the destination machine, namely the Edge Server machine. Based on the load
balancing algorithm, Edge Server chooses a cluster member to handle the request. It changes
the MAC address to that of the cluster member, but leaves the IP address alone.
For MAC forwarding to work, every machine in the cluster must have the cluster IP address as
an alias. However, the clustered machines should not respond to ARP requests (broadcast
requests asking for the MAC address associated with an IP) on the cluster IP address. The
solution is to alias the machines loopback device. Doing so requires a simple ifconfig
command.
Unfortunately, Linux has a defect such that aliases on the loopback device do respond to ARP
requests. This means when an ARP request goes out for the cluster IP, all the machines in the
cluster respond. That should not happen.
Theres a Linux patch (the so-called hidden patch) that lets you hide those IPs from ARP
requests. This patch is documented in the Edge Server Network Dispatcher Admin guide. The
patch came from http://oss.software.ibm.com/developerworks/opensource/cvs/naslib
Patching the Linux Kernel, Take 1
Applying the patch required rebuilding the kernel (since thats how things are done in Linux).
While the team had experience using Linux, they had never patched a kernel before.
Developer A, slogged through to do this on ITSWEB01. These were the basic steps he
followed:
1. Locate the kernel source code on the installation CD and copy it to the machine.
2. Install certain additional Linux packages that hadnt been installed in order to get the
kernel to recompile.
3. Apply the patch (supplied as a diff file) as follows:

patch b -verbose p0 < hidden-2.4.5-1.diff
4. Follow directions from Redhat tech support to compile the new kernel.
When he rebooted, ITSWEB01, he got errors regarding the ethernet devices. The machines
each had two network cards, a Linksys Gigabit card and a Compaq card: the first was assigned
to eth0, the second to eth1. On reboot, eth0 was not recognized at all. And eth1 gave an error:
Dev eth1 has different MAC address than expected.
Next the team tried fiddling with the Network Configuration, but with no luck. By weeks end
they were no closer to a solution.
The following Monday the team tried reinstalling the network card drivers. This took them on a
tour of a different circle of Linux hell. Some of the highlights:
The Linksys CD had two sets of drivers for the Gigabit card. But the install script for what
looked like the correct drivers said they required 2.4.13. For earlier kernel versions it
suggested upgrading to a newer version.
The team tried using the Compaq ProLiant Support Pack for Red Hat Enterprise Linux 2.1,
from the Compaq / HP web site. Perhaps it might update the drivers properly. Getting it to
run was a series of small obstacles, and ultimately it failed to help anyway.
Finally the team rebooted with the original kernel, but the boot failed because the root
partition had run out of disk space.
Page 66 of 109
At this point the team called a colleague offsite with greater Linux knowledge who helped them
free up disk space and remake the kernel. After rebooting, eth0 was active, eth1 (the Compaq
card) was inactive.
The team tried installing the bcm5700 driver for the card. With another series of acrobatic
maneuvers, including manually editing the file /etc/modules.conf, the team got eth0 and eth1 to
activate properly.
With the patch installed, the team applied the alias to the loopback device with this command:
ifconfig lo:1 192.168.4.200 netmask 255.255.255.255 up
And they applied the hidden flag like so:
sysctl w net.ipv4.conf.all.hidden=1 (turn on hidden patch)

sysctl w net.ipv4.conf.lo.hidden=1 (apply hidden flag to loopback device)
All that remained was to transfer the configuration to the second Customer Service machine.
Also not a trivial process (is anything trivial in Linux?), but with help from the offsite colleague
the team got through it.

Testing the Hidden Patch
The team worked with Edge Server for some time, but found it not working properly. After
wrestling with its configuration settings, they at last delved into the network layer to see whether
MAC forwarding was working as expected. Using the arp utility
arp a (to list the table)

arp d (to clear the table)
They learned that the cluster IP address was being associated with the MAC addresses of the
two Customer Service hosts, but not the ES host. They suspected that the hidden patch had
not taken hold. They performed a definitive test as follows:
1. Powered down the two CS hosts
2. Clear the arp table on a client machine
3. Ping the cluster IP from the client machine
4. Read the arp table.
Result: the arp table had the cluster IP associated w/ the ES box, as it should.
Next they tried the opposite test:
1. Boot ITSWEB01, activate its cluster IP alias on lo:1
2. Deactivate the cluster IP alias on the Edge Server host
3. Clear the arp table on a client machine
4. Ping the cluster IP from the client machine
Result: ITSWEB01 responded (it should not have). So the hidden patch had not worked.
Page 67 of 109
Why the Failure?
Why had the previous attempt to apply the patch failed? In retracing their steps, the team
discovered an apparent discrepancy in how the patch (the diff file) was applied. The team
member had used this command:
when the documentation had this command (-p0 vs p1):
The p option has to do with the number of characters to strip from file names. Was that the
explanation? The team tested both options using the dry-run option; -p1 showed all
successes whereas -p0 showed failures.
Discovering this apparent discrepancy, the team retraced their original procedure for applying
the patch. But halfway through it they ran into more disk space problems on the root partition,
as well as intimidating errors like this one:
Mount: wrong fs type, bad option, bad superblock on /dev/loop2, or too many
mounted file systems (could this be the IDE device where you infact use ide-scsi so
that sr0 or sda or so is needed?)
Cant get a loopback device.
At this point the team called a local Linux expert to come onsite and clean up the mess. In
about 6 hours he freed up disk space, cleaned up the kernels and applied the hidden patch
successfully to both Customer Service machines.
After that, the hidden IP addresses were no longer an issue. The team could move on to other
problems with Edge Server.
10.6.5 Configuring JNDI for WebSphere ND

After installing the Network Deployment version of WebSphere and clustering the Customer
Service servers, the team found the Customer Service application throwing a JNDI
NameNotFoundException. For example:
javax.naming.NameNotFoundException:
Context: WASCell/nodes/ITSWEB01/servers/nodeagent, name:
ITSCustServ.CustomerUpdateMessageHome:
First component in name ITSCustServ.CustomerUpdateMessageHome not found.
The error occurred when the application tried to look up the JDBC datasource or the EJB home
for the session beans that produced messages. The frustrating thing was that this error didnt
occur when they ran the application on a standalone server.
Developer B first thought that the offending part of the name, ITSCustServ, was already
representing a target object, and therefore couldnt also be a context for this compound name.
So in the WebSphere console he changed the name to ITS.CustServ. But this had no effect;
in fact the error message was unchanged, suggesting that his change to the JNDI name had
not been recognized at all.
Next he tried inspecting the JNDI tree. WebSphere has a utility, dumpNameSpace, that lets
you get the tree in text form. He verified that the entries from the console were there. But a
federated environment uses a federated JNDI setup. So the tree is full of links to other places.
Page 68 of 109
In fact, if you use dumpNameSpace with the default JNDI port 2809, you dont see the entries.
You need to use port 9811 for the right location.
Knowing the names were in the namespace, he then looked for the reason why they werent
found. He found the code RRD generated to do the session bean lookup. It did the following:
1. Look up the AppServerDescriptor
2. Get properties from the AppServerDescriptor
3. Create an InitialContext using the properties.
4. Do the lookup, with the JNDI name hard coded.

12
The key thing was that RRD hard codes the JNDI name , so the change the developer had
made in the WebSphere console didnt matter. He put it back to the old value.
So if the name was valid, then the code had to be looking for it in the wrong place. The
provider URL is specified in a RRD-generated file, ITSCustServ_RtConfig.properties. When
you set up a WebSphere deployment model in RRD, that URL defaults to
iiop://localhost:2809.
What you really want to use is the WebSphere bootstrap port. On a standalone server, its
2809. But in a federated environment, 2809 is used by the node agent on the local machine.
The bootstrap port is some other value, determined at runtime. It points you to a location
service demon, which is what you want instead. If you were coding straight J2EE, youd
instantiate InitialContext with a default constructor and use the default value automatically.
But you cant do that with RRD. So you have to change the configuration.
At first the developer thought of changing the provider URL setting in RRD and reconstructing
the application. After a couple of tests with no change in outcome, he realized that the
ITSCustServ_RtConfig.properties file on the target server (ITSWEB01) was not being
updated, because he was not using RRD to deploy the application from his development
machine. So he manually copied the file to ITSWEB01. That produced a new error earlier in
the process, on the lookup of the JDBC datasource.
The relevant part of the properties file is an XML element (the file is an XML file, despite its
misleading extension) with an attribute java.naming.provider.URL. Its value initially was
iiop://localhost:2809, from the default RRD setting. The developer tried manually changing it
to iiop://localhost, then iiop://localhost/, both without success. He also tried an empty string,
which gave a different error (because the empty string was treated as the explicit provider URL
value).
Finally, he decided to add some code to instantiate InitialContext with a default constructor,
verify that JNDI lookups on it would work and see what setting it contained. He did and it did.
The setting it used was corbaloc:rir://NameServiceServerRoot. When he put that URL into
the properties file, it worked.
After all that, it occurred to the developer to delete the java.naming.provider.URL attribute
from the properties file entirely. This worked too, and proved better than hard coding the URL.
10.6.6 Edge Servers Erratic Behavior

IBMs Edge Server, the component used for load balancing and failover, can be configured in
two modes:
12
In fact, RRD doesnt let you choose the JNDI name for the session bean; it creates a name as appName.messageNameHome
Page 69 of 109
You can assign fixed weights to the different targets that govern the how load is balanced
among them. This mode is similar to DNS round robinning; it balances load but does not
offer graceful failover.
You can configure a load manager that continually pings the targets to confirm their health,
and diverts load away from a target that cannot be reached. The manager pings the target
by invoking the URL of a page on port 80 of the target machine. The page does not have
to exist; as long as some reply even an error comes back from the target, the manager
is satisfied.
The team configured a manager to handle failover, but over the course of working with Edge
Server they found it behaving erratically. The manager would mark a target down for no
apparent reason. The team wrestled with the Edge Server configuration, looking for
configuration mistakes that might explain this behavior, but found none. After that they simply
switched to fixed-weight mode to avoid the inconvenience during the tuning phase. But during
the RRD Round Testing Phase this problem became critical.
Only during testing for the WSAD round did a possible explanation emerge. The team
discovered a bad network card on the Work Order host (see Section 10.6.11 for the full story).
Even though Edge Server had no communication with that host, the team wondered whether
the failing card had created noise on the network that interfered with Edge Servers pinging the
Customer Service targets and made it think one had failed. Although they could not prove or
disprove that hypothesis conclusively, after the bad card was replaced Edge Server behaved
flawlessly.
10.6.7 Session Persistence

As noted above, the ITS specification required that the Customer Service application share
session state between the two clustered instances to facilitate seamless failover.
The WebSphere team tried a progression of techniques to persist session state:
Writing session data to a database

Replicating sessions in memory in a peer-to-peer topology
Replicating sessions in memory in a client-server topology
Additionally, the team used other WebSphere settings to tune for session persistence
performance.
This section describes in detail the session persistence techniques and the teams experience
with them.
10.6.7.1 Persisting to a Database

When the WebSphere team finished developing the RRD implementation, they were required
to demonstrate its functionality, including session sharing. During the development phase, the
team had only installed standalone instances of WebSphere on each production server. This
configuration of WebSphere made possible only one technique for sharing sessions: persisting
to a database.
The team chose the Customer Service database as the logical target and set up a separate,
dedicated datasource to that database for that purpose. To configure session persistence, all
you need specify is the datasource and database login. WebSphere automatically creates the
necessary table.
While this technique satisfied the functional requirements of the RRD development phase, the
team discovered during tuning, however, that it created a significant performance bottleneck.
The team tried reducing it by adjusting the settings for tuning session persistence (see Section
Page 70 of 109
10.6.7.3 below), but could not get any improvement. At that point they looked for an
alternative.
10.6.7.2 In-Memory Replication

Having installed WebSphere Network Deployment edition offered that alternative, a feature new
to WebSphere 5: in-memory replication. In this technique, sessions are replicated in other
WebSphere servers on the same and/or other hosts.
This technique requires you to define a replication domain in WebSphere, an internal

messaging domain built on a lightweight, embedded JMS infrastructure. In configuring the
replication domain you indicate which WebSphere instances will act as replicators (message
servers) and which will use the domain (message clients). A single WebSphere instance can
do both.
In-memory replication offers two basic topologies:
Peer to peer. In this topology, each clustered server in the replication domain holds not only
its own sessions but those of all the other clustered servers. This topology saves you from
configuring additional servers, but it has certain implications for performance and failover:
Lots of duplicate messages. If the cluster has ten servers, every session is replicated to
nine destinations. That requires nine messages in the domain.
Greater memory requirements. Every server must have sufficient memory to hold the
sessions for all ten servers, not just its own.
Figure 5. Peer -to-peer topology for in-memory session replication
Client-server. In this topology, additional WebSphere servers act as repositories for replicated
sessions. The clustered servers send their sessions to these repositories, but do not
themselves replicate sessions from other servers in the cluster.
Page 71 of 109
Figure 6. Client-server topology for in-memory session replication
The team briefly experimented with the peer-to-peer topology. But given the memory overhead
it imposed, and given the availability of a host guaranteed not to go down, they shifted to the
client-server topology. They created a WebSphere server on the MQ host for this purpose and
configured it as the session server.
10.6.7.3 Tuning Session Persistence

Regardless of whether you persist sessions to a database or replicate them in memory, you
can tune session persistence for performance. WebSphere has two configuration settings that
let you control:
How frequently to write session data. You can choose to write the data at fixed time
intervals or after each servlet service.
What session data to write. You can choose to write the entire session or only the
updated attributes.
These settings let you optimize for performance (infrequent writes, update only), optimize for
failover (write after every service, write all session data), or something in between.
The team experimented with different combinations. With database persistence, their focus
was on improving the poor performance. They chose to write updated attributes only rather
than the entire session, and found this choice improved performance marginally. As for
controlling the write frequency, they could find an acceptable tradeoff between failover and
performance. Writing after every servlet service was just too slow. Writing at fixed intervals
improved performance only when the interval was too long to provide reliable failover. This
poor tradeoff led them to switch to abandon database persistence.
With in-memory replication, the team found they could write the session after every servlet
service. They also tried writing updates only to improve performance. But at that point the
team encountered strange out of memory errors related to session replication. Despite
Page 72 of 109
spending a great deal of time diagnosing this problem, including using WebSpheres diagnostic
trace service to examine the session replication behavior in detail, they could not explain it. But
when they switched from writing updates only to writing all session data, the problem
disappeared.
10.6.8 Hot Deploying Changes to an Application

The manageability portion of the study examined how effectively the team could deploy
changes to an application running under load. The normal application deployment procedure
via the Deployment Manager stops and restarts an application, which would cause numerous
errors under load. So the team needed alternatives to handle two sets of special
circumstances:
On the Customer Service side, the team could deploy to one server at a time, downing it
first if necessary. But deployment would have to be handled in a special way, because the
normal process would deploy to both clustered servers at once.
Since the Work Order application was running on only one machine, any changes to it
would have to be deployed while it was running. This would require hot deployment.
13
The IBM WebSphere literature describes how to hot deploy components.in a Network
Deployment environment. But the description is contradictory in a couple of places:
First it issues this caution:

CAUTION: Do not use hot deployment to update components in a production deployment
manager managed cell. Hot deployment is well-suited for development and testing, but poses
unacceptable risks to production environments.
Then, later on, it says: For changes to take effect, you might need to start, stop, or restart an
application. (This would, of course, defeat the purpose of hot deployment.)
Nevertheless, the document describes the various facets of WebSphere that go into the
process:
The exploded, deployed application is located in

<WAS home> /installedApps/<cellName>/ application_name.ear
Application metadata can be located in a separate place under
<WAS home> /config/cells/ <cellName>/
applications/ <application_name.>ear/deployments
Each deployed application has settings to enable/disable reload and control the reload
interval. Changing these settings requires restarting the application.
From this information, and with much experimentation, the WebSphere team put together
procedures for updating the applications. For the Customer Service application, they used this
procedure in the RRD Round:
1. Copy installedApps/../itscustserv.war folder from WebSphere on the development

machine to a staging area on both Customer Service hosts. (To save time, just copy
WEB-INF, META-INF and maybe common subfolders.)
Also copy config /../itscustserv.war folder.
2. Stop the first Customer Service server (see Section 10.6.9 for a discussion of how to
do this gracefully).
3. Copy the contents of the staging area to their respective locations in the WebSphere
installation.
13
Go to the WebSphere Information Center at http://publib.boulder.ibm.com/infocenter/ws51help/index.jspand search on hot deploy.
Page 73 of 109
4. Restart the server.
5. Repeat steps 2-4 for the other server.
Later, in the WSAD Round, they developed this alternative procedure:
1. Turn off the Deployment Managers automatic synchronization with nodes.
2. Deploy the updated Customer Service application through the Deployment Managers
Admin console in the normal way, but do not synchronize when saving the changes.
This installs the updated application on the DM machine only.
3. Stop the first server.
4. In the Deployment Manager, synchronize with the first Customer Service node. This
copies the installed application into place.
5. Restart the server.
6. Repeat steps 3-5 for the other server.
For the Work Order application:
1. Install the application with reload enabled and the reload interval set to a reasonable
value, such as 60 seconds.
2. Copy the changed components to their target locations under installedApps.
3. Wait until the reload interval has passed to see the changes.
Note that while these procedures cover most situations, there are some circumstances they
cannot handle:
If you update a custom external library deployed to WebSpheres lib folder, you must
restart the server.
If you update an EJB whose stub is cached (as, say, in a service locator class), the stub
becomes stale.
10.6.9 Configuring for Graceful Failover

A robust distributed system architecture must meet two important requirements:
Load balancing. It must balance client traffic evenly among available servers
Seamless failover. If a server dies in the middle of a conversation with a client, another
server should pick up the conversation without the client knowing it.
To meet these requirements, the architecture must provide two additional capabilities:
Session persistence. Seamless failover requires applications to persist the client

sessions so they are not lost during failover.
Server affinity (stickiness). Session persistence makes it possible for any server in the
cluster to handle any part of the client conversation, by first retrieving the session from the
persistent store. But session retrieval is expensive and slows performance; it should be
minimized during normal operations. So the smart system directs all requests from the
same client to the same server.
While the WebSphere team was able to address load balancing effectively, seamless failover
proved a huge challenge. The team tried several topologies:
A standard topology, suggested by default installations and the IBM literature.

A small but important modification to the standard topology.
Page 74 of 109
A major departure from the standard topology.
This section describes these topologies in turn and the teams experiences with them.
10.6.9.1 Failover Requirements

The load balancing and failover issues had to do with the Customer Service module of the
system, which was deployed to two clustered machines. For the testing phase the WebSphere
team needed to handle two types of failover:
Controlled shutdown of a server (as to redeploy an application)

Sudden, catastrophic failure (as if someone tripped over the power cord)
For the first situation, the team had to be able to redirect load away from a server before
downing it. If they stopped a WebSphere server or even simply stopped the application in the
server, errors would occur.
The only tool available for gracefully stopping load was Edge Servers quiesce function.
Quiescing a target server means reducing to zero the traffic to that server. If you use Edge
Server to provide server affinity, you must choose whether or not to quiesce immediately. If
you choose not to, Edge Server allows ongoing conversations to finish on the same target
server, which means that it may take much longer to shift load away from that server.
10.6.9.2 Standard Topology

The standard topology, described in the IBM literature and suggested by WebSpheres default
behavior, has these features:
One instance of WebSphere on each clustered machine

One instance of Apache (IHS) on each machine
Edge Server balancing load between the two Apache instances and redirecting load when
one goes down
The Web server pluging using the default configuration generated by WebSphere.
Figure 7. Standard system topology for load balancing and failover
Whats interesting is that Edge Server is used simply to balance load bet ween the Apache
instances and provide failover if one of them dies. The Web server plugin, does most of the
Page 75 of 109
14
work: it provides both load balancing and server affinity (stickiness) with respect to the two
WebSphere instances. It also provides failover capability if one of the WebSphere instances
goes down.
How the plugin distributes load among the target servers is governed by the plugin
configuration file, generated by WebSphere. By default the distribution is equal among the
targets.
The WebSphere team found that this topology worked well for load balancing, but performed
very poorly during failover tests. They could not gracefully bring down a WebSphere server. If
they used Edge Server to quiesce Apache #1 (the instance on Host #1), Apache #2 would
continue to direct load to WebSphere #1.
Figure 8. Standard system topology for load balancing and failover after quiescing Apache #1 from Edge Server
10.6.9.3 Non-Standard Topology

As an alternative the team tried a different topology with these characteristics:
Each Apache instance served only its local WebSphere instance.

Edge Server provided stickiness.
This configuration required manually editing the plugin configuration file to undo the clustering
and have each Apache instance serve one WebSphere instance only. Each host had its own
customized plugin configuration.
14
WebSphere provides stickiness by appending a unique server ID to the session ID that is passed back to the client via a cookie or URL
rewriting. When a new request in the same session comes in, the plugin detects the server ID and routes the request to that server, if possible.
Page 76 of 109
Figure 9. Non-standard system topology for load balancing and failover
This topology also proved troublesome. It did solve the problem of stopping traffic to a
WebSphere instance: when the team used Edge Server to quiesce one of the Apache
instances, traffic to the corresponding WebSphere stopped as well. But this topology required
Edge Server to provide stickiness, and that was difficult to control. Quiescing immediately led
to many errors, while quiescing slowly took much too long. The team found it hard to shut off
Edge Servers traffic to a target server in a timely, seamless fashion.
The team found this alternative worse than the first and reverted back to the standard topology.
10.6.9.4 Modified Standard Topology

During the WSAD round the team tried a third alternative that proved effective, a slight variation
on the first. In this version they manually edited the plugin configuration file to change the load
distribution between the two WebSphere instances, favoring the local WebSphere instance by
100 to 1.
Figure 10. Modified standard system topology for load balancing and failover
Page 77 of 109
This topology allowed the team to take advantage of the plugins stickiness and failover
capabilities while at the same time channelling traffic to a specific WebSphere instance. When
the team wanted to bring down WebSphere #1, they would do the following:
1. Use Edge Server quiesce Apache #1 immediately. With all traffic going to Apache #2,
which favored WebSphere #2 (apart from stickiness), eventually traffic would shift over
to WebSphere #2.
2. Monitor CPU activity on Host #1 and wait until it subsided. That would indicate that the
traffic had shifted.
3. Bring down WebSphere #1.
Although failover still was not seamless, this was the most successful topology.
10.6.10 Deploying the WSAD Web Service

For this implementation the team completely rebuilt the Work Order Web service in WSAD.
Testing it locally on the development machine using the WSADs WebSphere test environment
yielded no errors. But when it was deployed to the production environment, the Customer
Service application could not reach it. The application received a Page Not Found error, and
logged a java.lang.ExceptionInInitializerError in the error log.
The Web service address is found in 2 places:
WEB-INF/wsdl/SearchTicketsWS.wsdl in the Customer Service WAR file

A client class that WSAD generates from the WSDL and resides in the Customer Service
Web application.
Developer B double-checked that both these references were using the explicit IP address of
the Web service host. Then he gathered other clues:
When he ran the Customer Service application on the WSAD test environment and tried to
use the Web service on the production machine, it worked.
15
When he manually invoked the direct URL of the Web service from a browser on the
Customer Service host, it gave back the proper response page.
So the Web service was responding; the problem seemed to be with the client application.
Working together, the team noticed that the webservices.jar library in WebSpheres
AppServer/lib folder was older than the one in the WSAD test environment. They wondered
whether this discrepancy was causing the problem. When they substituted the library from
WSADs runtimes/base_v51_stub folder, that solved the problem.
10.6.11 The Sudden, Bizarre Failure of the Work Order Application

Early in Phase 2 of the WSAD round, the team tested the Work Order Web application under
load and got excellent results: it ramped up to 3000 users (600 hits/sec) before hitting a wall.
That was a great start, due to two factors:
Much more efficient code

Previous system tuning from the RRD round
15
The URL was: http://192. 168.4.215:9080/itsWorkOrderConsoleWeb/services/SearchTicketsWS; it responded with a page that simply said And
now some services.
Page 78 of 109
The very next day, however, the application started behaving badly. As load ramped up,
hits/sec would climb erratically; response times were horrible. The team had no idea what had
thrown this particular monkey wrench, so they pursued all the usual suspects:
They bounced all the machines and servers.

They regenerated the WebSphere Web service plugin.
None of these actions solved the problem. Moreover, a load test on the Customer Service
application gave respectable response times on Web service queries. This clue narrowed the
focus to the Work Order application itself. So the team looked at it more closely.
They rechecked the LoadRunner scripts and settings.

They ran the application with Borlands Optimizeit Thread Debugger. No sign of trouble
with threads.
They discovered an apparent error in the SQL for the Web applications ticket query. Fixing
it seemed to help, but the improvement was erratic.
Other coding changes did not help.
After this round of effort, another test of the Work Order Web application cranked it up over
3500 users. That was nice to see, but it didnt explain what the original problem, or whether the
team had solved it.
In fact, they hadnt. Despite successful load tests, the horrible results returned, proving again
the maxim, Things that go away by themselves can come back by themselves.
By the beginning of the testing phase the problem had worsened; the response time of
WorkOrder application was orders of magnitude time greater than that of the Customer Service
application. The team rechecked everything again in a vain search for the cause.
Then a new, tantalizing clue suddenly appeared: Ethernet 0 on the Work Order machine
started to fail, going off and back on intermittently. Could the network card be responsible for
the poor response times? There were two subnets connecting the servers, 192.168.4 and
192.168.5. The failing network card provided a .4 address, the address used by the load
testing scripts. But the Web service (which was performing well) was accessed through a .5
address provided by a second card. So the failing card became the prime suspect in this
mystery.
CN2 immediately replaced the failing card, after which the team reran the Work Order load test.
Lo and behold, the results were now on par with those of Customer Service. (More importantly,
those results would remain stable through the rest of the project!) Eve ryone involved
concluded that the failing card had been the culprit.
But discovering the failing card raised two new questions. First, did it also explain the tendency
of Edge Servers load manager to occasionally mark a target server down for no apparent
reason (described in Section 10.6.6)? The team wondered whether the failing card had created
noise on the network that would fool Edge Server into believing one of its targets had failed.
They would keep their eye on Edge Server. (In fact, it behaved flawlessly from that point on.)
Second, did the bad card compromise the results of the previous performance tests? Since
there was no way to answer that question with certainty, CN2 insisted on rerunning the tests for
both the .NET and RRD implementations, to insure the validity of the results.
10.6.12 Using Mercury LoadRunner

Mercury LoadRunner is a powerful tool with many useful features. It allowed the team to
simulate loads well in excess of the applications capacity, and gave detailed results that helped
the team diagnose problems and tune the system.
Page 79 of 109
16
But like any complex, sophisticated piece of software , LoadRunner can behave strangely if
not configured and used exactly correctly. Over the course of tuning and testing their system,
the team learned some important lessons about LoadRunner that would apply to comparable
products:
Adjust the scripts in response to page changes. Any change to the URL of a request
(including changes to query parameters) affects a script that invokes that URL. The same is
true of the fields in a form; if you add, remove or rename a field, the script must be corrected.
Double-check the runtime settings. Twice the team got bizarre test results because a simple
LoadRunner runtime setting was wrong.
In one test, the Work Order Web application suddenly began overloading at a fraction of the
load it had handled the day before. The cause turned out to be the LoadRunner runtime setting
governing how think time was handled. The scripts had 5-second think times hard coded
before each Web request, but when running a script you can control whether and how that think
time is used. The correct setting was to use a random value between 50% and 150% of the
17
coded time . But in this case, think time was accidentally turned off, so naturally the system
started hyperventilating very quickly.
On another occasion the opposite occurred. The team suddenly found it could ramp up to
loads much higher than those achieved earlier. These results were too good to be trusted.
Again, the culprit was an incorrect runtime setting in LoadRunner, which controls how many
seconds a user waits between iterations of the script. The correct setting was zero, but in this
case it had been accidentally changed to 60 seconds.
Periodically refresh the clients. The team found it prudent to reboot the client machines
occasionally, to insure they performed properly. They also had LoadRunner periodically
refresh the scripts on the client machines to guard against potential script corruption.
16
For some reason, application servers come to mind
17
Randomization is important because it helps stagger the requests and more evenly distribute the load.
Page 80 of 109
11 .NET CONFIGURATION AND TUNING PROCESS SUMMARY
This section describes the process the .NET team went through to configure and tune the .NET
and Windows infrastructure. It also describes the major bottlenecks encountered and resolved
in the implementation.
Here is a high-level summary of the stages the team went through. Details follow in the
sections below.
1. Install and configure the software: Network Load Balancing (NLB) and ASP.NET State
Server.
2. Resolve code bottlenecks in the implementation.
3. Tune the system for performance.
11.1 Installing and Configuring Software
11.1.1 Network Load Balancing (NLB)

Microsoft Windows Server 2003 comes with Network Load Balancing services (NLB) pre-
installed, so the .NET team only had to configure the services to suit the application
requirements.
The diagram below shows the network topology relevant to the ITS Customer Services
application and how the .NET team configured it for NLB.
Page 81 of 109
Figure 11. Network topology and Windows NLB configuration for load balancing and failover in the .NET implementation
Since each Web server has multiple network interface cards (NICs), the team decided to
configure the NLB in unicast mode for better performance. They also followed a best practice
for this mode: connecting the clustered network interfaces to a single hub that is up-linked to
the public switch. This practice prevents the NLB from flooding the switch (a condition known
as port flooding), and degrading the entire networks performance. The hub is used by the
servers in the cluster to communicate with each other via a heartbeat process.
There was an additional requirement for using this setup. For NLB to function properly with the
hub configuration, MaskSourceMac had to be disabled. To turn it off, the team set this registry
key to 0:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WLBS
\Parameters\MaskSourceMac
The hub and the clustered network interfaces would take care of the incoming traffic. With
performance in mind, the .NET team configured the second network interface card (gigabit) for
dedicated outgoing traffic back to the clients.
To simplify the configuration, the team used the interface metric setting for the outbound gigabit
NIC to ensure it was the default NIC used for outbound traffic. To do this, they simply used
TCP/IP properties to set the interface metric of the dedicated outbound NIC to a lower setting
than that of the cluster NIC, which NLB used to communicate over the hub via its heartbeat
process. In the diagram, these interface metrics are 1 and 2 respectively This configuration is
documented in the help files for setting up network load balancing.
Page 82 of 109
With this design, routing incoming and outgoing traffic through different dedicated channels, the
team could achieve better overall network performance.
Once the team had figured out the load balancing plan, setting up the clustered environment
was straighforward. Using Network Load Balancing Manager, they could configure the
clustered environment or an individual host from any server, making the maintenance task very
simple. Network Load Balancing has both graphical and command line interfaces in Windows.
Commands enable admins to stop, add and monitor servers in the cluster. One particularly
useful command is drainstop, which enables a server in the cluster to finish handling its
current requests, while ceasing to take new requests. Once all requests have been drained
from that server, it can be taken offline for possible maintenance.
It is important to note that the .NET team chose to hold session state in an ASP.NET session
state server (rather than in the Customer Service servers themselves), and that the session
server was placed on a machine outside of the cluster (the same machine handling the durable
MSMQ message queue). This topology made the Customer Service application cluster safe,
with several important advantages:
Network Load Balancing required no configuration for server affinity (stickiness).

Load balancing became more efficient because NLB didnt have to worry about server
affinity.
Cluster maintenance tasks (adding servers, stopping servers, etc.) became very easy.
In failover scenarios, no session state would be lost because the clustered servers
themselves were not holding session state information.
There were also possible drawbacks to this topology:
A possible performance cost because the Customer Service application obtained all
session data remotely rather than locally.
The session state server as a single point of failure. Because the ITS specification
guaranteed that the MQ host would always be available, the team did not concern itself
with this issue; had it been necessary, they could have used a clustered SQL Server for the
session state data store which would have removed the single point of failure.
11.1.2 ASP.NET Session State Server

Installing Microsoft Windows Server 2003 also installs ASP.NET State Service by default.
However, the .NET Team needed to enable the service in order to start using it. Configuring the
ASP.NET State Service is quite simple.
1. Open Administrative Tools, and then click Services.
2. In the details pane, right-click ASP.NET State Service, and then click Properties.
3. On the General tab, in the Startup type list box, click Automatic.
4. Under Service status, click Start, and then click OK. The state service starts
automatically when the Web server restarts.
To configure the clustered application to use a single session state server, there are a few
documented considerations to keep in mind.
Make sure all applications use the same machine key. The team added the same
<machineKey> entries in the Web.config files.
Make sure all objects that are stored in session state can be serialized. It is easy to
implement this in .NET by adding the Serializeable attribute to each class that needs
serializing.
Page 83 of 109
Make sure the Customer Services applications have the identical Application Path on
both Web servers. This means ensuring all installations of the application have the same
URL.
11.2 Resolving Code Bottlenecks

The .NET team investigated the use of an indexed view for the ticket search query performed
by the Web service. However, this had an adverse effect on some of the other ticket searches.
As a compromise, they removed the indexed view and added indexes to the appropriate tables.
Initially, they had all ticket searches return the top 500 rows. This turned out to be a significant
amount of data with a significant impact on performance. After the team implemented paging
(which meant each query returned only enough data to satisfy the current page), the database
returned an average of 20 rows per search (based on the mix of requests in the test scripts),
with a commensurate improvement in performance.
11.3 Base Tuning Process

The .NET team made a significant number of tuning and performance modifications during
Phase 2. They are listed below, organized by category.
Note: Many of the actions the team took are discussed in the MSDN article Developing High-
Performance ASP.NET Applications in the .NET Framework Developers Guide discusses.
See Section 17.2 for the URL.
11.3.1 Tuning the Database

Used the Index Tuning Wizard to optimize SQL queries
Removed the technician-status index and added the technician and type indices to the
WorkTickets table
Added the IX_Technicians_ID_Name_Phone index to the Technicians table
Aliased two of the table names to make the batch size smaller
Added Creation Date index for WorkTickets
Changed SQL Server Memory configuration to dynamically allocate a minimum of 1536MB
and reserve physical memory of 10240
Allocated 10GB for the Work Order Processing Database file. Modified the increment to
100MB
Created a clustered index on CustomerID, CreationDate in the WorkTickets table
11.3.2 Tuning the Web Applications

Disabled script injection attack validation (since all fields were checked manually)
Disabled debug mode in Web.config
Turned off request validation, debugging, Windows authentication, and session state in the
WorkOrder Web application (forms authentication was used to authenticate users from the
user table in the database)
Turned off request validation, debugging, and Windows authentication in the Customer
Web application (forms authentication was used to authenticate users from the user table
in the database)
Disabled enableViewStateMac
Optimized the Nav user control by converting all link buttons to anchor tags, except for
login/logout
Set EnableSessionState to false for all pages in the Work Order Processing Web
application and most pages in the Customer Service Web application
Changed TicketSearchParams to use the query string instead of Session state
Page 84 of 109
Added CustomerID cookie in CustomerService App to reduce Session State during Web
service call
Disabled unused HttpModules (e.g. output caching)
11.3.3 Tuning the Servers

Enabled hyper-threading on all servers
Reduced maximum I/O and worker thread count until CPU dipped
Increased maxConnections in Machine.config (set to 200 for tests, although team later
discovered a more proper setting would have been ~8, from a default of 2)
Set the HKLM\System\CurrentControlSet\Services\TCPIP\Parameters\MaxUserPort
registry value to 65534
Set the HKLM\System\CurrentControlSet\Services\TCPIP\Parameters\TimeWaitDelay
Set the HKLM\System\CurrentControlSet\Services\http\Parameters\MaxConnectios
Changed the httpRuntime appRequestQueueLimit attribute in Machine.config to 3000
Multithreaded the Forwarder and Processor
Added ASP.NET worker processes until CPU saturated (2 were used on each machine)
11.3.4 Tuning the Session State Server

Compared ASP.NET Session State Server with Session SQL Server. Since using SQL
Server to store session state was somewhat slower, they chose to keep using ASP.NET
Session State Server.
Modified the state network timeout values in the
HKLM\System\CurrentControlSet\Services\aspnet_state\Parameters registry key and
the associated sessionState Web.config values
11.3.5 Code Modifications

Modified the code to use a custom collection instead of a DataSet
Used a dynamic query in GetWorkTicketListBySearch
Implemented dynamic row count to minimize data transfer for paging
11.3.6 Tuning Data Access Logic

Added Connection Reset=false to the connection strings
Set the number of SQL connections (connection pool size) to 25 in the connection strings
of both the Customer Service and the Work Order Web applications
Replaced dynamic SQL with a stored procedure that uses SET ROWCOUNT
11.3.7 Tuning Message Processing

One area in which the team experienced significant improvement was message processing.
Initially, the message processing application was single-threaded and processed about 2
messages per second. After they enhanced it to be multi-threaded (18 threads in the Processor
and 12 threads in the Forwarder), the application was able to process about 170 messages per
second.
11.3.8 Other Changes

Turned off DTC tracing
Removed unnecessary server controls
Page 85 of 109
11.3.9 Changes to Machine.config
The team made these changes to Machine.config:
<connectionManagement>
<add address="*" maxconnection="200" />
</connectionManagement>
<httpRuntime executionTimeout="90" maxRequestLength="4096"
useFullyQualifiedRedirectUrl="false"
minFreeThreads="8" minLocalRequestFreeThreads="4"
appRequestQueueLimit="3000" enableVersionHeader="true" />
<processModel enable="true" timeout="Infinite" idleTimeout="Infinite"
shutdownTimeout="0:00:05"
requestLimit="Infinite" requestQueueLimit="5000" restartQueueLimit="10"
memoryLimit="60" webGarden="false"
cpuMask="0xffffffff" userName="machine" password="AutoGenerate"
logLevel="Errors" clientConnectedCheck="0:00:05"
comAuthenticationLevel="Connect" comImpersonationLevel="Impersonate"
responseDeadlockInterval="00:03:00"
maxWorkerThreads="16" maxIoThreads="16" />
11.3.10 Changes Not Pursued

Here are some possible optimizations that the team did not do either because the specification
prohibited it or because they had insufficient time:
Disable IIS logging.

Disable unused NT services.
Put DTC on the fastest machine.
Put the DTC log file on the fastest disk or RAM disk.
Use CommandBehavior.SequentialAccess.
Use ASCII instead of UTF -8.
11.4 Significant Technical Roadblocks
11.4.1 Performance Dips in Web Service

The team was unable to address several performance characteristics due to time constraints.
For instance, they experienced occasional rapid declines in performance of the Work Order
Web service. These dips were momentary, and did not affect overall performance. While some
time was spent to diagnose this behavior, the team could not diagnose it in the allotted time.
Since it did not have a material impact on the results, the team was not overly concerned with
this issue, but given more time would have liked to address it.
11.4.2 Lost Session Server Connections

In addition, when the Customer Service application made Web service requests while under
very high loads (server saturation), connections to the session state server would sometimes
be dropped.
The team attempted to correct this condition using documented Microsoft Knowledge base
articles by setting the session state network timeout values higher than the default. This did not
impact the condition.
The team suspected two possible factors contributing to this problem:
They configured too many network connections (200) for the .NET HTTP module.
Microsoft documents settings much lower than the 200 network connections used (the
default setting is 2, but does need to be adjusted upwards on the Web service client
machine).
Page 86 of 109
They did not increase the number of I/O threads to handle the number of network
connections. Analysis of MSDN material indicated that such a high setting might also
require increasing the I/O thread pool.
Since this problem only occurred after server saturation and above the one-second
performance testing cutoff, the team chose not to spend more time diagnosing it.
Page 87 of 109
12 PERFORMANCE TESTING
12.1 Performance Testing Overview

The various ITS implementations were subjected to a series of four performance tests.
The first three were similar: Mercury LoadRunner was used to subject each application to load
in a variety of tests. Each test consisted of ramping up user load gradually over time to plot
system throughput curves, measure transaction response times, determine maximum user
loads supported, and track error rates under load. Identical test scripts were created to test
each of the three implementations in a consistent manner.
The fourth test measured message processing throughput.
The following section details the tests and presents the summary findings from the auditors
report.
12.2 Performance Test Results
12.2.1 ITS Customer Service Application

This test put load on the Customer Service application running on two load-balanced servers to
determine the peak throughput, transaction response times and peak user loads for each
implementation. It ramped up user load at a rate of 500 new users every 15 minutes.
The test scripts simulated simultaneous users accessing the ITS Customer Service Application
to:
access the home page

log in
search and modify customer information
generate new work tickets
search work order database via Web service across a variety of search parameters
Since the Customer Service application was integrated with the Work Order application via
messaging and a Web service, putting load on the Customer Service application also exercised
the Work Order application and message queue server.
Customer updates would cause the Customer Service application to not only update its
local database, but also send a message to the Work Order application to update its
database (which replicates customer data) as well.
The Customer Service application would submit new work orders by sending messages to
the Work Order application.
To perform ticket queries, the Customer Service application would invoke a Web service
provided by the Work Order application.
Page 88 of 109
The following table summarizes the auditors results for the Customer Service application
performance test. Please note that a transaction represents a complete business operation
invoked by a user, such as executing a ticket search and receiving the first results page. So
272 transactions, per second for example, is equivalent to 979,000 business operations per
hour or 23.5 million business operations per day.
Performance Results for

ITS Customer Service Application Running on Cluster
Statistic RRD WSAD .NET
Peak throughput (passed 272 548 606

transactions / second)
Peak user load (load at which 1,500 3,500 3,500

throughput achieved peak rate)
Failed transactions as
percentage of total 0.00% 0.02% 0.00%
This graph shows the performance of the three implementations at each user load:
TPS versus UserLoad Results for Customer Service Load Balanced

Performance Test
700
600
500
AVG TPS
WSAD
400
RDD
300
.NET 1.1
200
100
0
500 1000 1500 2000 2500 3000 3500 4000
Number of Virtual Users
*Note: Last Data Point is AVG TPS past the 1 second cut-off
The .NET implementation performed slightly better than the WSAD, reaching about 10% higher
peak throughput at the same peak user load. And both performed far better than the RRD
implementation.
12.2.2 ITS Work Order Web Application

This test put load on just the Work Order Web application to determine the peak throughput,
transaction response times and peak user loads for each implementation. The test ramped up
user load at a rate of 500 new users every 15 minutes.
The test scripts simulated users accessing the application to:
log in
Page 89 of 109
search for customers
modify customer information
search for technicians
modify technician information
search for work tickets
The following table summarizes the auditors results for the Work Order application
performance test. Please note that a transaction represents a complete business operation
invoked by a user, such as executing a ticket search and receiving the first results page. So
260 transactions per second, for example, is equivalent to 936,000 business operations per
hour or 23.5 million business operations per day.

ITS Work Order Application Running on Single Server


throughput achieved peak rate)
This graph shows the performance of the three implementations at each user load:
TPS versus UserLoad Results for Work Order Performance Test
800
700
600
500
AVG TPS
WSAD
400 RDD
300 .NET 1.1
200
100
0
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
*Note: Last Data Point is AVG TPS past the 1 second cut-off
In this test the WSAD implementation far surpassed the other two. It achieved 75% more
throughput at 50% higher peak user load. Again the RRD implementation fell far behind the
other two.
Page 90 of 109
12.2.3 Integrated Scenario
This test stressed both systems at once, running the ITS Work Order test (25% of the load) and
the ITS Customer Service Application test (75%) of the load. In this case, LoadRunner clients
were accessing both Web Applications, running the same scripts as in the two individual tests.
This test ramped up user load at a rate of 661 new users every 15 minutes: 500 on the
Customer Service application, 161 on the Work Order Web application.
The following table summarizes the auditors results for the integrated scenario performance
test, which included both the Customer Service and Work Order applications. Please note that
a transaction represents a complete business operation invoked by a user, such as executing
a ticket search and receiving the first results page. So 365 transactions per second, for
example, is equivalent to 1.3 million business operations per hour or 31.5 million business
operations per day.

Integrated Scenario


throughput achieves peak rate)
Page 91 of 109
And the graph of performance at each user load:
TPS versus UserLoad Results for Fully Integrated

Performance Test
1000
900
800
700 WSAD
AVG TPS
600 RDD
500
.NET 1.1
400
300
200
100
0
661 1322 1983 2644 3305 3966 4627 5082 5299 5449
*Note: Last Data Point is AVG TPS past the 1 second
cut-off
In this test of the complete system, again the .NET implementation was the clear winner. It
achieved 1/3 higher peak throughput at 50% higher peak user load. Interestingly, though, the
difference between the WSAD and RRD performance was much smaller for this test than for
the others.
12.2.4 Message Processing

The last test measured the performance of the Work Order application in processing
messages. It timed how long it would take the Work Order application to process 20,000
messages in the ticket queue.
The teams were asked to halt their Work Order message processing module and make sure
the work ticket message queue was empty. A script was run to load the queue with exactly
20,000 messages. Once the queue was loaded, the Work Order application performing
message processing was restarted, and the time it took to process the 20.000 messages was
measured.
This table presents the results from the auditors report:

Message Processing
Time to process 20,000 168 38 101

messages (seconds)
Throughput (messages/sec) 119 526 198
Page 92 of 109
The WSAD implementation was the clear winner over the .NET implementation in this test, by
more than a factor of two. The difference may be attributable to differences in code,
infrastructure, or both. On the .NET side, the team had to create an additional message
forwarding layer to compensate for the fact that one cannot do transactional reads from a
remote MSMQ queue (see Section 8.4.3.1). This layer, not necessary for the J2EE
implementations, undoubtedly added overhead. On the other hand, teams initial throughput
was much worse (2 messages per second); they improved it to this level by reconfiguring the
message processing application from single- to multi-threaded.
Equally interesting is the fact that the RRD implementation lagged so far behind the WSAD,
despite their using the same message server (WebSphere MQ) and the same basic
architecture (message-driven EJBs). The main reason is the generated RRD code that
acquires message related resources. In particular, RRD code creates a new JNDI
InitialContext object whenever it wants to do a JNDI lookup. For the WSAD version, the team
used a ServiceLocator class that cached the InitialContext and the queue connection factory.
12.3 Conclusions from Performance Tests

What can we conclude from these performance tests? The following table may help. It
summarizes the results of the load tests described above. (Note that this table presents peak
throughput in transactions per hour rather than per second.) It also totals the numbers from the
three load tests. These totals are only meant to provide a rough overall indicator of each
implementations performance under load.
Performance Results for 3 Load Tests

ITS Customer Service Application Running on Cluster
Peak throughput (passed business operations/hour)
Customer Service only 979,200 1,972,800 2,181,600
Work Order only 936,000 2,714, 400 1,555, 200
Integrated scenario 1,314,000 1,735,200 3,078,000
Total for 3 test 3,229,200 6,422,400 6,814,800
Peak user load (load at which throughput achieved peak rate)
Customer Service only 1,500 3,500 3,500
Work Order only 1,500 4,500 3,000
Integrated scenario 1,983 2,644 5,299
Total for 3 test 4,983 10,644 11,799
Message processing throughput (messages/hour)
428,600 1,894,700 712,900
The most obvious conclusion is that the RRD implementation fell short of the others by a wide
margin. The .NET and WSAD implementations outperformed it by a factor of two. The
Page 93 of 109
explanation lies largely in the nature of RRD as a development tool (discussed in Section
7.1.1). Designed to speed development and distance the developer from J2EE coding, RRD
generates Java code that in many ways is not optimized for performance. The J2EE team
spent a great deal of time trying to compensate for those code limitations.
Comparing the WSAD and .NET implementations, we find a much closer match. .NET
performed significantly worse on the Work Order and messaging tests, slightly better on the
Customer Service test, and significantly better on the integrated scenario test (which is the one
most like the real world). Taking the total of the three load tests, the .NET and WSAD
implementations come out nearly even. On the message throughput test, the WSAD
implementation surpassed the .NET by more than double.
The ability of the WSAD implementation to match roughly its.NET counterpart in performance
suggests that the WebSphere/Linux platform performed on a par with the .NET/Windows
platform. Of course the study revealed other indicators where the two platforms differed
greatly. But strictly in terms of performance (as measured by these tests), the two platforms
are comparable.
Page 94 of 109
13 MANAGEABILITY TESTING
13.1 Manageability Testing Overview

To test manageability of the ITS system, the teams were asked to modify various aspects of the
Work Order and Customer Service applications and deploy these changes to the system while
running under load.
The three change requests were as follows:
1. In the Work Order application, change the ordering of results from a database query
2. In the Customer Service application, add a new Web page and modify an existing page
3. In one of the Customer Service applications Web pages, change a drop-down list
whose contents are hard-coded so that it instead binds to a database table.
Each test proceeded in two parts, development and deployment. For development, the team
made, tested and verified the change in a test area, while the auditor timed the process.
For deployment, the user load was ramped to 1,750 concurrent users (1,400 against the load
balanced Customer Service application, 350 against the Work Order application). Note that
even though Request #1 did not affect the Customer Service application, that application was
still running under load during the test. The team was then asked to deploy the change to the
appropriate server(s). The auditor measured the time taken to deploy, the number of errors
occurring during deployment, and whether the Customer Service application preserved session
state.
report.
13.2 Manageability Test Results
13.2.1 Change Request 1: Changing a Database Query

The first change request applied to the Work Order Web application. It stated:
Results generated from the Ticket Search Page need to be ordered in descending order by
date with most recent tickets displayed first.
Page 95 of 109
Here are the summary results from the auditors report:
Change Request #1:

Changing a Database Query and Deploying Under Load
Development
Development time (minutes) 112 35 9
Deployment
Errors during deployment to live 874 >2,000 5

server running under load
Time required to deploy changes 17 9 1

successfully (minutes)
Explosion of errors. The WSAD experience was somewhat bizarre. After the team had
successfully deployed the changes to the Work Order Web application, the Customer Service
side of the system (which was also running under load), began to fail. This occurred even
though the changes did not affect it directly; the only modified code was in the freestanding
Work Order Web application, not the Work Order module that processed messages or hosted a
Web service.
The team had no hypothesis to explain this failure. And under the time constraints they could
not properly address it. They are confident, however, that a solution exists and, given enough
time, they could have made this work.
Auditors observations. Given the high number of errors that occurred for the two J2EE
implementations, the auditor included these observations in the report:
RRD: For this change request, the development team chose to modify the middle tier
application logic since this contained the database query to which the order-by clause
needed to be applied. In order to ensure continued query performance, the team also in
tandem chose to change an index in the database schema so that the order-by clause
could be completed efficiently. The 874 errors can be attributed to both the recompiling
of the application and changing the database index while under load.
WSAD: A portion of the errors could be attributed to both the recompiling of the application
and changing the database index while under load. However, in this exercise it is also
observed that while Apache had systematically crashed during the live deployment,
Edge Server was not part of that particular issue. Technically this test was stopped and
allowed to pass; however, the CS portion of the site should not have been effected by the
change to the WO server. Since synchronization was not a step performed by Middleware
there is no real cause for the CS site to suddenly go offline. At the time noted for System
Back Online the User load was continued on the system for approximately 5 more
minutes after the test ended to verify that the bouncing of CS#1 worked and that
transactions were being passed successfully.
.NET: The five errors were time-out errors (120 seconds), most likely due to the application
recompiling and being re-loaded into memory after being re-deployed.
Page 96 of 109
13.2.2 Change Request 2: Adding a Web Page
The second change request applied to the Customer Service application. It required changes
to the display of news items on the home page and creation of a new page for adding new,
company -specific news items. The request stated:
Add a new page to the Customer Service application that will allow administrators for each
company to generate new news bulletins that are displayed for their company. This part
requires 2 changes: adding the web form to allow news items to be submitted and stored in
the database, and adding a column to the database table allowing news items to be
tracked by unique customer id. The CompanyID of zero for any news item is specified
such that these news items will display for every company. Unauthenticated users will see
only default news items. However, once logged in, the news items for that company will
also display, in addition to the default news items.
Change Request #2:

Adding a New Web Form and Deploying Under Load
Development
Deployment
Errors during deployment to live 30 2 0


Session state properly maintained for No No Yes

running users of the Customer
Service Application?
13.2.3 Change Request 3: Binding a Web Page Field to a Database

The final change request also applied to the Customer Service application. Its ticket query
page had a drop-down list of work status conditions (Created, In progress, Completed). In the
original specification, the list was hard coded. This change request required creating a table in
the Customer Service database and binding the list to it. The request stated:
Databind the Work Status dropdown list to a table in the database vs. hard coding the
values in the HTML.
Page 97 of 109
Change Request #3:

Changing A Web Form Field and Deploying Under Load
Development
Deployment
Errors during deployment to live 6 0 0


Session state properly maintained for Yes No Yes

running users of the Customer
Service Application?
13.3 Conclusions from Manageability Tests

Taken together, these tests show that the .NET implementation of ITS was significantly easier
to manage than either the RRD or WSAD implementations. It performed better in the two most
critical areas:
First, it had fewer errors during deployment under load in all three tests. For the changes to the
Customer Service application, the RRD and WSAD implementations both had relatively small
numbers of errors, whereas the .NET had zero. The greatest difference lay in deploying to the
Work Order application, where (as noted above in Section 13.2.1) the WSAD implementation
had an inexplicable catastrophic failure.
Second, the .NET implementation maintained session state properly in the two tests involving
the Customer Service application it. The RRD implementation failed once on that count, the
WSAD implementation twice.
Page 98 of 109
14 RELIABILITY TESTING
14.1 Reliability Testing Overview

In order to measure the reliability of each implementation, some simple tests were designed to
simulate operation of the production system under load. The following tests were performed:
1. Controlled shutdown: Gracefully bring down a Customer Service load-balanced server

for maintenance, then add it back to the cluster
2. Catastrophic failover: Abruptly down a Customer Service load-balanced server (pull the
plug), then bring the failed server back on-line within the cluster
3. Loose coupling: Power off the Work Order application while running the Customer
Service application (to test the loosely coupled nature of the applications )
4. Long duration: Run the entire system at normal load for 12 hours
report.
14.2 Reliability Test Results

Here are the summary results of these tests, taken from the auditors report.
14.2.1 Controlled Shutdown Test

This test focused on how reliably the system operates when a clustered server is taken offline
manually. A Customer Service server was powered down cleanly, properly removed from the
cluster, and then brought back online and into the cluster. Once the first server was verified to
be back up and handling load properly, the process was repeated for the other Customer
Service server.
Reliability During Controlled Shutdown
Bringing Customer Service server down
Can manually remove server from Yes Yes Yes

cluster for maintenance?
Application continues to operates with Yes Yes Yes

no errors when server is taken offline?
Number of errors thrown 0 0 0
Session state maintained? No Yes Yes
Adding Customer Service server to cluster
Can manually add server to cluster Yes Yes Yes
Page 99 of 109
additional capacity during operations?
Application continues to operates with No Yes Yes

no errors when server is added to
cluster?
Server picks up load correctly? No Yes Yes
Session state maintained? No Yes Yes
Number of errors thrown >58,000 0 0
The WebSphere team did poorly with the RRD implementation, but much better with the WSAD
implementation. The difference is explained much more by the differences in how WebSphere
was configured than in the implementations themselves. By the second round (WSAD), the
team had worked out a more reliable means of handling failover, which centered on skewing
the load balancing that each Apache instance performed to favor its colocated WebSphere
server. See Section 10.6.9 for a complete discussion.
14.2.2 Catastrophic Hardware Failure Test

This test was designed to see how the site would respond when the janitor trips over the power
cord of one of the servers. The power plug was physically removed from one of the clustered
Customer Service servers. At the same time, 1,000 new users were added to the load against
the system.
Reliability / Failover During a Catastrophic Hardware Failure
A server is downed abruptly
Session state maintained? No No Yes
Downed server is brought back online
Server picks up load correctly? No No Yes
Session state maintained? No No Yes
Errors >74,000 >90,000 27
Despite the progress they made in handling a controlled shutdown, the WebSphere team still
could not get the system to handle a catastrophic failure gracefully. The downed server did not
come back properly, leading to a huge number of errors. This was another case where the
team had no explanation for the problem and insufficient time to solve it.
The .NET implementation handled the failover well, with minimal errors.
14.2.3 Loosely Coupled Test

This test focused on the loose coupling of the Customer Service and Work Order applications
via messaging. It began with the Customer Service application generating new ticket
messages and the Work Order application processing those messages. At some point during
the test the Work Order application was shut down. The test was whether the Customer
Service application could continue to generate new tickets?
Page 100 of 109

Loosely Coupled Reliability Test
Work Order application is shut down
Can customers continue data entry in Yes Yes Yes

Customer Service app to generate
new work tickets with no errors?
14.2.4 Long Duration Test

In this test, the entire system was run for 12 hours with a user load of 1,750 virtual users. All
three implementations did well. From the auditors report:
All three implementations were able to sustain an average response time of less than 1
second and no errors were thrown by any implementation at this user load for the 12 hours.
14.3 Conclusions from Reliability Tests

In these tests the .NET implementation proved significantly more reliable in handling service
interruptions than did the two J2EE implementations. The WSAD implementation handled the
controlled shutdown as well as the .NET, and better than the RRD (which crashed when the
downed server was restarted). In the catastrophic failover test, however, only the .NET
implementation recovered, while both J2EE implementations failed to do so.
On the remaining two tests the loosely coupled test and the 12-hour sustained operation all
three performed with equal reliability.
Again, we note that the manageability and reliability testing results show what the teams were
able to achieve given the time each took for configuration and tuning in preparation for the
tests. With additional time and/or involvement by vendors themselves, improved results might
have been achieved.
Page 101 of 109

15 OVERALL CONCLUSIONS
At its heart, this study was a comparison of two fundamentally different approaches to
enterprise software embodied by two different technologies and platforms.
.NET represents Microsofts longstanding approach, which emphasizes these elements:
Focus on the Windows platform to provide tight integration between the OS and the
development framework and tools
Standardization on Visual Studio.NET as the primary development tool for .NET
The Java / J2EE world, on the other hand, emphasizes:
Independence of the J2EE platform from the underlying OS

Open standards
Vendor competition and consumer choices for tools and runtime platforms
How were these approaches reflected in the results of this study?
Developer productivity. Microsofts tight integration approach paid off in the development
phase, where VS.NET and the .NET platform proved more productive than either RRD or
WSAD with the WebSphere platform. Among the reasons:
The position of VS.NET as the premier .NET development tool all but guaranteed an
equivalence between VS.NET experience and .NET platform experience. In other words, a
developer with three years of .NET experience most likely has used VS.NET for three
years, whereas a developer with three years J2EE experience may not have used RRD or
WSAD at all.
VS.NET shared some of the best features of both RRD (visual page design; data binding)
and WSAD (direct coding of business logic; tight integration with the target platform).
Installation and configuration of software. Tight integration paid off here, too, for the .NET
team. Most key elements of the .NET runtime infrastructure (basic application platform, Web
server, load balancer, session server, message server) were already in place with the basic
Windows Server 2003 installation. This fact saved the .NET team a great deal of time and
trouble.
The WebSphere team, by comparison, spent a great deal of time during the development
phase installing the software and configuring it for basic functional tests. They also spent
considerable time overcoming fundamental configuration obstacles, such as patching the Linux
kernel for Edge Server and configuring WebSphere for session replication. The .NET team did
not face such obstacles.
System tuning. The .NET team completed their tuning process much more quickly. One
obvious reason is that they had fewer knobs to turn. A J2EE system has many more moving
parts that interact in many combinations, making the tuning process all the more complex. The
WebSphere team took a methodical approach to tuning that certainly proved more time
consuming.
Performance. In terms of sheer processing throughput, the .NET and WSAD implementations
18
performed comparably. In one particular area, message processing, the .NET version fell far
short, but this is most likely explained by the more complex architecture necessary to work
18
It would be interesting to know to what degree, if any, the different operating systems contributed to the performance results. Unfortunately the
data from this study sheds no light on that question.
Page 102 of 109

around a specific barrier in MSMQ regarding distributed transactions spanning a read from a
remote queue server.
Manageability & reliability. The .NET implementation consistently and reliably handled
service interruptions, both controlled and unexpected. It also allowed the team to deploy
application updates much more smoothly.
The WebSphere team, on the other hand, encountered catastrophic failures that they could not
diagnose or explain sufficiently to overcome. They also found session persistence performing
less than reliably. The team feels they could have solved these problems given more time.
Unfortunately, time was a measured resource in this study.
Overall, by most indicators in this study, the .NET implementation running on Windows Server
2003 was better, in some cases significantly so, than either WebSphere/J2EE implementation
running on Linux. Are these results surprising?
It makes sense that using an integrated out -of-the-box operating system and application
server" framework such as Windows and .NET would have a much lower setup cost than
attempting to integrate multiple products (albeit from the same company) and a third party OS.
Although IBM products have come a long way since 1998, they still have some way to go in
providing the seamless integration Microsoft can offer.
Nor should it surprise anyone that the development productivity results favor the .NET side;
productivity has always been one of Microsofts strong points. Perhaps a more noteworthy
result is that WSAD came much closer to Visual Studio than it would have a couple of years
ago.
Regarding performance, the RRD results should not come as a shock. Any code generation
scheme will always have a difficult time holding its own against tightly written, hand-crafted
code. What is worth noting, however, is how close the WSAD and .NET performance results
came. This outcome basically means that both IBM and Microsoft have done a good job getting
the most out of the hardware resources on which their platforms run. The only truly unexpected
result was that the WSAD message processing (using JMS and message EJBs) performed so
much faster than the.NET.
Given that the J2EE approach to enterprise software is very much about competition and
choices, we might well ask whether the most significant problems encountered by the
WebSphere team could have been helped or eliminated through different choices.
RRD vs. WSAD? RRD, chosen initially for its development productivity offering, did not deliver
that offering in this study. Given that the specification included J2EE technologies beyond the
core, such as a handheld application and a Web service; and given that the WebSphere team
consisted of skilled J2EE developers comfortable with those technologies APIs; WSAD was a
much better choice and, so the team feels, would have compared favorably with VS.NET
strictly in terms of code production.
In terms of producing a high-performance implementation, WSAD was clearly the better choice
over RRD.
Linux? Given the choice of Edge Server for load balancing, the WebSphere team had to patch
and upgrade the Linux kernel to make it work. This process requires skills common to Linux
experts but not necessarily to the average J2EE developer or IT team. There is no question
that Linux added a layer of complexity to the configuration process.
Page 103 of 109

Moreover, the challenges of Linux are separate from those of WebSphere. Linux emphasizes
functionality and control over simplicity and ease of use. Edge Server would have undoubtedly
been easier to configure on Windows, which many enterprises choose apart from their choice
of J2EE or .NET for their applications.
Configuring session sharing and failover? Two of the most vexing problems for the
WebSphere team were their inability to get session replication working reliably and to configure
the system for robust failover. During the manageability and reliability tests they found the
system crashing for reasons they could not explain in sufficient depth to correct the problems.
But knowing that robust, successful WebSphere installations exist in the world, the team feels
certain they could have done so, given enough time.
WebSphere? And finally, of course, one has a choice of J2EE platforms. This study examined
the use of one particular platform in a carefully constructed experiment. These results do not
speak to the qualities of others.
Page 104 of 109

16 APPENDIX: RELATED DOCUMENTS
The following documents are integral to this report. They can be found at
http://www.middlewareresearch.com/endeavors/040921IBMDOTNET/endeavor.jsp:
The ITS system specification was the basis for the two teams development.
The Independent Auditors report (CN2 report of the study results) provided most of the
result data cited in this report.
Page 105 of 109

17 APPENDIX: SOURCES USED
17.1 Sources Used by the IBM WebSphere Team

IBM WebSphere V5.1 Performance, Scalability and High Availability, IBM Redbook.
IBM WebSphere V5.1 System Management and Configuration, IBM Redbook.
WebSphere Edge Server for Multiplatforms, Administration Guide, Version 2.0
WebSphere Edge Server for Multiplatforms, Network Dispatcher Administration Guide, Version
2.0
17.2 Sources Used by the Microsoft .NET Team

Developing High-Performance ASP.NET Applications, from.NET Framework Developers
Guide. MSDN article. Found at http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/cpguide/html/cpconDevelopingHigh-PerformanceASPNETApplications.asp
Transactional Read-response Applications. MSDN article. Found at

http://msdn.microsoft.com/library/default.asp?url=/library/en-
us/msmq/msmq_about_transactions_05wz.asp
Page 106 of 109

18 APPENDIX: SOFTWARE PRICING DATA
This section provides sample prices for the software used in this study.
18.1 IBM Software

This IBM pricing data is obtained from the IBM Passport Advantage Express price book.
Passport Advantage (PA) is a relationship-based discount program: the more software you buy
over time, the larger the discount. It resembles Microsoft Select pricing. PA Express is a
transaction-based discount program, similar to Microsoft Open Value. In other words, the
discount is based solely on the amount of software you buy for a particular transaction. These
are real prices - what the customer would pay if they would approach IBM today and buy from
the web. They are discounted slightly from IBM's suggested retail prices..
IBM Passport Advantage Express Pricing
Extended
Item Product Price/unit Units Price
Base Red Hat Enterprise Linux $1499 / system, 4 servers $5,996

Server OS AS (Standard Subscription) per year
Application WebSphere Application $15,000 /cpu 4 servers $240,000

Server Server ND v5.1 x
Function 4 cpus each
Developer WebSphere Studio $4000/ seat 2 seats $8,000

tool Application Developer v5.1
Total $253,996
The fine print:
Red Hat Linux: http://www.redhat.com/software/rhel/as/
The only version of Red Hat Linux supported on 4-cpu servers is Red Hat Enterprise Linux AS.
Support for this product is available from Red Hat in a Standard or Premium subscription, on a
per-system, per year basis. The Standard subscription includes 9am-9pm telephone support
(US Eastern time), with 4-hour response time. The Premium subscription offers 24/7 telephone
support and 1-hour response time. This pricing configuration uses the Standard subscription.
WebSphere Application Server:

http://www-306.ibm.com/software/info1/websphere/index.jsp?tab=products/appserv
WebSphere Application Server is available in multiple versions and editions. This configuration
required load balancing and failover, which is available in WebSphere Application Server ND.
WAS ND is licensed on a per-cpu basis, and the initial license includes 1 year of product
telephone support and maintenance. This pricing configuration uses prices from IBMs
Passport Advantage Express discount purchasing program, the transaction-based licensing
program from IBM.
WebSphere Studio Application Developer:

http://www-306.ibm.com/software/info1/websphere/index.jsp?tab=products/studio
Page 107 of 109

WebSphere Studio is the brand name for a set of tools available from IBM. For building
distributed applications with database access and EJBs, WebSphere Studio Application
Developer is required. This pricing configuration uses the per-seat license costs offered under
IBMs Passport Advantage Express discount purchase program.
Passport Advantage Express:

http://www-306.ibm.com/software/info/ecatalog/en_US/brand/websphere.html
Taxes and surcharges are extra.
18.2 Microsoft Software

These Microsoft prices are based on a quote from Software Spectrum, Microsofts largest
distributor. They are based on using Open Value in order to get Software Assurance with the
Support privileges.
Microsoft Actual Reseller Pricing Quote
Item Product Price/unit Units Extended Price
Base Server OS Windows Server 2003 $3575.87 / 4 servers $14303.48

Enterprise Edition system
Application Included in Windows -- -- --

Server Function Server
Developer tool MSDN Enterprise $2483.99 / 2 seats $4967.98

Subscription seat, per year
Media CD kit for Windows $23 / kit 1 $23.00

Server 2003
Enterprise Edition
Total $19,294.46
The fine print:
Windows Server 2003:

http://www.microsoft.com/windowsserver2003/howtobuy/licensing/pricing.mspx
The version of Windows Server required for a 4-cpu server is the Enterprise Edition. In addition
to the base Windows Server license, customers enabling authenticated external connections to
Windows Server need to purchase the external connector license. Typically the external
connector is required for e-commerce applications. There is no External Connector required in
this case, since the application did not authenticate incoming requests against Active Directory.
Also, there are no Client Access Licenses (CALs) required in this case. The license of
Windows Server was priced with Software Assurance (SA), through the Open Value Licensing
program, the transaction-based licensing program available through qualified Microsoft
resellers. The license and software assurance plan priced here provides maintenance and
updates, 24x7 web support, as well as telephone support during business hours for these
products, for a period of 2 years.
Visual Studio:
http://msdn.microsoft.com/vstudio/howtobuy/pricing.aspx
Page 108 of 109

Visual Studio is a family of developer tools available from Microsoft. The developers in this
benchmark effort used Visual Studio 2003 Enterprise Architect. The purchase vehicle for this
version of the tool with Microsoft Software Assurance is the MSDN Enterprise subscription.
Taxes and media shipping surcharges are extra.
Page 109 of 109


Vs J2EE

Uploaded by

Copyright:

Available Formats

Vs J2EE

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vs J2EE

Uploaded by

Copyright:

Available Formats

RESEARCH REPORT

Comparing Microsoft .NET

1.1 Research Code of Conduct

Moreover, The Middleware Company is an independently operating but wholly owned

1.3 Why are we doing this study? What is our agenda?

Lower our cost of doing these studies

3.1 The Teams

3.2 The System

3.3 The Implementations

3.4 Developer Productivity Results

Significantly better. Better; uncertain how Worse, uncertain how

3.5 Configuration and Tuning Results

.NET vs. RRD .NET vs. WSAD RRD vs. WSAD

Significantly better. Better; uncertain how much. Uncertain.

3.6 Performance Results

Significantly better on 3 of 4 About equal. Significantly worse on all 4

3.7 Reliability and Manageability Results

.NET vs. RRD .NET vs. WSAD RRD vs. WSAD

Significantly better. Significantly better. Better.

Reliability: Handling Failover

Significantly better. Significantly better. Worse.

Equal. Equal. Equal.

Although sponsored by Microsoft, the study was conducted independently, in a strictly

4.1 How this Report is Organized

Section 4 (this section) introduces the study. It describes:

The goals we tried to achieve with the study

Section 5 covers the studys methodology in detail:

The composition of the two teams

Section 6 details the physical architecture of the system:

The hardware infrastructure used by both teams

Section 8 presents the developer productivity results:

The quantitative results broken out by core development tasks

Sections 9-11 present the deployment, configuration and tuning results:

Section 9 lays out the quantitative results

Section 12 presents the results of the performance tests

Section 13 presents the results of the manageability tests

Section 14 presents the results of the reliability tests

Section 15 presents our conclusions

The final sections, from 16 on, contain various appendices:

Where to find documents related to this report

4.2 Goals of the Study

We subcontracted CN2 Technology, a third-party testing company, to prepare the

4.4 The ITS System

ITS Corporate Network

ITS Durable ITS Work Order

Figure 1. ITS Connected System Diagram

4.5 Development Environments Tested

build the application initially

4.6 Application Platform Technologies Tested

Web application development

4.7 Application Code Availability

The discussion forum for the study is located at http://www.theserverside.com.

5.1 The Teams

J2EE Team Members from The Middleware Company

Development Java J2EE

A 14 7 4 Broad experience with

5.1.2 The Microsoft .Net Team

.NET Team Members from Vertigo Software

B 13 13 4 Experienced in design in the