Hardware Sizing For Software Application
Hardware Sizing For Software Application
Field Project
By
Ganesh Swaminathan
_____________________________
Tom Bowlin
Committee Chairperson
_____________________________
Herb Tuttle
Committee Member
_____________________________
Ray Dick
Committee Member
Date accepted:_____________________________
Acknowledgements
My sincere thanks to Dr. Tom Bowlin, Engineering Management (EMGT) Program advisor and
the chairperson of the advisory committee, for his guidance throughout the project.
I also want to thank Prof. Herb Tuttle and Ray Dick for their valuable time and feedback in
reviewing the project work and helping me with the organization of the material. The EMGT
study has been the best investment of my life. I have enriched my knowledge in so many
different areas. It has been a wonderful experience working with all the faculty members. I like
to extend my gratitude to all of the EMGT faculty members for their efforts in making the
EMGT Program enjoyable and successful.
2
Executive Summary
The most common method is to enter all the workload-related parameters into a modeling tool
that is built using the results of workload simulation on different hardware. The hardware and
software requirements are determined by the mathematical model underlying the tool. Without
performing a test on the actual hardware environment to be used, no sizing can be 100%
accurate. However, in real-life there is a need to predict the capacity when budgeting hardware,
assessing technical risk, validating technical architecture, sizing packaged applications,
predicting production system capacity requirements, and calculating the cost of the project.
These scenarios call for a quick way to estimate the hardware requirements. When dealing with
prospects, there is a need to come up with credible and accurate sizing estimates without
spending a lot of time.
One of the challenges faced by Kronos is the amount of effort and time spent in hardware sizing
for prospective customers. Typically, a survey process collects the workload related parameters
and feeds the sizing tool, which uses the performance model based on benchmark test results to
produce the hardware recommendations. Although this process works great for customers, it is a
time consuming activity due to the collection and validation of large number of independent
variables involved in the current sizing model.
This project makes an attempt to delve into alternate methods for producing quick sizing. By
combining the empirical data collected from various production systems and simple statistical
technique, relationship between sizing factors and CPU rating can be established. This can be
used to create a simple model to produce a quick, easy and credible recommendation when sizing
new customers.
3
Table of Contents
Acknowledgements ....................................................................................................................... 2
List of Figures................................................................................................................................ 5
Chapter 1 -- Introduction.......................................................................................................... 7
What is Hardware Sizing?..................................................................................................... 7
Challenges of Sizing for a New Customer ............................................................................. 8
Problem Definition and Scope ............................................................................................... 9
Current Sizing Methodology ............................................................................................... 10
Flow of the current sizing process ....................................................................................... 14
Chapter 4 - Results...................................................................................................................... 26
References.................................................................................................................................... 38
4
List of Figures
5
List of Tables
6
Chapter 1 -- Introduction
Kronos provides an integrated suite of human resources, payroll, scheduling and time and
labor solutions to companies of all sizes, across a variety of industries. Workforce
Solution helps organizations hire and manage people most effectively by maximizing the
productivity while minimizing the impact on IT. The integrated suite includes almost 15
different modules of applications that can be activated based on the licensing. The
company has more than 40,000 customers world-wide. The diversity of customer
environment, business applications implemented, and the hardware alternatives available
to meet the demand, results in a significant variation in hardware requirements among the
customers. One of the key factors to successful implementation is configuring adequate
hardware capacity so that the system performs well. Sales, Service, Support and
Engineering are the four major divisions in the company. There are well defined
processes that guide a new project from sales to service and support.
One of the challenges identified was the significant amount of unfunded work performed
by the professional services organization in completing the hardware sizing for new
customers. The focus of this project is to examine the sizing methodology and the
elements considered when developing a hardware recommendation for a new customer
and suggest ways for improvement. The ultimate goal is to simplify the process, thereby
eliminating or significantly reducing the amount of unfunded work.
Figure 1: Sizing
The processing power is broken down for hosting each component like the Web Server,
Application Server and the Database Server.
7
• Will the system have good responsiveness, particularly during peak periods?
• Will there be enough capacity for expected growth and variations in use?
• What hardware alternatives are available to meet the expected demand?
• Will the hardware investment be utilized effectively?
• Will application scale adequately for the business?
Almost all software vendors face these challenges when it comes to estimating hardware
for a new implementation. The biggest reason is the limited information that is available
during the early stage of the sales or the planning cycle. Most vendors including Kronos
have developed a sizing questionnaire. During the sales process, a questionnaire or
survey is completed by the prospective customer, which is then used to estimate the
hardware resources required for the project. The questionnaire aims at collecting
information on typical characteristics of the product usage by the customer. The
estimated hardware resource is used by the customer to arrive at the total cost for
implementing the product.
The problem with the survey is that many of the answers are either unknown or is a very
rough estimate. Often these questions are not even answered by the right team. When you
start a sizing study, there are never enough solid facts. The known facts are vague. This
presents a moving target as people change their mind or constantly review and modify the
facts. For example, questions like how many reports will be running during the busy
times or how many concurrent users will be on the system are important criteria that
affect the hardware sizing but these questions may not make much sense for a new
customer. Hence, there is a risk that information collected during the survey can be too
conservative or aggressive.
Inaccurate information when fed into the sizing model produces a not so accurate
estimate of hardware requirements. This poses a challenge later during the
implementation cycle when the estimates are modified. One way many vendors overcome
this challenge is by setting the right expectation of the sizing process upfront. Sizing is
generally positioned as an iterative process that will be refined and repeated several times
during the course of the implementation as more information is available. Though this
process works for many but it does not flow very smoothly. The cost of project is
calculated way upfront during the sales cycle and the ROI (Return on Investment)
calculation includes the cost of the right hardware for the product. Any significant change
8
in the hardware estimates can make the project exceed the allocated budget. So the initial
sizing process has to make reasonable assumptions about the use of the product and
produce fairly accurate sizing requirements.
At Kronos, the sales team works closely with the professional service team to produce the
sizing recommendations. The information collected during the survey is analyzed by
technology professionals. There are several back and forth conversations before the
information is analyzed by the sizing model. The final hardware recommendations
document is provided to the sales team. In other words, there is a significant service
involvement and these hours are usually unfunded.
Following are the challenges in the current sizing methodology for new customers –
• The criteria collected during the initial survey from the new or prospective
customer is not very accurate
A simplified and yet accurate sizing process that can be executed by the sales team with
very little help from the service organization is needed to minimize the unfunded hours.
The primary goal of this project is to research the possibility of a simplified approach for
hardware sizing by studying the current sizing model and application usage
characteristics. The expected outcome is a model that is simple to understand and simple
to use, thereby reducing the effort in collecting numerous variables to determine the
processing requirements.
The scope of the project includes only the Time Management, Leave and Attendance
module within the suite of applications. The sizing process will focus on estimating one
type of resource – CPU power – needed for optimal performance.
This project requires an understanding of the science behind hardware sizing, current
sizing methodology, architecture and the usage characteristics of the application.
Although the problem is specific to Kronos, the methodology and the solution discussed
in this project can be applied to any hardware sizing scenario.
9
Application Architecture
Workforce Central is a 3-tier J2EE application. Users connect using the browser to a middle
tier, which is a web/application server. The application deployed on the middle tier connects
to the database server to retrieve the data. Except for data validation most processing is done
on the middle-tier.
At the heart of the architecture is the totaling engine, a multithreaded servlet deployed within
the J2EE framework. Multiple applets are deployed within the application to allow editing
and retrieval of data from a back-end database. Application can be deployed on multiple
application servers for high availability and load balancing.
A given environment can have several application servers depending on the user load,
hardware available and the failover requirements, and one database server.
The first step in estimating the hardware required for a Workforce Central
implementation is to understand how a customer will use it. The goal of this exercise is to
quantify use cases in a way that can be tested and then modeled.
Use Cases, stated simply, allow description of sequences of events that, taken together,
lead to a system doing something useful. They are a simple and powerful way to express
the functional requirements, or behaviors, of a system. Use Cases provide a way to
express a system's requirements.
10
For new products and features, the Kronos product management group establishes use
patterns and response time requirements for all the product features. Product management
uses customer interviews, review boards, expert industry consultants, and data gathered
during on-site visits.
For existing products, customer data is collected and analyzed by product management
and performance engineering groups. Use-cases are then modified as more experience is
gained with those products across industries. Because the manner of use is generally the
primary driver in hardware sizing, understanding customer use-cases is a continuing
activity that results in improving accuracy of hardware estimation throughout a product’s
life cycle.
When establishing use-cases, it is necessary to determine the timing of actions each user
takes and then determine the concurrency across all predicted users over a given time.
Below is an example of Pay Period Use-Case - a typical activity that occurs during the
end of Pay Period.
11
Finally, each manager runs Exceptions and Time Sheet reports against the
previous pay period, with a 60-second wait between the completion of a report
and the submission of the next report (or logoff)”.
Once the use cases to be measured are known, their impact on system performance can be
identified. In some cases, the use-cases are non-overlapping (that is, they do not occur
during the same time in the course of a normal pay period), and overall scalability is
determined simply by evaluating the single case that places the greatest load on the
system.
In other cases, the overlap of two or more functions requires a test methodology that
takes concurrent load from those functions that overlap and models them to reflect their
impact on the system.
The next step in the sizing methodology is to define the performance characteristics of
the application. The application-specific measures of performance used are:
1. Response Time - the time duration between a request and a response; for
example, the time required to display a new screen after a “back” button is
pressed.
2. Multi-user tests - multi-user tests are used to measure the computing resources
used when increasing load is applied. The primary computing resources
measured are CPU utilization, memory consumption, hard drive input/output
rates, and network utilization. Response time for various activities is captured.
Multi-user tests also provide a measure of system throughput capabilities.
Both single-user and multi-user tests are generally automated so that results (metrics
collected) are repeatable. Many repetitions of tests on both similar and varied hardware
12
are run to collect data for the sizing models. The results of the benchmark tests are used
to develop a performance model.
The final step is to build an algorithm or model that computes the quantity of system
resources consumed when the product is used and configured in a particular way. The
way a product is used and configured is modeled by defining a set of independent
variables. Independent variables are quantifiable inputs or configuration parameters for a
system.
Analysis and independent variable testing, also called sensitivity testing, is performed to
determine the effect of various independent variables on the consumption of system
resources. Examples of system resources are CPU computing power, RAM memory, hard
drive memory, and network bandwidth. The consumption of these resources is known
from the results of the product performance and benchmark tests. A hardware sizing tool
is developed from the performance model and hardware choices are incorporated into the
tool. The tool is built to produce the hardware recommendations with a given set of
workload parameters based on the performance model. Appendix A lists most common
independent variables used by the sizing tool to produce hardware recommendation.
13
Flow of the current sizing process
Use-cases collected from the customers are based on a survey. The survey includes a
variety of questions about the customer’s environment, hardware purchase standards,
support requirements and the proposed product usage. The service organization
generates a “Site Survey” form that is used to collect data about prospective
implementations. This survey provides the assurance that a consistent methodology is
used to collect data necessary for a good sizing estimate. Usually, a Kronos technology
professional fills out a site survey during a series of interviews with a customer. From a
completed site survey, the consultant uses the hardware sizing tool to determine a
hardware sizing estimate and presents the results in writing to the customer. This process
can be completed in the range of a few days for a simple implementation to a number of
weeks for larger installations with complex requirements.
Sizing tool uses performance models established internally based on a variety of testing
with different use-cases; model is constantly validated with customer data collected from
live sites as well as test results from benchmarks run at independent laboratories on a
variety of hardware platforms.
14
Chapter 2 – Literature Review
One of the problems is the confusion between the terms ‘Sizing' and ‘Capacity Planning’
(Larry Pedigo, 2004). Sizing is best described as the process of estimating the hardware
requirements of a planned application based on technical descriptions of the customer’s
needs. Capacity Planning is the process of measuring resource requirements from existing
applications, and projecting the amount and kind of hardware required to support larger
workloads.
According to Larry Pedigo (2004), most of the available literature on ‘Sizing’ assumes a
thorough knowledge of application. Most white papers and articles on ‘Sizing’ are
actually focused on capacity planning. In a brand new project, before the first server is
purchased, customer has no idea what resources will be required by the new application.
Capacity planning guidelines will be of little use to the customer. The focus is more on
hardware sizing based on ‘rules of thumb’ and guidelines based on the customer’s
technical needs. According to the authors of the e-book on ‘IBM @Server pSeries Sizing
and Capacity Planning (2004)’, capacity planning is strongly related to sizing. Capacity
Planning is part of the resizing task that happens when the actual performance data from
the production system is used as input for sizing.
The article "The Ratio Modeling Technique (1997)" by David Cook, Ellen Dudar and
Shallahamer Craig introduces a new calculation based method for performing capacity
predictions. Author defines ratio modeling technique as set of steps to perform a quick
sizing prediction. This technique shows an alternate method to predict the production
system load based on relationship between process categories (e.g. Batch processes) and
system resource (e.g. CPU). This paper shows how a simple technique can be used for
real situations that demand predicting capacity requirements at a low precision level.
Author also quotes that this technique has been used and validated by many of Oracle’s
largest customer sites.
15
different methods - Estimation, Summation and Process method to produce capacity
prediction at varying confidence levels. In Summation method, author suggests
Regression Analysis on metrics gathered from production data to produce a fairly high
confidence model.
Available literature on sizing often falls short on tools needed to perform the sizing
estimates. Regression analysis is a simple method for investigating relationships among
variables. The book “Regression Analysis by Example (2006)” by Samprit Chatterjee and
Hadi Ali provides essentials of regression analysis through practical applications. While
this book provides only a review of the basic principles of regression, it covers many
other topics, mainly problems occurring during a regression analysis. Diagnostic tools as
well as methods to overcome the problems are discussed. Throughout the book there are
plenty of examples to demonstrate the ideas presented. Examples are derived from a wide
range of disciplines and present real problems. In summary, this book contains a
thorough overview of diagnostic tools for regression models that are easily understood.
The White paper "Sun Server Scalability and Sizing Guide (2002)" from Sun
Microsystems, Inc. provides a simple formula or method of sizing Sun Enterprise server
based on benchmark testing results. This paper shows how benchmark results can be used
for predicting CPU, Memory and disk capacity configuration requirements.
The White paper "Windows Sever 2003 Terminal Server Capacity and Scaling (2003)"
from Microsoft contains analysis, results, sizing guidelines, and testing methodologies for
the Terminal Services component in the Windows 2003 Server family. This methodology
specifies profiles for Light, Medium and Heavy users, allowing the vendor to establish
performance characteristics for these three classes of users. A model is created based on
the user types to arrive at the server sizing requirements.
Larry Pedigo in his white paper “Sizing Oracle on Microsoft Windows and Dell
PowerEdge Servers (2004)” defines a systematic approach to sizing. The key steps
outlined in this paper are:
16
“How to Effectively Size Hardware for your Portal implementation (December 2004)” by
Jason Pepper, Jack Sun, and Biswajit Nayak is an Oracle White Paper article that
explains the overall capacity planning methodologies available for Oracle Portal
application. Though the calculations and the metrics obtained are more specific to the
portal application, the overall sizing and the estimation methodology used can be applied
to any application.
There are three primary approaches for sizing hardware for a software implementation.
• Algorithm based
• Example based
• Pilot based
Regardless of the approach, a model is needed for predicting the capacity. Modeling is an
integral part of planning capacity (Shallahamer Craig, 2002). Models can be broadly
classified into either simulation models or mathematical models. Mathematical models
require input from either a production system or a simulated production system. If
mathematical models are being used, unless the observation is from a production system,
some form of workload simulation must occur to feed the mathematical models.
Constructing simulation and mathematical modeling tools can be very difficult and time
consuming. Determining which model or models to use is dependent upon the method
chosen and the data available.
Algorithm Based
In this approach, an algorithm or process accepts input from the customer (e.g. user
counts, number of reports run, total page requests, number of transactions, etc.) and
attempts to deliver a processing requirement. This is probably the most commonly
accepted tool for delivering sizing estimations. Unfortunately, this approach is generally
the most inaccurate. When considering a logical n-tier enterprise class implementation,
the number of variables involved in delivering a calculation for realistic sizing requires
numerous variables that calculations become so complex and sensitive. A small variation
in input variables can produce inaccurate results. The majority of the vendors use some
form of algorithm based approach to recommend hardware for a new implementation.
Benchmarking the application and modeling the data from the testing is probably the
most common approach in arriving at an algorithm. A benchmark is defined as a set of
programs that are run on different systems to give a measure of their performance. The
term performance becomes very crucial and needs to be defined before the testing. In
order to understand what needs to be measured during the testing, a good understanding
of application architecture and the usage characteristics is absolutely essential.
Performance metrics are defined and tools are developed to trace and record the metrics
during the testing. Appendix B lists some of the popular metrics used by various
application benchmarking.
17
There are two kinds of benchmark testing – standard and application specific. A standard
benchmark usually includes a fixed set of programs that are run on different systems to
produce a single figure called rating, which is then used to rank system performance. This
type of testing has little relevance when the characteristics of the standard programs are
different from the real application. Even in cases where the application characteristics
match the benchmark programs, many take a cynical view the results.
Several standard institutes are in charge of the process of defining and distributing
standard benchmarks. Some of the well-known standard entities are: SPEC (System
Performance Evaluation Corporation) and TPC (Transaction Processing Performance
Council). SPEC defines a wide variety of standard benchmarks, ranging from high
performance computing to network file servers. Among those, the SPEC CPU benchmark
suite is probably the most widely used benchmark in computer literature. SPEC is a non-
profit corporation formed to establish, maintain and endorse a standardized set of relevant
benchmarks that can be applied to the newest generation of high-performance computers.
SPEC develops benchmark suites and also reviews and publishes submitted results from
their member organizations and other benchmark licensees.
SPEC publishes results for most widely used CPUs in the industry. Below is an example
of SPECInt 2006 results.
A rating like this helps to compare different systems. However, it should be noted that the
rating was developed using the standard set of programs. There are different suites of
programs used for testing. SPECInt is a rating based on integer suite. There is one called
SPECFloat, which is based on the Floating-point suite. SPECMail is a benchmark for
Mail Server programs. Depending on the characteristics of the real application, a
particular rating method can become more relevant over other ratings.
The other type of benchmarking is more application specific and is used by most software
vendors to produce some form of hardware sizing recommendation. There are several
different approaches, methods and tools available to conduct this type of testing. This
also suggests that many believe that systems should be measured in the context of
applications in which end-users are interested. The basic idea is to separate
characterization of an application from that of the underlying platform and combine the
two characterizations to form a prediction of the application's performance. As the
application of interest is incorporated into the "benchmarking" process, the resulting
performance metrics reflects the expected behavior of the application on the given
platform. Whatever is the approach or the tool used vendors, they all embrace the
following steps.
18
• Define Workload Characteristics
• Define Performance Targets
• Conduct Simulation
• Define & Collect Metrics
• Develop Performance Model
• Develop Sizing Tool
Specific performance targets are established during the benchmarking process. For
example, Order Entry response time under three seconds or maximum CPU utilization
allowed is 80%. Measuring against specific performance goals also helps to verify
whether the application meets the designed performance expectation. If the application
does not meet the designed goal, corrective action is usually taken. A series of tuning or
optimization steps are carried out and the focus is on identifying and eliminating bottle
necks to improve performance. Sizing or performance factors emerge during this iterative
testing process. For example, the number of images of 20k size extracted is around
20,000 in three seconds and the time increases if the average size of the images is around
40k, then the average size of the image becomes one of the sizing factors. Several
performance terms are used. Appendix B lists some of the most common terms.
The principle of a benchmarking is to simulate the behavior of real users with "virtual"
users. The Simulation tool is used to record the behavior of the application under the load
and give information on the virtual users' experiences on a given hardware.
The following table (Table 1) shows sample results from a multi-user test exercising a
use-case. These tests were run to determine scalability characteristics. The 50 concurrent
users test was run with a single web application server; the 150 concurrent users test was
run with three web application servers. As can be seen from the results below, the
application demonstrates excellent scalability. Response times were measured at the
client. All response times are shown in seconds.
19
Total Users Processed per Hour 157 466
Peak Concurrent Users 50 150
User Action 95th Percentile Response Times
User Session Time 1,148 1,159
Logon Action 1.141 1.828
Next Applet .27 .36
Save Applet .761 .969
Report for 5 employees .03 .032
System Performance Data
Database server CPU utilization 14.03% 47.01%
Database server CPU seconds/user 6.44 7.26
Database server network bytes/second 356,891 963,156
Web server CPU utilization (per Web server) 45.77% 46.49%
Web server CPU seconds/user 21.02 21.55
Web server network bytes/second 288,619 282,174
Table 1: Sample Benchmark Results
Sensitivity testing is performed to determine the effect of the factors on the consumption
of system resources. Examples of systems resources are CPU computing power, RAM
memory, hard drive memory, and network bandwidth. Examples of the sizing factors are
number of concurrent users, number of concurrent page requests, number of documents
stored in the database, product features used etc. A performance model is built based on
the correlation between the sizing factors and the resource consumption.
Testing is performed on a finite number of hardware with varied specifications (like two,
four, eight CPUs) from different vendors or operating environments. Generalization rules
are applied for all other hardware and specifications. This is typically done by the sizing
tool. Here is an example of how the sizing tool might use the performance results to
arrive at hardware requirement.
1000 users of a sales and distribution department and 1000 users of a finance
department plan to use the ERP system on 400 MHz systems. For performance
and security reasons (for instance, head room for batch jobs), a maximum CPU
utilization of 67% is requested by the customer.
First the tool maps the "Finance" users to "Sales and Distribution" users, so we
get 500 "Sales and Distribution" users instead of 1000 "Finance" users.
20
The scalability tests have shown that a 300 MHz CPU can handle the load of 40
"Sales and Distribution" users. Therefore, a 400 MHz CPU handles 50 users, and
1500 users need 30 CPUs. Using the customer requirement for 67% CPU
utilization, tool configures a system containing 45 CPUs.
The software configuration consists of the database and the ERP system. Both
components could also run on separated machines. To configure them, the tool
uses the ratio of the CPU consumption of both the database and the ERP system
(33% to 66%) determined through the Scalability Tests, and adds one CPU on
each machine to handle the network traffic. As a result, we configure 16 CPUs for
the database and 31 CPUs for the ERP system.
Example Based
An Example-based approach requires a set of known samples to use as data points along
the thermometer of system size. The more examples available, the more accurate is the
sizing. By using samples collected from real world and through internal deployments,
customers can be assured that the configurations proposed have been implemented before
and will provide the performance unique to the proposed implementation. The challenge
is with maintaining an accurate database of real-world hardware configurations and
determining configurations that produce optimal performance.
Pilot Based
A proof of concept or pilot based approach offers the most accurate sizing data of all
three approaches. Performance metrics is defined and collected from the customer
environment during the pilot implementation. Number of users and the activities are
simulated on the hardware provided by the customer. Based on the results, the exact
processing requirements are arrived. By far this is the most expensive and most time
consuming method. Moreover, this approach requires the customer to have manpower,
hardware, and the time available to implement and validate the solution. Findings are
analyzed based on the test results. Because the environment is simulated as close as
possible to the real solution, the results are highly accurate.
21
Chapter 3 - Procedure and Methodology
The objective is to minimize the labor effort involved in the sizing for a prospective
customer at the same time produce a reliable sizing recommendation using the known
information. Majority of the efforts expended during the current sizing process is directed
towards collecting, validating and making reasonable assumptions about the independent
variables that affect the required processing power. While the current process is labor
intensive, it is a proven and well established methodology with very high credibility.
The approach taken in this project is calculation-based. However, instead of using a long
list of independent variables and complex calculation, the overall philosophy is to present
a simple to use and simple to understand algorithm based on empirical study. When all
the independent variables cannot be accurately determined, it is not possible to perform
high confidence capacity predictions. A quick and easy way to calculate hardware costs is
needed when dealing with prospects for project budgeting, assessing technical risks and
validating alternative solutions.
22
By analyzing the existing customer’s data on the independent variables and the hardware
used, a reasonable assumption about some of the lesser known variables can be
established. Also, a simpler model can be developed by analyzing the correlation between
the independent variables and the processing power used. Since the model is developed
using the data from the real world environments, a high level of credibility can be
established in the results.
Independent Variables
The first step in this process is to determine the factors or the independent variables that
can affect the processing power (CPU) of the hardware. The factors identified in this step
will dictate which data to collect from the customers. This is most crucial step because
without the useful data, it is not possible to come up with a predictive model. Knowledge
of the hardware sizing process, application architecture and experience with the current
performance model are the essential ingredients for this step.
One of the problems with selecting independent variables for forecasting is the problem
of collinearity – that is, severely correlated independent variables. If independent
variables themselves are correlated then it becomes difficult to understand variation in
processing power in relation to independent variables. Usually, if the variables are
strongly correlated, whichever independent variable happens to be entered first typically
accounts for most of the explainable variation in the dependent variable (Hildebrand and
Ott, 1998). Also the standard error of the estimate can be very high with correlation
variables. The objective is not to separate the predictive effects of every single variable
but rather to arrive at a prediction equation. The model should have high predictability in
explaining the variation with low standard error so it can be used with confidence.
23
Below independent variables can be determined accurately during the sales process.
Customers usually know the products and features they are interested and how many
employees and supervisors will use the product.
Experience with the application has shown that number of concurrent users, number of
reports run, and the duration of activity are some of the major factors affecting the
utilization of the system. Besides the known variables, it is absolutely critical to collect
data on the system usage to validate the workload characteristics and make reasonable
assumptions about the unknown variables. Usage of the system with respect to the above
variables and their correlation to resources used will be analyzed to understand the sizing
needs.
Data Collection
This step involves collecting the below from a sample of customer sites –
Data gathered is never perfect. If the data is used without close examination, model
predictions would be seriously flawed. Moreover, we cannot make an assumption that
hardware used by a customer produces optimal performance. In order to avoid misleading
predictions, we follow the below simple rules to collecting and scrubbing data.
4. Logs should be analyzed to find out how the application is used. Making an
assumption that all licensed users are using the application can result in a flawed
model. Actual usage should reflect the typical workload characteristics. Using
24
data from an environment that is used in a unique way can also produce skewed
results.
25
Chapter 4 - Results
Logs from the production environments were analyzed. Application web logs have the
URL (web site address) and the time of the activity. Based on the logon and logoff
activity, the number of concurrent users and average duration of session is calculated.
System has a default inactivity timeout of 30 minutes. Any session missing logoff URL is
taken as 30 minute session. Going through the web logs to collect all this information is a
very tedious process. A script is used to search through logs for specific URL and count
the occurrences. Analyzing the web logs from one of the production systems for the
entire duration of the pay-cycle provides the following information on the workload
characteristics.
The system (application server) is heavily utilized on a Monday. The peak utilization
window is around a 4-hr window. In the below graph (Figure 3), there is very high
activity on 5/19, 5/26 and 6/2 for a period of 4 hours. This data helps to validate the
assumption about the typical workload characteristics of the application. The typical
characteristic of the application is that the heaviest activity happens during the payroll
processing day. The below graph confirms the typical workload.
The graph (Figure 4) below shows the maximum concurrent users on the system
(application server) during the busiest period. The environment is licensed for 1800 users
and about 196 concurrent users are on the server during the peak window.
26
Figure 4: Concurrent Users Analysis
Given a 4 hour window and 1800 users, if we assume a uniform arrival rate, a user would
arrive every eight seconds (4*60*60/1800). In any 21 minutes (average session time per
user), there will be 158 (21*60/8) users on the application. We assume 100% overlap
with the users who arrived in the previous 21 minutes. Therefore, the maximum
concurrent users on the application are at the most double the number of the users who
can arrive in a given 21 minute interval. In this environment, it is safe to expect up to 316
concurrent users. So if we know the total number of users and the average session time
per user, using this calculation we can predict the approximate maximum number of
concurrent sessions.
We can also calculate the number concurrent users if we assume that users arrival rate
follows normal distribution. Given a four hour busy window, total number of users and
the average session time, we can arrive at the number of concurrent users during peak
usage.
In all the environments monitored, the number concurrent users did not exceed 1.5 times
the number of users who can arrive in a window of time equivalent to the average session
time. The formula to calculate the concurrent users is just an effort to make a reasonable
assumption given the licenses purchased. If a large sample is used for the study, the
formula for arriving at maximum concurrent users can be established with more than
95% confidence level.
In the above environment, the hardware used for the application is two servers with Intel
Xeon 2 Ghz (1333 Mhz) with Dual Core. The average CPU utilization during the peak
window was below 37%. We used our internal model to convert the CPU utilization into
SpecInt2006 rating of 33. Our internal model computes demand with the goal of keeping
the average utilization below 50%. In other words, the SpecInt rating computed reflects
the CPU power needed in order to keep the average utilization around 50%.
27
We can also quantify the server demand by getting the SpecInt rating for the processor
from Spec.org. While this may appear tricky and even extremely difficult, it is one of the
reliable methods to convert different CPU models to a common scale for meaningful
comparison. Our internal model uses the SpecInt2006 ratings published. Whatever is the
method used, we just need to be consistent in applying that to all the observations.
If we assume uniform arrival within the six hour window, users arrive at the rate of one
for every 216 seconds (6*60*60/100). In any 40 minute (average session time) period, we
can expect 40*60/216 (12) users to arrive. With 100% overlap with the previous 40
minutes, we can expect at the most 24 concurrent users. The application server hardware
used is a Dual CPU 2 Ghz (400) with the SpecInt 2006 rating of 3.97.
Logs from different production environments were analyzed. The following statistics are
gathered from each environment.
1. Peak Duration Window – Duration of time when the application was highly
active.
2. Number of actual concurrent users during the peak utilization
3. Average Session time
4. Reports run per user
5. CPU utilization on the application and the database server
6. Hardware used converted to SpecInt2006 rating.
7. Number of Licensed Employees
8. Number of Licensed Supervisors
9. Database Platform
10. Modules implemented
28
Table 2: Data collected
All the independent variables listed above affect the utilization of the application and the
database hardware. We are looking for a relationship between the independent variables
and the CPU demand so that the computing resource on the application and the database
server can be predicted. Benchmark testing has shown that performance varies linearly
with workload up to a certain limit (100,000 licenses). For any application, this is true.
Beyond a particular point, adding more resources does not improve the performance.
Since the relationship is linear in nature and there is a strong correlation between the user
activity and the resources deployed, multiple regression technique is used in this study.
We use forward-selection stepwise regression to select independent variables one by one.
The first variable included is the one that has the highest R square value for predicting the
application server demand. The second variable included is the one that, when combined
with the first one, produces the highest adjusted R square value. The third variable
included yields the highest adjusted R square value. This process is continued to include
all variables that increases the probability of predicting the server demand. We use
adjusted R square to see if the added independent variable improved the predictability of
the model.
The graph (Figure 5) below shows the linear relationship between the application
computing resource and the concurrent users.
29
App Server
90
80
70
60
CPU Ratings
50
App Server
40
30
20
10
0
0 50 100 150 200 250 300 350
Concurrent Users
Below (Table 3) is the R Square value from the single regression between the application
server demand (CPU rating) and the individual variable.
Variables R Square
Concurrent Users 0.94
Managers 0.77
Database 0.56
Leave 0.2
Peak Duration 0.05
#Reports 0.04
Average Session Time 0.01
Employees 0.0079
Schedule 0.003
Attendance 0.0005
Table 3: Application Server CPU and Variables Correlation
The above table (Table 3) clearly shows that there is a very high correlation between the
number of concurrent users and the CPU resource used on the application server.
Multiple Regression is performed by adding the next independent variable (Mangers) to
30
the Concurrent Users. This process is repeated and the R Square (Coefficient of
Determination) and Adjusted R Square values were recorded.
The highest coefficient of determination and adjusted R Square was produced when all
the variables were included in the model. In regression analysis, R Square is a good
measure of relationship between the independent and the dependent variable. But when
there are several independent variables and the analysis is performed on a sample of data,
R Square Adjusted is a much more realistic measure of the correlation.
Below (Table 4) is the output of the multiple regression using all independent variables
for predicting the application server demand. The model has very high R Square and
Adjusted R Square, which leads to the conclusion that variables included have very high
predictability of the dependent variable. Also the Significance F value less than 0.05
indicates that the regression is significant at 95% confidence level. The F test merely
indicates that there is a good evidence of some degree of predictive value somewhere
among the independent variables (Hildebrand and Ott, 1998). It does not give any direct
indication of how strong the relation is, or any indication of which individual independent
variables are useful.
31
Table 4: Application Server Regression Output
32
Table 5: Application Server Regression Output 2
33
In the output shown in Table 4, there are several independent variables with the p-value
greater than 0.05 or even 0.1. A p-value is a measure of how much evidence we have
against the null hypothesis that the individual independent variable has no additional
predictive value over and above that contributed by the other independent variables. In
Table 4, the independent variables ‘Scheduling’, ‘Oracle’, and ‘Av. Session Time’ have
higher P-value (greater than 0.1). Adding these variables after including all other
independent variables would not improve the prediction. Table 5 is the output of
regression after removing the statistically insignificant variables.
Below (Table 6) is the R Square value from the single regression between the Database
Server demand (CPU rating) and the individual variable.
Table 7 shows the output of multiple regression for predicting the database server
utilization. Based on the sample collected, it is hard to reject the null hypothesis that
independent variables have no predictive power. Significance F and the P-values are
higher than 0.05 or even 0.1. The width of the interval for the variables is also large. All
the intervals (Lower & Upper 95%) include 0.
34
Table 7: Database Regression Output
35
Chapter 5 – Key Findings
At first glance, the model results may not seem like a discovery of any significance, but
looking at the equation closer, we can see the wide applicability of this modeling
technique.
For predicting the application server demand, an equation like below can be used –
The values are based on the information gathered from real life production environments,
an extremely realistic and better than in-house benchmark values. Gathering multiple data
points from a real life production environment will provide all information we need to
derive the unknowns. By understanding the production workload and carefully analyzing
the logs, we can arrive at reasonable assumptions on the unknowns. In the above
equations, the only unknowns are -
We already know how to compute the approximate number of concurrent users using the
formula based on the average session time and the duration of peak window and the total
number of users.
Given a normal workload characteristic, we know the value for peak duration window is
in the range of four to six hours, the value for average session time is in the range of 17 to
45 minutes and the value for reports per user is in the range of one to five. Depending on
whether we want conservative, average or aggressive estimate, we can use the minimum,
average or the highest values for these variables in the above equations.
A simple spreadsheet calculator can be built which will take all the known variables and
the type of estimate (conservative, average or aggressive) to produce the demand for the
application and the database server.
36
Chapter 6 -- Suggestions for Additional Work
The samples collected in this study did not target any particular industry or market. Also, the
samples collected included a wide range of products and users. Although the model uses
real life data, it has some known defects.
1. The number of samples used in the study is less than 20. We need at least 20 to 30
samples to establish a credible model.
2. There are three variables whose values have to be assumed based on the range of
data collected.
3. The database utilization could not be predicted with higher confidence level based
on the sample data collected.
Model can be improved by collecting 20 samples from each industry vertical (like
education, government, manufacturing, healthcare etc). Within each vertical industry, we
need to collect samples based on customer size (1000, 5000, 10000 user licenses). This
will really help in understanding the workload characteristics within each industry and
customer size (small, mid-market, enterprise etc). A more realistic assumption can be
made on the unknown variables. However, a study like this would require considerable
effort and time.
Another way to simplify sizing is to collect enough samples (20-30) in each range of
customer size. For example, collect 20 samples of hardware from customers with less
than 1000 employees. The maximum CPU rating in each category (like 1000, 3000, 5000
users etc) will serve as the hardware recommendation that will be used for the hardware
planning during the sales cycle for a new customer in the same category. This method
will not only be quick but will also avoid any kind calculation. Moreover, because we
recommend the highest hardware resource used in each category, we can be assured that
there is enough room for growth or scalability in the solution.
The model established in this study is not conclusive. But it helps to provide enough
information to begin exploring the possibilities, usefulness and appropriateness of
theoretical models based on empirical or production data to arrive at quick sizing. Models
to predict CPU, Disk Space, I/O throughput, Memory requirements can be created by
carefully collecting metrics from production environments. Once the metrics are defined
and carefully collected, a simple estimation technique like regression analysis can be
performed to build a fairly high confidence model.
The log analysis technique (to collect the independent variables) used in this paper can be
applied in other scenarios (like performance troubleshooting) to validate the assumptions
made during sizing and also to understand the application usage characteristics.
37
References
Benton Gibbs G., Jerry M. Enriquez, and Nigel Griffiths, eds. (March 2004). IBM
@server pSerries Sizing and Capacity Planning (Ibm.com/Redbooks).
http://www.redbooks.ibm.com/redbooks/pdfs/sg247071.pdf (accessed April 1 2009)
Chatterjee, Samprit and Hadi S. Ali. 2006. Regression Analysis by Example. San
Francisco: John Wiley & Sons.
Cook, David R., Ellen M. Dudar, and Shallahmer A. Craig. (1997). The Ratio Modelling
Technique. http://www.geocities.com/mtarrani/CapacityRatioModeling.pdf (accessed
March 20, 2009)
Microsoft Corporation (White Paper). (June 2003). Windows Sever 2003 Terminal
Server Capacity and Scaling.
http://www.microsoft.com/windowsserver2003/techinfo/overview/tsscaling.mspx
(accessed April 2, 2009)
Hildebrand, David H., Lyman R. Ott. 1998. Statistical Thinking for Managers. Belmont:
Duxbury Press
Pedigo, Larry. (2004). "Sizing Oracle on Microsoft Windows and Dell PowerEdge
Servers (White Paper sponsored by Microsoft, Dell and Oracle)".
http://www.dell.com/downloads/global/solutions/Oracle%20on%20Windows%20Sizing.
pdf (accessed April 5, 2009)
Pepper, Jason, Jack Sun, and Biswajit Nayak. (2004). How to Effectively Size Hardware
for your Portal implementation.
http://www.oracle.com/technology/products/ias/portal/pdf/oow_10gr2_1337_pepper.pdf
(accessed March 30, 2009)
Sun Microsystems, Inc (White Paper). (2002). Sun Server Scalability and Sizing Guide.
http://www.sun.com/servers/white-papers/scalability-sizing-guide.pdf (accessed March
31, 2009)
38
Appendix A - Criteria for Hardware Sizing
39
Appendix B - Performance Terminology
Concurrency
The ability to handle multiple requests or users simultaneously. Threads, processes are
examples of concurrency mechanism.
Contention
Competition for resources on the servers hosting the application and the database
Cluster
A group of machines that handle workload in a distributed manner, providing redundancy
and failover
Failover
A method of allowing one machine or set of machines to provide an alternative execution
arena for a task, should the original machine(s) fail.
Hit
The subsequent request for a snippet of content from the web application
Latency
The time that one system component spends waiting for another component in order to
complete the entire task. Latency can be defined as wasted time. In networking contexts,
latency is defined as the travel time of a packet from source to destination.
Page request
The unique request for a page defined inside the application. A figure specifying page
requests per second is the measurement of the load expected for the architected
solution given a common element of web content.
Response time
The time between the submission of a request and the receipt of the response
Scalability
The ability of a system to provide throughput in proportion to, and limited only by,
available hardware resources. A scalable system is one that can handle increasing
numbers of requests without adversely affecting response time and throughput. A system
exhibits good scalability when the amount of system resources consumed increases at the
same rate as the load is increased without adversely impacting response times or
throughput.
Service time
40
The time between the receipt of a request and the completion of the response to the
request
Think time
The time the user is not engaged in actual use of the processor.
Stream time
The time taken to transmit the response to the requestor
Throughput
The number of requests processed per unit of time.
Wait time
The time between the submission of the request and initiation of the request
41