Perf-Eval-lecture Notes
Perf-Eval-lecture Notes
Perf-Eval-lecture Notes
1.0 Introduction
.
1.1 Goals of Performance Evaluation
.
2.0 Measurements
.
2.1 Measurement Techniques
.
2.1.1. On-chip Performance Monitoring Counters
2.1.2. Off-chip hardware measurement
2.1.3. SpeedTracer from AMD
2.1.4. Software Monitoring
2.1.5. Microcoded Instrumentation
2.2 Performance Metrics
.
2.2.1. Characteristics of Good Performance Metrics
2.2.2. Commonly Used Performance Metrics
3.0 Simulation Techniques
.
3.1 Simulation Methodology
.
3.1.1. Planning Phase:
3.1.2. Modeling Phase.
3.1.3. Verification and Validation (V&V)
3.1.4. Applications and Experimentation
4.0 Workload Characterization and Benchmarking
.
4.1 Workload Characterization
.
1
4.2 Benchmark
.
4.2.1. Micro-benchmarks
4.2.2. Macro-benchmarks
4.2.3. The Program Kernel
4.2.4. Application benchmark
1.0. Introduction
To understand Performance Evaluation in Computer System, we need to first understand what
Performance is, in the field. Performance in Computer System can be seen as high throughput
(no of work units done per time), short response times, reliability, etc. It can also be defined as
the extent to which a computer system meets the expectation of those for which it is meant, that
is, how effective resources are utilized in order to fulfil the set objectives.
manner. Considering the complexity and quick revolution of present-day computer system, there
is need to discover new and effective tools that will assist in first understanding the performance
of existing systems, and then predicting the performance of the ones being designed. This will in
turn provide quantitative answers to questions that arise during the life cycles of the system.
The Performance Evaluation of an existing system is carried out by the use of the measurement
technique, and it is possible for the required experiments and testing to be conducted on it.
However, in a situation where the system does not exist or when conducting the measurements is
The benefit of studying performance evaluation must be assessed against its cost and the cost of
the system. In practice, detailed performance evaluations are done by product development units
2
(system design). During system operation, it is not economical (except for huge systems such as
The goals in studying performance evaluation are generally either a comparison of design
alternatives, i.e. quantify the improvement brought by a design option, or system dimensioning,
i.e. determining the size of all system components for a given planned utilization. These goals
are itemized as
follows:
1. Comparison of alternative system designs: The aim here is to compare the performance of
different systems or component designs for a specific application. One example is deciding
the best ATM switch for a specific application or the type of buffering used in it. Other
examples are, choosing the optimum number of processors in a parallel processing system, the
type of interconnection network, size and number of disk drives, and type of compiler or
operating system. In this case, the objective of performance analysis is to find quantitatively
2. Procurement: The goal in this instance is to find the most cost-effective system for a specific
application. It is important to weigh out the benefit of choosing an expensive system that
provides a little performance enhancement when compared with a less expensive system.
3
3. Capacity planning: This is a major interest for system administrators and managers of data
processing installations. It is carried out to ensure that adequate resources will be available to
(d) forecast the performance under different configurations and alternatives, and
4. System tuning: Here, the focus is to find the set of parameter values that produce the best
system performance. For example, disk and network buffer sizes can impact the overall
performance. Finding the set of best parameters for these resources is a challenge but is
5. Performance debugging. There are situations in some applications, where the application or
control software of the system is working but slow. It is therefore important to discover
through performance analysis, the reason why the program is not meeting the performance
expectation. The problem is then rectified as soon as the cause of the problem is identified.
6. Set expectation. The purpose of this is to allow system users to set the appropriate
expectations for what a system actually can do. This is vital for the future planning of new
4
7. Recognize relative performance. The objective in this situation is to quantify the change in
performance relative to past experience and previous system generations. It can also be to
2.0. Measurement
The measurement of a system is concerned with monitoring the real system. It can be largely
divided into t
Hardware monitoring
Software monitoring
Hybrid monitoring
interfaced with the system being measured in non-intrusive way. The main advantage of this
technique is that the measurement does not interfere with the normal functioning of the
monitored sys
tem and fast events can be captured. However, it is expensive and has difficulty in carrying
Software monitoring: This technique uses some measurement code either embedded in the
existing software or as a separate set of routines. The main advantage of this technique is its
simplicity and flexibility. The disadvantages are that it may seriously interfere with the
normal functioning of the system and cannot be used to capture fast occurring events. This
technique is most appropriate for obtaining user program and operating system related
5
information, such as the time spent executing a particular routine, page fault frequency, and
Hybrid monitoring: This technique draws upon the advantages of both hardware and software
monitoring. All relevant signals are collected under software control and sent to another
machine for measurement and processing. The advantages are that it is flexible and that its
domain of application overlaps those of both hardware and software monitoring. The
disadvantages are that the synchronization requirements between the measuring and
measured system may cause some interference, and it is expensive and cumbersome to obtain
The purpose of Performance measurement is to understand systems that are already built or
(ii) to tune the application if source code and algorithms can still be changed.
understanding the applications that are running on the system and the match between the
6
Some Measurement Techniques are discussed below.
All state-of-the-art high performance microprocessors including Intel's Core, IBM's POWER
Series processors, AMD's Athlon, Compaq's Alpha, and Sun's UltraSPARC processors
performance of these microprocessors while they run complex, real-world workloads. This
ability is now a serious limitation of simulators, that they often could not execute complex
workloads. Complex run time systems involving multiple software applications can be evaluated
and monitored very closely. All present-day microprocessor vendors release information on their
performance monitoring counters, although they are not part of the architecture.
There are several tools available to measure performance using performance monitoring
counters. They are listed on Table 1. However, sometimes, extensive post-processing can result
in tools that are somewhat invasive. PMON is a counter reading software written by Juan Rubio
of the Laboratory for Computer Architecture at the University of Texas. It provides a mechanism
to read specified counters with minimal or no perceivable overhead. All these tools measure user
and operating system activity. Since everything on a processor is counted, effort should be made
to have minimal or no other undesired process running during experimentation. This type of
7
PMON IA-32 http://www.ece.itexas.edu/projects/ece/lca/pmon
http://www.research.compaql.com/SRC/dcpi/
Instrumentation using hardware means can also be done by attaching off-chip hardware, two
AMD developed this hardware tracing platform to aid in the design of their x86 microprocessors.
When an application is being traced, the tracer interrupts the processor on each instruction
boundary. The state of the CPU is captured on each interrupt and then transferred to a separate
control machine where the trace is stored. The trace contains virtually all valuable pieces of
information for each instruction that executes on the processor. Operating system activity can
also be traced. However, tracing in this manner can be invasive, and may slow down the
processor. Although the processor is running slower, external events such as disk and memory
accesses still happen in real time, thus looking very fast to the slowed-down processor. Usually
2.1.3b Logic Analyzers: Detailed logic analyzer traces are limited by restrictions on sizes and are
typically used for the most important sections of the program under analysis. Preliminary coarse
level analysis can be done by performance monitoring counters and software instrumentation.
8
2.1.4. Software Monitoring
monitoring used to be an important mode of performance evaluation before the advent of on-chip
easy to do. However, disadvantages include the fact that the instrumentation can slow down the
application. The overhead of servicing the exception, switching to a data collection process, and
performing the necessary tracing can slow down a program by more than 1000 times. Another
disadvantage is that software monitoring systems typically only handle the user activity.
This is a technique lying between trapping information on each instruction using hardware
interrupts (traps) or software traps. Compaq used microcoded instrumentation to obtain traces of
VAX and Alpha architectures. The ATUM tool used extensively by Compaq in the late 1980s
and early 1990s used microcoded instrumentation. The tracing system essentially modified the
VAX microcode to record all instruction and data references in a reserved portion of memory.
Unlike software monitoring, ATUM could trace all processes including the operating system.
However, this kind of tracing is invasive, and can slow down the system by a factor of 10
9
Performance Metrics can be described as the absolute number of times a service has been carried
out, the time taken to perform a service, and the size of the resources required to perform a
service. There are various ways in which Performance Metrics is carried out, which are;
Normalizing values to a common time basis to provide a speed metric (dividing number
by time)
Deriving probabilities
The goals and the costs of a performance study determines the choice of an appropriate
performance metric.
Ease of measurements: If a metric is not easy to measure, it is unlikely that anyone will
actually use it. If it is complicated, it might be much more difficult to measure the metric
correctly.
10
Consistency: Units of the metric and its precise definition are the same across different
configurations and different systems, although this is not true in many cases (ex. MIPS and
MFLOPS)
Independence: Commonly used metrics are often used for decisions to select a system. Good
metric should be independent of vendors to influence the composition of the metric to their
benefit.
i. Clock rate: Most prominent indication of performance often is the frequency of the
processors central clock. This performance metric completely ignores how much
iii. MIPS (Million Instructions per Second): This is a rate metric (amount of computation
performed per time unit). It is easy to measure, repeatable, and independent, but it is
nonlinear, not reliable, and not consistent. One major problem is that the amount of
iv. FLOPS (Floating Point Operations per second -Mega-, Giga-, TeraFLOPS): It defines an
arithmetic operation on two floating point quantities to be the basic unit. It also tries to
correct shortcoming of the MIPS metric. It has no value for integer applications. It has
the difficulty of agreeing on exactly how to count the number. It is the dominant metric in
the HPC field. The advantages are that it is repeatable, easy to measure (now).
11
Disadvantages are that It is nonlinear and inconsistent, there are some games from
vendors.
vi. QUIPS (QUality Improvement Per Second): Even though traditionally, Metrics define
effort to reach a certain result, but in this case, Metric defines the quality of a solution.
ix. Response time: This is the time interval between a user’s request and the system
response, also known as reaction time, turnaround time, etc. Small response time is good,
in the sense that, For the user: waiting time is less, and for the system. It is free to do
other things.
x. Throughput: This is defined as the number of work units done per time unit, e.g.
Applications being run, files transferred, etc. High throughput is good, in the sense that,
the system is able to serve many clients, but for the user, it might imply worse service.
xi. Utilization: This is the percentage of time the system is busy serving clients. It is
important for expensive shared system, but less important (if at all) for single user
systems and real time systems. Utilization and response time are interrelated, in the sense
12
Mean Time Between Failures (MTBF)
Supportable load
Speedup
Scalability (weak/strong)
Start
The dataset used for the development of a particular model are two (2) sets of data, one
contained the 21 feature set variables identified as the input variables of the ANN model for IDS
while the other dataset contained the output variable containing two classes, namely: normal and
attack such that if normal is 0 then attack is 1 and vice versa (0 means No while 1 means Yes)
(Figure 10). The dataset were imported into the MATLAB workspace in order to be used by the
developed ANN model for further analysis. The next task is to to carry out the performance
evaluation of developed model. Below is the result of confusion matrix generated as displayed in
Table 1.1 and Table 1.2.
Table 1.1
NO YES -------Predicted as
10,900 20 NO
1,700 45,100 YES
Table 1.2
NO YES -------Predicted as
20,40 10 NO
0
500 12700 YES
13
The formulae for desired parameters as stated as:
a. Accuracy
A+ D
Accuracy=
A + B+C+ D
b. True positive (TP) rate/Sensitivity/recall
A D
TP NO= TPYES =
A+ B C +D
C B
c. False alarm or false positive (FP) rate FPNO = TPYES =
C+ D A+ B
A D
d. Precision – PrecisionNO = PrecisionYES=
A +C B+ D
1) You are required to use the above formulae to computer for data in Table 1.1 and Table 1.2 to
: i) Correct classification, ii) Incorrect classification, iii) Accuracy, iv) TPR, v) FPR, vi)
Precision and vii) Interpret each of the result.
14
Figure 16: Diagram of a Confusion Matrix
The true positive/negative and false positive/negative values recorded from the confusion matrix
can then be used to evaluate the performance of the prediction model. A description of the
definition and expressions of the metrics are presented as follows:
a. True Positive (TP) rates (sensitivity/recall) – proportion of positive cases correctly
classified.
TP1
TP− poor = (3.34 )
TP 1+ FN 12+ FN 13
TN 2
TP−moderate= (3. 35)
FP 21+TN 2+ FN 23
TN 3
TP−excellent= (3. 36)
FP31+ FN 32+TN 3
FP 21+ FP31
FP− poor= (3.37)
FP21+ FP 31+TN 2+ FN 32+ FN 23+TN 3
15
FN 12+ FN 32
FP−moderate = (3. 38)
TP 1+ FN 12+ FN 13+ FN 32+ FP 31+TN 3
FN 13+ FN 23
FP−excellent = (3.39)
FN 13+TP 1+ FN 12+ FP 21+TN 2+ FN 23
TN 2
Precision−moderate = (41)
FN 12+ TN 2+ FN 32
TN 3
Precision−excellent = (3.42)
FN 13+ FN 23+ TN 3
16
data – while 9 parts are used for training the remaining one is used for testing; this process is
repeated until the remaining 9 parts take their turn for testing the model.
Fair Good
2020 2 Fair
2 2255 Good
Figure 3: Confusion Matrix for the Result of C4.5 Decision Trees Classifier
From the information provided by the confusion matrix, it was discovered that out of the 2022
good cases, 2020 were correctly classified with 2 misclassified as good and out of the 2257 good
cases, 2255 were correctly classified while 2 were misclassified as fair cases. Table 2 shows the
results of the evaluation of the performance of the C4.5 decision trees classifier using the
metrics. Based on the results presented for the C4.5 decision trees’ classifier, the true positive
(TP) rate of the model was the same for the fair and good cases, a value of 0.999 – 99.9% of the
actual cases correctly classified; the false positive (FP) rate of the model was the same for the
fair and good cases, a value of 0.001 – 0.1% of the actual cases misclassified while for the
precision, the model performed equally in predicting the fair and good cases, a value of 0.999 –
99.9% of the predicted cases correctly classified.
17
Good 0.999 0.001 0.999
Average 0.999 0.001 0.999
2022 0 Fair
0 2057 Good
Figure 4: Confusion Matrix for the Result of ID3 Decision Trees Classifier
18
Average 1.000 0.000 1.000
4.3 Discussions
Table 5 gives a summary of the simulation results by presenting the average value of
each performance metrics that was used to evaluate the decision trees algorithms used. The True
positive rate (recall/sensitivity), false positive rate (false alarm/1-specificity), precision, accuracy
and the accuracy were used. From the table, it was discovered that the ID3 decision trees
algorithm showed a better performance due to its ability to predict all 4279 data records recorded
form the sites unlike the C4.5 decision trees algorithm which had 4 misclassifications. The
accuracy, TP rate and the precision of the ID3 decision trees algorithm was higher and better
than that of the C4.5 decision trees while the FP rate was lower.
TECHNIQUES TO BE USED
The three techniques for performance evaluation are analytical modeling, simulation, and
measurement. There are a number of considerations that help decide the technique to be used.
These considerations are listed in the table below. The list is ordered from most to least
important.
The key consideration in deciding the evaluation technique is the life-cycle stage in which the
system is. Measurements are possible only if something similar to the proposed system already
exists, as when designing an improved version of a product. If it is a new concept, analytical
modeling and simulation are the only techniques from which to choose. Analytical modeling and
simulation can be used for situations where measurement is not possible, but in general it would
be more convincing to others if the analytical modeling or simulation is based on previous
measurement.
19
CRITERION ANALYTICAL SIMULATIO MEASUREMENT
MODELING N
1. Stage Any Any Postprototype
be a bit less accurate than the measurement technique, it is not as expensive. Simulation is used
largely before altering an existing system or before building a new system, in order to minimize
the chances of failure in meeting requirements and remove unanticipated bottlenecks. It is also
used in order to avoid underuse or overuse of resources, and to increase the system performance.
For a new telecommunication network or computer system, what is the best possible design
available?
20
Does a new design or topology provide better performance than existing ones?
In case of traffic load raises by 50%, what will be the performance of the system or network?
Simulation is the technique used in designing a model of a real system and conducting
experiments with this model in order to understand its behavior, or to evaluate various strategies
and scenarios of its operation. Simulation can also be defined as the procedure of experimenting
with a model of the system under study using computer programming. It measures a model of the
Three types of entities are usually involved in the simulation of any system, and they are;
These entities are related and dependent on each other in one way or another. Note that the real
system is a source of raw data, while the model is a set of instructions for data gathering. The
simulator is a tool for carrying out model instructions. A simulation model needs to be validated,
to ensure that the assumptions, distributions, inputs, outputs, results, and conclusions are correct.
The simulator must also be properly verified and debugged from all programming errors, to
ensure that the model assumptions have been implemented by the one carrying out the
simulation.
21
There are various reasons for using Simulation analysis as a tool to evaluate the performance of
4. Simulation can promote innovative attitude for trying new concepts or ideas. Various
organizations have under-used resources and systems, which if fully used, can result in
communicate, experiment with, and assess such proposed solutions, scenarios, schemes,
designs, or plans.
5. Simulation can forecast results for possible courses of action in a fast manner.
6. Simulation can
7. justify the effect of variances occurring in a node, element, or a system. It is essential to note
that performance computations based mainly on mean values neglect the effect of variances,
A systematic and effective simulation study and analysis involves some steps to be followed
strictly. These steps (also known as phases) are illustrated in Figure 3.1, and discussed below.
22
Fig 3.1. Simulation steps
important that policy makers comprehend and agree with the formulation. Consider that a
problem well defined is half solved. It is important to establish the problem statement and
Resource assessment: An estimate of the resources required to gather data and analyze
the system under study should be performed. These resources, which include time,
money, personnel, and equipment, must be well planned for. It is better to amend the
23
objectives of the simulation study at early phase than to fall short because of a lack of
vital resources.
System and data analysis: This step includes a thorough investigation in the literature of
previous schemes, techniques, and algorithms for the same problem. Many projects have
factors, variables, initial conditions, and performance metrics is carried out in this phase.
Here, the person carrying out the simulation constructs a system model, which is an imitation of
the real system understudy or a representation of some aspects of the system to be analyzed.
Model construction: This task consists of abstraction of the system into mathematical
Data acquirement: This task entails identification, description, and gathering of data.
Model transformation: This task involves preparation and troubleshooting of the model
(b) physical models such as the ones used in aircrafts and buildings,
(d) flowcharts,
24
(f) computer pseudocode. (informal H. level description of an operating principle of comp prg
/algorithm)
(e) adjusting the top-down design; testing; and validation for the required degree of granularity;
(g) iteration through steps above (e) and (g) until the required degree of granularity ( level of
i. Model scooping: This refers to the method of finding out what process, entity, function,
device, and so on, within the system should be taken into account in the simulation
ii. Level of details: This is established based on the element’s effect on the steadiness of the
analysis. The proper level of details will differ depending on the modeling and simulation
aims.
iii. Subsystem modeling. When the system to be evaluated is large, a subsystem modeling is
carried out. All subsystem models are later tied properly. To characterize subsystems,
25
Flow scheme: This technique has been employed to study systems that are
characterized by the flow of physical or information entries through the system such
Functional scheme: This scheme is valuable when no directly visible flowing entities
are in the system, such as manufacturing processes that do not use assembly lines.
iv. Variable and parameter assessment. This is performed normally by gathering data over
some period of time and then figuring out a frequency distribution for the needed
variables. This kind of analysis may aid the modeler to come across a well-known
(b) a simulation language such as SIMSCRIPT III, MODSIM III, CSIM, or JavaSim; or
(c) a simulation package such as Opnet, NS2, NS3, Network III, Comnet III, QualNet,
and GloMoSim.
In general, using a simulation package may save money and time; however, it may not
be flexible and effective to use simulation packages as they may not contain capabilities
to do the task such as modules to simulate some protocols or some features of the
26
Verification is the procedure of noting whether the model realizes the assumptions considered
accurately or not. Others consider it basically the process of troubleshooting the simulation
program (simulator), which implements the model of the system under study. It is possible to
have a verified simulator that actually represents an invalid model. Also, it is possible to have a
valid model that represents an unverified simulator. The validation procedure refers to making
sure that assumptions considered in the model are realistic in that, if properly realized, the model
Model validation is basically aimed at validating the assumptions, input parameters and
distributions, and output values and conclusions. Validation can be carried out by one of the
following schemes:
(a) relating the results of the simulation with results previously obtained by the real system
Following the verification and validation of the model, the simulator has to be run under
different operating conditions and environments to reveal the behavior of the system under study.
Keep in mind that any simulation study that does not include experimentation with the
simulation model is not useful. It is through testing and experimentation the analyst can
27
appreciate the system and make recommendations about its design and most favorable
operational modes. The level of experiments relies mainly on the cost to approximate
correlation among control variables. The realization of simulation results into practice is an
essential task that is performed after testing and experimentation. Documentation is crucial and
should contain a full record of the whole project activity, not just a user’s manual.
experiments cannot be conducted on the real physical system due to inconvenience, risk, and
cost.
28
2. Speed: It permits time compression operation of a system operation over extensive period of
time. The results of conducting experiments can be obtained much faster than real-time
3. Simulation modeling allows sensitivity analysis by manipulating input variables to find the
design parameters that are critical to the operation of the system under study and that
4. It is a good training tool: In any simulation study, the simulation group consists of experts in
training
5. opportunity.
6. It does not disturb the real system; simulation analysis can be performed on the system
without the need to disturb the physical system under study. This is critical as running tests
on the real system may be pricey and also can be catastrophic. In addition, in some cases the
physical system does not exist physically and it is only design on paper.
2. In simulation modeling, we usually make a hypothesis about input variables and parameters,
and distributions and if these assumptions are inaccurate, then the obtained outcomes may
not be credible.
3. It is not easy to select initial conditions, and not doing so may influence the reliability of the
model.
29
A simplified illustration of the simulation process is shown on figure 3.2.
The nature of the process involved here is iterative. All simulation experiments and changes do
not incur cost in hardware parts, and any change can be made in the simulation model easily and
risk free. Almost all simulation software tools have this efficient feature. To achieve success in
simulation analysis, well-qualified problem formulators, simulation modelers, and analysts are
Modeling is the process of producing a model. With the help of the model, the analyst can easily
predict the impact of changes on the system. The model should always be a close representation
of the actual system, and most of its significant features should be included. In the same way, the
model should not be too compound to understand and to experiment with. Practitioners who
practice simulation always advocate the increase in the complication of the model iteratively.
Model validity is an important issue in modeling. In most simulation studies, the models that
generally used are mathematical models. Various classifications are involved in mathematical
30
modeling, which include deterministic, in which both the input variables and output variables are
fixed, or stochastic; in this case either, input or output variables is probabilistic. Classifications
also include static or dynamic. Generally, simulation models of computer systems and networks
Many stages are involved in simulation modeling process, which include the following:
4. Performing experiments with the model and generating the entire documentation of the project
This process represents an iterative method until a level of granularity is reached. Scientific
databases, are applying foremost influence on the simulation modeling process (SMP) because of
the advances in their respective fields that can help to provide a credible model and simulator.
31
Fig. 3.3. The Simulation modelling process
2. When experimenting with the real (actual) system is not possible, or when the system is under
construction
The modeler should always choose a combination of assumptions, which are appropriate,
realistic, and adequate. After devising a conceptual model, it should be changed into a digital
model. However, the digital model reliability is affected directly by the accuracy of verification
and validation phases. After acquiring a precise and reliable digital model, the simulation
modeler proceeds toward the experimental stage. To meet the goals of the study, statistical tests
are designed.
32
Now, we will take up the issue of randomness in simulation. Few simulations accept inputs only
in the form of nonrandom and fixed values, which typically correspond to factors that illustrate
the model and the particular alternative that we are going to evaluate. If the system that is to be
simulated is like the above, then one can achieve a simulation model that is deterministic.
The one thing good about this deterministic simulation model is that because the input has no
randomness, there will no randomness in size and the interarrival times that exist between
consecutively incoming parts in batches. Deterministic values for the input are represented by the
big dots, and the big dots that are on the outputs also stand for the performance of the output
(deterministic
), which is achieved by converting the input into output by means of simulation’s logic. Figure
different runs are to be made to evaluate the unlike input-parameter combinations before dealing
with the tentative output. Most systems have some type of randomness or uncertain input, so that
is why the simulation models that are practical should also offer for such variable input; such
models are called stochastic simulation models. Actually, if the randomness in the input is
ignored, then it may cause errors in the output of simulation model. A random-in–random-out
33
Fig 3.4. DIDO Simulation
34
4.0. Workload Characterization and Benchmarking
The workload of a system denotes a set of inputs generated by the environment in which the
system is used, for instance, the inter-arrival times and service demands of incoming jobs. They
are usually not under the control of the system designer/administrator. These inputs can be used
for driving the real system (as in measurement) or its simulation model. They can also be used in
clear what aspects of the workload are important, in how much detail the workload should be
recorded, and how the workload should be represented and used. Workload characterization only
builds a model of the real workload, since not every aspect of the real workload may be captured
or is relevant.
A workload model may be executable or non-executable. A good example is; recording the
arrival instants and service durations of jobs creates an executable model, but where only the
An executable model must not always be a record of inputs, but it can also be a program that
generates the inputs. Executable workloads are useful in direct measurements and trace-driven
simulations, whereas non-executable workloads are useful for analytic modeling and
distribution-driven simulations
There is need to provide input to a model or real system under study, irrespective of the
performance evaluation technique used. Lots of new computer and network applications and
programming paradigms are continually evolving, and the need to understand the characteristics
35
architectures for them. It is vital to characterize web servers, database systems, transaction
processing systems, multimedia, networks, ATM switches, and scientific workloads. It is also
operation system behavior leads to improved architectures and designs. Analytical modeling of
workloads is a challenge and needs to be carried out carefully. The major reason is that it takes
increased complexity of the processor, memory subsystem, and the workload domain.
analytic models. They can capture the essential features of systems and workloads, which can be
helpful in providing early predication about the design. Besides, quantitative and analytical
features. An overall block diagram of workload characterization process is shown in figure 4.1.
In this framework, there are two types of relevant inputs, they are;
(a) parameters that can be controlled by the system designer, such as resource allocation
(b) input generated by the environments in which the system under study is used such as
interarrival times.
36
Such inputs are used to drive the real system if the measurement technique or the simulation
model is used. They also can be used to determine adequate distributions for the analytic and
simulation models. In the published literature, such inputs are often called workloads.
In workload characterization, the term ‘‘user’’ may or may not be a human being. In most related
literature, the term ‘‘workload component’’ or ‘‘workload unit’’ is used instead of user. This
(b) sites such as several sites for the same company, and
(c) user sessions such as monitoring complete sessions from user login and logout and
37
Measured quantities, requests, and resource demands used to characterize the workload are
(c) instructions.
In general, workload parameters are preferred over system parameters for the characterization of
workloads. The parameters of significant impact are included, whereas those of minor impact are
usually excluded. Among the techniques that can be used to specify workload are
(a) averaging,
(e) clustering,
The averaging is the simplest scheme. It relies on presenting a single number that summarizes
means. The arithmetic means may not be appropriate for certain applications. In such cases, the
median, mode, geometric means, and harmonic means are used. For example, in the case of
addresses in a network, the mean or median is meaningless, therefore, the mode is often chosen.
38
In the single-parameter histogram scheme, histograms are used to show the relative frequencies
of various values of the parameter under consideration. The drawback of using this scheme is
that when using individual-parameter histograms, these histograms ignore the correlation among
various parameters. To avoid the problem of correlation among different parameters in the single
workload parameters. The difficulty with this technique is that it is not easy to construct joint
Markov models are used in cases when the next request is dependent only on the last request.
Generally, if the next state of the system under study depends only on the current state, then the
overall system’s behavior follows the Markov model. Markov models are often used in queuing
analysis.
The model can be illustrated by a transition matrix that gives the values of the probabilities of the
next state given present state. Figure 4.2. shows the transition probability matrix for a job’s
Fig 4.2.: State transition diagram for the Markov model of the multiprocessor system
39
Any node in the system can be in one of three possible states:
(a) active state where the node (computer) is executing a program (code) using its own cache
memory,
(b) wait (queued) state where the node waits to access the main memory to read/write data, and
(c) access state where the node’s request to access the main memory has been granted.
The probabilities of going from one state to the other make what is called the transition matrix,
The clustering scheme is used when the measured workload is made of a huge number of
components. In that instance, the huge components are categorized into a small number of
clusters/tiers such that the components in one cluster are as similar to each other as possible. This
is almost like what is used in clustering in pattern recognition. One class member may be
selected from each cluster to be its representative and to conduct the needed study to find out
Figure 4.3 shows the number of cells delivered to node A and the numbers delivered to node B in
a computer network.
40
Fig. 4.3.: An example of 60 cells in 6 groups (clusters)
As shown in Figure 4.3, the cells can be classified into six groups (clusters) that represent the six
different links that they arrive on. Therefore, instead of using 60 cells for each specific analysis,
we can use only 6 cells. The use of dispersion measure can give better information about the
variability of the data, as the mean scheme alone is insufficient in cases where the variability in
the data set is large. The variability can be quantified using the variance, standard deviation or
n
1
Variance=s =2
∑
(n−1) i=1
(x i−x ' )'
and COV=s/x’, where x’ is the sample mean with size n. A high COV means high variance,
which means in such a case, the mean is not sufficient. A zero COV means that the variance is
zero, and in such a case, the mean value gives the same information as the complete data set. The
sum of their parameter values. If di is the weight for the ith parameter xi, then the weighted sum W
is as follows:
41
k
W =∑ α i x i
i=1
The last expression can be used to group the components into clusters such as low, medium, and
high-demand classes. The weights to be used in such cases can be determined using the principal
component analysis that permits finding the weights wj’s such that Wi’s provide the maximum
discrimination when compared with other components. The value of Wi is called the principal
factor or principal component. In general, if we are given a set of k parameters, such as x1, x2,.,
xn, then the principal component analysis produces a set of factors and W1, W2, …., Wk, such
that:
(b) the W’s form an orthogonal set, which means that their inner product is zero.
Inner Product =ƩWj .Wj = 0, and the W’s form an ordered set so that W1 describes the highest
percent of the variance in resource demands,W2 describes a lower highest percent, and so forth.
If the system under study is to be used for a specific application, such as airline reservation,
online banking, or stock market trade, then representative application programs from these
applications or a representative subset of functions for these applications should be used during
the performance evaluation study. Usually, benchmark programs are described in terms of the
functions to be performed, and they exercise all resources in the system such as peripherals,
4.2. Benchmark
A benchmark of a system refers to a set of published data about it. The benchmarks are primarily
42
Benchmark is the term often used to mean workload or kernel. Benchmarks are usually run by
vendors or third parties for typical configurations and workloads. This process should be done
with care as it may leave room for misinterpretation and misuse of the measures. Clearly, it is
essential to perform this task accurately. A benchmark program is used as a standard reference
for comparing performance results using different hardware or different software tools. It is
Benchmarks are meant to measure and predict the performance of systems under study and to
reveal their design weakness and strong aspects. A benchmark suite is basically a set of
benchmark programs together with a set of specific rules that govern the test conditions and
methods such as testbed platform environment, input data, output results, and evaluation metrics
can be classified based on the application, such as commercial applications, scientific computing,
network services, signal processing, scientific computing, and image processing. Moreover, we
4.2.1. Micro-benchmarks
of the computer system, such as the CPU speed, memory speed, I/O speed, interconnection
network, and so on. A small program can be used to test only the processor-memory interface, or
microbenchmarks are used to characterize the maximum possible performance that could be
43
obtained if the overall system’s performance were limited by that single component. Examples
on microbenchmarks include
subroutines that analyzes and solves linear equations and linear least-square problems.
ii. LMBENCH: This suite measures system calls and data movement operation. It is portable
and used to measure the operating system overheads and capability of data transfer among
the processor, cache, main memory, network, a disk or various Unix platforms.
iii. STREAM: This simple synthetic benchmark measures sustainable bandwidth of memory and
4.2.2. Macro-benchmarks
different systems when running a specific application on them. This is of great interest to the
system buyer. Keep in mind that this class of benchmarks does not reveal why the system
performs well or bad. Usually, this class of benchmarks is used for parallel computer systems.
NPB suite: The Numerical Aerodynamic Simulation (NAS) Parallel Benchmark (NPB) was
developed by the (NAS) program at National Aeronautics and Space Administration (NASA) at
Pascal (EP), multigrid method (MG), conjugate gradient method (CG), fast Fourier-based
method for solving a three-dimensional (3D) partial differential equation (FT), and Integer
Sorting (IS), as well as the simulated applications block lower triangular, block upper triangular
44
PARKBENCH: This was called after the Parallel Kernel and Benchmarks committee. The
current benchmarks are for distributed memory multicomputers, coded using Fortran 77 plus
Parallel Virtual Machine (PVM) or Message Passing Interface (MPI) for message passing.
STAP: The Space-Time Adaptive Processing (STAP) benchmark suite is basically a set of real-
time, radar signal processing programs originally developed at MIT Lincoln Laboratory. The
FLOPS.
TPC: This was developed by the Transaction Processing Performance Council. TPC has released
five benchmarks: TPC-A, TPC-B, TBC-C, TPCD, and TPC-E. The first two released
SPEC: This suite was developed by Standard Performance Evaluation Corporation (SPEC),
which is a nonprofit corporation that is made of major vendors. It is becoming the most popular
benchmark suite worldwide. SPEC started with benchmarks that measure CPU performance, but
now it has suites that measure client-server systems, commercial applications, I/O subsystems,
and so on. Among the suites, there are SPECT95, SPEChpc96, SPECweb96, SFS, SDM, GPC,
measuring
SPEC HPC2002, and SPECviewperf 7.1. SPEC periodically publishes performance results of
45
instruction mix include the Gibson mix, which was originally developed by Jack C. Gibson for
the IBM 704 system. The program kernel is used to characterize the main portion of a specific
type of application program. The Kernel benchmark is usually a small program that has been
extracted from a large application program. Because the kernel is small, including a dozen lines
of code, it should be easy to port it to many different systems. Evaluating the performance of
different systems by running such a small kernel can provide an insight into the relative
performance of these systems. Because kernels do not exercise memory hierarchy, which is a
major bottleneck in most systems, they are of limited value to make a conclusive overall
performance comparison or
prediction of system performance. Examples of kernels include Puzzle, Tree Searching, and
Ackermann’s Function.
Application benchmark programs are often used when the computer system under evaluation is
meant to be used for a specific application, such as an airline reservation or scientific computing.
These benchmarks are usually described in terms of the functions to be performed and make use
of almost all resources of the system. Application benchmarks are real and complete programs
that produce useful results. Collection of such programs is often made on emphasizing one
application. To reduce the time needed to run the entire set of programs, they usually use
artificial small input data sets, which may limit the application’s ability to model memory
behavior and I/O requirement accurately. They are considered effective in giving good results.
Examples of such benchmark programs include the Debit-Credit benchmark, which is used to
46
References
Metrics, Techniques and Mistakes, in Center for Information Services and High
_02_requirements-metrics-techniques.pdf?lang=en
Hummel, K. A., Hlawacs, H. & Gansterer W. (2010), Performance Evaluation of Computer and
978-3-642-25575-5, Springer
47