0% found this document useful (0 votes)

14 views78 pages

CSC 301 Lecture Vii

Chapter 11 focuses on measuring and analyzing computer performance, discussing benchmarks, their limitations, and factors affecting CPU and disk performance. It covers performance equations, the importance of response time and throughput, and various statistical measures for evaluating system performance. Additionally, it highlights benchmarking practices, including the SPEC and TPC benchmarks, and optimization techniques for CPU performance.

Uploaded by

akinbodejohn2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views78 pages

CSC 301 Lecture Vii

Uploaded by

akinbodejohn2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 78

Chapter 11

Performance
Measurement and
Analysis
Chapter 11 Objectives

• Understand the ways in which computer

performance is measured.
• Be able to describe common benchmarks and
their limitations.
• Become familiar with factors that contribute to
improvements in CPU and disk performance.

2
11.1 Introduction

• The ideas presented in this chapter will help you

to understand various measurements of computer
performance.
• You will be able to use these ideas when you are
purchasing a large system, or trying to improve
the performance of an existing system.
• We will discuss a number of factors that affect
system performance, including some tips that you
can use to improve the performance of programs.

3
11.2 The Basic Computer
Performance Equation
• The basic computer performance equation has
been useful in our discussions of RISC versus
CISC:

• To achieve better performance, RISC machines

reduce the number of cycles per instruction, and
CISC machines reduce the number of instructions
per program.

4
11.2 The Basic Computer
Performance Equation
• We have also learned that CPU efficiency is not the
sole factor in overall system performance. Memory
and I/O performance are also important.
• Amdahl’s Law tells us that the system performance
gain realized from the speedup of one component
depends not only on the speedup of the component
itself, but also on the fraction of work done by the
component:

5
11.2 The Basic Computer
Performance Equation
• In short, using Amdahl’s Law we know that we need
to make the common case fast.
• So if our system is CPU bound, we want to make
the CPU faster.
• A memory bound system calls for improvements in
memory management.
• The performance of an I/O bound system will
improve with an upgrade to the I/O system.
Of course, fixing a performance problem in one part of the
system can expose a weakness in another part of the system!
6
11.3 Mathematical Preliminaries

• Measures of system performance depend upon one

’s point of view.
– A computer user is most often concerned with response
time: How long does it take the system to carry out a task?
– System administrators are usually more concerned with
throughput: How many concurrent tasks can the system
handle before response time is adversely affected?
• These two ideas are related: If a system carries out
a task in k seconds, then its throughput is 1/k of
these tasks per second.

7
11.3 Mathematical Preliminaries

• In comparing the performance of two systems, we

measure the time that it takes for each system to do
the same amount of work.
• Specifically, if System A and System B run the
same program, System A is n times as fast as
System B if:

• System A is x% faster than System B if:

8
11.3 Mathematical Preliminaries

• Suppose we have two racecars that have just completed

a 10 mile race. Car A finished in 3 minutes, and Car B
finished in 4 minutes. Using our formulas, Car A is 1.25
times as fast as Car B, and Car A is also 25% faster than
Car B:

9
11.3 Mathematical Preliminaries

• When we are evaluating system performance we

are most interested in its expected performance
under a given workload.
• We use statistical tools that are measures of central
tendency.
• The one with which everyone is most familiar is the
arithmetic mean (or average), given by:

10
11.3 Mathematical Preliminaries

• The arithmetic mean can be misleading if the data

are skewed or scattered.
– Consider the execution times given in the table below. The
performance differences are hidden by the simple average.

11
11.3 Mathematical Preliminaries

• If execution frequencies (expected workloads) are

known, a weighted average can be revealing.
– The weighted average for System A is:
50  0.5 + 200  0.3 + 250  0.1 + 400  0.05 + 5000  0.05 = 380.

12
11.3 Mathematical Preliminaries

• However, workloads can change over time.

– A system optimized for one workload may perform poorly
when the workload changes, as illustrated below.

13
11.3 Mathematical Preliminaries

• When comparing the relative performance of two or

more systems, the geometric mean is the preferred
measure of central tendency.
– It is the nth root of the product of n measurements.

• Unlike the arithmetic means, the geometric mean

does not give us a real expectation of system
performance. It serves only as a tool for comparison.

14
11.3 Mathematical Preliminaries

• The geometric mean is often uses normalized ratios

between a system under test and a reference
machine.
– We have performed the calculation in the table below.

15
11.3 Mathematical Preliminaries

• When another system is used for a reference

machine, we get a different set of numbers.

16
11.3 Mathematical Preliminaries

• The real usefulness of the normalized geometric

mean is that no matter which system is used as a
reference, the ratio of the geometric means is
consistent.
• This is to say that the ratio of the geometric
means for System A to System B, System B to
System C, and System A to System C is the
same no matter which machine is the reference
machine.

17
11.3 Mathematical Preliminaries

• The results that we got when using System B and

System C as reference machines are given below.
• We find that 1.6733/1 = 2.4258/1.4497.

18
11.3 Mathematical Preliminaries

• The inherent problem with using the geometric mean

to demonstrate machine performance is that all
execution times contribute equally to the result.
• So shortening the execution time of a small program
by 10% has the same effect as shortening the
execution time of a large program by 10%.
– Shorter programs are generally easier to optimize, but in the
real world, we want to shorten the execution time of longer
programs.
• Also, if the geometric mean is not proportionate. A
system giving a geometric mean 50% smaller than
another is not necessarily twice as fast!
19
11.3 Mathematical Preliminaries

• The harmonic mean provides us with a way to

compare execution times that are expressed as a
rate.
• The harmonic mean allows us to form a mathematical
expectation of throughput, and to compare the relative
throughput of systems and system components.
• To find the harmonic mean, we add the reciprocals of
the rates and divide them into the number of rates:
H = n  (1/x1+1/x2+1/x3+ . . . + 1/xn)

20
11.3 Mathematical Preliminaries

• The harmonic mean holds two advantages over the

geometric mean.
• First, it is a suitable predictor of machine behavior.
– So it is useful for more than simply comparing performance.
• Second, the slowest rates have the greatest influence
on the result, so improving the slowest performance--
usually what we want to do-- results in better
performance.
• The main disadvantage is that the harmonic mean is
sensitive to the choice of a reference machine.

21
11.3 Mathematical Preliminaries

• This chart summarizes when the use of each of the

performance means is appropriate.

22
11.3 Mathematical Preliminaries

• The objective assessment of computer performance

is most critical when deciding which one to buy.
– For enterprise-level systems, this process is complicated, and
the consequences of a bad decision are grave.
• Unfortunately, computer sales are as much
dependent on good marketing as on good
performance.
• The wary buyer will understand how objective
performance data can be slanted to the advantage of
anyone giving a sales pitch.

23
11.3 Mathematical Preliminaries

• The most common deceptive practices include:

– Selective statistics: Citing only favorable results while
omitting others.
– Citing only peak performance numbers while ignoring the
average case.
– Vagueness in the use of words like “almost,” “nearly,” “more,
” and “less,” in comparing performance data.
– The use of inappropriate statistics or “comparing apples to
oranges.”
– Implying that you should buy a particular system because
“everyone” is buying similar systems.
Many examples can be found in business and trade journal ads.
24
11.4 Benchmarking

• Performance benchmarking is the science of making

objective assessments concerning the performance of
one system over another.
• Price-performance ratios can be derived from
standard benchmarks.
• The troublesome issue is that there is no definitive
benchmark that can tell you which system will run
your applications the fastest (using the least wall
clock time) for the least amount of money.

25
11.4 Benchmarking

• Many people erroneously equate CPU speed with

performance.
• Measures of CPU speed include cycle time (MHz,
and GHz) and millions of instructions per second
(MIPS).
• Saying that System A is faster than System B
because System A runs at 1.4GHz and System B
runs at 900MHz is valid only when the ISAs of
Systems A and B are identical.
– With different ISAs, it is possible that both of these
systems could obtain identical results within the same
amount of wall clock time.
26
11.4 Benchmarking

• In an effort to describe performance independent of

clock speed and ISAs, a number of synthetic
benchmarks have been attempted over the years.
• Synthetic benchmarks are programs that serve no
purpose except to produce performance numbers.
• The earliest synthetic benchmarks, Whetstone,
Dhrystone, and Linpack (to name only a few) were
relatively small programs that were easy to optimize.
– This fact limited their usefulness from the outset.
• These programs are much too small to be useful in
evaluating the performance of today’s systems.
27
11.4 Benchmarking

• In 1988 the Standard Performance Evaluation

Corporation (SPEC) was formed to address the
need for objective benchmarks.
• SPEC produces benchmark suites for various
classes of computers and computer applications.
• Their most widely known benchmark suite is the
SPEC CPU benchmark.
• The SPEC CPU2000 benchmark consists of two
parts, CINT2000, which measures integer arithmetic
operations, and CFP2000, which measures floating-
point processing.
28
11.4 Benchmarking

• The SPEC benchmarks consist of a collection of

kernel programs.
• These are programs that carry out the core
processes involved in solving a particular problem.
– Activities that do not contribute to solving the problem,
such as I/O are removed.
• CINT2000 consists of 12 applications (11 written in
C and one in C++); CFP2000 consists of 14
applications (6 FORTRAN 77, 4 FORTRAN 90,
and 4 C).
A list of these programs can be found in Table 10.7 on Pages 467 - 468.
29
11.4 Benchmarking

• On most systems, more than two 24 hour days are

required to run the SPEC CPU2000 benchmark suite.
• Upon completion, the execution time for each kernel
(as reported by the benchmark suite) is divided by
the run time for the same kernel on a Sun Ultra 10.
• The final result is the geometric mean of all of the run
times.
• Manufacturers may report two sets of numbers: The
peak and base numbers are the results with and
without compiler optimization flags, respectively.

30
11.4 Benchmarking

• The SPEC CPU benchmark evaluates only CPU

performance.
• When the performance of the entire system under
high transaction loads is a greater concern, the
Transaction Performance Council (TPC) benchmarks
are more suitable.
• The current version of this suite is the TPC-C
benchmark.
• TPC-C models the transactions typical of a
warehousing and distribution business using terminal
emulation software.
31
11.4 Benchmarking

• The TPC-C metric is the number of new

warehouse order transactions per minute (tpmC),
while a mix of other transactions is concurrently
running on the system.
• The tpmC result is divided by the total cost of the
configuration tested to give a price-performance
ratio.
• The price of the system includes all hardware,
software, and maintenance fees that the
customer would expect to pay.

32
11.4 Benchmarking

• The Transaction Performance Council has also

devised benchmarks for decision support systems
(used for applications such as data mining) and for
Web-based e-commerce systems.
• For all of the TPC benchmarks, the systems tested
must be available for general sale at the time of the
test and at the prices cited in a full disclosure report.
• Results of the tests are audited by an independent
auditing firm that has been certified by the TPC.

33
11.4 Benchmarking

• TPC benchmarks are a kind of simulation tool.

• They can be used to optimize system performance
under varying conditions that occur rarely under
normal conditions.
• Other kinds of simulation tools can be devised to
assess performance of an existing system, or to
model the performance of systems that do not yet
exist.
• One of the greatest challenges in creation of a
system simulation tool is in coming up with a realistic
workload.
34
11.4 Benchmarking

• To determine the workload for a particular system

component, system traces are sometimes used.
• Traces are gathered by using hardware or software
probes that collect detailed information concerning
the activity of a component of interest.
• Because of the enormous amount of detailed
information collected by probes, they are usually
engaged for only a few seconds.
• Several trace runs may be required to obtain
statistically useful system information.

35
11.4 Benchmarking

• Devising a good simulator requires that one keep a

clear focus as to the purpose of the simulator
• A model that is too detailed is costly and time-
consuming to write.
• Conversely, it is of little use to create a simulator that
is so simplistic that it ignores important details of the
system being modeled.
• A simulator should be validated to show that it is
achieving the goal that it set out to do: A simple
simulator is easier to validate than a complex one.

36
11.5 CPU Performance
Optimization
• CPU optimization includes many of the topics that
have been covered in preceding chapters.
– CPU optimization includes topics such as pipelining,
parallel execution units, and integrated floating-point
units.
• We have not yet explored two important CPU
optimization topics: Branch optimization and user
code optimization.
• Both of these can affect performance in dramatic
ways.

37
11.5 CPU Performance
Optimization
• We know that pipelines offer significant execution
speedup when the pipeline is kept full.
• Conditional branch instructions are a type of pipeline
hazard that can result in flushing the pipeline.
– Other hazards are include conflicts, data dependencies, and
memory access delays.
• Delayed branching offers one way of dealing with
branch hazards.
• With delayed branching, one or more instructions
following a conditional branch are sent down the
pipeline regardless of the outcome of the statement.
38
11.5 CPU Performance
Optimization
• The responsibility for setting up delayed branching
most often rests with the compiler.
• It can choose the instruction to place in the delay slot
in a number of ways.
• The first choice is a useful instruction that executes
regardless of whether the branch occurs.
• Other possibilities include instructions that execute if
the branch occurs, but do no harm if the branch does
not occur.
• Delayed branching has the advantage of low
hardware cost.
39
11.5 CPU Performance
Optimization
• Branch prediction is another approach to minimizing
branch penalties.
• Branch prediction tries to avoid pipeline stalls by
guessing the next instruction in the instruction
stream.
– This is called speculative execution.
• Branch prediction techniques vary according to the
type of branching. If/then/else, loop control, and
subroutine branching all have different execution
profiles.

40
11.5 CPU Performance
Optimization
There are various ways in which a prediction can be
made:
• Fixed predictions do not change over time.
• True predictions result in the branch being always
taken or never taken.
• Dynamic prediction uses historical information
about the branch and its outcomes.
• Static prediction does not use any history.

41
11.5 CPU Performance
Optimization
• When fixed prediction assumes that a branch is not
taken, the normal sequential path of the program is
taken.
• However, processing is done in parallel in case the
branch occurs.
• If the prediction is correct, the preprocessing
information is deleted.
• If the prediction is incorrect, the speculative
processing is deleted and the preprocessing
information is used to continue on the correct path.

42
11.5 CPU Performance
Optimization
• When fixed prediction assumes that a branch is
always taken, state information is saved before the
speculative processing begins.
• If the prediction is correct, the saved information is
deleted.
• If the prediction is incorrect, the speculative
processing is deleted and the saved information is
restored allowing execution to continue to continue
on the correct path.

43
11.5 CPU Performance
Optimization
• Dynamic prediction employs a high-speed branch
prediction buffer to combine an instruction with its
history.
• The buffer is indexed by the lower portion of the
address of the branch instruction that also contains
extra bits indicating whether the branch was recently
taken.
– One-bit dynamic prediction uses a single bit to indicate
whether the last occurrence of the branch was taken.
– Two-bit branch prediction retains the history of the
previous to occurrences of the branch along with a
probability of the branch being taken.
44
11.5 CPU Performance
Optimization
• The earliest branch prediction implementations
used static branch prediction.
• Most newer processors (including the Pentium,
PowerPC, UltraSparc, and Motorola 68060) use
two-bit dynamic branch prediction.
• Some superscalar architectures include branch
prediction as a user option.
• Many systems implement branch prediction in
specialized circuits for maximum throughput.

45
11.5 CPU Performance
Optimization
• The best hardware and compilers will never
equal the abilities of a human being who has
mastered the science of effective algorithm and
coding design.
• People can see an algorithm in the context of the
machine it will run on.
– For example a good programmer will access a stored
column-major array in column-major order.
• We end this section by offering some tips to help
you achieve optimal program performance.

46
11.5 CPU Performance
Optimization
• Operation counting can enhance program
performance.
• With this method, you count the number of
instruction types executed in a loop then determine
the number of machine cycles for each instruction.
• The idea is to provide the best mix of instruction
types for a particular architecture.
• Nested loops provide a number of interesting
optimization opportunities.

47
11.5 CPU Performance
Optimization
• Loop unrolling is the process of expanding a loop
so that each new iteration contains several of the
original operations, thus performing more
computations per loop iteration. For example:
for (i = 1; i <= 30; i++)
a[i] = a[i] + b[i] * c;

becomes
for (i = 1; i <= 30; i+=3)
{ a[i] = a[i] + b[i] * c;
a[i+1] = a[i+1] + b[i+1] * c;
a[i+2] = a[i+2] + b[i+2] * c; }

48
11.5 CPU Performance
Optimization
• Loop fusion combines loops that use the same data
elements, possibly improving cache performance.
For example:
for (i = 0; i < N; i++)
C[i] = A[i] + B[i];
for (i = 0; i < N; i++)
D[i] = E[i] + C[i];

becomes
for (i = 0; i < N; i++)
{ C[i] = A[i] + B[i];
D[i] = E[i] + C[i]; }

49
11.5 CPU Performance
Optimization
• Loop fission splits large loops into smaller ones to
reduce data dependencies and resource conflicts.
• A loop fission technique known as loop peeling
removes the beginning and ending loop statements.
For example:
for (i = 1; i < N+1; i++)
{ if (i==1)
becomes
A[i] = 0;
else if (i == N) A[1] = 0;
A[i] = N; for (i = 2; i < N; i++)
else A[i] = A[i] + A[i] = A[i] + 8;
8; } A[N] = N;

50
11.5 CPU Performance
Optimization
• The text lists a number of rules of thumb for
getting the most out of program performance.
• Optimization efforts pay the biggest dividends
when they are applied to code segments that
are executed the most frequently.
• In short, try to make the common cases fast.

51
11.6 Disk Performance

• Optimal disk performance is critical to system

throughput.
• Disk drives are the slowest memory component, with
the fastest access times one million times longer
than main memory access times.
• A slow disk system can choke transaction
processing and drag down the performance of all
programs when virtual memory paging is involved.
• Low CPU utilization can actually indicate a problem
in the I/O subsystem, because the CPU spends
more time waiting than running.
52
11.6 Disk Performance

• Disk utilization is the measure of the percentage of

the time that the disk is busy servicing I/O requests.
• It gives the probability that the disk will be busy when
another I/O request arrives in the disk service queue.
• Disk utilization is determined by the speed of the disk
and the rate at which requests arrive in the service
queue. Stated mathematically:
Utilization = Request Arrival Rate Disk Service Rate.
where the arrival rate is given in requests per second, and the
disk service rate is given in I/O operations per second (IOPS)

53
11.6 Disk Performance

• The amount of time that a request spends in the

queue is directly related to the service time and the
probability that the disk is busy, and it is indirectly
related to the probability that the disk is idle.
• In formula form, we have:
Time in Queue = (Service time  Utilization) 
(1 – Utilization)
• The important relationship between queue time and
utilization (from the formula above) is shown
graphically on the next slide.

54
11.6 Disk Performance

The “knee” of the

curve is around
78%. This is why
80% is the rule-
of-thumb upper
limit for utilization
for most disk
drives.
Beyond that,
queue time
quickly becomes
excessive.
55
11.6 Disk Performance

• The manner in which files are organized on a disk

greatly affects throughput.
• Disk arm motion is the greatest consumer of service
time.
• Disk specifications cite average seek time, which is
usually in the range of 5 to 10ms.
• However, a full-stroke seek can take as long as 15 to
20ms.
• Clever disk scheduling algorithms endeavor to
minimize seek time.

56
11.6 Disk Performance

• The most naïve disk scheduling policy is first-

come, first-served (FCFS).
• As its name implies, FCFS services all I/O
requests in the order in which they arrive in the
queue.
• With this approach, there is no real control over
arm motion, so random, wide sweeps across
the disk are possible.

The next slide illustrates the arm motion of FCFS.

57
11.6 Disk Performance

Using FCFS, performance is unpredictable and

widely variable.
58
11.6 Disk Performance

• Arm motion is reduced when requests are

ordered so that the disk arm moves only to the
track nearest its current location.
• This is the idea employed by the shortest seek
time first (SSTF) scheduling algorithm.
• Disk track requests are queued and selected so
that the minimum arm motion is involved in
servicing the request.

The next slide illustrates the arm motion of SSTF.

59
11.6 Disk Performance

Shortest Seek Time First

60
11.6 Disk Performance

• With SSTF, starvation is possible: A track request

for a “remote” track could keep getting shoved to the
back of the queue nearer requests are serviced.
– Interestingly, this problem is at its worst with low disk
utilization rates.
• To avoid starvation, fairness can be enforced by
having the disk arm continually sweep over the
surface of the disk, stopping when it reaches a track
for which it has a request.
– This approach is called an elevator algorithm.

61
11.6 Disk Performance

• In the context of disk scheduling, the elevator

algorithm is known as the SCAN (which is not an
acronym).
• While SCAN entails a lot of arm motion, the
motion is constant and predictable.
• Moreover, the arm changes direction only twice:
At the center and at the outermost edges of the
disk.

The next slide illustrates the arm motion of SCAN.

62
11.6 Disk Performance

SCAN Disk Scheduling

63
11.6 Disk Performance

• A SCAN variant, called C-SCAN for circular SCAN,

treats track zero as if it is adjacent to the highest-
numbered track on the disk.
• The arm moves in one direction only, providing a
simpler SCAN implementation.
• The following slide illustrates a series of read
requests where after track 75 is read, the arm
passes to track 99, and then to track 0 from which it
starts reading the lowest numbered tracks starting
with track 6.

64
11.6 Disk Performance

C-SCAN Disk Scheduling

65
11.6 Disk Performance

• The disk arm motion of SCAN and C-SCAN is

can be reduced through the use of the LOOK and
C-LOOK algorithms.
• Instead of sweeping the entire disk, the disk arm
travels only to the highest- and lowest-numbered
tracks for which access requests are pending.
• Although the circuitry is more complex, LOOK
and C-LOOK provide the best theoretical
throughput, although the circuitry is the most
complex.

66
11.6 Disk Performance

• At high utilization rates, SSTF performs slightly

better than SCAN or LOOK. But the risk of
starvation persists.
• Under very low utilization (under 20%), the
performance of any of these algorithms will be
acceptable.
• No matter which scheduling algorithm is used, file
placement greatly influences performance.
• When possible, the most frequently-used files should
reside in the center tracks of the disk, and the disk
should be periodically defragmented.
67
11.6 Disk Performance

• The best way to reduce disk arm motion is to avoid

using the disk as much as possible.
• To this end, many disk drives, or disk drive
controllers, are provided with cache memory or a
number of main memory pages set aside for the
exclusive use of the I/O subsystem.
• Disk cache memory is usually associative.
– Because associative cache searches are time-consuming,
performance can actually be better with smaller disk
caches because hit rates are usually low.

68
11.6 Disk Performance

• Many disk drive-based caches use prefetching

techniques to reduce disk accesses.
• When using prefetching, a disk will read a number
of sectors subsequent to the one requested with
the expectation that one or more of the
subsequent sectors will be needed “soon.”
• Empirical studies have shown that over 50% of
disk accesses are sequential in nature, and that
prefetching increases performance by 40%, on
average.

69
11.6 Disk Performance

• Prefetching is subject to cache pollution, which

occurs when the cache is filled with data that no
process needs, leaving less room for useful data.
• Various replacement algorithms, LRU, LFU and
random, are employed to help keep the cache
clean.
• Additionally, because disk caches serve as a
staging area for data to be written to the disk,
some disk cache management schemes evict all
bytes after they have been written to the disk.

70
11.6 Disk Performance

• With cached disk writes, we are faced with the

problem that cache is volatile memory.
• In the event of a massive system failure, data in
the cache will be lost.
• An application believes that the data has been
committed to the disk, when it really is in the
cache. If the cache fails, the data just disappears.
• To defend against power loss to the cache, some
disk controller-based caches are mirrored and
supplied with a battery backup.

71
11.6 Disk Performance

• Another approach to combating cache failure is to

employ a write-through cache where a copy of the
data is retained in the cache in case it is needed
again “soon,” but it is simultaneously written to the
disk.
• The operating system is signaled that the I/O is
complete only after the data has actually been
placed on the disk.
• With a write-through cache, performance is
somewhat compromised to provide reliability.

72
11.6 Disk Performance

• When throughput is more important than

reliability, a system may employ the write back
cache policy.
• Some disk drives employ opportunistic writes.
• With this approach, dirty blocks wait in the cache
until the arrival of a read request for the same
cylinder.
• The write operation is then “piggybacked” onto
the read operation.

73
11.6 Disk Performance

• Opportunistic writes have the effect of reducing

performance on reads, but of improving it for writes.

• The tradeoffs involved in optimizing disk

performance can present difficult choices.
• Our first responsibility is to assure data reliability
and consistency.
• No matter what its price, upgrading a disk
subsystem is always cheaper than replacing lost
data.
74
Chapter 11 Conclusion

• Computer performance assessment relies upon

measures of central tendency that include the
arithmetic mean, weighted arithmetic mean, the
geometric mean, and the harmonic mean.
• Each of these is applicable under different
circumstances.
• Benchmark suites have been designed to provide
objective performance assessment. The most well
respected of these are the SPEC and TPC
benchmarks.

75
Chapter 11 Conclusion

• CPU performance depends upon many factors.

• These include pipelining, parallel execution units,
integrated floating-point units, and effective
branch prediction.
• User code optimization affords the greatest
opportunity for performance improvement.
• Code optimization methods include loop
manipulation and good algorithm design.

76
Chapter 11 Conclusion

• Most systems are heavily dependent upon I/O

subsystems.
• Disk performance can be improved through good
scheduling algorithms, appropriate file placement,
and caching.
• Caching provides speed, but involves some risk.
• Keeping disks defragmented reduces arm motion
and results in faster service time.

77
End of Chapter 11

Computer-Controlled Systems: Theory and Design, Third Edition
From Everand
Computer-Controlled Systems: Theory and Design, Third Edition
Karl J Åström
3/5 (1)
Module 3
No ratings yet
Module 3
23 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
Clfaracterlzmlg Computer Perforiuance With A Single Hlum3Er
No ratings yet
Clfaracterlzmlg Computer Perforiuance With A Single Hlum3Er
5 pages
Lecture 7
No ratings yet
Lecture 7
23 pages
Ch1a Slides
No ratings yet
Ch1a Slides
33 pages
Chapter Two
No ratings yet
Chapter Two
33 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Computer Architecture and Performance
No ratings yet
Computer Architecture and Performance
33 pages
CSC 417 Note
No ratings yet
CSC 417 Note
5 pages
Performance-Related Attributes: Presented By: Eddie Santillan
No ratings yet
Performance-Related Attributes: Presented By: Eddie Santillan
16 pages
CH 02a-Computer Performance
No ratings yet
CH 02a-Computer Performance
22 pages
P51a 01
No ratings yet
P51a 01
46 pages
CSCI 8150 Advanced Computer Architecture
No ratings yet
CSCI 8150 Advanced Computer Architecture
26 pages
Chapter4 Performance
No ratings yet
Chapter4 Performance
36 pages
Chapter 8 - CPU Performance
No ratings yet
Chapter 8 - CPU Performance
40 pages
Figure 2.3 Illustration of Amdahl's Law: © 2016 Pearson Education, Inc., Hoboken, NJ. All Rights Reserved
No ratings yet
Figure 2.3 Illustration of Amdahl's Law: © 2016 Pearson Education, Inc., Hoboken, NJ. All Rights Reserved
10 pages
Notes On Calculating Computer Performance
No ratings yet
Notes On Calculating Computer Performance
10 pages
Computer Architecture A Quantitative Approach (5th Edition) - Comparación
No ratings yet
Computer Architecture A Quantitative Approach (5th Edition) - Comparación
2 pages
Computer Architecture Unit 1
No ratings yet
Computer Architecture Unit 1
12 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
513 Lec 02 Quantifying Performance
No ratings yet
513 Lec 02 Quantifying Performance
50 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
Chapter 1 Lecture 2 & 3 - Computer Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Computer Performance
37 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Chapter 1 Lecture 2 & 3 - Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Performance
36 pages
C A Lecture-3
No ratings yet
C A Lecture-3
41 pages
ACA Lec2 New
No ratings yet
ACA Lec2 New
44 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
ComputerOrganization Chapter4 Performance Color
No ratings yet
ComputerOrganization Chapter4 Performance Color
37 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
The Role of Performance: Chapter - 2
No ratings yet
The Role of Performance: Chapter - 2
40 pages
10.1201 b16328 Previewpdf
No ratings yet
10.1201 b16328 Previewpdf
47 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
Karp
No ratings yet
Karp
5 pages
Measuring and Reasoning About Performance: Readings: 1.4-1.5
No ratings yet
Measuring and Reasoning About Performance: Readings: 1.4-1.5
26 pages
Quatitative Principle
No ratings yet
Quatitative Principle
56 pages
L14 Introduction To Performance Evaluation
No ratings yet
L14 Introduction To Performance Evaluation
48 pages
Lec 2
No ratings yet
Lec 2
39 pages
2 Week
No ratings yet
2 Week
35 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
1aca L1
No ratings yet
1aca L1
35 pages
Fundamentals of Computer Design - 1
No ratings yet
Fundamentals of Computer Design - 1
32 pages
Principles of Scalable Performance
0% (1)
Principles of Scalable Performance
7 pages
1 - Introduction To Computer System
No ratings yet
1 - Introduction To Computer System
31 pages
Benchmarking Slides
No ratings yet
Benchmarking Slides
9 pages
Da Ci
No ratings yet
Da Ci
13 pages
Design Goals
No ratings yet
Design Goals
2 pages
CS-3006 4 PerformanceAnalysis
No ratings yet
CS-3006 4 PerformanceAnalysis
62 pages
Cs2100 14 Understanding Performance
No ratings yet
Cs2100 14 Understanding Performance
46 pages
Lesson 7: System Performance: Objective
No ratings yet
Lesson 7: System Performance: Objective
2 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Exercise CH - 1 CAO Hennessy PDF
0% (1)
Exercise CH - 1 CAO Hennessy PDF
4 pages
Introduction To Computer System Performance
100% (1)
Introduction To Computer System Performance
28 pages
Lec 2
No ratings yet
Lec 2
31 pages
CSA
No ratings yet
CSA
68 pages
IT401 Computer Organization and Architecture: Prasun Ghosal
No ratings yet
IT401 Computer Organization and Architecture: Prasun Ghosal
30 pages
Fundamentals of Computer Design: Bina Ramamurthy CS506
No ratings yet
Fundamentals of Computer Design: Bina Ramamurthy CS506
25 pages
Processor Clock:: Computer Organization
No ratings yet
Processor Clock:: Computer Organization
17 pages
CH02 COA10e
No ratings yet
CH02 COA10e
33 pages
Parallel Archtecture and Computing
No ratings yet
Parallel Archtecture and Computing
65 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Co Question Bank
No ratings yet
Co Question Bank
6 pages
Coa Question Bank
No ratings yet
Coa Question Bank
4 pages
Computer Architecture Sample Questions
100% (1)
Computer Architecture Sample Questions
4 pages
Pipelining: by Based On The Text Book "Computer Organization" by Carl Hamacher Et Al., Fifth Edition
No ratings yet
Pipelining: by Based On The Text Book "Computer Organization" by Carl Hamacher Et Al., Fifth Edition
23 pages
Unit 7 - Basic Processing
No ratings yet
Unit 7 - Basic Processing
85 pages
pdc2: MODULE2
No ratings yet
pdc2: MODULE2
113 pages
Ddco With Answers
No ratings yet
Ddco With Answers
12 pages
Questions With Answers
No ratings yet
Questions With Answers
22 pages
COA UNIT-V PPTS Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-V PPTS Dr.G.Bhaskar ECE
100 pages
CH14 COA9e Processor Structure and Function
No ratings yet
CH14 COA9e Processor Structure and Function
40 pages
Pipelines - #1 RISC ISA Without Pipe
No ratings yet
Pipelines - #1 RISC ISA Without Pipe
9 pages
Techopedia Explains: Amdahl's Law
No ratings yet
Techopedia Explains: Amdahl's Law
19 pages
Implementation and Functional Verification of RISC-V Core For Secure IoT Applications
No ratings yet
Implementation and Functional Verification of RISC-V Core For Secure IoT Applications
4 pages
Comp Architecture Sample Questions
No ratings yet
Comp Architecture Sample Questions
9 pages
CSE 431 Computer Architecture Fall 2005 Lecture 17: VLIW Processors
No ratings yet
CSE 431 Computer Architecture Fall 2005 Lecture 17: VLIW Processors
18 pages
Anna University CA
No ratings yet
Anna University CA
9 pages
CSC612M Problem Set #7 - 1st Term SY 2016-2017
No ratings yet
CSC612M Problem Set #7 - 1st Term SY 2016-2017
5 pages
Coa Based On Willam Stalling
No ratings yet
Coa Based On Willam Stalling
9 pages
The Color Purple Essay
100% (2)
The Color Purple Essay
5 pages
Pipelining and Vector Processing-1-30
No ratings yet
Pipelining and Vector Processing-1-30
30 pages
Pipelining: Basic and Intermediate Concepts
No ratings yet
Pipelining: Basic and Intermediate Concepts
69 pages
10 - Processor Structure and Function
No ratings yet
10 - Processor Structure and Function
45 pages
Slot15 CH14 ProcessorStructureAndFunction 42 Slots
No ratings yet
Slot15 CH14 ProcessorStructureAndFunction 42 Slots
42 pages
Ec8501 Digital Communication 1
No ratings yet
Ec8501 Digital Communication 1
13 pages
Anch Prediction
No ratings yet
Anch Prediction
111 pages
Parallel System Assignment
No ratings yet
Parallel System Assignment
25 pages
Chapter-4 Computer Organization and Embedded System
No ratings yet
Chapter-4 Computer Organization and Embedded System
22 pages

CSC 301 Lecture Vii

Uploaded by

CSC 301 Lecture Vii

Uploaded by

Chapter 11

• Understand the ways in which computer

• The ideas presented in this chapter will help you

• To achieve better performance, RISC machines

• Measures of system performance depend upon one

• In comparing the performance of two systems, we

• System A is x% faster than System B if:

• Suppose we have two racecars that have just completed

• When we are evaluating system performance we

• The arithmetic mean can be misleading if the data

• If execution frequencies (expected workloads) are

• However, workloads can change over time.

• When comparing the relative performance of two or

• Unlike the arithmetic means, the geometric mean

• The geometric mean is often uses normalized ratios

• When another system is used for a reference

• The real usefulness of the normalized geometric

• The results that we got when using System B and

• The inherent problem with using the geometric mean

• The harmonic mean provides us with a way to

• The harmonic mean holds two advantages over the

• This chart summarizes when the use of each of the

• The objective assessment of computer performance

• The most common deceptive practices include:

• Performance benchmarking is the science of making

• Many people erroneously equate CPU speed with

• In an effort to describe performance independent of

• In 1988 the Standard Performance Evaluation

• The SPEC benchmarks consist of a collection of

• On most systems, more than two 24 hour days are

• The SPEC CPU benchmark evaluates only CPU

• The TPC-C metric is the number of new

• The Transaction Performance Council has also

• TPC benchmarks are a kind of simulation tool.

• To determine the workload for a particular system

• Devising a good simulator requires that one keep a

• Optimal disk performance is critical to system

• Disk utilization is the measure of the percentage of

• The amount of time that a request spends in the

The “knee” of the

• The manner in which files are organized on a disk

• The most naïve disk scheduling policy is first-

The next slide illustrates the arm motion of FCFS.

Using FCFS, performance is unpredictable and

• Arm motion is reduced when requests are

The next slide illustrates the arm motion of SSTF.

Shortest Seek Time First

• With SSTF, starvation is possible: A track request

• In the context of disk scheduling, the elevator

The next slide illustrates the arm motion of SCAN.

SCAN Disk Scheduling

• A SCAN variant, called C-SCAN for circular SCAN,

C-SCAN Disk Scheduling

• The disk arm motion of SCAN and C-SCAN is

• At high utilization rates, SSTF performs slightly

• The best way to reduce disk arm motion is to avoid

• Many disk drive-based caches use prefetching

• Prefetching is subject to cache pollution, which

• With cached disk writes, we are faced with the

• Another approach to combating cache failure is to

• When throughput is more important than

• Opportunistic writes have the effect of reducing

• The tradeoffs involved in optimizing disk

• Computer performance assessment relies upon

• CPU performance depends upon many factors.

• Most systems are heavily dependent upon I/O

You might also like