0% found this document useful (0 votes)
74 views30 pages

IT401 Computer Organization and Architecture: Prasun Ghosal

per

Uploaded by

Aveek Chatterjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views30 pages

IT401 Computer Organization and Architecture: Prasun Ghosal

per

Uploaded by

Aveek Chatterjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

1

IT401 Computer Organization


and Architecture
Prasun Ghosal
Department of Information Technology
Bengal Engineering and Science University,
Shibpur
2
Outline
How to measure, report and summarize performance?
What are the major factors that determine the
performance of a computer?
Execution time is the only adequate measure of
performance
Benchmarks, what are they, and how are they used to
evaluate performance
3
Why study Performance?
Hardware performance is often key to the effectiveness of an
entire system of Hardware and Software
The goal is not just to assess performance but need to understand
what affects performance of a machine
To improve performance of software understand how
hardware affects system performance
How well a program uses instructions of the machine?
How well underlying HW implements instructions?
How well memory and I/O systems perform?
4
How to define performance?
Airplanes example
Passenger capacity
Cruising range (miles)
Cruising speed (m.p.h)
Passenger throughput (passengers * m.p.h)
Which airplane has the best
performance?
Highest cruising speed
Longest range
Largest capacity
Speed
Highest cruising speed
highest throughput
Run a program on two different workstations, which
is fastest?
User: response time (execution time)
Computer center manager: throughput
(how many tasks were performed during a time interval)
Relationship between response time and throughput
5
Performance
Use response time or execution time. To maximize performance
minimize execution time for some task
ime ExecutionT
e Performanc
1
=
What does it mean that Performance(X) is greater than Performance(Y)?
) ( ) (
) (
1
) (
1
) ( ) (
X ime ExecutionT Y ime ExecutionT
Y ime ExecutionT X ime ExecutionT
Y e Performanc X e Performanc
>
>
>
X is n times faster than Y n
X ime ExecutionT
Y ime ExecutionT
Y e Performanc
X e Performanc
= =
) (
) (
) (
) (
6
Performance Example
Machine A runs a program in 10 seconds and machine B runs
the same program in 15 seconds, how much faster is A than B?
5 . 1
10
15
) (
) (
) (
) (
= = =
A ime ExecutionT
B ime ExecutionT
B e Performanc
A e Performanc
A is 1.5 times faster than B
7
Measuring Performance1/4
Time is the measure of computer performance (sec per program)
Response time or elapsed time
Total time to complete a task including everything (disk access, memory
access, operating system overhead, )
CPU execution time (CPU time)
Time CPU spends computing for this task and does not include time spent
waiting for I/O or running other programs (some computers are timeshared)
CPU execution time can be divided into
User CPU time: CPU time spent in the program
System CPU time: CPU time spent in operating system performing tasks
on behalf of the program
8
Measuring Performance2/4
Example of user CPU time and System CPU time
Output of Unix time command
90.7u 12.9s 2:39 65%
User CPU time 90.7 sec
System CPU time 12.9 sec
Elapsed time 2:39 =( 2 minutes and 39 sec) =159 sec
% of elapsed time that is CPU time =(90.7 +12.9)/159 =65%
Then 100 65 =35% of elapsed time was spent doing something else
(waiting for I/O, running other programs, )
9
Measuring Performance3/4
Express CPU execution time in terms of other metric that relates to how fast the
HW can perform basic functions
Computers governed by a clock that runs at constant rate and determines when
events happen in HW
Length of a clock period is Clock cycle (measured in nanoseconds (10
-9
sec) or
picoseconds (10
-12
sec))
Clock rate is 1/(clock cycle) (measured in Megahertz (MHz =10
6
Hz), or
Gigahertz (GHz =10
9
Hz) )
1 Hertz is 1 cycle/sec
CPU execution time =CPU clock cycles for a program * clock cycle time
CPU execution time =CPU clock cycles for a program / clock rate
How to improve CPU execution time?
10
Measuring Performance4/4
Relating to Software
express CPU clock cycles in terms of program instructions
CPU clock cycles =Instruction for a program * Average clock cycles per
instruction
Clock cycles per instruction (average number of cycles each instruction takes
to execute) is abbreviated as CPI
CPI can be used to compare two implementations of the same instruction set
architecture (since instruction count for a program will remain the same)
11
Measuring Performance1/5
CPU clock cycles =Instructions for a program * CPI
CPU time =CPU clock cycles * clock cycle time
CPU time =Instruction count * CPI * clock cycle time
CPU time =Instruction count * CPI/clock rate
clockcycle
Seconds
n Instructio
s Clockcycle
ogram
ns Instructio
Time * *
Pr
=
Basic Performance Components
CPU execution time
Instruction count
CPI
Clock cycle time
12
Measuring Performance2/5
How to determine values of performance components
CPU execution time: measurement
Clock cycle time: published as part of documentation for a machine
Instruction count:
Software tools to profile execution, or use a simulator of the architecture
Hardware counters if available to measure the #of instructions executed
CPI: varies by application , as well as among implementations within the same
instruction set. Obtained through a detailed simulation or by combining HW
counters and simulation
CPI Can be calculated if different types of instructions and individual clock cycle
counts are known
13
Measuring Performance3/5

=
=
n
i
i i C CPI
1
) * ( CPU clock cycles
C
i
: number of instructions of class i executed
CPI
i
: average number of cycles per instruction for that
instruction class
n: number of instruction classes
Overall program CPI dependent on
Number of cycles for each instruction type
Frequency of each instruction type in the program
execution
14
Measuring Performance4/5
CPU clock cycles =Instructions for a program * CPI
ime ExecutionT
e Performanc
1
=
CPU time =CPU clock cycles * clock cycle time
CPU time =CPU clock cycles for a program / clock rate
CPU time =Instruction count * CPI * clock cycle time
CPU time =Instruction count * CPI/clock rate

=
=
n
i
i i C CPI
1
) * (
CPU clock cycles
15
Benchmarks1/3
Concept of Workload
Informally, set of programs that the user runs day in and day out
Benchmarks
Programs specifically chosen to measure performance
Form a workload that the user hopes will predict the performance of the actual
workload
Best benchmark types are real programs
Use of benchmarks whose performance depends on small code segments
encourages optimizations in either the architecture or compiler
A problem: Compilers with special-purpose optimizations targeted at specific
benchmarks. Will such optimizations produce good or correct code with a real
application?
16
Benchmarks2/3
COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED
Matrix 300 in SPEC suite
in 1989
SPEC is System
Performance Evaluation
Cooperative
For matrix 300, the
enhanced compiler
improves performance by a
factor of more than 9!.
Although not that much
improvement with other
benchmarks.
SPEC benchmark web site
http://www.specbench.org
17
Benchmarks3/3
Why real programs are not used to measure performance?
Small size of benchmark (easier compilation and simulation)
Compilers might not be available for a new machine
Numerous published performance results are available for small
benchmarks
Benchmarks are OK for the initial design phase, but a working computer
system should be evaluated with a real program
Writing Performance reports
Reproducibility
Include everything needed to be able to duplicate the experiment
18
Comparing and Summarizing Performance1/4
Selected benchmark
Agreed to use response time or throughput
How to summarize performance of a group of benchmarks?
M/C A M/C B
P1 1 10
P2 1000 100
Total 1001 110
A is 10 times faster than B for P1
B is 10 times faster than A for P2
What is the relative performance of A &
B?
Use Total Execution Time
1 . 9
110
1001
) (
) (
) (
) (
= = =
B ime ExecutionT
A ime ExecutionT
A e Performanc
B e Performanc
19
Comparing and Summarizing Performance2/4
B is 9.1 times faster than A for P1 and P2 together
One figure as Summary of performance directly proportional to execution
time
If the workload consists of running P1 and P2 an equal number of times,
this statement would predict the relative execution times for the workload on
each machine
Average of execution times that is directly proportional to total execution
time isarithmetic mean (AM)

=
=
n
i
i Time
n
AM
1
) (
1
Time(i): execution time for i
th
program
n: total number of programs in the workload
A Smaller mean means smaller average
execution time and thus improved performance
20
Comparing and Summarizing Performance3/4
Arithmetic mean proportional to execution time, if programs in workload are
each run an equal number of times. What happens if not the case?
Assign a weighting factor w(i) to each program to indicate frequency of the
program in the workload
Weighted arithmetic mean
AM special case of weighted AM when all weights are equal

=
=
n
i
i Time i w WeightedAM
1
) ( * ) (
21
Comparing and Summarizing Performance4/4
Program M/C A M/C B M/C C
P1 1 10 20
P2 1000 100 20
Table shows runtimes of P1 and P2 on three machines A, B, and C
Workload consists of P1 and P2.
P1 is run 10 times as often as P2
Find which machine is fastest for this workload and by how much?
22
SPEC95 Benchmarks
CPU benchmark
Created by a set of computer companies in 1989
SPEC95 (8 integer and 10 floating point programs). Figure 2.6
SPEC95 web site (http://www.specbench.org/osg/cpu95/news/cpu95descr.html)
SPEC ratio for xxx.benchmark =
xxx.benchmark reference time /xxx.benchmark run time
Normalized measure. Higher results indicate faster performance
Reference machine is a Sun SPARCstation 10/40
SPECint95 or SPECfp95 summary measurement is obtained by taking geometric mean
of the SPEC ratios
n
n
i
i SPECratio

=1
) (
=
n
i
i a
1
) (
Product of a
1
* a
2
* ..* a
n
23
SPEC95 Benchmark results for Pentium and
Pentium Pro
At same clock rate, Pentium Pro
is 1.4 to 1.5 times faster
When clock rate increased by a
certain factor, processor
performance increases by a lower
factor
Pentium clock rate from 100 to
200 MHz. SPECint95 performance
improves by only 1.7 (Why?)
24
SPEC95 Benchmark results for Pentium and
Pentium Pro
At same clock rate, Pentium
Pro is 1.7 to 1.8 times faster
Clock rate from 100 to 200
MHz, SPECfp95 improves by
only 1.4 (Why?)
Bottleneck at memory system
due to increase of processor
speed, which effect is more
evident on floating point
benchmarks because of size.
25
Performance Summary Example1/2
M/C A M/C B
P1 1482 139
P2 2266 254
P3 6206 690
Which machine is faster according to total
execution time? And by how much?
Total Execution Time (A) =1482 +2266 +6206 =9954
Total Execution (B) =139 +254 +690 =1083
Machine B is fastest by 9954/1083 =9.27 times
26
Performance Summary Example2/2
M/C A M/C B
P1 1482 139
P2 2266 254
P3 6206 690
Which machine is faster by the geometric
mean measure?
Remember how SPEC reported performance?
Normalize in reference to one machine
Choose A as reference machine
Obtain Execution time ratios (ET Ratio)
ET Ratio(P1) =ET(A)/ET(B) =1482/139 =10.66
ET Ratio (P2) =2266/254 =8.92
ET Ratio(P3) =6206/690 =8.99
Geometric Mean =(Ratio (P1) * Ratio(P2) * Ratio(P3))
1/3
Geometric Mean =9.49
Machine B is 9.49 times faster than A according to
geometric mean measure
27
Amdahls Law1/3
Pitfall
Expecting the improvement of one aspect of a machine to increase performance
by an amount proportional to the size of the improvement
Program runs in 100 sec on a machine
Multiply operations responsible for 80 sec of time
How much do we need to improve the speed of multiplication if program is to run 5
times faster?
Execution time after improvement =
(Execution time affected by improvement/Amount of improvement +Execution time unaffected)
Execution time after improvement =80/n +(100-80) =20 =(100/5)
20 =80/n +20 80/n =0 no n can be found to achieve the requested improvement
Make the common case fast
28
Amdahls Law2/3
Another form of Amdahls Law (to yield Speedup)
Speedup =Performance after improvement/Performance before
Speedup =Execution time before/Execution time after improvement
Assume new hardware added to machine
f =fractions of all operations which use new hardware
s =speedup of those operations using new hardware
Execution time with new hardware is T
new
Execution time without new hardware is T
old
T
new
=f* T
old
/s +(1-f) * T
old
Overall speedup S =T
old
/T
new
Speedup =s / (s f * (s-1))
f
s 0.1 Speedup
2 1.052632
5 1.086957
10 1.098901

s 0.5 Speedup
2 1.333333
5 1.666667
10 1.818182
s 0.9 Speedup
2 1.818182
5 3.571429
10 5.263158
s 0.99 Speedup
2 1.980198
5 4.807692
10 9.174312
29
Amdahls Law3/3
Example of memory versus processor speedup
A =B op C
Assume memory access takes 4 cycles and a typical operation takes 2 cycles
Which of the following achieves the best increase in performance
Increase memory speed by 50%
Double operation speed
Calculate how many memory accesses are needed first?
1 to get instruction from memory
2 to get B and C from memory
1 to store result (A) back in memory
Then we need a total of 4 memory access operations
Memory access time =4 (accesses) * 4 (cycles/access) =16 cycles
Operation time =1 (operation) * 2 (cycles/operation) =2 cycles
Total number of cycles =16 +2 =18
Option 1 increase memory speed by 50%
s1 =1.5 (how?)
f1 =memory access time/ total time
=16/18 =0.889
S1 =1.42
Option 2 double operation speed
s2 =2
f2 =operation time/total time
=2/18 =0.111
S2 =1.059
30
MIPS as a Performance Metric
MIPS is million instructions per second
MIPS =instruction count / (Execution time * 10
6
)
Instruction execution rate (instruction/sec)
Faster machines have a higher MIPS rating
Problems with MIPS
Does not take into account capabilities of instructions
(can not compare computers with different ISA)
Varies between programs on the same computer
(a machine can not have a single MIPS rating for all programs)
Can vary inversely with performance

You might also like