0% found this document useful (0 votes)
86 views20 pages

Multi-Core Computer Architecture: Performance Evaluation Methods

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views20 pages

Multi-Core Computer Architecture: Performance Evaluation Methods

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Multi-Core Computer Architecture

Lecture 1D
Performance Evaluation Methods

John Jose
Associate Professor
Department of Computer Science & Engineering
Indian Institute of Technology Guwahati
Measuring Performance
❖ When can we say one computer / architecture / design is
better than others?
❖ Desktop PC – ( execution time of a program)
❖ Server (transactions / unit time)

❖ When can we say X is n times faster than Y ?


❖ Execution timeY / Execution timeX =n
❖ Throughput x/ Throughput y =n
Measuring Performance
❖ Typical performance metrics:
❖Response time
❖Throughput
❖CPU time
❖Wall clock time
❖Speedup
❖ Benchmarks
❖Toy programs (e.g. sorting, matrix multiply)
❖Synthetic benchmarks (e.g. Dhrystone)
❖Benchmark suites (e.g. SPEC06, SPLASH)
Benchmark Suite
Benchmark Based Evaluation
SPEC Ratio
Reference for SPEC 2006:
Sun Ultra Enterprise 2 workstation with
a 296-MHz UltraSPARC II processor
Amdahl's Law
❖ Amdahl’s Law defines the speedup that can be gained by improving
some portion of a computer.
❖ The performance improvement to be gained from using some faster
mode of execution is limited by the fraction of the time the faster mode
can be used.
Amdahl's Law- Illustration
Example: Suppose that we want to enhance the floating point
operations of a processor by introducing a new advanced FPU unit.
Let the new FPU is 10 times faster on floating point computations
than the original processor. Assuming a program has 40% floating
point operations, what is the overall speedup gained by
incorporating the enhancement?

Solution:
Fraction enhanced = 0.4
Speedup enhanced = 10
Amdahl's Law for Parallel Processing
100 100 100 100
100 50 50 25 25 25 25 ∞ Processors, Time ≈0
100 100 100 100
100 50 50 25 25 25 25 ∞ Processors, Time ≈0
100 100 100 100
Work 500, Work 500, Work 500, Work 500,
Time 500 Time 400 Time 350 Time 300
Sp=1X Sp=1.25X Sp=1.4X Sp=1.7X
How much Speed up you can achieve ?
Design Example
A common transformation required in graphics processors is square
root. Implementations of floating-point (FP) square root vary
significantly in performance, especially among processors designed
for graphics. Suppose FP square root (FPSQR) is responsible for
20% of the execution time of a critical graphics benchmark.
One proposal is to enhance the FPSQR hardware and
speed up this operation by a factor of 10. The other alternative is
just to try to make all FP instructions in the graphics processor run
faster by a factor of 1.6; FP instructions are responsible for half of
the execution time for the application. Compare these two design
alternatives using Amdahl's Law.
Design Example
Case A: FPSQR hardware optimization Case B: FP instructions optimization
Principles of Computer Design
❖ All processors are driven by clock.
❖ Expressed as clock rate in GHz or clock period in ns
❖ CPU Time = CPU clock cycles x clock cycle time
Principles of Computer Design

❖ Clock cycle time- hardware technology


❖ CPI- organization and ISA
❖ IC-ISA and compiler technology
Principles of Computer Design
❖ Different instruction types having different CPIs
Example: Basic Performance Analysis
Consider two programs A and B that solves a given problem. A is scheduled
to run on a processor P1 operating at 1 GHz and B is scheduled to run on
processor P2 running at 1.4 GHz. A has total 10000 instructions, out of
which 20% are branch instructions, 40% load store instructions and rest are
ALU instructions. B is composed of 25% branch instructions. The number of
load store instructions in B is twice the count of ALU instructions. Total
instruction count of B is 12000. In both P1 and P2 branch instructions have
an average CPI of 5 and ALU instructions has an average CPI of 1.5. Both
the architectures differ in the CPI of load-store instruction. They are 2 and 3
for P1 and P2, respectively. Which mapping (A on P1 or B on P2) solves the
problem faster, and by how much?
Example: Basic Performance Analysis
A on P1 (1GHz 🡪 CCT = 1ns) B on P2 (1.4 GHz🡪 CCT = 0.714ns)
IC=10000 IC=12000
Fraction BR: L/S: ALU = 20: 40: 40 Fraction BR: L/S: ALU = 25: 50: 25
CPI of BR: L/S: ALU = 5: 2: 1.5 CPI of BR: L/S: ALU = 5: 3 : 1.5

(a) CPI A_P1=(0.2x5 + 0.4x2 + 0.4x1.5) = 2.4


ExT = 2.4 x10000x1 ns= 24000 ns
(b) CPI B_P2=(0.25x5 + 0.5x3 + 0.25x1.5) = 3.125
ExT = 3.125 x12000x0.714 ns = 26775 ns
Hence A on P1 is faster.
Example: Amdahl's Law
A company is releasing 2 latest versions (beta and gamma) of its basic
processor architecture named alpha. Beta and gamma are designed by
making modifications on three major components (X, Y and Z) of the alpha.
It was observed that for a program A the fractions of the total execution time
on these three components, X, Y, and Z are 40%, 30%, and 20%,
respectively. Beta speeds up X and Z by 2 times but slows down Y by 1.3
times, where as gamma speeds up X, Y and Z by 1.2, 1.3 and 1.4 times,
respectively.
(a) How much faster is gamma over alpha for running A?
(b) Whether beta or gamma is faster for running A? Find the speedup factor
fx=0.4 : fy=0.3: fz=0.2
Beta: Nx=2 : Ny=1/1.3 : Nz=2
Gamma: Nx=1.2 : Ny=1.3 : Nz=1.4
Example: Amdahl's Law

(a) Gamma is 1.239 times faster over alpha


(b) Beta is faster than gamma🡪 1.267/1.239 = 1.022 times
[email protected]
http://www.iitg.ac.in/johnjose/

You might also like