Multi-Core Computer Architecture: Performance Evaluation Methods

Uploaded by

harshithgrandhala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views20 pages

Multi-Core Computer Architecture: Performance Evaluation Methods

Uploaded by

harshithgrandhala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Multi-Core Computer Architecture

Lecture 1D
Performance Evaluation Methods

John Jose
Associate Professor
Department of Computer Science & Engineering
Indian Institute of Technology Guwahati
Measuring Performance
❖ When can we say one computer / architecture / design is
better than others?
❖ Desktop PC – ( execution time of a program)
❖ Server (transactions / unit time)

❖ When can we say X is n times faster than Y ?

❖ Execution timeY / Execution timeX =n
❖ Throughput x/ Throughput y =n
Measuring Performance
❖ Typical performance metrics:
❖Response time
❖Throughput
❖CPU time
❖Wall clock time
❖Speedup
❖ Benchmarks
❖Toy programs (e.g. sorting, matrix multiply)
❖Synthetic benchmarks (e.g. Dhrystone)
❖Benchmark suites (e.g. SPEC06, SPLASH)
Benchmark Suite
Benchmark Based Evaluation
SPEC Ratio
Reference for SPEC 2006:
Sun Ultra Enterprise 2 workstation with
a 296-MHz UltraSPARC II processor
Amdahl's Law
❖ Amdahl’s Law defines the speedup that can be gained by improving
some portion of a computer.
❖ The performance improvement to be gained from using some faster
mode of execution is limited by the fraction of the time the faster mode
can be used.
Amdahl's Law- Illustration
Example: Suppose that we want to enhance the floating point
operations of a processor by introducing a new advanced FPU unit.
Let the new FPU is 10 times faster on floating point computations
than the original processor. Assuming a program has 40% floating
point operations, what is the overall speedup gained by
incorporating the enhancement?

Solution:
Fraction enhanced = 0.4
Speedup enhanced = 10
Amdahl's Law for Parallel Processing
100 100 100 100
100 50 50 25 25 25 25 ∞ Processors, Time ≈0
100 100 100 100
100 50 50 25 25 25 25 ∞ Processors, Time ≈0
100 100 100 100
Work 500, Work 500, Work 500, Work 500,
Time 500 Time 400 Time 350 Time 300
Sp=1X Sp=1.25X Sp=1.4X Sp=1.7X
How much Speed up you can achieve ?
Design Example
A common transformation required in graphics processors is square
root. Implementations of floating-point (FP) square root vary
significantly in performance, especially among processors designed
for graphics. Suppose FP square root (FPSQR) is responsible for
20% of the execution time of a critical graphics benchmark.
One proposal is to enhance the FPSQR hardware and
speed up this operation by a factor of 10. The other alternative is
just to try to make all FP instructions in the graphics processor run
faster by a factor of 1.6; FP instructions are responsible for half of
the execution time for the application. Compare these two design
alternatives using Amdahl's Law.
Design Example
Case A: FPSQR hardware optimization Case B: FP instructions optimization
Principles of Computer Design
❖ All processors are driven by clock.
❖ Expressed as clock rate in GHz or clock period in ns
❖ CPU Time = CPU clock cycles x clock cycle time
Principles of Computer Design

❖ Clock cycle time- hardware technology

❖ CPI- organization and ISA
❖ IC-ISA and compiler technology
Principles of Computer Design
❖ Different instruction types having different CPIs
Example: Basic Performance Analysis
Consider two programs A and B that solves a given problem. A is scheduled
to run on a processor P1 operating at 1 GHz and B is scheduled to run on
processor P2 running at 1.4 GHz. A has total 10000 instructions, out of
which 20% are branch instructions, 40% load store instructions and rest are
ALU instructions. B is composed of 25% branch instructions. The number of
load store instructions in B is twice the count of ALU instructions. Total
instruction count of B is 12000. In both P1 and P2 branch instructions have
an average CPI of 5 and ALU instructions has an average CPI of 1.5. Both
the architectures differ in the CPI of load-store instruction. They are 2 and 3
for P1 and P2, respectively. Which mapping (A on P1 or B on P2) solves the
problem faster, and by how much?
Example: Basic Performance Analysis
A on P1 (1GHz 🡪 CCT = 1ns) B on P2 (1.4 GHz🡪 CCT = 0.714ns)
IC=10000 IC=12000
Fraction BR: L/S: ALU = 20: 40: 40 Fraction BR: L/S: ALU = 25: 50: 25
CPI of BR: L/S: ALU = 5: 2: 1.5 CPI of BR: L/S: ALU = 5: 3 : 1.5

(a) CPI A_P1=(0.2x5 + 0.4x2 + 0.4x1.5) = 2.4

ExT = 2.4 x10000x1 ns= 24000 ns
(b) CPI B_P2=(0.25x5 + 0.5x3 + 0.25x1.5) = 3.125
ExT = 3.125 x12000x0.714 ns = 26775 ns
Hence A on P1 is faster.
Example: Amdahl's Law
A company is releasing 2 latest versions (beta and gamma) of its basic
processor architecture named alpha. Beta and gamma are designed by
making modifications on three major components (X, Y and Z) of the alpha.
It was observed that for a program A the fractions of the total execution time
on these three components, X, Y, and Z are 40%, 30%, and 20%,
respectively. Beta speeds up X and Z by 2 times but slows down Y by 1.3
times, where as gamma speeds up X, Y and Z by 1.2, 1.3 and 1.4 times,
respectively.
(a) How much faster is gamma over alpha for running A?
(b) Whether beta or gamma is faster for running A? Find the speedup factor
fx=0.4 : fy=0.3: fz=0.2
Beta: Nx=2 : Ny=1/1.3 : Nz=2
Gamma: Nx=1.2 : Ny=1.3 : Nz=1.4
Example: Amdahl's Law

(a) Gamma is 1.239 times faster over alpha

(b) Beta is faster than gamma🡪 1.267/1.239 = 1.022 times
[email protected]
http://www.iitg.ac.in/johnjose/

FCI
100% (2)
FCI
48 pages
CWSi235 - Maintenance Utility 1 - 10
100% (1)
CWSi235 - Maintenance Utility 1 - 10
25 pages
CS1601 Computer Architecture
100% (1)
CS1601 Computer Architecture
389 pages
Computer Fundamentals Application PRELIM Module
100% (1)
Computer Fundamentals Application PRELIM Module
24 pages
Computer Architecture and Performance
No ratings yet
Computer Architecture and Performance
33 pages
Fundamentals of Computer Design - 1
No ratings yet
Fundamentals of Computer Design - 1
32 pages
Computer Architecture Unit 1 - Phase 2 PDF
No ratings yet
Computer Architecture Unit 1 - Phase 2 PDF
26 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
Introduction To Computer Organization
No ratings yet
Introduction To Computer Organization
66 pages
Exam 2 Review Answers
No ratings yet
Exam 2 Review Answers
3 pages
Chapter 1 Lecture 2 & 3 - Computer Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Computer Performance
37 pages
Numerical: Central Processing Unit
No ratings yet
Numerical: Central Processing Unit
28 pages
Chapter 1 Lecture 2 & 3 - Performance
No ratings yet
Chapter 1 Lecture 2 & 3 - Performance
36 pages
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
28 pages
Fundamentals of Computer Design: Bina Ramamurthy CS506
No ratings yet
Fundamentals of Computer Design: Bina Ramamurthy CS506
25 pages
Micro Processor and Assembly Language
No ratings yet
Micro Processor and Assembly Language
16 pages
Computer Architecture Measuring Performance
No ratings yet
Computer Architecture Measuring Performance
33 pages
C F C P S (CS61063) : Tutorial 1
No ratings yet
C F C P S (CS61063) : Tutorial 1
13 pages
06 CA (Performance Enhancement)
No ratings yet
06 CA (Performance Enhancement)
31 pages
Coa Presentation
No ratings yet
Coa Presentation
20 pages
Eceg-3143 Computer Architecture & Organization Lecture 2-Computer Performance Concepts
No ratings yet
Eceg-3143 Computer Architecture & Organization Lecture 2-Computer Performance Concepts
15 pages
AKT Turkmenbaev Nursulttan
No ratings yet
AKT Turkmenbaev Nursulttan
3 pages
4 Performance
No ratings yet
4 Performance
27 pages
L5-L6-Performance Issues
No ratings yet
L5-L6-Performance Issues
47 pages
Quiz For Chapter 1 Computer Abstractions and Technology
No ratings yet
Quiz For Chapter 1 Computer Abstractions and Technology
5 pages
TUT2
No ratings yet
TUT2
3 pages
Parallel2 PDF
No ratings yet
Parallel2 PDF
16 pages
Amdahl's Law Example #2: - Protein String Matching Code
No ratings yet
Amdahl's Law Example #2: - Protein String Matching Code
23 pages
Sof108 Computer Architecture SESSION: September 2019 TUTORIAL 2 - Quantitative Principles of Computer Design
No ratings yet
Sof108 Computer Architecture SESSION: September 2019 TUTORIAL 2 - Quantitative Principles of Computer Design
3 pages
40 RT41FB MPC2HG
100% (1)
40 RT41FB MPC2HG
9 pages
CALec 1
No ratings yet
CALec 1
24 pages
Amdahl
No ratings yet
Amdahl
2 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
18 pages
Kelompok 1 - Fundamental of Computer Design
No ratings yet
Kelompok 1 - Fundamental of Computer Design
20 pages
Ece586 Lec4 1
No ratings yet
Ece586 Lec4 1
4 pages
MCA Manual
No ratings yet
MCA Manual
96 pages
Computer Performance
No ratings yet
Computer Performance
35 pages
EC305 Microprocessors and MicrocontrollersS5-EC-Syllabus PDF
No ratings yet
EC305 Microprocessors and MicrocontrollersS5-EC-Syllabus PDF
3 pages
Computer Architecture Unit 1
No ratings yet
Computer Architecture Unit 1
12 pages
CA Classes-41-45
No ratings yet
CA Classes-41-45
5 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Computer Architecture Measurement
No ratings yet
Computer Architecture Measurement
26 pages
Intel 955X Express Chipset: Datasheet
No ratings yet
Intel 955X Express Chipset: Datasheet
236 pages
Description: Citroen 8-Pins Bluetooth Streaming Interface Incl. Carkit Function & AUX Option (RD3)
No ratings yet
Description: Citroen 8-Pins Bluetooth Streaming Interface Incl. Carkit Function & AUX Option (RD3)
4 pages
LAB - 7 - R - AKASH - 20BCE1501 - Template
No ratings yet
LAB - 7 - R - AKASH - 20BCE1501 - Template
15 pages
School Broadcast System
No ratings yet
School Broadcast System
2 pages
Quiz For Chapter 1 With Solutions
No ratings yet
Quiz For Chapter 1 With Solutions
6 pages
Dell N4020 09275 PDF
No ratings yet
Dell N4020 09275 PDF
88 pages
Board I/O (Input/Output)
No ratings yet
Board I/O (Input/Output)
5 pages
Quatitative Principle
No ratings yet
Quatitative Principle
56 pages
xiSWITCH Infographic
No ratings yet
xiSWITCH Infographic
1 page
Recommendation Report
No ratings yet
Recommendation Report
10 pages
Lec 3
No ratings yet
Lec 3
21 pages
CAAL Chapter 2 Lecture 3 Amdhal's
No ratings yet
CAAL Chapter 2 Lecture 3 Amdhal's
10 pages
Section 8.12.1 "OSCCAL - Oscillator Calibration Register" On Page 32 Table 28-1 On Page 260 Section 27.4 "Calibration Byte" On Page 244
No ratings yet
Section 8.12.1 "OSCCAL - Oscillator Calibration Register" On Page 32 Table 28-1 On Page 260 Section 27.4 "Calibration Byte" On Page 244
1 page
Cas Vengeance Default
No ratings yet
Cas Vengeance Default
12 pages
CA Lecture1
No ratings yet
CA Lecture1
9 pages
VSP 70-04-51-00-M122
No ratings yet
VSP 70-04-51-00-M122
25 pages
Reducing Boot Time: Techniques For Fast Booting
No ratings yet
Reducing Boot Time: Techniques For Fast Booting
27 pages
Mac SSD Upgrades For All MacBook Air Models
No ratings yet
Mac SSD Upgrades For All MacBook Air Models
6 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Medidas de Rendimiento
No ratings yet
Medidas de Rendimiento
5 pages
Connection Manual: FANUC Series 16 - LB FANUC Series 160 - LB
No ratings yet
Connection Manual: FANUC Series 16 - LB FANUC Series 160 - LB
270 pages
Remax Pricelist by Mirage 2021
No ratings yet
Remax Pricelist by Mirage 2021
41 pages
Performance
No ratings yet
Performance
4 pages
CS-3006 4 PerformanceAnalysis
No ratings yet
CS-3006 4 PerformanceAnalysis
62 pages
Computer Architecture
No ratings yet
Computer Architecture
26 pages
Man pr5230 Inst en 949905052300-Rel3.50
No ratings yet
Man pr5230 Inst en 949905052300-Rel3.50
300 pages
Forty Seconds CV v0
No ratings yet
Forty Seconds CV v0
1 page
2 Week
No ratings yet
2 Week
35 pages
080 236 LAH - dum.900.BE SW Integrity Protection V1.1
No ratings yet
080 236 LAH - dum.900.BE SW Integrity Protection V1.1
21 pages
B38DF LS2b Performance
No ratings yet
B38DF LS2b Performance
20 pages
1.2 Performance
No ratings yet
1.2 Performance
14 pages
InfoGenesis Flex Overview
No ratings yet
InfoGenesis Flex Overview
31 pages
Dell Inspiron 17 5770 P35E P35E001 Laptop Factory Service Hardware Manual English
No ratings yet
Dell Inspiron 17 5770 P35E P35E001 Laptop Factory Service Hardware Manual English
139 pages
MIMXRT1060EVKBUM
No ratings yet
MIMXRT1060EVKBUM
28 pages
Computer Architecture Unit1
No ratings yet
Computer Architecture Unit1
20 pages
CS-3006 10 PerformanceAnalysis
No ratings yet
CS-3006 10 PerformanceAnalysis
52 pages
Outnumbered Manual
No ratings yet
Outnumbered Manual
14 pages
Lec 2
No ratings yet
Lec 2
31 pages
Lec 2
No ratings yet
Lec 2
31 pages
Lecture 3.1.4 (Amdahl's Law)
No ratings yet
Lecture 3.1.4 (Amdahl's Law)
13 pages
02 - 01 Microprogrammed Control
No ratings yet
02 - 01 Microprogrammed Control
53 pages
513 Lec 02 Quantifying Performance
No ratings yet
513 Lec 02 Quantifying Performance
50 pages
Amdahl's Law: Example 1
No ratings yet
Amdahl's Law: Example 1
12 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Python Beyond Limits: Python, #3
From Everand
Python Beyond Limits: Python, #3
AnwaarX
No ratings yet

Multi-Core Computer Architecture: Performance Evaluation Methods

Uploaded by

Multi-Core Computer Architecture: Performance Evaluation Methods

Uploaded by

Multi-Core Computer Architecture

❖ When can we say X is n times faster than Y ?

❖ Clock cycle time- hardware technology

(a) CPI A_P1=(0.2x5 + 0.4x2 + 0.4x1.5) = 2.4

(a) Gamma is 1.239 times faster over alpha

You might also like