This Unit: - ! Multicycle Datapath - ! Clock Vs CPI - ! CPU Performance Equation - ! Performance Metrics - ! Benchmarking

This document discusses performance metrics for CPUs. It introduces the concepts of multicycle datapaths, clock speed versus cycles per instruction (CPI), and the CPU performance equation. It explains that overall performance depends on the number of instructions, CPI, and clock cycle time. Different types of datapaths trade off these factors differently. The document also discusses benchmarking performance using standard workloads like SPEC, and how average performance metrics like mean and harmonic mean are calculated.

Uploaded by

avssrinivasav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views5 pages

This Unit: - ! Multicycle Datapath - ! Clock Vs CPI - ! CPU Performance Equation - ! Performance Metrics - ! Benchmarking

Uploaded by

avssrinivasav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CIS 371 (Roth/Martin): Performance & Multicycle 1

CIS 371
Computer Organization and Design
Unit 4: Performance & Multicycle
CIS 371 (Roth/Martin): Performance & Multicycle 2
This Unit
Multicycle datapath
Clock vs CPI
CPU performance equation
Performance metrics
Benchmarking
CPU Mem I/O
System software
App App App
CIS 371 (Roth/Martin): Performance & Multicycle 3
Readings
P&H
Chapter 1.4 (for performance discussion)
CIS 371 (Roth/Martin): Performance & Multicycle 4
240 ! 371
CIS 240: build something that works
CIS 371: build something that works well
well means high-performance but also cheap, low-power, etc.
Mostly high-performance
So, what is the performance of this?
What is performance?
PC
Insn
Mem
Register
File
s1 s2 d
Data
Mem
+
4
CIS 371 (Roth/Martin): Performance & Multicycle 5
CPU Performance Equation
Multiple aspects to performance: helps to isolate them
Latency = seconds / program =
(insns / program) * (cycles / insn) * (seconds / cycle)
Insns / program: dynamic insn count = f(program, compiler, ISA)
Cycles / insn: CPI = f(program, compiler, ISA, micro-arch)
Seconds / cycle: clock period = f(micro-arch, technology)
For low latency (better performance) minimize all three
Difficult: often pull against one another
Example we have seen: RISC vs. CISC ISAs
RISC: low CPI/clock period, high insn count
CISC: low insn count, high CPI/clock period
CIS 371 (Roth/Martin): Performance & Multicycle 6
MIPS (performance metric, not the ISA)
Factor out dynamic insn count, CPU equation becomes
Latency: seconds / insn = (cycles / insn) * (seconds / cycle)
Throughput: insns / second = (insns / cycle) * (cycles / second)
MIPS (millions of insns per second): insns / second * 10
-6
Cycles / second: clock frequency (in MHz)
Example: CPI = 2, clock = 500 MHz (2ns period), what is MIPS?
0.5 * 500 MHz * 10
-6
= 250 MIPS
MIPS is okay for micro-architects
Typically work in one ISA/one compiler, treat insn count as fixed
Not okay for general public
Processors with different ISAs/compilers have incomparable MIPS
Wait, it gets worse
CIS 371 (Roth/Martin): Performance & Multicycle 7
Mhz (MegaHertz) and Ghz (GigaHertz)
1 Hertz = 1 cycle per second
1 Ghz is 1 cycle per nanosecond, 1 Ghz = 1000 Mhz
Micro-architects often ignore instruction count
but general public (mostly) also ignores CPI
Equates clock frequency with performance!!
Which processor would you buy?
Processor A: CPI = 2, clock = 5 GHz
Processor B: CPI = 1, clock = 3 GHz
Probably A, but B is faster (assuming same ISA/compiler)
Classic example
800 MHz PentiumIII faster than 1 GHz Pentium4!
Same ISA and compiler!
Meta-point: danger of partial performance metrics!
CIS 371 (Roth/Martin): Performance & Multicycle 8
System Performance
Clock frequency implies processor core clock frequency
Other system components have their own clocks (or not)
E.g., increasing processor clock doesnt accelerate memory
Example
Processor A: CPI
CORE
= 1, CPI
MEM
= 1, proc. clock = 500 MHz (2ns)
What is the speedup if we double processor clock frequency?
Base: CPI = 2 ! IPC = 0.5 ! MIPS = 250
New: CPI = 3 ! IPC = 0.33 ! MIPS = 333
Clock *= 2 ! CPI
MEM
*= 2
Speedup = 333/250 = 1.33 << 2
What about an infinite clock frequency?
Only a 2X (factor of 2) speedup
Example of Amdahls Law
CIS 371 (Roth/Martin): Performance & Multicycle 9
Amdahls Law
Literally: total speedup limited by non-accelerated piece
Example: can optimize 50% of program A
Even magic optimization that makes this 50% disappear
only yields a 2X speedup
For consumers: buy a balanced system
For microarchitects: build a balanced system
MCCF (Make Common Case Fast)
Focus your efforts on things that matter
CIS 371 (Roth/Martin): Performance & Multicycle 10
Single-Cycle Datapath Performance
Goes against make common case fast (MCCF) principle
+ Low CPI: 1
Long clock period: to accommodate slowest instruction
Especially if multiply/divide are included
P
C
Insn
Mem
Register
File
S
X
s1 s2 d
Data
Mem
a
d
+
4
<<
2
Multi-cycle Operations
Lets allow long-latency operations take many cycles
Calculation assumptions:
Most instructions take 10 nanoseconds (ns)
But multiply instruction takes 40ns
Multiplies are 10% of all instructions
Single-cycle datapath: 40ns clock period, 1 CPI
40ns per instruction
1/40ns = 0.025 billion instructions per second = 25 MIPS
Multi-cycle datapath: 10ns clock period
Average CPI = (90% * 1) + (10% * 4) = 1.3
13ns per instruction
1/13ns = 0.77 billion instructions per second = 77 MIPS
Multi-cycle is 3 times (or 200%) faster than single-cycle
CIS 371 (Roth/Martin): Performance & Multicycle 11 CIS 371 (Roth/Martin): Performance & Multicycle 12
Fine-Grained Multi-Cycle Datapath
Multi-cycle datapath: attacks high clock period
Cut datapath into multiple stages (5 here), isolate using FFs
Finite state machine (FSM) control walks insns through
+ Insns can skip stages and exit early (memory ops vs alu ops)
P
C
Insn
Mem
Register
File
S
X
s1 s2 d
Data
Mem
a
d
+
4
<<
2
I
R
D O
B
A
s3
s3
s3
s4
s5
s5
s5
CIS 371 (Roth/Martin): Performance & Multicycle 13
Multi-Cycle Datapath Performance
Opposite performance split of single-cycle datapath
+ Short clock period
High CPI
P
C
Insn
Mem
Register
File
S
X
s1 s2 d
Data
Mem
a
d
+
4
<<
2
I
R
D O
B
A
Multicycle Performance
Assumptions
30% loads, 5ns
10% stores, 5ns
50% adds, 4ns
10% multiplies, 20ns
Single-cycle datapath: 20ns clock period, 1 CPI
20ns per instruction or 50 MIPS
Simple multi-cycle datapath: 5ns clock
CPI = (90% * 1) + (10% * 4) = 1.3
6.5ns per instruction or 153 MIPS
Fine-grained multi-cycle datapath: 1ns clock
CPI = (30% * 5) + (10% * 5) + (50% * 4) + (10% * 20) =
1.5 + 0.5 + 2 + 2 = 6 CPI
6ns per instruction or 166 MIPS
CIS 371 (Roth/Martin): Performance & Multicycle 14
CIS 371 (Roth/Martin): Performance & Multicycle 15
Processor Performance and Workloads
Q: what does performance of a chip mean?
A: nothing, there must be some associated workload
Workload: set of tasks someone (you) cares about
Benchmarks: standard workloads
Used to compare performance across machines
Either are or highly representative of actual programs people run
Micro-benchmarks: non-standard non-workloads
Tiny programs used to isolate certain aspects of performance
Not representative of complex behaviors of real applications
Examples: binary tree search, towers-of-hanoi, 8-queens, etc.
CIS 371 (Roth/Martin): Performance & Multicycle 16
SPEC Benchmarks
SPEC: Standard Performance Evaluation Corporation
http://www.spec.org/
Consortium that collects, standardizes, and distributes benchmarks
Suites for CPU, Java, I/O, Web, Mail, OpenMP (multithreaded), etc.
Updated every few years: so companies dont target benchmarks
Post SPECmark results for different processors
1 number that represents performance for entire suite
CPU 2006: 29 CPU-intensive C/C++/Fortran programs
integer: bzip2, gcc, perl, hmmer (genomics), h264, etc.
floating-point: wrf (weather), povray, sphynx3 (speech), etc.
TPC: Transaction Processing Council
Like SPEC, but for web/database server workloads
Much heavier on memory, I/O, network, than on CPU
Doesnt give you the source code, only a description
CIS 371 (Roth/Martin): Performance & Multicycle 17
SPECmark
Reference machine: Sun SPARC 10
Latency SPECmark
For each benchmark
Take odd number of samples: on both machines
Choose median
Take latency ratio (Sun SPARC 10 / your machine)
Take geometric mean of ratios over all benchmarks
Throughput SPECmark
Run multiple benchmarks in parallel on multiple-processor system
Recent SPECmark latency leaders
SPECint: Intel 2.3 GHz Core2 Extreme (3119)
SPECfp: IBM 2.1 GHz Power5+ (4051)
CIS 371 (Roth/Martin): Performance & Multicycle 18
Mean (Average) Performance Numbers
Arithmetic: (1/N) * !
P=1..N
Latency(P)
For units that are proportional to time (e.g., latency)
You can add latencies, but not throughputs
Latency(P1+P2,A) = Latency(P1,A) + Latency(P2,A)
Throughput(P1+P2,A) != Throughput(P1,A) + Throughput(P2,A)
1 mile @ 30 miles/hour + 1 mile @ 90 miles/hour
Average is not 60 miles/hour
Harmonic: N / !
P=1..N
1/Throughput(P)
For units that are inversely proportional to time (e.g., throughput)
Geometric:
N
"#
P=1..N
Speedup(P)
For unitless quantities (e.g., speedups)
CIS 371 (Roth/Martin): Performance & Multicycle 19
How Can We Make Common Case Fast?
If we dont know what CC is?
How is CPI actually measured?
Execution time: time (Unix): wall clock / CPU + system
CPI = CPU time / (clock frequency * dynamic insn count)
How is dynamic insn count measured?
Hardware event counters
More useful is CPI breakdown (CPI
CPU
, CPI
MEM
, etc.)
So we know what performance problems are and what to fix
Hardware event counters:
+Accurate
Cant measure everything or evaluate modifications
Cycle-level micro-architecture simulation: e.g., SimpleScalar
+Measure exactly what you want, evaluate potential fixes
Burden of accuracy is on the simulator writer
CIS 371 (Roth/Martin): Performance & Multicycle 20
Summary
Multicycle datapath
Clock vs CPI
CPU performance equation
Performance metrics
Benchmarking
CPU Mem I/O
System software
App App App

Hcdslk1i DVD Receiver PDF
No ratings yet
Hcdslk1i DVD Receiver PDF
88 pages
2011 ISEE Blasting Seismograph Stds
No ratings yet
2011 ISEE Blasting Seismograph Stds
17 pages
Ret 615
No ratings yet
Ret 615
68 pages
Cs23402 - Computer Architecture - Unit - 1
No ratings yet
Cs23402 - Computer Architecture - Unit - 1
161 pages
COMP 303 Computer Architecture
No ratings yet
COMP 303 Computer Architecture
34 pages
The Sound of Silence PDF
No ratings yet
The Sound of Silence PDF
5 pages
Computer Studies Comprehensive Notes
No ratings yet
Computer Studies Comprehensive Notes
367 pages
Computer Studies Study Kit
No ratings yet
Computer Studies Study Kit
15 pages
Special Energy Meter
No ratings yet
Special Energy Meter
3 pages
CMP3010L02 Performance Datapath
No ratings yet
CMP3010L02 Performance Datapath
68 pages
Unit-I Data Communication Concept
No ratings yet
Unit-I Data Communication Concept
18 pages
The Basics of Digital Multimeters Is Designed To: Technical Support - Toll Free 877-201-9005
No ratings yet
The Basics of Digital Multimeters Is Designed To: Technical Support - Toll Free 877-201-9005
11 pages
Answer:: Y Abc + A + B + C
No ratings yet
Answer:: Y Abc + A + B + C
10 pages
Ilovepdf - Merged (4) 36 274
No ratings yet
Ilovepdf - Merged (4) 36 274
120 pages
Nondestructive Examination (NDE) Technology and Codes Student Manual Introduction To Ultrasonic Examination
No ratings yet
Nondestructive Examination (NDE) Technology and Codes Student Manual Introduction To Ultrasonic Examination
78 pages
MiCOM Alstom P341 Technical Data Sheet GB-epslanguage en-GB
No ratings yet
MiCOM Alstom P341 Technical Data Sheet GB-epslanguage en-GB
62 pages
Study Material: Free Master Class Series
No ratings yet
Study Material: Free Master Class Series
35 pages
Service Manual: Fm/Am Compact Disc Player
No ratings yet
Service Manual: Fm/Am Compact Disc Player
46 pages
RSM 2430C User Manual - DRT-MAN-086 - Rev.07
No ratings yet
RSM 2430C User Manual - DRT-MAN-086 - Rev.07
36 pages
Wind Sentry RM Young 03002
No ratings yet
Wind Sentry RM Young 03002
40 pages
DCN CHAPTER 03 Solution Exercise
No ratings yet
DCN CHAPTER 03 Solution Exercise
11 pages
Itu-T: Monitoring of Electromagnetic Field Levels
No ratings yet
Itu-T: Monitoring of Electromagnetic Field Levels
24 pages
COAL - Week 5 - Chap 2 (William Stallings)
No ratings yet
COAL - Week 5 - Chap 2 (William Stallings)
52 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
08 Perf Pipeline I
No ratings yet
08 Perf Pipeline I
65 pages
Mixer Simulation With HP Advanced Design System: Technical Note
No ratings yet
Mixer Simulation With HP Advanced Design System: Technical Note
21 pages
User Manual 4431978
No ratings yet
User Manual 4431978
22 pages
2 RISC V Performance ISA
No ratings yet
2 RISC V Performance ISA
72 pages
ATV11HU09M2A DATASHEET US en-US
No ratings yet
ATV11HU09M2A DATASHEET US en-US
4 pages
Ham Radio For The New Ham What To Do The Minute You Get Your Amateur Radio License - Stan Merrill
67% (3)
Ham Radio For The New Ham What To Do The Minute You Get Your Amateur Radio License - Stan Merrill
137 pages
PHY 308 Acoustics - Notes
No ratings yet
PHY 308 Acoustics - Notes
14 pages
Lec10 Performance
No ratings yet
Lec10 Performance
64 pages
Performance
No ratings yet
Performance
51 pages
Datasheet MAG97A
No ratings yet
Datasheet MAG97A
2 pages
Lecture2 ch1
No ratings yet
Lecture2 ch1
23 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
4 pages
EMC Knowledge
No ratings yet
EMC Knowledge
3 pages
Ctcss Tone Guide
No ratings yet
Ctcss Tone Guide
1 page
Performance
No ratings yet
Performance
23 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
2 CPU Performance
No ratings yet
2 CPU Performance
35 pages
23-Performance Parameters-21-02-2023
No ratings yet
23-Performance Parameters-21-02-2023
16 pages
DHXD - Chuong 8. Performance
No ratings yet
DHXD - Chuong 8. Performance
27 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
No ratings yet
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
23 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
Computer Performance
No ratings yet
Computer Performance
18 pages
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
No ratings yet
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
52 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
47 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
09 Perf
No ratings yet
09 Perf
22 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
Homework PWM Solution
No ratings yet
Homework PWM Solution
4 pages
Regional Headquarters: Authorized Distributor: FA Systems Division H.Q. Omron Europe B.V
No ratings yet
Regional Headquarters: Authorized Distributor: FA Systems Division H.Q. Omron Europe B.V
15 pages
The Role of Performance: Chapter - 2
No ratings yet
The Role of Performance: Chapter - 2
40 pages
2 - Computer Organization and Architecture
No ratings yet
2 - Computer Organization and Architecture
21 pages
L7 Performance
No ratings yet
L7 Performance
11 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
SEN307 Lecture 5
No ratings yet
SEN307 Lecture 5
34 pages
Numerical Performance
No ratings yet
Numerical Performance
12 pages
Lecture4 Performance Evaluation
No ratings yet
Lecture4 Performance Evaluation
34 pages
CIS 501: Computer Architecture: Unit 4: Performance & Benchmarking
No ratings yet
CIS 501: Computer Architecture: Unit 4: Performance & Benchmarking
51 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
CSC232 - Chp1 (Compatibility Mode)
No ratings yet
CSC232 - Chp1 (Compatibility Mode)
50 pages
Week 10 Part 02 - Processor Performance (Answers)
No ratings yet
Week 10 Part 02 - Processor Performance (Answers)
35 pages
This Unit: - Metrics
No ratings yet
This Unit: - Metrics
7 pages
A Summary of Sound Therapy and Vibrational Healing Concepts
100% (8)
A Summary of Sound Therapy and Vibrational Healing Concepts
52 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
Lect 1
No ratings yet
Lect 1
54 pages
Lect 1
No ratings yet
Lect 1
56 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
Measuring Computer Performance
No ratings yet
Measuring Computer Performance
26 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
Performance of Processor1
No ratings yet
Performance of Processor1
9 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
It3030e CA Chap1 Introduction 2.0m
No ratings yet
It3030e CA Chap1 Introduction 2.0m
25 pages
Intro
No ratings yet
Intro
14 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
Lecture4 Performance Evaluation 2011
No ratings yet
Lecture4 Performance Evaluation 2011
34 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
MTS2000 Brochure
No ratings yet
MTS2000 Brochure
2 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet

This Unit: - ! Multicycle Datapath - ! Clock Vs CPI - ! CPU Performance Equation - ! Performance Metrics - ! Benchmarking

Uploaded by

This Unit: - ! Multicycle Datapath - ! Clock Vs CPI - ! CPU Performance Equation - ! Performance Metrics - ! Benchmarking

Uploaded by

CIS 371 (Roth/Martin): Performance & Multicycle 1

You might also like