0% found this document useful (0 votes)

85 views7 pages

This Unit: - Metrics

This document discusses performance metrics like latency and throughput. It describes how latency measures the time to complete a task while throughput measures the number of tasks completed in a fixed time. Several benchmarks for measuring performance are also introduced, including SPEC CPU2006 and various parallel benchmarks. Formulas for calculating average performance numbers like mean latency, mean throughput, and speedup are provided. Finally, a processor performance equation is presented that breaks down program runtime into instructions executed, cycles per instruction, and seconds per cycle.

Uploaded by

vamshi krishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views7 pages

This Unit: - Metrics

Uploaded by

vamshi krishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

This Unit

•  Metrics
•  Latency and throughput

CIS 501 •  Reporting performance

Computer Architecture •  Benchmarking and averaging

•  CPU performance equation & performance trends

Unit 2: Performance

Slides originally developed by Amir Roth with contributions by Milo Martin

at University of Pennsylvania with sources that included University of
Wisconsin slides by Mark Hill, Guri Sohi, Jim Smith, and David Wood.

CIS 501 (Martin/Roth): Performance 1 CIS 501 (Martin/Roth): Performance 2

Readings Performance: Latency vs. Throughput

•  H+P •  Latency (execution time): time to finish a fixed task
•  Chapter 1: Section 1.8 •  Throughput (bandwidth): number of tasks in fixed time
•  Different: exploit parallelism for throughput, not latency (e.g., bread)
•  Often contradictory (latency vs. throughput)
•  Will see many examples of this
•  Choose definition of performance that matches your goals
•  Scientific program: latency; web server: throughput?

•  Example: move people 10 miles

•  Car: capacity = 5, speed = 60 miles/hour
•  Bus: capacity = 60, speed = 20 miles/hour
•  Latency: car = 10 min, bus = 30 min
•  Throughput: car = 15 PPH (count return trip), bus = 60 PPH

CIS 501 (Martin/Roth): Performance 3 CIS 501 (Martin/Roth): Performance 4

Comparing Performance Processor Performance and Workloads
•  A is X times faster than B if •  Q: what does latency(ChipA) or throughput(ChipA) mean?
•  Latency(A) = Latency(B) / X •  A: nothing, there must be some associated workload
•  Throughput(A) = Throughput(B) * X •  Workload: set of tasks someone (you) cares about
•  A is X% faster than B if
•  Latency(A) = Latency(B) / (1+X/100) •  Benchmarks: standard workloads
•  Throughput(A) = Throughput(B) * (1+X/100) •  Used to compare performance across machines
•  Either are or highly representative of actual programs people run
•  Car/bus example
•  Latency? Car is 3 times (and 200%) faster than bus •  Micro-benchmarks: non-standard non-workloads
•  Throughput? Bus is 4 times (and 300%) faster than car •  Tiny programs used to isolate certain aspects of performance
•  Not representative of complex behaviors of real applications
•  Examples: towers-of-hanoi, 8-queens, etc.

CIS 501 (Martin/Roth): Performance 5 CIS 501 (Martin/Roth): Performance 6

SPEC Benchmarks Other Benchmarks

•  SPEC (Standard Performance Evaluation Corporation) •  Parallel benchmarks
•  http://www.spec.org/ •  SPLASH2: Stanford Parallel Applications for Shared Memory
•  Consortium that collects, standardizes, and distributes benchmarks •  NAS: another parallel benchmark suite
•  Post SPECmark results for different processors •  SPECopenMP: parallelized versions of SPECfp 2000)
•  1 number that represents performance for entire suite •  SPECjbb: Java multithreaded database-like workload
•  Benchmark suites for CPU, Java, I/O, Web, Mail, etc.
•  Updated every few years: so companies don’t target benchmarks •  Transaction Processing Council (TPC)
•  TPC-C: On-line transaction processing (OLTP)
•  SPEC CPU 2006 •  TPC-H/R: Decision support systems (DSS)
•  12 “integer”: bzip2, gcc, perl, hmmer (genomics), h264, etc. •  TPC-W: E-commerce database backend workload
•  17 “floating point”: wrf (weather), povray, sphynx3 (speech), etc. •  Have parallelism (intra-query and inter-query)
•  Written in C/C++ and Fortran •  Heavy I/O and memory components

CIS 501 (Martin/Roth): Performance 7 CIS 501 (Martin/Roth): Performance 8

SPECmark 2006 Mean (Average) Performance Numbers
•  Reference machine: Sun UltraSPARC II (@ 296 MHz) •  Arithmetic: (1/N) * !P=1..N Latency(P)
•  Latency SPECmark •  For units that are proportional to time (e.g., latency)
•  For each benchmark
•  Take odd number of samples •  You can add latencies, but not throughputs
•  Choose median •  Latency(P1+P2,A) = Latency(P1,A) + Latency(P2,A)
•  Throughput(P1+P2,A) != Throughput(P1,A) + Throughput(P2,A)
•  Take latency ratio (reference machine / your machine)
•  1 mile @ 30 miles/hour + 1 mile @ 90 miles/hour
•  Take “average” (Geometric mean) of ratios over all benchmarks
•  Average is not 60 miles/hour
•  Throughput SPECmark
•  Run multiple benchmarks in parallel on multiple-processor system
•  Harmonic: N / !P=1..N 1/Throughput(P)
•  Recent (latency) leaders •  For units that are inversely proportional to time (e.g., throughput)
•  SPECint: Intel 3.3 GHz Xeon W5590 (34.2)
•  SPECfp: Intel 3.2 GHz Xeon W3570 (39.3)
•  Geometric: N"#P=1..N Speedup(P)
•  (First time I’ve look at this where same chip was top of both) •  For unitless quantities (e.g., speedup ratios)
CIS 501 (Martin/Roth): Performance 9 CIS 501 (Martin/Roth): Performance 10

Processor Performance Equation

•  Multiple aspects to performance: helps to isolate them
•  Program runtime = “seconds per program” =
(instructions/program) * (cycles/instruction) * (seconds/cycle)
•  Instructions per program: “dynamic instruction count”
•  Runtime count of instructions executed by the program
•  Determined by program, compiler, instruction set architecture (ISA)
•  Cycles per instruction: “CPI” (typical range: 2 to 0.5)
•  On average, how many cycles does an instruction take to execute?
•  Determined by program, compiler, ISA, micro-architecture
•  Seconds per cycle: clock period, length of each cycle
•  Inverse metric: cycles per second (Hertz) or cycles per ns (Ghz)
•  Determined by micro-architecture, technology parameters
•  For low latency (better performance) minimize all three
•  Difficult: often pull against one another
CIS 501 (Martin/Roth): Performance 11 CIS 501 (Martin/Roth): Technology 12
Cycles per Instruction (CPI) Another CPI Example
•  CPI: Cycle/instruction for on average •  Assume a processor with instruction frequencies and costs
•  IPC = 1/CPI •  Integer ALU: 50%, 1 cycle
•  Used more frequently than CPI •  Load: 20%, 5 cycle
•  Favored because “bigger is better”, but harder to compute with •  Store: 10%, 1 cycle
•  Different instructions have different cycle costs •  Branch: 20%, 2 cycle
•  E.g., “add” typically takes 1 cycle, “divide” takes >10 cycles
•  Depends on relative instruction frequencies
•  Which change would improve performance more?
•  A. “Branch prediction” to reduce branch cost to 1 cycle?
•  CPI example •  B. “cache” to reduce load cost to 3 cycles?
•  A program executes equal: integer, floating point (FP), memory ops •  Compute CPI
•  Cycles per instruction type: integer = 1, memory = 2, FP = 3 •  Base = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*2 = 2
•  What is the CPI? (33% * 1) + (33% * 2) + (33% * 3) = 2 •  A = 0.5*1 + 0.2*5 + 0.1*1 + 0.2*1 = 1.8
•  Caveat: this sort of calculation ignores many effects •  B = 0.5*1 + 0.2*3 + 0.1*1 + 0.2*2 = 1.6 (winner)
•  Back-of-the-envelope arguments only

CIS 501 (Martin/Roth): Performance 13 CIS 501 (Martin/Roth): Performance 14

MIPS (performance metric, not the ISA) Mhz (MegaHertz) and Ghz (GigaHertz)
•  (Micro) architects often ignore dynamic instruction count •  1 Hertz = 1 cycle per second
•  Typically work in one ISA/one compiler ! treat it as fixed 1 Ghz is 1 cycle per nanosecond, 1 Ghz = 1000 Mhz
•  (Micro-)architects often ignore dynamic instruction count…
•  CPU performance equation becomes •  … but general public (mostly) also ignores CPI
•  Latency: seconds / insn = (cycles / insn) * (seconds / cycle) •  Equates clock frequency with performance!
•  Throughput: insn / second = (insn / cycle) * (cycles / second) •  Which processor would you buy?
•  Processor A: CPI = 2, clock = 5 GHz
•  MIPS (millions of instructions per second) •  Processor B: CPI = 1, clock = 3 GHz
•  Cycles / second: clock frequency (in MHz) •  Probably A, but B is faster (assuming same ISA/compiler)
•  Example: CPI = 2, clock = 500 MHz ! 0.5 * 500 MHz = 250 MIPS •  Classic example
•  800 MHz PentiumIII faster than 1 GHz Pentium4!
•  Pitfall: may vary inversely with actual performance •  More recent example: Core i7 faster clock-per-clock than Core 2
–  Compiler removes insns, program gets faster, MIPS goes down •  Same ISA and compiler!
–  Work per instruction varies (e.g., multiply vs. add, FP vs. integer) •  Meta-point: danger of partial performance metrics!
CIS 501 (Martin/Roth): Performance 15 CIS 501 (Martin/Roth): Performance 16
Latency vs. Throughput Revisited Inter-Insn Parallelism: Pipelining
•  Latency and throughput: two views of performance …
•  … at the program level +
4
•  ... not at the instructions level
Register
File Data
•  Single instruction latency PC Insn s1 s2 d Mem
Mem
–  Doesn’t matter: programs comprised of [billions]+ of instructions
–  Difficult to reduce anyway
Tinsn-mem Tregfile TALU Tdata-mem Tregfile

•  As number of dynamic instructions is large… •  Pipelining: cut datapath into N stages (here 5) Tsinglecycle
•  Instruction throughput ! program latency or throughput •  Separate each stage of logic by latches
+  Can reduce using parallelism •  Clock period: maximum logic + wire delay of any stage =
•  Multiple cores (more units executing instructions)… more later max(Tinsn-mem, Tregfile, TALU, Tdata-mem)
•  Inter-instruction parallelism example: pipelining •  Base CPI = 1, but actual CPI > 1: pipeline must often stall
•  Individual insn latency increases (pipeline overhead), not the point
CIS 501 (Martin/Roth): Performance 17 CIS 501 (Martin/Roth): Performance 18

Pipelining: Clock Frequency vs. IPC CPI and Clock Frequency

•  Increase number of pipeline stages (“pipeline depth”) •  Clock frequency implies CPU clock
•  Keep cutting datapath into finer pieces •  Other system components have their own clocks (or not)
+  Increases clock frequency (decreases clock period) •  E.g., increasing processor clock doesn’t accelerate memory latency
•  Latch overhead & unbalanced stages cause sub-linear scaling •  Example: a 1 Ghz processor with
•  Double the number of stages won’t quite double the frequency •  80% non-memory instructions @ 1 cycle
–  Decreases IPC (increase CPI) •  20% memory instructions @ 6 nanoseconds (6 cycles)
•  Base: CPI is 2, frequency is 1Ghz ! MIPS is 500
•  More pipeline “hazards”, higher branch penalty
•  Memory latency relatively higher (same absolute lat., more cycles) •  Impact of double the core clock frequency?
•  Without speeding up the memory
–  Result: after some point, deeper pipelining can decrease performance
•  Non-memory instructions retain 1-cycle latency
•  “Optimal” pipeline depth is program and technology specific
•  Memory instructions now have 12-cycle latency
•  Classic example •  CPI = (80% * 1) + (20% * 12) = 3.2 CPI @ 2Ghz ! MIPS is 625
•  Pentium III: 12 stage pipeline, 800 MHz •  Speedup = 625/500 = 1.25, which is << 2
•  Pentium 4: 22 stage pipeline, 1 GHz (Actually slower due to IPC) •  What about an infinite clock frequency? (non-memory free)
•  Only a factor of 1.66 speedup (example of Amdahl’s Law)
CIS 501 (Martin/Roth): Performance 19 CIS 501 (Martin/Roth): Performance 20
Measuring CPI Performance Trends
•  How are CPI and execution-time actually measured? 386 486 Pentium PentiumII Pentium4 Core2
•  Execution time? stopwatch timer (Unix “time” command) Year 1985 1989 1993 1998 2001 2006
•  CPI = CPU time / (clock frequency * dynamic insn count) Technode (nm) 1500 800 350 180 130 65
•  How is dynamic instruction count measured? Transistors (M) 0.3 1.2 3.1 5.5 42 291
Clock (MHz) 16 25 66 200 1500 3000
Pipe stages “1” 5 5 10 22 to 31 ~15
•  More useful is CPI breakdown (CPICPU, CPIMEM, etc.)
(Peak) IPC 0.4 1 2 3 3 “8”
•  So we know what performance problems are and what to fix
(Peak) MIPS 6 25 132 600 4500 24000
•  Hardware event counters
•  Available in most processors today •  Historically, clock provides 75%+ of performance gains…
•  One way to measure dynamic instruction count
•  Achieved via both faster transistors and deeper pipelines
•  Calculate CPI using counter frequencies / known event costs
•  … that’s changing: 1GHz: ‘99, 2GHz: ‘01, 3GHz: ‘02, 4Ghz?
•  Cycle-level micro-architecture simulation (e.g., SimpleScalar)
•  Deep pipelining can be power inefficient
+ Measure exactly what you want … and impact of potential fixes!
•  Physical scaling limits? (Intel’s 65nm process wasn’t great, 45nm is)
•  Method of choice for many micro-architects (and you)
CIS 501 (Martin/Roth): Performance 21 CIS 501 (Martin/Roth): Performance 22

Improving CPI: Caching and Parallelism Performance Rules of Thumb

•  CIS501 is more about improving CPI than clock frequency •  Amdahl’s Law
•  Techniques we will look at •  Literally: total speedup limited by non-accelerated piece
•  Caching, speculation, multiple issue, out-of-order issue •  Example: can optimize 50% of program A
•  Vectors, multiprocessing, more… •  Even “magic” optimization that makes this 50% disappear…
•  Moore’s Law can help CPI – “more transistors” •  …only yields a 2X speedup
•  Best examples are caches (to improve memory component of CPI) •  Corollary: build a balanced system
•  Parallelism:
•  Don’t optimize 1% to the detriment of other 99%
•  IPC > 1 implies instructions in parallel •  Don’t over-engineer capabilities that cannot be utilized
•  And now multi-processors (multi-cores)
•  Design for actual performance, not peak performance
•  But also speculation, wide issue, out-of-order issue, vectors…
•  Peak performance: “Performance you are guaranteed not to exceed”
•  All roads lead to multi-core •  Greater than “actual” or “average” or “sustained” performance
•  Why multi-core over still bigger caches, yet wider issue? •  Why? Caches misses, branch mispredictions, limited ILP, etc.
•  Diminishing returns, limits to instruction-level parallelism (ILP) •  For actual performance X, machine capability must be > X
•  Multi-core can provide linear performance with transistor count
CIS 501 (Martin/Roth): Performance 23 CIS 501 (Martin/Roth): Performance 24
Summary

•  Latency = seconds / program =

•  (instructions / program) * (cycles / instruction) * (seconds / cycle)
•  Instructions / program: dynamic instruction count
•  Function of program, compiler, instruction set architecture (ISA)
•  Cycles / instruction: CPI
•  Function of program, compiler, ISA, micro-architecture
•  Seconds / cycle: clock period
•  Function of micro-architecture, technology parameters

•  Optimize each component

•  CIS501 focuses mostly on CPI (caches, parallelism)
•  …but some on dynamic instruction count (compiler, ISA)
•  …and some on clock frequency (pipelining, technology)
CIS 501 (Martin/Roth): Performance 25

Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
80% (5)
Computer Organization & Design The Hardware/Software Interface, 2nd Edition Patterson & Hennessy
118 pages
CS5204/EE5364 - Advanced Computer Architecture - Performance
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Performance
56 pages
CIS 501: Computer Architecture: Unit 4: Performance & Benchmarking
No ratings yet
CIS 501: Computer Architecture: Unit 4: Performance & Benchmarking
51 pages
This Unit: - ! Multicycle Datapath - ! Clock Vs CPI - ! CPU Performance Equation - ! Performance Metrics - ! Benchmarking
No ratings yet
This Unit: - ! Multicycle Datapath - ! Clock Vs CPI - ! CPU Performance Equation - ! Performance Metrics - ! Benchmarking
5 pages
09 Perf
No ratings yet
09 Perf
22 pages
Lecture 02 CH01 Performance Power
No ratings yet
Lecture 02 CH01 Performance Power
76 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
M116C 1 M116C 1 Lect02-Performance
No ratings yet
M116C 1 M116C 1 Lect02-Performance
23 pages
Computer Performance
No ratings yet
Computer Performance
22 pages
Module 2 [26-10-2024]
No ratings yet
Module 2 [26-10-2024]
50 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
Performance Measures
No ratings yet
Performance Measures
25 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
28 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
Topics in This Course ..: BITS Pilani, Pilani Campus
No ratings yet
Topics in This Course ..: BITS Pilani, Pilani Campus
24 pages
2 RISC V Performance ISA
No ratings yet
2 RISC V Performance ISA
72 pages
DA_CI
No ratings yet
DA_CI
13 pages
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
No ratings yet
Performances of Computer Systems: CSE 675.02: Introduction To Computer Architecture
52 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
No ratings yet
Week 10 Part 02 - Processor Performance (Q Only) - Tagged 2
23 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
Chapter4 Performance
No ratings yet
Chapter4 Performance
36 pages
L-2 (Computer Performance)
No ratings yet
L-2 (Computer Performance)
52 pages
SEN307-Lecture-5
No ratings yet
SEN307-Lecture-5
34 pages
Assessing and Understanding Performance
No ratings yet
Assessing and Understanding Performance
31 pages
CSC232 - Chp1 (Compatibility Mode)
No ratings yet
CSC232 - Chp1 (Compatibility Mode)
50 pages
It3030e CA Chap1 Introduction 2.0m
No ratings yet
It3030e CA Chap1 Introduction 2.0m
25 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
Performance
No ratings yet
Performance
51 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
CH02-COA10e Spring 2025
No ratings yet
CH02-COA10e Spring 2025
24 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
Computer Performance
No ratings yet
Computer Performance
18 pages
Computer Performance
No ratings yet
Computer Performance
17 pages
Lecture4 Performance Evaluation 2011
No ratings yet
Lecture4 Performance Evaluation 2011
34 pages
ch2
No ratings yet
ch2
10 pages
Performance of Processor1
No ratings yet
Performance of Processor1
9 pages
WINSEM2024-25_BCSE205L_TH_VL2024250501382_2025-02-07_Reference-Material-I
No ratings yet
WINSEM2024-25_BCSE205L_TH_VL2024250501382_2025-02-07_Reference-Material-I
19 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
Cse431 04
No ratings yet
Cse431 04
17 pages
Inroduction and Performance Analysis
No ratings yet
Inroduction and Performance Analysis
29 pages
IT401 Computer Organization and Architecture: Prasun Ghosal
No ratings yet
IT401 Computer Organization and Architecture: Prasun Ghosal
30 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
lec01
No ratings yet
lec01
10 pages
Performance Chap4
No ratings yet
Performance Chap4
20 pages
Cs2100 14 Understanding Performance
No ratings yet
Cs2100 14 Understanding Performance
46 pages
Cs23402- Computer Architecture - Unit - 1 (4)
No ratings yet
Cs23402- Computer Architecture - Unit - 1 (4)
161 pages
Lecture 2: Metrics To Evaluate Systems
No ratings yet
Lecture 2: Metrics To Evaluate Systems
33 pages
Performance Matrices
No ratings yet
Performance Matrices
14 pages
Lecture - 4 - Performance
No ratings yet
Lecture - 4 - Performance
31 pages
Performance
No ratings yet
Performance
12 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
L7 Performance
No ratings yet
L7 Performance
11 pages
1 - Introduction To Computer System
No ratings yet
1 - Introduction To Computer System
31 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
COMP 303 Computer Architecture
No ratings yet
COMP 303 Computer Architecture
34 pages
Designing For Performance - Performance Metrics
No ratings yet
Designing For Performance - Performance Metrics
19 pages
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Memory Technology & Hierarchy: Caching and Virtual Memory Parallel System Architectures
No ratings yet
Memory Technology & Hierarchy: Caching and Virtual Memory Parallel System Architectures
36 pages
Parallel System Architectures: System and Network Engineering Lab, Uva E-Mail: A.D.Pimentel@Uva - NL
No ratings yet
Parallel System Architectures: System and Network Engineering Lab, Uva E-Mail: A.D.Pimentel@Uva - NL
6 pages
Chapter 01: Introduction: Technological Trends, Measuring and Improving Performance
No ratings yet
Chapter 01: Introduction: Technological Trends, Measuring and Improving Performance
31 pages
The Performance Equation
No ratings yet
The Performance Equation
4 pages
Mohamed Nabil Mohamed: Objective
No ratings yet
Mohamed Nabil Mohamed: Objective
2 pages
(url to pdf)https___www.exploit-db.com_ (3)
No ratings yet
(url to pdf)https___www.exploit-db.com_ (3)
5 pages
General Academic Seat Matrix 2024
No ratings yet
General Academic Seat Matrix 2024
279 pages
Issues in Designing MAC For ADHOC
No ratings yet
Issues in Designing MAC For ADHOC
20 pages
Student Information Synopsis
No ratings yet
Student Information Synopsis
7 pages
VLSI Interview Question
No ratings yet
VLSI Interview Question
17 pages
7341 3rj
No ratings yet
7341 3rj
11 pages
Visual Foxpro Best Practices For The Next Ten Years: Conference Proceedings Outline
No ratings yet
Visual Foxpro Best Practices For The Next Ten Years: Conference Proceedings Outline
17 pages
Delta Opti DH IPC A22P PDF
No ratings yet
Delta Opti DH IPC A22P PDF
4 pages
Final Review Template: Project Information
No ratings yet
Final Review Template: Project Information
4 pages
Lesson 5 Parts Products Drawings PDF
No ratings yet
Lesson 5 Parts Products Drawings PDF
12 pages
Online Bus Ticketing System
No ratings yet
Online Bus Ticketing System
3 pages
Airbus RCS9500 Dispatching Solution Press Backgrounder
No ratings yet
Airbus RCS9500 Dispatching Solution Press Backgrounder
2 pages
Geographical Data in the Computer-1
No ratings yet
Geographical Data in the Computer-1
36 pages
刊出_IoT
No ratings yet
刊出_IoT
15 pages
MOD4
No ratings yet
MOD4
38 pages
SQLRDD Manual PDF
No ratings yet
SQLRDD Manual PDF
38 pages
PBL-1 - MN506 Answered
No ratings yet
PBL-1 - MN506 Answered
3 pages
AcumaticaERP UnitTestFrameworkGuide
No ratings yet
AcumaticaERP UnitTestFrameworkGuide
34 pages
A 3 M 50 Aa
No ratings yet
A 3 M 50 Aa
2 pages
Final Mini Project
No ratings yet
Final Mini Project
15 pages
State of Port 20160906
No ratings yet
State of Port 20160906
14 pages
Touch Based Human Machine
No ratings yet
Touch Based Human Machine
2 pages
2023 ITC Group Profile - Indonesia
No ratings yet
2023 ITC Group Profile - Indonesia
93 pages
Final Project Report CSC186
No ratings yet
Final Project Report CSC186
20 pages
5.SAP MM Org Structure
No ratings yet
5.SAP MM Org Structure
25 pages
CISCO Secure Intrusion Detection System
100% (1)
CISCO Secure Intrusion Detection System
45 pages
N14MICE
No ratings yet
N14MICE
4 pages
Learning Object-Oriented Programming - Sample Chapter
No ratings yet
Learning Object-Oriented Programming - Sample Chapter
29 pages
Onkyo TX-nr1010 SM Parts
No ratings yet
Onkyo TX-nr1010 SM Parts
202 pages

This Unit: - Metrics

Uploaded by

This Unit: - Metrics

Uploaded by

This Unit

CIS 501 • Reporting performance

• CPU performance equation & performance trends

Slides originally developed by Amir Roth with contributions by Milo Martin

CIS 501 (Martin/Roth): Performance 1 CIS 501 (Martin/Roth): Performance 2

Readings Performance: Latency vs. Throughput

• Example: move people 10 miles

CIS 501 (Martin/Roth): Performance 3 CIS 501 (Martin/Roth): Performance 4

CIS 501 (Martin/Roth): Performance 5 CIS 501 (Martin/Roth): Performance 6

SPEC Benchmarks Other Benchmarks

CIS 501 (Martin/Roth): Performance 7 CIS 501 (Martin/Roth): Performance 8

Processor Performance Equation

CIS 501 (Martin/Roth): Performance 13 CIS 501 (Martin/Roth): Performance 14

Pipelining: Clock Frequency vs. IPC CPI and Clock Frequency

Improving CPI: Caching and Parallelism Performance Rules of Thumb

• Latency = seconds / program =

• Optimize each component

You might also like

CIS 501 •  Reporting performance

•  CPU performance equation & performance trends

•  Example: move people 10 miles

•  Latency = seconds / program =

•  Optimize each component