This Unit: - Metrics
This Unit: - Metrics
• Metrics
• Latency and throughput
MIPS (performance metric, not the ISA) Mhz (MegaHertz) and Ghz (GigaHertz)
• (Micro) architects often ignore dynamic instruction count • 1 Hertz = 1 cycle per second
• Typically work in one ISA/one compiler ! treat it as fixed 1 Ghz is 1 cycle per nanosecond, 1 Ghz = 1000 Mhz
• (Micro-)architects often ignore dynamic instruction count…
• CPU performance equation becomes • … but general public (mostly) also ignores CPI
• Latency: seconds / insn = (cycles / insn) * (seconds / cycle) • Equates clock frequency with performance!
• Throughput: insn / second = (insn / cycle) * (cycles / second) • Which processor would you buy?
• Processor A: CPI = 2, clock = 5 GHz
• MIPS (millions of instructions per second) • Processor B: CPI = 1, clock = 3 GHz
• Cycles / second: clock frequency (in MHz) • Probably A, but B is faster (assuming same ISA/compiler)
• Example: CPI = 2, clock = 500 MHz ! 0.5 * 500 MHz = 250 MIPS • Classic example
• 800 MHz PentiumIII faster than 1 GHz Pentium4!
• Pitfall: may vary inversely with actual performance • More recent example: Core i7 faster clock-per-clock than Core 2
– Compiler removes insns, program gets faster, MIPS goes down • Same ISA and compiler!
– Work per instruction varies (e.g., multiply vs. add, FP vs. integer) • Meta-point: danger of partial performance metrics!
CIS 501 (Martin/Roth): Performance 15 CIS 501 (Martin/Roth): Performance 16
Latency vs. Throughput Revisited Inter-Insn Parallelism: Pipelining
• Latency and throughput: two views of performance …
• … at the program level +
4
• ... not at the instructions level
Register
File Data
• Single instruction latency PC Insn s1 s2 d Mem
Mem
– Doesn’t matter: programs comprised of [billions]+ of instructions
– Difficult to reduce anyway
Tinsn-mem Tregfile TALU Tdata-mem Tregfile
• As number of dynamic instructions is large… • Pipelining: cut datapath into N stages (here 5) Tsinglecycle
• Instruction throughput ! program latency or throughput • Separate each stage of logic by latches
+ Can reduce using parallelism • Clock period: maximum logic + wire delay of any stage =
• Multiple cores (more units executing instructions)… more later max(Tinsn-mem, Tregfile, TALU, Tdata-mem)
• Inter-instruction parallelism example: pipelining • Base CPI = 1, but actual CPI > 1: pipeline must often stall
• Individual insn latency increases (pipeline overhead), not the point
CIS 501 (Martin/Roth): Performance 17 CIS 501 (Martin/Roth): Performance 18