0% found this document useful (0 votes)
56 views5 pages

Midterm Recap: Performance Evaluation

The document discusses various performance evaluation metrics and processor architecture concepts such as instruction set architectures, pipelining, hazards, forwarding, and dynamic scheduling. It also covers the MIPS architecture including its instruction format and 5-stage pipeline, as well as more advanced techniques like register renaming, Tomasulo's algorithm, speculative execution, and the reorder buffer.

Uploaded by

Shivam Agarwal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views5 pages

Midterm Recap: Performance Evaluation

The document discusses various performance evaluation metrics and processor architecture concepts such as instruction set architectures, pipelining, hazards, forwarding, and dynamic scheduling. It also covers the MIPS architecture including its instruction format and 5-stage pipeline, as well as more advanced techniques like register renaming, Tomasulo's algorithm, speculative execution, and the reorder buffer.

Uploaded by

Shivam Agarwal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Performance Evaluation

  Basic concepts

Midterm Recap
  Response time
  Throughput
  Speedup
  CPI
  IPC
Sections 1.1 ~ 1.4, 1.8~1.10,
  Amdahl’s law
Sections 2.1, 2.4 ~ 2.8,
Appendices A&B

How to Summarize Performance Instruction Set Architecture


  Arithmetic mean   ISA types
  Stack
  Weighted arithmetic mean   Accumulator
  General-purpose registers
  Register-memory

  Geometric mean   Register-register

  Common memory addressing modes


  Harmonic mean   Register, immediate, replacement, ……
n / (1/s1 + 1/s2 + … + 1/sn)   Byte ordering: big vs. little endian

1
Basics of MIPS Pipelined 5-Stage Data Path
  Instruction format
 I-type, R-type, J-type
  Instruction types
 ALU, load/store, control, FP
  5-stage pipeline
 IF, ID, EX, MEM, WB

MIPS FP Pipeline Dependences and Hazards


  Dependences  possible hazards
  Dependences
 Data, name (anti, output), control
  Hazards
 RAW, WAR, WAW, branch

2
MIPS Five-Stage Pipeline With/
Dependences vs. Hazards
Without Data Forwarding
Data
  Without data forwarding
  i: ADD R3, R1, R2
  RAW (if j gets the “old” j: ADD R5, R3, R4
value of R3)  Resultexchanges via register file
  Anti i: ADD R3, R2, R1
 Producer: WB  consumer: ID
WAR (if i gets the “new”
  With data forwarding
  j: ADD R2, R4, R5
value of R2)
 Result produced  result used
  Output i: ADD R3, R2, R1
  WAW (if final result in R3 is
j: SUB R3, R4, R5
produced by i)

Register Renaming Dynamic Scheduling


• Eliminate WAR and WAW hazards by register renaming   Split ID stage into two
1.  Issue: Decode inst, check for structural hazards
DIV.D F0,F2,F4 DIV.D F0,F2,F4 2.  Read operands: Wait until no data hazards, then
ADD.D F6,F0,F8 ADD.D S,F0,F8 read operands
S.D F6,0(R1) S.D S,0(R1)   In-order instruction issue
SUB.D F8,F10,F14 SUB.D T,F10,F14   Out-of-order execution
MUL.D F6,F10,F8 MUL.D F6,F10,T   An inst begins execution as soon as its data
operand is available
  Out-of-order completion  cause complication in
handling exception

3
Tomasulo Components Three Stages of Tomasulo Algorithm
  RS entry 1. Issue—get instruction from Inst Queue
 Op—Operation to perform in the unit If reservation station free (no structural hazard),
 Vj, Vk—Value of source operands control issues inst & sends operands (renames registers).
2. Execution—operate on operands (EX)
 Qj, Qk—Reservation stations producing source
When both operands ready then execute;
registers if not ready, watch Common Data Bus for result
  Qj,Qk = 0  ready 3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting units;
 Busy—Indicates reservation station or FU is busy mark reservation station available
  Register result status
  Nospeculation
 Indicates which RS will write each register
  In-orderissue, out-of-order execution, and out-of-order
  Blank: no pending instructions writing the register completion

Dynamic Scheduling, Single-Issue Dynamic Scheduling, 2-Way Issue


Iteration Inst Issue Exe Mem W CDB Iteration Inst Issue Exe Mem W CDB
1 LD R2, 0(R1) 1 2 3 4 1 LD R2, 0(R1) 1 2 3 4
1 ADD R2, R2, #1 2 5 6 1 ADD R2, R2, #1 1 5 6
1 SD R2, 0(R1) 3 4 7 1 SD R2, 0(R1) 2 3 7
1 ADD R1, R1, #4 4 6 7 1 ADD R1, R1, #4 2 3 4
1 BNE R2, R3, loop 5 7 1 BNE R2, R3, loop 3 7
2 LD R2, 0(R1) 6 8 9 10 2 LD R2, 0(R1) 4 8 9 10
2 ADD R2, R2, #1 7 11 12 2 ADD R2, R2, #1 4 11 12
2 SD R2, 0(R1) 8 9 13 2 SD R2, 0(R1) 5 9 13
2 ADD R1, R1, #4 9 10 11 2 ADD R1, R1, #4 5 8 9
2 BNE R2, R3, loop 10 13 2 BNE R2, R3, loop 6 13
2-way issue (branch single-issue), separate INT FUs for address, ALU,
branch, two CDBs

4
Dynamic Scheduling vs.
Reorder Buffer
Speculative Execution
  Dynamic scheduling (w/o speculation)   Contain all in-flight instructions
  A branch must be resolved before actually executing any
instructions in the successor basic block (those instruction   Reorder out-of-order inst to program
can be issued)
  Issue, Exec, Memory (R/W), Write CDB order at the time of writing reg/
  Speculative execution (using dynamic scheduling) memory (commit)
  Allow the execution of later instructions before the branch
is resolved (with the ability to undo the effect of an   Buffer results/supply operands between
incorrectly speculated sequence)
  Issue, Exec, Read memory, Write CDB, Commit (Write execution complete and commit
memory)

Example Architectural Simulator


Iteration Inst. Issue @ Exec @ Read Write Commit@
Mem @ CDB @   Measurement
1 LD R2, 0(R1) 1 2 3 4 5   Accurate
  Only working on existing systems, not flexible
1 ADD R2, R2, #1 1 5 6 7
  Constructing hardware prototype -- Slow, expensive, and
1 SD R2, 0(R1) 2 3 7 complicate
1 ADD R1, R1, #4 2 3 4 8   Analytical models
1 BNE R2, R3, loop 3 7 8
  Fast and with insights
  Hard to model the complexity of today’s processor
2 LD R2, 0(R1) 4 5 6 7 9
  Simulator
2 ADD R2, R2, #1 4 8 9 10   Fast, cheap, flexible and relatively accurate
2 SD R2, 0(R1) 5 6 10
2 ADD R1, R1, #4 5 6 7 11   What is an architectural simulator?
2 BNE R2, R3, loop 6 10 11   A tool that reproduces the behavior of a computing device

You might also like