Lecture13
Lecture13
Announcement
• Programming assignment 3 is out
• Details: https://www.cs.rochester.edu/courses/252/fall2024/
labs/assignment3.html
• Due on Oct. 25th, 11:59 PM
• You (may still) have 3 slip days
2
Carnegie Mellon
Announcement
• Programming assignment 3 is in x86 assembly language. Seek
help from TAs.
• TAs are best positioned to answer your questions about
programming assignments!!!
• Programming assignments do NOT repeat the lecture materials.
They ask you to synthesize what you have learned from the
lectures and work out something new.
3
Carnegie Mellon
Single-Cycle Microarchitecture
Clock
PC
Register Flags
Memory
File Z S O
Cur.
PC Inst. New Enable? Cur. Flag
New Rd/Wr Reg. Current Values
Data Addr. Reg. IDs Reg.
New Data Valus
Values New Flag
PC
Enable? Values
Combinational Logic
Read current_states;
next_states = calculate_new_state(current_states);
When clock rises, current_states = next_states;
next_states has to be ready before the close rises
4
Carnegie Mellon
Single-Cycle Microarchitecture
Data to write
Enable Logic 6
Memory Data read back
Address
Logic 7 MUX
Logic 1
Clock newData [s4…s19]
PC M Select
Write
oPC nPC s0 4 Reg. ID RA U
X A
s1 0 Read Reg. Register
Logic 3 ID 1 L
s2 6 File RB U
s3 Read Reg. Logic 4
4 ID 2
Flags [s2…s9] s4 1
Clock Flags
s5 c Logic 2
Enable Logic 5
EnableF Z S O
… …
5
Carnegie Mellon
modified. CC
100
Read Write
rises) Register
file
■ PC register %rbx = 0x100
6
Carnegie Mellon
① ② ③ ④
Cycle 1: 0x000: irmovq $0x100,%rbx # %rbx <-- 0x100
Cycle 2: 0x00a: irmovq $0x200,%rdx # %rdx <-- 0x200
Cycle 3: 0x014: addq %rdx,%rbx # %rbx <-- 0x300 CC <-- 000
Cycle 4: 0x016: je dest # Not taken
Cycle 5: 0x01f: rmmovq %rbx,0(%rdx) # M[0x200] <-- 0x300
Combinational
logic
• state set according to second
Read Write
Register
file
%rbx = 0x100
PC
0x014
7
Carnegie Mellon
① ② ③ ④
Cycle 1: 0x000: irmovq $0x100,%rbx # %rbx <-- 0x100
Cycle 2: 0x00a: irmovq $0x200,%rdx # %rdx <-- 0x200
Cycle 3: 0x014: addq %rdx,%rbx # %rbx <-- 0x300 CC <-- 000
Cycle 4: 0x016: je dest # Not taken
Cycle 5: 0x01f: rmmovq %rbx,0(%rdx) # M[0x200] <-- 0x300
Combinational
logic
• state set according to second
Read Write
000 Register
%rbx
file <--
%rbx = 0x100
0x300
0x016
PC
0x014
8
Carnegie Mellon
① ② ③ ④
Cycle 1: 0x000: irmovq $0x100,%rbx # %rbx <-- 0x100
Cycle 2: 0x00a: irmovq $0x200,%rdx # %rdx <-- 0x200
Cycle 3: 0x014: addq %rdx,%rbx # %rbx <-- 0x300 CC <-- 000
Cycle 4: 0x016: je dest # Not taken
Cycle 5: 0x01f: rmmovq %rbx,0(%rdx) # M[0x200] <-- 0x300
Combinational
logic Read Write
• state set according to addq
Data
memory
instruction
• combinational logic starting
CC
000 to react to state changes
Read Write
ports ports
Register
file
%rbx = 0x300
PC
0x016
9
Carnegie Mellon
① ② ③ ④
Cycle 1: 0x000: irmovq $0x100,%rbx # %rbx <-- 0x100
Cycle 2: 0x00a: irmovq $0x200,%rdx # %rdx <-- 0x200
Cycle 3: 0x014: addq %rdx,%rbx # %rbx <-- 0x300 CC <-- 000
Cycle 4: 0x016: je dest # Not taken
Cycle 5: 0x01f: rmmovq %rbx,0(%rdx) # M[0x200] <-- 0x300
Combinational
logic Read Write
• state set according to addq
Data
memory
instruction
• combinational logic generates
CC
000 results for je instruction
Read Write
ports ports
Register
file
%rbx = 0x300
0x01f
PC
0x016
10
Carnegie Mellon
Processor Microarchitecture
• Sequential, single-cycle microarchitecture implementation
• Basic idea
• Hardware implementation
• Pipelined microarchitecture implementation
• Basic Principles
• Difficulties: Control Dependency
• Difficulties: Data Dependency
11
Carnegie Mellon
Performance Model
Execution time
of a program = # of Dynamic Instructions
(in seconds)
12
Carnegie Mellon
Performance Model
Execution time
of a program = # of Dynamic Instructions
(in seconds) CPI
12
Carnegie Mellon
Performance Model
Execution time
of a program = # of Dynamic Instructions
(in seconds) CPI
12
Carnegie Mellon
Improving Performance
Execution time
of a program = # of Dynamic Instructions
(in seconds)
13
Carnegie Mellon
Improving Performance
Execution time
of a program = # of Dynamic Instructions
(in seconds)
13
Carnegie Mellon
Improving Performance
Execution time
of a program = # of Dynamic Instructions
(in seconds)
13
Carnegie Mellon
Improving Performance
Execution time
of a program = # of Dynamic Instructions
(in seconds)
13
Carnegie Mellon
14
Carnegie Mellon
14
Carnegie Mellon
14
Carnegie Mellon
14
Carnegie Mellon
14
Carnegie Mellon
14
Carnegie Mellon
14
Carnegie Mellon
14
Carnegie Mellon
14
Carnegie Mellon
A Motivating Example
300 ps 20 ps
R
Combinational
e
logic
g
Clock
15
Carnegie Mellon
Pipeline Diagrams
• Time to finish 3 insts = 960 ps
• Each inst.’s latency is 320 ps
OP1 320
OP2 320
OP3 320
Time
16
Carnegie Mellon
Clock
• Divide combinational logic into 3 stages of 100 ps each
• Insert registers between stages to store intermediate data between
stages. These are call pipeline registers (ISA-invisible)
• Can begin a new instruction as soon as the previous one finishes
stage A and has stored the intermediate data.
• Begin new operation every 120 ps
• Cycle time can be reduced to 120 ps
17
Carnegie Mellon
Clock
3-Stage Pipelined
OP1 A B C
OP2 A B C
OP3 A B C
Time
18
Carnegie Mellon
Comparison
• Time to finish 3 insts = 960 ps
Unpipelined
• Each inst.’s latency is 320 ps
OP1 320
OP2 320
OP3 320
Time
3-Stage Pipelined
• Time to finish 3 insets = 120 *
OP1 A B C
5 = 600 ps
OP2 A B C
• But each inst.’s latency
OP3 A B C
increases: 120 * 3 = 360 ps
Time
19
Carnegie Mellon
Benefits of Pipelining
• Time to finish 3 insts = 960 ps
• Each inst.’s latency is 320 ps
OP1
OP2
OP3
Time
20
Carnegie Mellon
21
Carnegie Mellon
Pipeline Trade-offs
• Pros: Decrease the total execution time (Increase the “throughput”).
• Cons: Increase the latency of each instruction as new registers are
needed between pipeline stages.
100 ps 20 ps 100 ps 20 ps 100 ps 20 ps
300 ps 20 ps Clock
R
Combinational
e
logic
g
Clock 22
Carnegie Mellon
Throughput
• The rate at which the processor can finish executing an
instruction (at the steady state).
Inst 1 A B C Clock
Inst 2 A B C
Inst 3 A B C Throughput of this 3-stage
Inst 4 A B C
processor is 1 instruction every
120 ps, or 8.3 Giga (billion)
Inst 5 A B C Instructions per Second (GIPS).
Time
23
Carnegie Mellon
Clock
24
Carnegie Mellon
Clock
50 ps 20 ps 150 ps 20 ps 100 ps 20 ps
Clock
24
Carnegie Mellon
Clock
50 ps 20 ps 150 ps 20 ps 100 ps 20 ps
Cycle time: 170 ps
Comb. R Comb. R Comb. R
logic e logic e logic e
A g B g C g
Clock
24
Carnegie Mellon
Clock
50 ps 20 ps 150 ps 20 ps 100 ps 20 ps
Cycle time: 170 ps
Delay: 510 ps Comb. R Comb. R Comb. R
logic e logic e logic e
A g B g C g
Clock
24
Carnegie Mellon
Clock
50 ps 20 ps 150 ps 20 ps 100 ps 20 ps
Cycle time: 170 ps
Delay: 510 ps Comb. R Comb. R Comb. R
logic e logic e logic e
A g B g C g
Thrupt: 5.9 GIPS
Clock
24
Carnegie Mellon
OP1 A B C
OP2 A B C
OP3 A B C
Time
50 ps 20 ps 150 ps 20 ps 100 ps 20 ps
Cycle time: 170 ps
Delay: 510 ps Comb. R Comb. R Comb. R
logic e logic e logic e
A g B g C g
Thrupt: 5.9 GIPS
Clock
25
Carnegie Mellon
50 ps 20 ps 100 ps 20 ps 50 ps 20 ps
26
Carnegie Mellon
50 ps 20 ps 100 ps 20 ps 50 ps 20 ps
26
Carnegie Mellon
26
Carnegie Mellon
Copy 1
Comb. R Comb. R Comb. R
logic e logic e logic e
A g B g C g
Copy 2
Comb.
logic
B
26
Carnegie Mellon
Copy 1
Comb. R Comb. R Comb. R
logic e logic e logic e
A g B g C g
M
U
X
Copy 2
Comb.
logic
B
26
Carnegie Mellon
Copy 1
Comb. R Comb. R Comb. R
logic e logic e logic e
A g B g C g
M
U
X
Copy 2
R Comb.
e logic
g B
26
Carnegie Mellon
Copy 1
Comb. R Comb. R Comb. R
logic What e logic e logic e
A Logic? C
g B M g g
U
X
Copy 2
R Comb.
e logic
g B
26
Carnegie Mellon
26
Carnegie Mellon
50 ps 20 ps 100 ps 20 ps 50 ps 20 ps
select
Copy 1
Comb. R Comb. R Comb. R
logic What e logic e logic e
A Logic? C
g B M g g
U
Clock X
Copy 2
R Comb.
e logic
g B
27
Carnegie Mellon
50 ps 20 ps 100 ps 20 ps 50 ps 20 ps
select
Copy 1
Comb. R Comb. R Comb. R
logic What e logic e logic e
A Logic? C
g B M g g
U
Clock X
Copy 2
R Comb.
e logic
g B
27
Carnegie Mellon
50 ps 20 ps 100 ps 20 ps 50 ps 20 ps
select
Copy 1
Comb. R Comb. R Comb. R
logic What e logic e logic e
A Logic? C
g B M g g
U
Clock X
Copy 2
R Comb.
e logic
g B
27
Carnegie Mellon
28
Carnegie Mellon
newPC
PC
valE, valM
Data
Data
Fetch
Memory memory
memory
■ Read instruction from instruction memory
Addr, Data
Decode
valE ■ Read program registers
CC
CC ALU
ALU
Execute
Execute Cnd
Decode
srcA, srcB
dstA, dstB A B
M
Write Back
Register
Register
file
file E ■ Write program registers
icode ifun
,
rA , rB
valC
valP
PC
Instruction
Instruction PC
PC
■ Update program counter
memory
memory increment
increment
Fetch
PC
29
Carnegie Mellon
OPq rA, rB
30
Carnegie Mellon
OPq rA, rB
icode:ifun ← M1[PC] Read instruction byte
rA:rB ← M1[PC+1] Read register byte
Fetch
30
Carnegie Mellon
OPq rA, rB
icode:ifun ← M1[PC] Read instruction byte
rA:rB ← M1[PC+1] Read register byte
Fetch
30
Carnegie Mellon
OPq rA, rB
icode:ifun ← M1[PC] Read instruction byte
rA:rB ← M1[PC+1] Read register byte
Fetch
30
Carnegie Mellon
OPq rA, rB
icode:ifun ← M1[PC] Read instruction byte
rA:rB ← M1[PC+1] Read register byte
Fetch
30
Carnegie Mellon
OPq rA, rB
icode:ifun ← M1[PC] Read instruction byte
rA:rB ← M1[PC+1] Read register byte
Fetch
30
Carnegie Mellon
OPq rA, rB
icode:ifun ← M1[PC] Read instruction byte
rA:rB ← M1[PC+1] Read register byte
Fetch
30
Carnegie Mellon
31
Carnegie Mellon
31
Carnegie Mellon
31
Carnegie Mellon
31
Carnegie Mellon
31
Carnegie Mellon
31
Carnegie Mellon
31
Carnegie Mellon
Fetch
valC ← M8[PC+1] Read destination address
valP ← PC+9 Fall through address
Fetch
valC ← M8[PC+1] Read destination address
valP ← PC+9 Fall through address
Decode
Fetch
valC ← M8[PC+1] Read destination address
valP ← PC+9 Fall through address
Decode
Execute
Cnd ← Cond(CC,ifun) Take branch?
Fetch
valC ← M8[PC+1] Read destination address
valP ← PC+9 Fall through address
Decode
Execute
Cnd ← Cond(CC,ifun) Take branch?
Memory
Fetch
valC ← M8[PC+1] Read destination address
valP ← PC+9 Fall through address
Decode
Execute
Cnd ← Cond(CC,ifun) Take branch?
Memory
Write
back
Fetch
valC ← M8[PC+1] Read destination address
valP ← PC+9 Fall through address
Decode
Execute
Cnd ← Cond(CC,ifun) Take branch?
Memory
Write
back
PC update PC ← Cnd ? valC : valP Update PC
Pipeline Stages
Fetch
• Select current PC
• Read instruction
• Compute incremented PC
Decode
• Read program registers
Execute
• Operate ALU
Memory
• Read or write data memory
Write Back
• Update register file
33
Carnegie Mellon
34
Carnegie Mellon
34
Carnegie Mellon
34
Carnegie Mellon
Idea
• Divide process into independent stages
• Move objects through stages in sequence
• At any given times, multiple objects being processed
34
Carnegie Mellon
Pipeline Illustration
R R R R Write R
Fetch e Decode e Execute e Memory e e
g g g g back g
35
Carnegie Mellon
Pipeline Illustration
Inst0
R R R R Write R
Fetch e Decode e Execute e Memory e e
g g g g back g
35
Carnegie Mellon
Pipeline Illustration
Inst1 Inst0
R R R R Write R
Fetch e Decode e Execute e Memory e e
g g g g back g
35
Carnegie Mellon
Pipeline Illustration
R R R R Write R
Fetch e Decode e Execute e Memory e e
g g g g back g
35
Carnegie Mellon
Pipeline Illustration
R R R R Write R
Fetch e Decode e Execute e Memory e e
g g g g back g
35
Carnegie Mellon
Pipeline Illustration
R R R R Write R
Fetch e Decode e Execute e Memory e e
g g g g back g
35
Carnegie Mellon
Pipeline Illustration
R R R R Write R
Fetch e Decode e Execute e Memory e e
g g g g back g
35
Carnegie Mellon
Pipeline Illustration
R R R R Write R
Fetch e Decode e Execute e Memory e e
g g g g back g
35
Carnegie Mellon
Pipeline Illustration
Inst4 Inst3
R R R R Write R
Fetch e Decode e Execute e Memory e e
g g g g back g
35
Carnegie Mellon
Pipeline Illustration
Inst4
R R R R Write R
Fetch e Decode e Execute e Memory e e
g g g g back g
35
Carnegie Mellon
Another Illustration
239
Clock
OP1 A B C
OP2 A B C
OP3 A B C
Clock
36
Carnegie Mellon
Another Illustration
241
Clock
OP1 A B C
OP2 A B C
OP3 A B C
Clock
37
Carnegie Mellon
Another Illustration
300
Clock
OP1 A B C
OP2 A B C
OP3 A B C
Clock
38
Carnegie Mellon
Another Illustration
359
Clock
OP1 A B C
OP2 A B C
OP3 A B C
Clock
39
Carnegie Mellon
40
Carnegie Mellon
Control Dependency
• Definition: Outcome of instruction A determines whether or not
instruction B should be executed.
• Jump instruction example below:
• jne L1 determines whether irmovq $1, %rax should be
executed
• But jne doesn’t know its outcome until after its Execute stage
41
Carnegie Mellon
Control Dependency
• Definition: Outcome of instruction A determines whether or not
instruction B should be executed.
• Jump instruction example below:
• jne L1 determines whether irmovq $1, %rax should be
executed
• But jne doesn’t know its outcome until after its Execute stage
41
Carnegie Mellon
Control Dependency
• Definition: Outcome of instruction A determines whether or not
instruction B should be executed.
• Jump instruction example below:
• jne L1 determines whether irmovq $1, %rax should be
executed
• But jne doesn’t know its outcome until after its Execute stage
1 2
41
Carnegie Mellon
Control Dependency
• Definition: Outcome of instruction A determines whether or not
instruction B should be executed.
• Jump instruction example below:
• jne L1 determines whether irmovq $1, %rax should be
executed
• But jne doesn’t know its outcome until after its Execute stage
1 2 3
41
Carnegie Mellon
Control Dependency
• Definition: Outcome of instruction A determines whether or not
instruction B should be executed.
• Jump instruction example below:
• jne L1 determines whether irmovq $1, %rax should be
executed
• But jne doesn’t know its outcome until after its Execute stage
1 2 3
41
Carnegie Mellon
Control Dependency
• Definition: Outcome of instruction A determines whether or not
instruction B should be executed.
• Jump instruction example below:
• jne L1 determines whether irmovq $1, %rax should be
executed
• But jne doesn’t know its outcome until after its Execute stage
1 2 3 4
41
Carnegie Mellon
Control Dependency
• Definition: Outcome of instruction A determines whether or not
instruction B should be executed.
• Jump instruction example below:
• jne L1 determines whether irmovq $1, %rax should be
executed
• But jne doesn’t know its outcome until after its Execute stage
1 2 3 4
41
Carnegie Mellon
Control Dependency
• Definition: Outcome of instruction A determines whether or not
instruction B should be executed.
• Jump instruction example below:
• jne L1 determines whether irmovq $1, %rax should be
executed
• But jne doesn’t know its outcome until after its Execute stage
1 2 3 4 5
41
Carnegie Mellon
Control Dependency
• Definition: Outcome of instruction A determines whether or not
instruction B should be executed.
• Jump instruction example below:
• jne L1 determines whether irmovq $1, %rax should be
executed
• But jne doesn’t know its outcome until after its Execute stage
1 2 3 4 5
41
Carnegie Mellon
Control Dependency
• Definition: Outcome of instruction A determines whether or not
instruction B should be executed.
• Jump instruction example below:
• jne L1 determines whether irmovq $1, %rax should be
executed
• But jne doesn’t know its outcome until after its Execute stage
1 2 3 4 5 6 7 8 9
41
Carnegie Mellon
Delay Slots
1 2 3 4 5 6 7 8 9
42
Carnegie Mellon
Delay Slots
1 2 3 4 5 6 7 8 9
if (cond) {
do_A();
} else {
do_B();
}
do_C();
42
Carnegie Mellon
Delay Slots
1 2 3 4 5 6 7 8 9
if (cond) {
do_A();
Have to make sure do_C doesn’t
depend on do_A and do_B!!!
} else {
do_B();
}
do_C();
42
Carnegie Mellon
Delay Slots
1 2 3 4 5 6 7 8 9
do_C();
if (cond) {
A less obvious
example do_A();
} else {
do_B();
}
43
Carnegie Mellon
Delay Slots
1 2 3 4 5 6 7 8 9
do_C(); add A, B
if (cond) { or C, D
A less obvious
example do_A(); sub E, F
} else { jle 0x200
do_B(); add A, C
}
43
Carnegie Mellon
Delay Slots
1 2 3 4 5 6 7 8 9
43
Carnegie Mellon
Delay Slots
1 2 3 4 5 6 7 8 9
43
Carnegie Mellon
44