COA Module4
COA Module4
CSE 2009
In cases where an instruction occupies more than 1 word, step1 & 2 must
be repeated as many times as necessary to fetch the complete instruction.
These 2 steps are usually referred as fetch phase and step 3 constitutes
execution phase Monday, July 8, 2024
3
Processor Organization (Single BUS)
To study these operations in detail, let us see how processor (ALU) and all the
The data and address lines of the external memory bus connected to the internal
processor bus via the memory data register, MDR, and the memory address register,
MAR respectively.
Data may be loaded into MDR either from the memory bus or from the internal processor
bus.
The input of MAR is connected to the internal bus, and its output is connected to the
external bus.
Monday, July 8, 2024
4
Single-
Figure 1:
bus
Organization
of the Data
path inside
CPU
1. Transfer a word of data from one processor register to another or to the ALU.
(REGISTER TRANSFERS)
2. Perform arithmetic or a logic operation and store the result in a processor
register.(ARITHMETIC OR LOGIC OPERATION)
3. Fetch the contents of a given memory location and load them into a processor
register.(FETCHING)
4. Store a word of data from a processor register into a given memory location.
(STORING)
1. R1out, R4in
R i in
R i
R i out
Y in
Constant 4
Select MUX
A B
ALU
Z in
For each register two control signals are used to place the contents of that
register on the bus or to load the data on the bus into register.(symbolically
The input and output of register Ri are connected to the bus via switches
When Riin is set to 1, the data on the bus are loaded into R i.
Similarly, when Riout is set to 1, the contents of register Ri are placed on the bus.
CONTROL SEQUENCE
1. R1out, Yin
3. Zout, R3in
1. MAR [R1]
2. Start a Read operation on the memory bus
3. Wait for the MFC(Memory Function Completed) response from the memory
4. Load MDR from the memory bus
5. R2 [MDR]
2.MDRinE, WMFC
3.MDRout, R2in`
CONTROL SEQUENCE
1. R1out, MARin
3. MDRoutE, WMFC
FETCH PHASE
In step 1 instruction fetch operation is initiated by loading the contents of the
PC into the MAR and sending a read request to the memory.
The select signal is set to select the constant4.
This value is added to the operand at input B, which is the contents of the PC,
and the result is stored in register Z.
The updated value is moved from register Z back into the PC during step 2,
while waiting for the memory to respond.
In step 3, the word fetched from the memory is loaded into the IR.
Steps 1 to 3 constitute the instruction fetch phase, which is the same for
all instructions.
This discussion accounts for all control signals in figure5 except Yin in step 2. There is no
need to copy the updated contents of PC into register Y when executing the Add instruction. But,
in branch instructions the updated value of the PC is needed to compute the branch target
address. To speed up the execution of branch instructions, this value is copied into register Y in
step 2.
ideas.
The resulting control sequences in figure 5 and 6 are quite long because only one
multiple internal paths that enable several transfers to take place in parallel.
If needed, the ALU may simply pass one of its two input operands unmodified
to bus C. We will call the ALU control signals for such an operation R=A or R=B.
Using the Incrementer eliminates the need to add 4 to the PC using the main
ALU.
The source for the constant 4 at the ALU input multiplexer is still useful. It
can be used to increment other addresses such as the memory addresses in load
Figure : Control sequence for Add R4, R5, R6 using three bus organization
By providing more paths for data transfer a significant reduction in the number of clock
cycles needed to execute an instruction is achieved.
Monday, July 8, 2024
3
1
END of Module 4 Part A: BPU
CSE 2009
• Laundry Example
• Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
• Washer takes 30 minutes A B C D
6 PM 7 8 9 10 11 Midnight
Time
30 40 20 30 40 20 30 40 20 30 40 20
• Sequential laundry takes 6
hours for 4 loads
A
• If they learned pipelining,
how long would laundry
take?
B
D
Traditional Pipeline Concept
6 PM 7 8 9 10 11 Midnight
Time
T
a 30 40 40 40 40 20
s
k A
• Pipelined laundry takes
3.5 hours for 4 loads
O B
r
d C
e
r D
Traditional Pipeline Concept
Fetch + Execution
T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction
I1 F1 E1
(a) Sequential execution
I2 F2 E2
Interstage buffer
B1
I3 F3 E3
Instruction Execution
fetch unit (c) Pipelined execution
unit
Instruction
I1 F1 D1 E1 W1
Fetch + Decode
+ Execution + Write I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
Interstage buffers
D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
I5 F5 D5 E5
Figure 8.3. Effect of an execution operation taking more than one clock cycle.
Pipeline Performance
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction Instruction
hazard I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Stage
F: Fetch F1 F2 F2 F2 F2 F3
Idle periods –
D: Decode D1 idle idle idle D2 D3
stalls (bubbles)
E: Execute E1 idle idle idle E2 E3
Load X(R1), R2
Structural
Time
hazard Clock cycle 1 2 3 4 5 6 7
Instruction
I1 F1 D1 E1 W1
I2 (Load) F2 D2 E2 M2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4
I5 F5 D5