0% found this document useful (0 votes)
18 views50 pages

COA Module4

Uploaded by

muthukumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views50 pages

COA Module4

Uploaded by

muthukumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

PRESIDENCY UNIVERISTY, BENGALURU, School of Engineering

Computer Organization and Architecture

CSE 2009

Module-4: Part A- Basic processing Unit (BPU)

Monday, July 8, 2024 1


Fundamental Concepts
 Processor fetches one instruction at a time, and performs the operation
specified.
 Instructions are fetched from successive memory locations until a branch or a
jump instruction is encountered.
 Processor keeps track of the address of the memory location containing the
next instruction to be fetched using Program Counter (PC).
 After fetching an instruction, contents of PC are updated to point to the next instruction
in the sequence.
 Another key register in the processor is the Instruction Register (IR)
 Suppose, each instruction comprises 4 bytes, and that is stored in one memory word.
 To execute an instruction, processor has to perform the following 3 steps

Monday, July 8, 2024


2
Executing an Instruction
1. Fetch the contents of the memory location pointed to by the PC. The contents
of this location are loaded into the IR (fetch phase).
IR [[PC]]
2. Assuming that the memory is byte addressable, increment the contents
of the PC by 4 (fetch phase).
PC  [PC] + 4
3. Carry out the actions specified by the instruction in the IR (execution
phase).

In cases where an instruction occupies more than 1 word, step1 & 2 must
be repeated as many times as necessary to fetch the complete instruction.

These 2 steps are usually referred as fetch phase and step 3 constitutes
execution phase Monday, July 8, 2024
3
Processor Organization (Single BUS)
To study these operations in detail, let us see how processor (ALU) and all the

registers are interconnected via a single common bus.

The data and address lines of the external memory bus connected to the internal

processor bus via the memory data register, MDR, and the memory address register,

MAR respectively.

 Register MDR has two inputs and two outputs.

 Data may be loaded into MDR either from the memory bus or from the internal processor

bus.

 The data stored in MDR may be placed on either bus.

 The input of MAR is connected to the internal bus, and its output is connected to the

external bus.
Monday, July 8, 2024
4
Single-
Figure 1:

bus
Organization
of the Data
path inside
CPU

Monday, July 8, 2024


5
 The control lines of the memory bus are connected to the instruction decoder
and control logic.
 This unit is responsible for issuing the signals that control the operation of all
the units inside the processor and for increasing with the memory bus.
 The number & use of the processor registers R0 through R(n-1) vary considerably from
one processor to another.
 These registers may be general purpose registers used by the programmer and special
purpose registers dedicated for particular purpose used by the processor, such as index
registers or stack pointers, etc.
 3 registers Y, Z & TEMP in fig are used by the processor for temporary storage during
execution of instruction
 The MUX selects either the output of register Y or a constant value 4 to be
provided as input A of the ALU.
 The constant 4 is used to increment the contents of the program counter.
Monday, July 8, 2024
7
Executing an Instruction

With few exceptions, an Instruction can performs some operations like:

1. Transfer a word of data from one processor register to another or to the ALU.
(REGISTER TRANSFERS)
2. Perform arithmetic or a logic operation and store the result in a processor
register.(ARITHMETIC OR LOGIC OPERATION)
3. Fetch the contents of a given memory location and load them into a processor
register.(FETCHING)
4. Store a word of data from a processor register into a given memory location.
(STORING)

Monday, July 8, 2024


8
1.REGISTER TRANSFERS

Example: MOVE R1, R4


Suppose we wish to transfer the contents of register R1 to register R4. This can be
accomplished as follows:
 Enable the output of registers R1 by setting R1 out to 1. This places the contents of R1 on the
processor bus.
 Enable the input of register R4 by setting R4 in to 1. This loads data from the processor bus
into register R4.
CONTROL SEQUENCE

1. R1out, R4in

Monday, July 8, 2024


9
Internal processor
b us

R i in

R i

R i out

Y in

Constant 4

Select MUX

A B
ALU

Z in

Monday, July 8, 2024


1 Z out
1
Instruction execution involves a sequence of steps in which data are
transferred from one register to another.

 For each register two control signals are used to place the contents of that

register on the bus or to load the data on the bus into register.(symbolically

represented in above figure)

 The input and output of register Ri are connected to the bus via switches

controlled by the signals Riin and Riout respectively.

 When Riin is set to 1, the data on the bus are loaded into R i.

 Similarly, when Riout is set to 1, the contents of register Ri are placed on the bus.

Monday, July 8, 2024


1
2
2. PERFORM AN ARITHMETIC OR LOGIC OPERATION

Example: ADD R1, R2, R3


 The ALU is a combinational circuit that has no internal storage.
 ALU gets the two operands from MUX and bus. The result is temporarily stored in register
Z.
 The sequence of operations to add the contents of register R1 to those of R2 and store the
result in R3 is shown below:

CONTROL SEQUENCE

1. R1out, Yin

2. R2out, SelectY, Add, Zin

3. Zout, R3in

Monday, July 8, 2024


1
3
3. FETCHING A WORD FROM MEMORY
Example: Move
(R1), R2
The actions needed to execute this instruction are:

1. MAR  [R1]
2. Start a Read operation on the memory bus
3. Wait for the MFC(Memory Function Completed) response from the memory
4. Load MDR from the memory bus
5. R2  [MDR]

Monday, July 8, 2024


1
5
The connections and control signals for register MDR is shown below:

Figure: Connections for register MDR

Monday, July 8, 2024


1
6
CONTROL SEQUENCE

1.R1out, MARin, Read

2.MDRinE, WMFC

3.MDRout, R2in`

Monday, July 8, 2024


1
7
4. STORING A WORD Into MEMORY
Example: Move R2, (R1)

 Writing a word into a memory location


follows a similar procedure.
 The desired address is loaded into MAR.
 Then, the data to be written are loaded
into MDR, and a write command is issued.

CONTROL SEQUENCE

1. R1out, MARin

2. R2out, MDRin, Write

3. MDRoutE, WMFC

Monday, July 8, 2024


1
9
Storing into memory
EXECUTION OF A COMPLETE INSTRUCTION

Let us now put together the sequence of elementary operations required to


execute one instruction.
Consider the instruction
ADD (R3), R1

which adds the contents of memory location provided by R3 to register R1.


Executing this instruction requires the following actions:
1. Fetch the instruction.
2. Fetch the first operand (the contents of the memory location pointed to by R3).
3. Perform the addition.
4. Load the result into R1.

Monday, July 8, 2024


2
1
Control Sequence
ADD (R3), R1

Monday, July 8, 2024


2
2
Instruction execution proceeds as follows:

FETCH PHASE
 In step 1 instruction fetch operation is initiated by loading the contents of the
PC into the MAR and sending a read request to the memory.
 The select signal is set to select the constant4.
 This value is added to the operand at input B, which is the contents of the PC,
and the result is stored in register Z.
 The updated value is moved from register Z back into the PC during step 2,
while waiting for the memory to respond.
 In step 3, the word fetched from the memory is loaded into the IR.
 Steps 1 to 3 constitute the instruction fetch phase, which is the same for
all instructions.

Monday, July 8, 2024


2
3
EXECUTE PHASE
 The instruction is decoded and the control circuitry activates the control
signals for steps 4 through 7, which constitute the execution phase.
 The contents of register R3 are transferred to MAR in step 4, and a memory
read operation is initiated.
 Then the contents of R1 are transferred to register Y in step 5, to prepare for
the addition operation.
 When the read operation is completed, the memory operand is available in
register MDR, and the addition operation is performed in step 6.
 The addition is performed by ALU and the sum is stored in register Z, and then
transferred to R1 in step 7.
 The END signal causes a new instruction fetch cycle to begin by returning to
step 1.
Monday, July 8, 2024
2
4
NOTE

This discussion accounts for all control signals in figure5 except Yin in step 2. There is no
need to copy the updated contents of PC into register Y when executing the Add instruction. But,
in branch instructions the updated value of the PC is needed to compute the branch target
address. To speed up the execution of branch instructions, this value is copied into register Y in
step 2.

Monday, July 8, 2024


2
5
MULTIPLE-BUS ORGANIZATION

 We used the simple single-bus architecture of figure 1 to illustrate the basic

ideas.

 The resulting control sequences in figure 5 and 6 are quite long because only one

data item can be transferred over the bus in a clock cycle.

 To reduce the number of steps needed, most commercial processors provide

multiple internal paths that enable several transfers to take place in parallel.

Monday, July 8, 2024


2
6
Figure:
Three-bus
organization
of the data
path
MULTIPLE-BUS ORGANIZATION
 Figure above depicts a three-bus structure used to connect the registers
and ALU of a processor.
 All general purpose registers are combined into a single block called the
register file.
 The register file in figure 7 is said to have three ports. There are two
outputs, allowing the contents of two different registers to be accessed
simultaneously and have their contents placed on buses A and B. The third
port allows the data on bus C to be loaded into a third register during the
same clock cycle.
 Buses A and B are used to transfer the source operands to the A and B
inputs of the ALU, where an arithmetic or logic operations may be performed.
 The result is transferred to the destination over bus C.

Monday, July 8, 2024


2
8
Cntd…

 If needed, the ALU may simply pass one of its two input operands unmodified

to bus C. We will call the ALU control signals for such an operation R=A or R=B.

 Another feature in figure 7 is the introduction of the Incrementer unit, which

is used to increment the PC by 4.

 Using the Incrementer eliminates the need to add 4 to the PC using the main

ALU.

 The source for the constant 4 at the ALU input multiplexer is still useful. It

can be used to increment other addresses such as the memory addresses in load

multiple and store multiple instructions.

Monday, July 8, 2024


2
9
Control Sequence

//PC to MAR through i/p B of ALU

//Wait for MEM function to complete

//MDR to IR through input B of ALU

Figure : Control sequence for Add R4, R5, R6 using three bus organization

Monday, July 8, 2024


3
0
Consider the three operand instruction

ADD R4, R5, R6


 The control sequence for executing this instruction is given in figure 8.
 In step 1, the contents of the PC are passed through the ALU, using the R=B control signal,
and loaded into the MAR to start a memory read operation. (PC to MAR)
 At the same time the PC is incremented by 4.
 In step 2, the processor waits for MFC (MEMORY FUNCTION COMPLETE) and loads the
data received into MDR.
 transfers data from MDR to IR in step 3.
 Finally, the execution phase of the instruction requires only one control step to complete,
step 4.

 By providing more paths for data transfer a significant reduction in the number of clock
cycles needed to execute an instruction is achieved.
Monday, July 8, 2024
3
1
END of Module 4 Part A: BPU

Monday, July 8, 2024


3
2
PRESIDENCY UNIVERISTY, BENGALURU, School of Engineering

Computer Organization and Architecture

CSE 2009

Module-4: Part B-Pipelining

Monday, July 8, 2024 33


MODULE 4B. Pipelining

Pipelining is widely used in modern processors.


Pipelining improves system performance in terms of throughput.
Pipelined organization requires sophisticated compilation techniques.
Basic Concepts

Making the Execution of


Programs Faster

• Use faster circuit technology to build the processor


and the main memory.
• Arrange the hardware so that more than one
operation can be performed at the same time.
• In the latter way, the number of operations
performed per second is increased even though the
elapsed time needed to perform any one operation
is not changed.
Traditional Pipeline Concept

• Laundry Example
• Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
• Washer takes 30 minutes A B C D

• Dryer takes 40 minutes

• “Folder” takes 20 minutes


Traditional Pipeline Concept

6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20
• Sequential laundry takes 6
hours for 4 loads
A
• If they learned pipelining,
how long would laundry
take?
B

D
Traditional Pipeline Concept

6 PM 7 8 9 10 11 Midnight

Time
T
a 30 40 40 40 40 20
s
k A
• Pipelined laundry takes
3.5 hours for 4 loads
O B
r
d C
e
r D
Traditional Pipeline Concept

• Pipelining doesn’t help


6 PM 7 8 9 latency of single task, it
helps throughput of entire
workload
Time
T • Pipeline rate limited by
30 40 40 40 40 20 slowest pipeline stage
a
• Multiple tasks operating
s simultaneously using
A
k different resources
• Potential speedup =
B Number pipe stages
O
r • Unbalanced lengths of
pipe stages reduces
d C speedup
e • Time to “fill” pipeline and
r time to “drain” it reduces
D speedup
• Stall for Dependences
Use the Idea of Pipelining in
a Computer

Fetch + Execution
T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction

I1 F1 E1
(a) Sequential execution

I2 F2 E2
Interstage buffer
B1
I3 F3 E3

Instruction Execution
fetch unit (c) Pipelined execution
unit

Figure 8.1. Basic idea of instruction pipelining.


(b) Hardware organization
Use the Idea of Pipelining in
a Computer
Time
Clock cycle 1 2 3 4 5 6 7

Instruction

I1 F1 D1 E1 W1
Fetch + Decode
+ Execution + Write I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

(a) Instruction execution divided into four steps

Interstage buffers

D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3

(b) Hardware organization

Textbook page: 457

Figure 8.2. A 4-stage pipeline.


Role of Cache Memory

• Each pipeline stage is expected to complete in


one clock cycle.
• The clock period should be long enough to let
the slowest pipeline stage to complete.
• Faster stages can only wait for the slowest one
to complete.
• Since main memory is very slow compared to
the execution, if each instruction needs to be
fetched from main memory, pipeline is almost
useless.
• Fortunately, we have cache.
Pipeline Performance

• The potential increase in performance resulting


from pipelining is proportional to the number of
pipeline stages.
• However, this increase would be achieved only if all
pipeline stages require the same time to complete,
and there is no interruption throughout program
execution.
• Unfortunately, this is not true.
Pipeline Performance

Time
Clock cycle 1 2 3 4 5 6 7 8 9

Instruction

I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

I5 F5 D5 E5

Figure 8.3. Effect of an execution operation taking more than one clock cycle.
Pipeline Performance

• The previous pipeline is said to have been stalled for two


clock cycles.
• Any condition that causes a pipeline to stall is called a
hazard.
• Data hazard – any condition in which either the source or
the destination operands of an instruction are not available
at the time expected in the pipeline. So some operation has
to be delayed, and the pipeline stalls.
• Instruction (control) hazard – a delay in the availability of
an instruction causes the pipeline to stall.
• Structural hazard – the situation when two instructions
require the use of a given hardware resource at the same
time.
Pipeline Performance

Time
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction Instruction
hazard I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

(a) Instruction execution steps in successive clock cycles

Time
Clock cycle 1 2 3 4 5 6 7 8 9

Stage
F: Fetch F1 F2 F2 F2 F2 F3
Idle periods –
D: Decode D1 idle idle idle D2 D3
stalls (bubbles)
E: Execute E1 idle idle idle E2 E3

W: Write W1 idle idle idle W2 W3

(b) Function performed by each processor stage in successive clock cycles

Figure 8.4. Pipeline stall caused by a cache miss in F2.


Pipeline Performance

Load X(R1), R2
Structural
Time
hazard Clock cycle 1 2 3 4 5 6 7

Instruction
I1 F1 D1 E1 W1

I2 (Load) F2 D2 E2 M2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4

I5 F5 D5

Figure 8.5. Effect of a Load instruction on pipeline timing.


Pipeline Performance

• Again, pipelining does not result in individual


instructions being executed faster; rather, it is
the throughput that increases.
• Throughput is measured by the rate at which
instruction execution is completed.
• Pipeline stall causes degradation in pipeline
performance.
• We need to identify all hazards that may cause
the pipeline to stall and to find ways to
minimize their impact.
Quiz

• Four instructions, the I2 takes two clock cycles for


execution. Pls draw the figure for 4-stage pipeline,
and figure out the total cycles needed for the four
instructions to complete.
END of Module 4 Part B: Pipelining

Monday, July 8, 2024


5
0

You might also like