Module-5_DDCO

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Module-5

Basic Processing Unit


Pipelining
Syllabus
• Basic Processing Unit: Some Fundamental Concepts:
• Register Transfers,
• Performing ALU operations,
• Fetching a word from Memory,
• Storing a word in memory.
• Execution of a Complete Instruction.
• Pipelining: Basic concepts,
• Role of Cache memory,
• Pipeline Performance.
Overview

• Instruction Set Processor (ISP)


• Central Processing Unit (CPU)
• A typical computing task consists of a series of steps
specified by a sequence of machine instructions
that constitute a program.
• An instruction is executed by carrying out a
sequence of more rudimentary operations.
• Task-fetching, decoding and executing instructions
of a program.
Fundamental Concepts

• Processor fetches one instruction at a time and perform


the operation specified.
• Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
• Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
• Instruction Register (IR)- Each instructions comprises
4bytes, stored in one binary word.
Executing an Instruction
• Fetch the contents of the memory location pointed to
by the PC. The contents of this location are loaded into
the IR (fetch phase).
IR ← [[PC]]
• Assuming that the memory is byte addressable,
increment the contents of the PC by 4 (fetch phase).
PC ← [PC] + 4
• Carry out the actions specified by the instruction in the
IR (execution phase).
Processor Organization

Single bus organization of the Datapath inside a processor


Internal organization of the processor

• ALU
• Registers for temporary storage
• Various digital circuits for executing different micro
operations.(gates, MUX, decoders, counters).
• Internal path for movement of data between ALU and
registers.
• Driver circuits for transmitting signals to external units.
• Receiver circuits for incoming signals from external units.
PC:
❖ Keeps track of execution of a program
❖ Contains the memory address of the next
instruction to be fetched and executed.
MAR:
❖Holds the address of the location to be accessed.
❖I/P of MAR is connected to Internal bus and an O/p
to external bus.
MDR:
❖Contains data to be written into or read out of the
addressed location.
❖IT has 2 inputs and 2 Outputs.
❖Data can be loaded into MDR either from memory
bus or from internal processor bus.
❖The data and address lines are connected to the
internal bus via MDR and MAR.
Registers:
❖The processor registers R0 to Rn-1 vary considerably from one
processor to another.
❖Registers are provided for general purpose used by programmer.
❖Special purpose registers-index & stack registers.
❖Registers Y, Z & TEMP are temporary registers used by processor
during the execution of some instruction.
Multiplexer:
❖Select either the output of the register Y or a constant value 4 to
be provided as input A of the ALU.
❖Constant 4 is used by the processor to increment the contents of
PC.
ALU:
❖Used to perform arithmetic and logical operation.
Data Path:
❖The registers, ALU and interconnecting bus are collectively
referred to as the data path.
Input and output gating for the registers Internal processor
bus

R i in

Register Transfers
R i

• The input and output gates for R i out

register Ri are controlled by Y in

signals is Rin and Riout .


• Rin Is set to1 – data available Y

on common bus are loaded Constant 4


into Ri.
• Riout Is set to1 – the contents Select MUX

of register are placed on the


A B
bus. ALU
• Riout Is set to 0 – the bus can
be used for transferring data Z in

from other registers.


Z
• Processor clock and
multiphase clocking. Z out
Data transfer between two registers:
Internal processor
bus
R iin
EX:
Transfer the contents of R1 to R4. Ri

1. Enable output of register R1 by R iout


setting R1out=1. This places the Y in
contents of R1 on the processor
bus. Y

Constant 4
2. Enable input of register R4 by
setting R4in=1. This loads the Select MUX

data from the processor bus into A B


register R4. ALU

Z in

Z out
Performing an Arithmetic or Logic Operation
• The ALU is a combinational circuit that has no internal
storage.
• ALU gets the two operands from MUX and bus. The result
is temporarily stored in register Z.
• What is the sequence of operations to add the contents of
register R1 to those of R2 and store the result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in

Step 1: Output of the register R1 and input of the register Y are enabled,
causing the contents of R1 to be transferred to Y.
Step 2: The multiplexer’s select signal is set to select Y causing the
multiplexer to gate the contents of register Y to input A of the ALU.
Step 3: The contents of Z are transferred to the destination register R3.
Register Transfers
All operations and data transfers are controlled by the processor clock.

Input and output gating for one register bit.


Fetching a Word from Memory
• Address into MAR; issue Read operation; data into MDR.

Connection and control signals for register MDR.


Fetching a Word from Memory
• The response time of each memory access varies (cache
miss, memory-mapped I/O,…).
• To accommodate this, the processor waits until it
receives an indication that the requested operation has
been completed (Memory-Function-Completed, MFC).
• Move (R1), R2
➢MAR ← [R1]
➢Start a Read operation on the memory bus
➢Wait for the MFC response from the memory
➢Load MDR from the memory bus
➢R2 ← [MDR]
Timing
Assume MAR is always
available on the address
lines of the memory bus.

⚫ Move (R1), R2
1. R1out, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in

Timing of a memory Read operation


Storing a word in memory

• Address is loaded into MAR


• Data to be written loaded into MDR.
• Write command is issued.
• Example: Move R2,(R1)
R1out,MARin
R2out,MDRin,Write
MDRoutE, WMFC
Execution of a Complete Instruction

• Add (R3), R1
• Fetch the instruction
• Fetch the first operand (the contents of the memory location
pointed to by R3)
• Perform the addition
• Load the result into R1
Execution of a Complete Instruction

Add (R3), R1

Control sequence for execution of the instruction Add (R3), R1


Execution of Branch Instructions
• A branch instruction replaces the contents of PC with the
branch target address, which is usually obtained by adding an
offset X given in the branch instruction.
• The offset X is usually the difference between the branch target
address and the address immediately following the branch
instruction.
• Un-Conditional branch

Step Action

1 PC out , MAR in , Read, Select4, Add, Z in


2 Z out , PC in , Y in , WMF C
3 MDR out , IR in
4 Offset-field-of-IR out, Add, Z in
5 Z out, PC in , End

Control sequence for an unconditional branch instruction


Exercise

• What is the control sequence for


execution of the instruction
Add R1, R2
including the instruction fetch phase?
(Assume single bus architecture)
Making the Execution of Programs Faster

• Use faster circuit technology to build the processor and the main
memory.
• Arrange the hardware so that more than one operation can be
performed at the same time.
• In the latter way, the number of operations performed per second is
increased even though the elapsed time needed to perform any one
operation is not changed.
• Concurrent activities-Pipelining
• Pipelining commonly known as Assembly line operation.
• In computer, fetch and execute the instructions one after the other.
Traditional Pipeline Concept

• Laundry Example
• Ann, Brian, Cathy, Dave each have
one load of clothes to wash, dry,
A B C D
and fold.
• Washer takes 30 minutes

• Dryer takes 40 minutes

• “Folder” takes 20 minutes


Traditional Pipeline Concept

6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20

A • Sequential laundry takes 6


hours for 4 loads.
• If they learned pipelining, how
B long would laundry take?

D
Traditional Pipeline Concept

6 PM 7 8 9 10 11 Midnight

Time
T 30 40 40 40 40 20
a
s
k A
• Pipelined laundry takes 3.5
hours for 4 loads
O B
r
d C
e
r D
Traditional Pipeline Concept

6 PM 7 8 9 • Pipelining doesn’t help latency


of single task, it helps
Time throughput of entire workload
T • Pipeline rate limited by slowest
30 40 40 40 40 20 pipeline stage
a
s
• Multiple tasks operating
A simultaneously using different
k resources
• Potential speedup = Number
O B pipe stages
r
• Unbalanced lengths of pipe
d C stages reduces speedup
e • Time to “fill” pipeline and time
r to “drain” it reduces speedup
D
• Stall for Dependences
Use the Idea of Pipelining in a Computer
Fetch + Execution

T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F1 E1 F2 E2 F3 E3
Instruction
I1 F1 E1
(a) Sequential execution

I2 F2 E2
Interstage buffer
B1
I3 F3 E3
Instruction Execution
fetch unit (c) Pipelined execution
unit

Figure: Basic idea of instruction pipelining.


(b) Hardware organization
Use the Idea of Pipelining in a Computer

Fetch + Decode + Execution + Write

4-stage pipelining
Role of Cache Memory

• Each pipeline stage is expected to complete in one clock


cycle.
• The clock period should be long enough to let the
slowest pipeline stage to complete.
• Faster stages can only wait for the slowest one to
complete.
• Since main memory is very slow compared to the
execution, if each instruction needs to be fetched from
main memory, pipeline is almost useless.
• Fortunately, we have cache.
Pipeline Performance

• The potential increase in performance resulting from


pipelining is proportional to the number of pipeline stages.
• However, this increase would be achieved only if all pipeline
stages require the same time to complete, and there is no
interruption throughout program execution.
• Unfortunately, this is not true.
Pipeline Performance
Pipeline Performance
• The previous pipeline is said to have been stalled for two
clock cycles.
• Any condition that causes a pipeline to stall is called a hazard.
• Data hazard – any condition in which either the source or
the destination operands of an instruction are not available
at the time expected in the pipeline. So some operation has
to be delayed, and the pipeline stalls.
• Instruction (control) hazard – a delay in the availability of an
instruction causes the pipeline to stall.
• Structural hazard – the situation when two instructions
require the use of a given hardware resource at the same
time.
Pipeline Performance

Instruction
hazard

Idle periods –
stalls (bubbles)
Pipeline Performance
Load X(R1), R2
Structural
hazard

Effect of a load instruction on pipeline timing


Pipeline Performance

• Again, pipelining does not result in individual


instructions being executed faster; rather, it is the
throughput that increases.
• Throughput is measured by the rate at which
instruction execution is completed.
• Pipeline stall causes degradation in pipeline
performance.
• We need to identify all hazards that may cause the
pipeline to stall and to find ways to minimize their
impact.

You might also like