0% found this document useful (0 votes)
97 views6 pages

Pipelining Basic Concepts and Approaches

Piping Research Paper,

Uploaded by

Usama Tayyab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views6 pages

Pipelining Basic Concepts and Approaches

Piping Research Paper,

Uploaded by

Usama Tayyab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

International Journal of Scientific & Engineering Research, Volume 7, Issue 4, April-2016 1197

ISSN 2229-5518

Pipelining: Basic Concepts and Approaches


RICHA BAIJAL1
1
Student,M.Tech,Computer Science And Engineering
Career Point University,Alaniya,Jhalawar Road,Kota-325003 (Rajasthan)

Abstract-This paper is concerned with the pipelining principles while designing a processor.The basics of instruction pipeline are
discussed and an approach to minimize a pipeline stall is explained with the help of example.The main idea is to understand the working of
a pipeline in a processor.Various hazards that cause pipeline degradation are explained and solutions to minimize them are discussed.

Index Terms— Data dependency, Hazards in pipeline, Instruction dependency, parallelism, Pipelining, Processor, Stall.

——————————  ——————————

1 INTRODUCTION does the paint. Still,2 rooms are idle. These rooms that I want
to paint constitute my hardware.The painter and his skills are

T O understand what pipelining is,let us consider the as-


the objects and the way i am using them refers to the stag-
es.Now,it is quite possible i limit my resources,i.e. I just have
two buckets of paint at a time;therefore,i have to wait until
sembly line manufacturing of a car.If you have ever gone to a
machine work shop ; you might have seen the different as- these two stages give me an output.Although,these are inde-

IJSER
pendent tasks,but what i am limiting is the resources.
semblies developed for developing its chasis,adding a part to
I hope having this comcept in mind,now the reader knows
its body,wheels alignment and painting the parts.All this to-
what he has to do with his computer to achieve a maximum
gether bring up your favourite car,every assembly line adding utilization.[2]
to the perfection and doing their best.Now,if these units wait
for resources or we can say that if the second stage is depend- 3 DIFFERENCE BETWEEN SEQUENTIAL PRO-
ent on the first one,then more time will be consumed in build- CESSING AND PIPELINING :
ing the car.So,we divide the tasks in such a way that their de-
pendency is relaxed.In pipelining a task on computer,we ei- Below is an illustration of the basic difference between execut-
ther divide it in such a way that one task is independent of the ing four subtasks of a given instruction (in this case fetching F,
other so that hardware units can switch the tasks between decoding D, execution E, and writing the results W) using
them using a clock or we add a hardware circuitry to speed up pipelining and sequential processing.[3]
the tasks. The second approach adds up the cost while the first
one results in the efficient utilization of available resources,to
what we call Pipelining.[1]

2 PARALLELISM AND PIPELINING :


When we implement pipelining in processing a task on com-
puter,we create objects or stages.These stages are inde-
pendently working and accomplish a certain task. After a cer-
tain period of time ,the output from a certain stage is given as
input to the next stageThis is done by synchronizing a clock or
simply specifying a time period for certain task.Then next
stage works on the combined input for a period of time and
produces the desired results.Meanwhile,the other stages also
work on their inputs. If i have to summarize the concept of
pipelining,it is simply achieving maximum throughput from
the computer by utilizing resources in parallel manner and
avoiding resource deadlocks.
A very good example from our daily lives is :
I want to paint my house. It has 4 rooms.one approach is to
employ a painter with his paint bucket and wait until he
paints one room. The second approach is to employ another Pipelining refers to the technique in which a given task is di-
worker with him who in parallel works in the other room and vided into a number of subtasks that need to be performed in
IJSER © 2016
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 7, Issue 4, April-2016 1198
ISSN 2229-5518
sequence. It means that in our example,the processor utilization using
• Each subtask is performed by a given functional unit. pipelined tasks is 80 %.
• The units are connected in a serial fashion and all of them 3. Efficiency E(n)=Ratio of the actual speed-up to the max-
operate simultaneously. imum speed-up
𝑆𝑝𝑒𝑒𝑑−𝑈𝑝 𝑚
• The use of Pipelining improves the performance as com- i.e. = = 𝑛+𝑚−1
𝑛
pared to the traditional sequential execution of tasks.
lim𝑚→∞ 𝐸(𝑛)=1
In the given fig.,sequential processing is divided into three
instructions I 1, I 2 and I 3 and these instruction sets are being Which is same as throughput.
processed in serial manner in time slots of 1-4,5-8 and 9-12 Note : Practically, m and n are very large.
respectively.
While in parallel processing,instruction sets I 1, I 2 and I 3 are 6 INSTRUCTION PIPELINE :
arranged in parallel manner and processing is done in serial We discussed the performance of a pipeline in the above sec-
manner. Tasks are divided such that no 2 decode instructions tion. But,practically things are different.There is a possibility
or no 2 write instructions are performed in the same time that a instruction might delay in its execution in order to re-
slot.In parallel processing,the instruction queue has been pro- solve a pipeline hazard. This situation is referred to as a stall
cessed in 6 time slots. As such the time of processing instruc- or a bubble in pipeline execution. Let us first draw a pipeline
tions is reduced and efficiency of processor is increased. with a bubble or stall :[5]
Let us understand this pipeline now :
4 PERFORMANCE OF A PIPELINE : Here,Instruction I 2 is incurring a cache miss which requires 3
time units during instruction fetch.
In order to formulate some performance measures for the
1 2 3 4 5 6 7
goodness of a pipeline in processing a series of tasks, a space
time chart (called the Gantt's Chart) is used. The chart shows ADD R1,R2,R3 IF ID EX MEM WB

IJSER
the succession of the sub-tasks in the pipe with respect to
time.[4] SUB R4,R5,R1 IF ID SUB EX MEM WB

AND R6,R1,R7 IF ID AND EX MEM WB


5 PERFORMANCE ANALYSIS OF A PIPELINE :
A cache miss stalls all the instructions on the peipeline both
In the following analysis, we provide three performance
before and after the instruction causing the miss. This delay of
measures for the goodness of a pipeline. These are the Speed-
three units time has worsened the pipeline performance.
up S(n), Throughput U(n), and Efficiency E(n). It should be
noted that in this analysis we assume that the unit time T = t
6.1 Understanding pipeline “stall” :
units In the following analysis, we provide three performance
measures for the goodness of a pipeline. 1 Due to instruction dependency : An instruction is being
1.Speed-up S(n): Consider the execution of m tasks (instruc-
executed in a pipeline. The result of its execution is an input to
tions) using n-stages (units) pipeline. As can be seen, n + m-1
time units are required to complete m tasks. the next instruction. Therefore ,the latter instruction cannot
start its execution until it gets the input from the previous in-
Speed-up S(n)=
𝑇𝑖𝑚𝑒 𝑢𝑠𝑖𝑛𝑔 𝑆𝑒𝑞𝑢𝑒𝑛𝑡𝑖𝑎𝑙 𝑃𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔
struction. Such kind of dependency is called instruction de-
𝑇𝑖𝑚𝑒 𝑢𝑠𝑖𝑛𝑔 𝑃𝑟𝑎𝑙𝑙𝑒𝑙 𝑃𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔
pendency.[1,5]
(𝑚 ×𝑛×𝑡) 𝑚×𝑛
= =
(𝑛+𝑚−1)×𝑡 𝑚+𝑛−1

lim𝑚→∞ 𝑆(𝑛)= n (i.e., n-fold increase in speed is theoretically


possible)

For the given example : m=10,n=4,t=13


10×4 40
S(n)= = = 3.07 and maximum speed up that can be
10+4−1 13

achieved is 4 times because n=4.


2. Throughput U(n)=no.of tasks executed per unit time
2 Due to data dependency : Data dependency in pipeline oc-
𝑚
=(𝑛+𝑚−1)𝑡 curs when a source operand of instruction I i depends on the
result of executing a preceding instruction I j ,i>j .
lim𝑚→∞ 𝑈(𝑛)=1 assuming that t=1 unit time.
𝑚 10 10 The hazards discussed here involve registers.
t=1 i.e. U(n)= = = =0.8
𝑛+𝑚−1 4+10−1 13
Types Of Data Dependency with Example :[5]

IJSER © 2016
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 7, Issue 4, April-2016 1199
ISSN 2229-5518
an instruction that started three cycles earlier. Forwarding can
1 2 3 4 5 6 7 be arranged from MEM/WB latch to ALU input also. Using
those forwarding paths the code sequence can be executed
ADD R1, R2, R3 IF ID EX add MEM add WB
without stalls:

The first forwarding is for value of R1 from EX add to EX sub .


SUB R4, R5, R1 IF ID EX sub MEM WB The second forwarding is also for value
of R1 from MEMadd to EX and .
AND R6, R1, R7 IF ID EX and MEM WB This code now can be executed without stalls.

Forwarding can be generalized to include passing the result


(i)RAW (Read After Write): j tries to read a source before i directly to the functional unit that requires it: a result is for-
writes it,so j incorrectly gets the old value. warded from the output of one unit to the input of another,
rather than just from the result of a unit to the input of the
This is the most common type of hazard and can be overcome
same unit. [5],[6]
using forwarding.[7]
Example:
(ii) WAW (write after write) - j tries to write an operand be-
The key insight in forwarding is that the result is not really fore it is written by i. The writes end up being performed in
needed by SUB until after the ADD actually produces it.The the wrong order, leaving the value written by i rather than the
only problem is to make it available for SUB when it needs it. value written by j in the destination.
If the result can be moved from where the ADD produces it
(EX/MEM register), to where the SUB needs it (ALU input This hazard is present only in pipelines that write in more

IJSER
latch), then the need for a stall can be avoided. than one pipe stage or allow an instruction to proceed even
Using this observation , forwarding works as follows: when a previous instruction is stalled. The DLX integer pipe-
line writes a register only in WB and avoids this class of haz-
-The ALU result from the EX/MEM register is always fed ards.
back to the ALU input latches.
-If the forwarding hardware detects that the previous ALU WAW hazards would be possible if we made the following
operation has written the register corresponding to the source two changes to the DLX pipeline:
for the current ALU operation, control logic selects the for-
warded result as the ALU input rather than the value read ▪move write back for an ALU operation into the MEM stage,
from the register file. since the data value is available by then.
▪suppose that the data memory access took two pipe stages.

Forwarding of results to the ALU requires the additional of Here is a sequence of two instructions showing the execution
three extra inputs on each ALU multiplexer and the addtion in this revised pipeline, highlighting the pipe stage that writes
of three paths to the new inputs. the result:

The paths correspond to a forwarding of: LW R1, 0(R2) IF ID EX MEM1 MEM2 WB


(a) the ALU output at the end of EX,
(b) the ALU output at the end of MEM, and ADD R1, R2,
(c) the memory output at the end of MEM. IF ID EX WB
R3

Without forwarding our example will execute correctly with


stalls:
Unless this hazard is avoided, execution of this sequence on
this revised pipeline will leave the result of the first write (the
1 2 3 4 5 6 7 8 9 LW) in R1, rather than the result of the ADD.
ADD R1, R2, R3 IF ID EX MEM WB
SUB R4, R5, R1 IF stall stall ID sub EX MEM WB Allowing writes in different pipe stages introduces other prob-
AND R6, R1, R7 stall stall IF ID and EX MEM WB lems, since two instructions can try to write during the same
clock cycle. The DLX FP pipeline , which has both writes in
different stages and different pipeline lengths, will deal with
As our example shows, we need to forward results not only both write conflicts and WAW hazards in detail.
from the immediately previous instruction, but possibly from
IJSER © 2016
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 7, Issue 4, April-2016 1200
ISSN 2229-5518
(iii) WAR (write after read) - j tries to write a destination be- For AND instruction we can forward the result immediately
fore it is read by i , so i incorrectly gets the new value. to the ALU (EX and ) from the MEM/WB register(MEM).

This can not happen in our example pipeline because all reads OR instruction has no problem, since it receives the value
are early (in ID) and all writes are late (in WB). This hazard through the register file (ID). In clock cycle no. 5, the WB of
occurs when there are some instructions that write results ear- the LW instruction occurs "early" in first half of the cycle and
ly in the instruction pipeline, and other instructions that read a the register read of the OR instruction occurs "late" in the se-
source late in the pipeline. cond half of the cycle.

Because of the natural structure of a pipeline, which typically For SUB instruction, the forwarded result would arrive too
reads values before it writes results, such hazards are rare. late - at the end of a clock cycle, when needed at the begin-
Pipelines for complex instruction sets that support autoincre- ning.
ment addressing and require operands to be read late in the
pipeline could create a WAR hazards. The load instruction has a delay or latency that cannot be elim-
inated by forwarding alone. Instead, we need to add hard-
If we modified the DLX pipeline as in the above example and ware, called a pipeline interlock, to preserve the correct execu-
also read some operands late, such as the source value for a tion pattern. In general, a pipeline interlock detects a hazard
store instruction, a WAR hazard could occur. Here is the pipe- and stalls the pipeline until the hazard is cleared.
line timing for such a potential hazard, highlighting the stage
where the conflict occurs: The pipeline with a stall and the legal forwarding is:

1 2 3 4 5 6 7 8 9

IJSER
SW R1, 0(R2) IF ID EX MEM1 MEM2 WB
LW R1, 0(R1) IF ID EX MEM WB

ADD R2, R3,


IF ID EX WB
R4 SUB R4, R1, R5 IF ID stall EX sub MEM WB
AND R6, R1 R7 IF stall ID EX MEM WB
OR R8, R1, R9 stall IF ID EX MEM WB
If the SW reads R2 during the second half of its MEM2 stage
and the Add writes R2 during the first half of its WB stage, the
The only necessary forwarding is done for R1 from MEM
SW will incorrectly read and store the value produced by the
to EX sub .
ADD.
Notice that there is no need to forward R1
for AND instruction because now it is getting the value
(iv)RAR (read after read) - this case is not considered as a through the register file in ID (as OR above).
hazard .
There are techniques to reduce number of stalls even in this
6.2 When are the Stalls Required ? case, which we consider next.

Unfortunately,not all potential hazards can be handled by for-


6.3 Pipeline Scheduling :
warding.
Consider
following
the
Generate DLX code that avoids pipeline stalls for the follow-
sequence of
instructions: [5,6]
1 2 3 4 5 6 7 8
ing sequence of statements:[5]

LW R1, 0(R1) IF ID EX MEM WB


a=b+c;
d=a-f;
SUB R4, R1, R5 IF ID EX sub MEM WB e=g-h;
AND R6, R1 R7 IF ID EX and MEM WB
Assume that all variables are 32-bit integers.Wherever neces-
sary,explicitly explain the actions that are needed to avoid
OR R8, R1, R9 IF ID EX MEM WB
pipeline stalls in your scheduled code.

Solution:
1.The DLX assembly code for the given sequence of statements
The LW instruction does not have the data until the end of is:
clock cycle 4 (MEM) , while the SUB instruction needs to have
the data by the beginning of that clock cycle (EX sub ).

IJSER © 2016
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 7, Issue 4, April-2016 1201
ISSN 2229-5518

Rd read in
SW Rd, d IF ID EX M WB second half
of ID;

Rg read in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Sub Re, second half
IF ID EX M WB
Rg, Rh of ID;
LW Rb, b IF ID EX M WB Rh forwarded
LW Rc, c IF ID EX M WB
Add Ra,Rb,
IF ID stall EX M WB
Rc
SW Ra, a IF stall ID EX M WB
SW Re, e IF ID EX M WB Re forwarded

LW Rf, f stall IF ID EX M WB

Sub Rd, Ra,


IF ID stall EX M WB
Rf
SW Rd, d IF stall ID EX M WB The same color is used to outline the source and destination of
LW Rg, g stall IF ID EX M WB forwarding.
LW Rh, h IF ID EX M WB
The blue color is used to indicate the technique to perform the
Sub Re, Rg,
Rh
IF ID stall EX M WB
register file reads in the second half of a cycle, and the writes
SW Re, e IF stall ID EX M WB in the first half.

Running this code segment will need some forwarding. But Note: Notice that the use of different registers for the first, se-
instructions LW and ALU(Add or Sub), when put in sequence, cond and third statements was critical for this schedule to be
are generating hazards for the pipeline that can not be re- legal! In general, pipeline scheduling can increase the register
solved by forwarding. So the pipeline will stall. Observe that count required.

IJSER
in time steps 4, 5, and 6, there are two forwards from the Data
memory unit to the ALU in the EX stage of the Add instruc- 7 CONCLUSION :
tion. So also the case in time steps 13, 14, and 15. The hard-
ware to implement this forwarding will need two Load
In this paper, the basic principles involved in designing pipe-
Memory Data registers to store the output of data memory.
line architectures were considered.Our coverage started with a
Note that for the SW instructions, the register value is needed
discussion on a number of metrics that can be used to assess
at the input of Data memory. The better solution with compil-
the goodness of a pipeline.We then moved to present a general
er assist is given below.
discussion on the main problems that need to be considered in
designing a pipelined architecture.
Rather then just allow the pipeline to stall, the compiler could
try to schedule the pipeline to avoid these stalls by rearrang-
– In particular two main problems are considered :- Instruc-
ing the code sequence to eliminate the hazards.[7]
tion and data dependency.

2.Suggested version is (the problem has actually more than References :


one solution) : [1] Book : Computer Organization by Hamacher.
[2] Parallelism and pipelining by David G. Messer-
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Explanation
schmitt,University Of California.
LW Rb, b IF ID EX M WB
[3] Pipelining Design Techniques by Mostafa Abd-El-
Barr & Hesham El-Rewini
LW Rc, c IF ID EX M WB
[4] Project Management Graphics by Edward Tufte:
LW Rf, f IF ID EX M WB
B.S. and M.S. in statistics, Stanford University,
Rb read in 1964. Ph.D. in political science, Yale University,
Add Ra, second half
Rb, Rc
IF ID EX M WB
of ID; 1968
Rc forwarded
SW Ra, a IF ID EX M WB Ra forwarded [5] Website : http://www.cs.iastate.edu/
[6] Website :
Sub Rd,
IF ID EX M WB
Rf read
second
in
half
https://www.cs.uaf.edu/2011/fall/cs441/lecture/
Ra, Rf
of ID; 09_20_pipelining.html ;lecture by Dr. Lawlor
[7] Website :
Ra forwarded
https://courses.engr.illinois.edu/cs232/sp2010/lec
LW Rg, g
LW Rh, h
IF ID
IF
EX M
ID EX M
WB
WB
tures/L13.pdf

IJSER © 2016
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 7, Issue 4, April-2016 1202
ISSN 2229-5518

IJSER

IJSER © 2016
http://www.ijser.org

You might also like