Module 4 - Parallel & Pipeline Processing - Final
Module 4 - Parallel & Pipeline Processing - Final
Page 1
Introduction to parallel processing concepts
What is Parallel Processing ?
• Large class of techniques used to provide simultaneous data processing tasks for the purpose of increasing
the computational speed of a computer system.
• Instead of processing each instructions sequentially as in a conventional computer, a Parallel Processing
system is able to perform concurrent data processing to achieve faster execution time.
• The system may have two or more ALUs and be able to execute two or more instructions at the same time.
• The system may have two or more processors operating concurrently.
Page 2
Purpose of Parallel Processing :
To speed up the computer processing capability.
To increase its throughput
Page 3
Flynn's Classification of Computers
• M.J. Flynn proposed a classification for the organization of a computer system by the number of instructions and
data items that are manipulated simultaneously.
• The operations performed on the data in the processor constitute a data stream.
• Parallel processing may occur in the instruction stream, in the data stream, or both.
Flynn's classification divides computers into four major groups that are:
1.Single instruction stream, single data stream (SISD)
2.Single instruction stream, multiple data stream (SIMD)
3.Multiple instruction stream, single data stream (MISD)
4.Multiple instruction stream, multiple data stream (MIMD)
Page 4
Single instruction stream, single data stream (SISD)
Page 5
Single instruction stream, multiple data stream (SIMD)
Page 6
Multiple instruction stream, single data stream (MISD)
Page 7
Multiple instruction stream, multiple data stream (MIMD)
Page 8
Instruction Level Parallelism
• Instruction Level Parallelism (ILP) is used to refer to the architecture in which multiple operations
can be performed parallelly in a particular process, with its own set of resources – address space,
registers, identifiers, state, program counters.
• It refers to the compiler design techniques and processors designed to execute operations, like
memory load and store, integer addition, float multiplication, in parallel to improve the performance
of the processors.
• Examples of architectures that exploit ILP are VLIWs, Superscalar Architecture.
• A typical ILP allows multiple-cycle operations to be pipelined.
• ILP is a measure of how many of the operations in a computer program can be performed
simultaneously
Page 9
Classification
• The classification of ILP architectures can be done in the following ways –
• Sequential Architecture :
• Here, program is not expected to explicitly convey any information
regarding parallelism to hardware,
• Dependence Architectures :
• Here, program explicitly mentions information regarding dependencies
between operations like dataflow architecture.
• Independence Architecture :
• Here, program gives information regarding which operations are
independent of each other so that they can be executed in stead of the
‘nop’s.
Page 10
Pipeline processing
Page 11
Pipeline processing
• Pipelining is the process of arrangement of hardware elements of CPU such
that its overall performance is increased.
• Simultaneous execution of more than one instruction tasks place in
pipelined process.
• In pipelining multiple instructions are overlapped in execution.
• General structure of n segment pipeline
Page 12
Example
Page 13
Pipeline stages
Page 14
Pipeline stages- Three Stage Pipeline
Page 15
pipeline stages
Page 16
• The first instruction gets completed in 5 clock cycle.
• After the completion of first instruction, in every new clock cycle, a new instruction completes its execution.
• Observe that when the Instruction fetch operation of the first instruction is completed in the next clock cycle
the instruction fetch of second instruction gets started.
• This way the hardware never sits idle it is always busy in performing some or other operation.
• But, no two instructions can execute their same stage at the same clock cycle.
Page 17
Advantages of Pipelining
• Advantages of Pipelining
• Instruction throughput increases.
• Increase in the number of pipeline stages increases the number of instructions executed simultaneously.
• Faster ALU can be designed when pipelining is used.
• Pipelining increases the overall performance of the CPU.
• Disadvantages of Pipelining
• Designing of the pipelined processor is complex.
• The throughput of a pipelined processor is difficult to predict.
Page 18
Instruction pipelining
• Pipeline processing can occur not only in the data stream but in the instruction
stream as well.
• Most of the digital computers with complex instructions require instruction
pipeline to carry out operations like fetch, decode and execute instructions.
• In general, the computer needs to process each instruction with the following
sequence of steps.
1. Fetch instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
Page 19
Instruction pipelining
Page 20
Four-segment instruction pipeline
• Segment 1:
• The instruction fetch segment can be
implemented using first in, first out (FIFO)
buffer.
• Segment 2:
• The instruction fetched from memory is
decoded in the second segment, and
eventually, the effective address is
calculated in a separate arithmetic circuit.
• Segment 3:
• An operand from memory is fetched in the
third segment.
• Segment 4:
• The instructions are finally executed in the
last segment of the pipeline organization. Page 21
Types of Pipelining
• Arithmetic Pipelining
• Instruction Pipelining
• Processor Pipelining
• Unifunction Vs. Multifunction Pipelining
• Static vs Dynamic Pipelining
• Scalar vs Vector Pipelining
Page 22
Advantages of Pipelining
• The cycle time of the processor is decreased. It can improve the instruction
throughput. Pipelining doesn't lower the time it takes to do an instruction.
Rather than, it can raise the multiple instructions that can be processed
together ("at once") and lower the delay between completed instructions
(known as 'throughput').
• If pipelining is used, the CPU Arithmetic logic unit can be designed quicker,
but more complex.
• Pipelining increases execution over an un-pipelined core by an element of
the multiple stages (considering the clock frequency also increases by a
similar factor) and the code is optimal for pipeline execution.
• Pipelined CPUs frequently work at a higher clock frequency than the RAM
clock frequency, (as of 2008 technologies, RAMs operate at a low
frequency correlated to CPUs frequencies) increasing the computer’s global
implementation.
Page 23
Types of Pipelining
• Arithmetic Pipelining
• Instruction Pipelining
• Processor Pipelining
• Unifunction Vs. Multifunction Pipelining
• Static vs Dynamic Pipelining
• Scalar vs Vector Pipelining
Page 24
Pipeline Hazards
Pipeline Hazards
In the pipeline system, some situations prevent the next instruction from performing the planned task on a
particular clock cycle due to some problems.
“Pipeline hazards are the situations that prevent the next instruction from being executing during its designated
clock cycle." These hazards create a problem named as stall cycles.
Page 25
Structural Hazard/ Resource conflict
• This type of Hazard occurs when two different Inputs try to use the same resource simultaneously.
• These hazards are caused by access to memory by two instructions at the same time. These conflicts can
be slightly resolved by using separate instruction and data memories.
• Structural hazards occur when the processor's hardware is not capable of executing all the
instructions in the pipeline simultaneously.
• Structural hazards within a single pipeline are rare modern processors because the instruction set
architecture is designed to support pipelining.
Page 26
Structural Hazard/ Resource conflict continue
• During clock cycle, I1 is fetching operand (OF) and no other instruction can access memory during cycle 3 and
same with I2.
• Instruction 3 (I3) is delay by 2 cycle as it cannot fetch instruction as memory is being access by other
instruction.
• Thus resource dependency can detoriate overall performance of pipeline execution.
• Above problem can be solved by using separate instruction and data memory.
Page 27
Data Hazard/ data Dependency
• Instruction 1 cycle is processed where it is fetched, decoded, operand fetch, execution and write back of
instruction takes place.
• When instruction two is processed from i+ 1, then it is fetched, decoded but we cannot fetch the operand
because the value of R2 and R3 is stored in R1, and that updated value is used in the next instruction operand.
• So in i + 1, we cannot fetch the operand because the R1 value is not updated. Therefore, we have to delay the
second instruction's operation fetch till the write back instruction of the first instruction is completed, and this
situation is called a Hazard.
• The instruction R1 result is required as an input for the next instruction R1 value, it means value of R1 in second
instruction depends on the resulting value of instruction R1and this dependency is called as Data
Dependency, and because of this data dependency, two Stall cycles have been created by the pipeline in
executing the instructions.
Page 28
here are three situations in which a data hazard can occur:
1.read after write (RAW), a true dependency
2.write after read (WAR), an anti-dependency
3.write after write (WAW), an output dependency
Page 29
Branch hazards
• Branch instructions, particularly conditional branch instructions, create data dependencies between
the branch instruction and the previous instruction, fetch stage of the pipeline.
• Since the branch instruction computes the address of the next instruction that the instruction fetch
stage should fetch from, it consumes some time and also some time is required to flush the pipeline
and fetch instructions from target location.
• This time wasted is called as branch penalty.
Page 30
Example:
MOV RO,77H
MOV R1, 73H
ADD RO,R1
JC NEXT
Page 31