0% found this document useful (0 votes)

138 views32 pages

Parallel Processing

This document discusses parallel processing and pipelining. It begins with an introduction to parallel processing and Flynn's classification of computer architectures. It then discusses different types of pipelining including general pipelining, arithmetic pipelining, and instruction pipelining. For each type of pipelining, it describes the pipeline stages and how instructions or operations move through the pipeline to achieve parallel execution. It also discusses concepts like pipeline speedup and factors that can limit achieving maximum theoretical speedup.

Uploaded by

Mannan Bansal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

138 views32 pages

Parallel Processing

Uploaded by

Mannan Bansal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Parallel processing

Agenda
• Introduction to parallel processing
• Flynn’s classification
• Pipelining
– General Pipeline
– Arithmetic pipeline
– Instruction pipeline
• Instruction level parallelism
Parallel Processing
• Execution of Concurrent Events in the
computing process to achieve faster
computational speed
• Levels of Parallel Processing
– Job or Program level
– Task or Procedure level
– Inter-Instruction level
– Intra-Instruction level
Parallel Computers
• Flynn's classification: Based on the multiplicity
of Instruction Streams and Data Streams
– Instruction Stream: Sequence of Instructions read
from memory
– Data Stream: Operations performed on the data in
the processor Number of Data Streams
Single Multiple

Number of Single SISD SIMD

Instruction
Streams Multiple MISD MIMD
SISD COMPUTER SYSTEMS
Control Processor Data stream
Memory
Unit Unit

Instruction stream

• Characteristics
– Standard von Neumann machine
– Instructions and data are stored in memory
– One operation at a time

• Limitations
– Limitation on Memory Bandwidth
– Memory is shared by CPU and I/O
MISD COMPUTER SYSTEMS

M CU P

M CU P Memory
• •
• •
• •

M CU P Data stream

Instruction stream

• There is no computer at present that can be

classified as MISD
SIMD COMPUTER SYSTEMS
Data bus Memory

Control Unit
Instruction stream

P P • • • P Processor units
Data stream
Alignment network

M M ••• M Memory modules

• Characteristics
– Only one copy of the program exists
– A single controller executes one instruction at a time
MIMD COMPUTER SYSTEMS
P M P M ••• P M

Interconnection Network

Shared Memory
• Characteristics
– Multiple processing units
– Execution of multiple instructions on multiple data
• Types of MIMD computer systems
– Shared memory multiprocessors
– Message-passing multicomputers
Pipelining
• A technique of decomposing a sequential
process into sub-operations, with each sub
process being executed in a partial dedicated
segment that operates concurrently with all
other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2

Multiplier

Segment 2

R3 R4

Adder
Segment 3

R1  Ai, R2  Bi Load Ai and Bi

R3  R1 * R2, R4  Ci Multiply and load Ci
R5  R3 + R4 Add
OPERATIONS IN EACH PIPELINE
STAGE

Clock Segment 1 Segment 2 Segment 3

Pulse
Number R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7
GENERAL PIPELINE
• General Structure of a 4-Segment Pipeline
– Any operation that can be decomposed into a
sequence of sub operations of about same
complexity can be implemented by a pipeline
processor.
– A task is defined as the total operation performed
going through all the segments in pipeline (Ti)
General Pipeline
Clock

Input S 1 R1 S2 R2 S 3 R3 S 4 R4

• State Space Diagram

1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6

2 T1 T2 T3 T4 T5 T6

3 T1 T2 T3 T4 T5 T6

4 T1 T2 T3 T4 T5 T6
PIPELINE SPEEDUP
n: Number of tasks to be performed

Conventional Machine (Non-Pipelined)

– t: time to complete one task
– t1: Time required to complete the n tasks
– t1 = n * t

Pipelined Machine (k stages)

– tp: Clock cycle (time to complete each suboperation)
– tk: Time required to complete the n tasks
– tk = (k + n - 1) * tp
PIPELINE SPEEDUP
• Speedup : The speedup of a pipeline processing over an
equivalent non-pipelining processing is defined by the ratio
– Sk: Speedup
Sk = n*t / (k + n - 1)*tp
• As the number of tasks increases, n becomes much larger
than k - 1, and k + n - 1 approaches the value of n.
t
lim Sk = t ( = k, if t = k * tp )
n p

• Thus, theoretical maximum speedup of the pipeline can be k,

where k is number of segments in the pipeline.
Hurdle to maximum speedup
• There are various reasons why the pipeline
cannot operate at its maximum theoretical rate.
• Different segments may take different times to
complete their sub-operation.
• The clock cycle must be chosen
– equal the time delay of the segment with the
maximum propagation time.
– This causes all other segments to waste time while
waiting for the next clock.
PIPELINE SPEEDUP: Example
• Example
– 4-stage pipeline
– suboperation in each stage; tp = 20nS
– 100 tasks to be executed
– 1 task in non-pipelined system; 20*4 = 80nS
• Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS
• Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS
• Speedup
Sk = 8000 / 2060 = 3.88
Types of Pipeline
• Two areas of computer design where the
pipeline organization is applicable.
– An arithmetic pipeline divides an arithmetic
operation into sub-operations for execution in the
pipeline segments.
– An instruction pipeline operates on a stream of
instructions by overlapping the fetch, decode, and
execute phases of the instruction cycle.
ARITHMETIC PIPELINE
• used to implement floating-point operations,
multiplication of fixed-point numbers, and similar
computations encountered in scientific problems.
• Eg. Floating-point adder-subtractor:
– Let X and Y be two floating point numbers
• X = A x 2a, Y = B x 2b
– Let 4-stage pipeline with below segments is used
1. Compare the exponents
2. Align the mantissa
3. Add/sub the mantissa
4. Normalize the result
Exponents Mantissas
• Ex: a b A B
– X = 0.9504 x 103
– Y = 0.8200 x 102 R R
– Z = X+Y i.e.
Z = 0.10324 x 104 Compare Difference
• Suppose that the time delays of Segment 1: exponents
by subtraction
the four segments are t1 = 60
ns, t2 = 70 ns, t3 = 100 ns, t4 =
R
80 ns, and the interface
registers have a delay of tr = 10
ns. Segment 2: Choose exponent Align mantissa
• The clock cycle is chosen to be
R
t p= t3 + tr = 110 ns.
• An equivalent non-pipeline Add or subtract
floating point adder-subtractor Segment 3: mantissas
will have a delay time tn = t1 +
t2 + t3+ t4 + 4*tr = 350 ns. R R
• In this case the pipelined adder
has a speedup of 350/110 = 3.1 Adjust Normalize
over the non-pipelined adder. Segment 4: exponent result

R R
INSTRUCTION CYCLE
• Six Phases* in an Instruction Cycle
1. Fetch an instruction from memory
2. Decode the instruction
3. Calculate the effective address of the operand
4. Fetch the operands from memory
5. Execute the operation
6. Store the result in the proper place

* Some instructions skip some phases

* Effective address calculation can be done in the part of the decoding
phase
* Storage of the operation result into a register is done automatically
in the execution phase
INSTRUCTION CYCLE
• 4-Stage Pipeline
1. FI: Fetch an instruction from memory
2. DA: Decode the instruction and calculate the
effective address of the operand
3. FO: Fetch the operand
4. EX: Execute the operation
It is assumed that the processor has separate
instruction and data memories so that the
operation in Fl and FO can proceed at the same
time.
Instruction Execution In a 4-stage Pipeline
Segment1: Fetch instruction
from memory

Decode instruction
Segment2: and calculate
effective address

yes Branch?
no
Segment3: Fetch operand
from memory

Segment4: Execute instruction

Interrupt yes
Interrupt?
handling
no
Update PC

Empty pipe
Instruction Execution In a 4-stage Pipeline

Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX

6 FI DA FO EX
7 FI DA FO EX
Major Hazards In Pipelined Execution
• 3 categories
1. Resource Hazard
2. Data Dependency Hazard
3. Branching Hazard
Resource Hazard
• Hardware Resources required by the
instructions in simultaneous overlapped
execution cannot be met
Eg. Fetch instruction and fetch operands for
two different instructions at same time.
• Solution is to use two different memory buses
to fetch instruction and data respectively
Data Dependency Hazard
• Occurs when the execution of an instruction
depends on the result of previous instruction
• Eg. ADD R1, R2, R3
SUB R4, R1, R5
• Data hazard can be dealt with either hardware
techniques or software techniques.
Data Dependency Solution
Hardware Technique:
• Interlock
– hardware detects the data dependencies and delays
the scheduling of the dependent instruction by
stalling enough clock cycles
• Forwarding (bypassing, short-circuiting)
– Accomplished by a data path that routes a value from
a source (usually an ALU) to a user, bypassing a
designated register. This allows the value to be
produced to be used at an earlier stage in the pipeline
than would otherwise be possible
Data Dependency Solution
Software Technique:
• Delayed Load:
– give the responsibility for solving data conflicts
problems to the compiler that translates the high-
level programming language into a machine language
program.
– The compiler for such computers is designed to detect
a data conflict and reorder the instructions as
necessary to delay the loading of the conflicting data
by inserting delayed load no-operation instructions.
– This method is referred to as delayed load.
Branching Hazards
• Branch Instructions
– Branch target address is not known until the branch
instruction is completed
Branch FI DA FO EX
Instruction
Next FI DA FO EX
Instruction

Target address available

• Dealing with Branching hazards
– Pre-fetch target instruction
– Branch Target Buffer / Loop Buffer
– Branch Prediction
– Delayed Branch
Control Hazards
• Pre-fetch target instruction :
– prefetch the target instruction in addition to the instruction following the
branch. Both are saved until the branch is executed.
– If the branch condition is successful, the pipeline continues from the branch
target instruction.
– An extension of this procedure is to continue fetching instructions from both
places until the branch decision is made. At that time control chooses the
instruction stream of the correct program flow.
• Branch Target Buffer(BTB; Associative Memory)
– Entry: Address of previously executed branches; (Address of) Target
instruction and the next few instructions
– When fetching an instruction, search BTB.
– If found, fetch the instruction stream in BTB;
– If not, new stream is fetched and update BTB
Control Hazards
• Loop Buffer (small very high speed register file maintained by IF segment)
– When a program loop is detected in the program, it is stored in the loop buffer
in its entirety, including all branches.
– The program loop can be executed directly without having to access memory
until the loop mode is removed by the final branching out
• Branch Prediction
– Guessing the branch condition, and fetch an instruction stream based on the
guess. Correct guess eliminates the branch penalty.
• Delayed Branch
– Compiler detects the branch and rearranges the instruction sequence by
inserting no-operation instructions that keep the pipeline busy in the presence
of a branch instruction.
– This causes the computer to fetch the target instruction during the execution
of the no-operation instruction, allowing a continuous flow of the pipeline.

Pipelining and Vector Processing Chapter 9
100% (6)
Pipelining and Vector Processing Chapter 9
29 pages
BCA Semester II Computer Organisation and Architecture (COA
No ratings yet
BCA Semester II Computer Organisation and Architecture (COA
24 pages
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
No ratings yet
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
36 pages
Pipeline & Parallel Processing
No ratings yet
Pipeline & Parallel Processing
19 pages
module-3-chapter-2
No ratings yet
module-3-chapter-2
40 pages
32 Hazards in Pipeline 06-04-2023
No ratings yet
32 Hazards in Pipeline 06-04-2023
24 pages
Pipelining and Vector Processing: - Parallel
No ratings yet
Pipelining and Vector Processing: - Parallel
37 pages
Unit-4-Pipeline and Vector Processing
No ratings yet
Unit-4-Pipeline and Vector Processing
45 pages
chapter9pipelining-200907163859
No ratings yet
chapter9pipelining-200907163859
13 pages
Psychology Assignment: Introspection Method
100% (1)
Psychology Assignment: Introspection Method
11 pages
Psychology Assignment: Introspection Method
100% (1)
Psychology Assignment: Introspection Method
11 pages
Pipelining 2
No ratings yet
Pipelining 2
43 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
18 pages
Pipelining
No ratings yet
Pipelining
33 pages
3.2 Pipeline Processing
No ratings yet
3.2 Pipeline Processing
18 pages
Pipelining Vector Processing
No ratings yet
Pipelining Vector Processing
27 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
FINAL Presentation
No ratings yet
FINAL Presentation
31 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
39 pages
Lecture 10
No ratings yet
Lecture 10
23 pages
COA DR MVN 5 UNIT - Latest PDF
No ratings yet
COA DR MVN 5 UNIT - Latest PDF
24 pages
Pipeline
No ratings yet
Pipeline
33 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Observation Method
100% (4)
Observation Method
13 pages
Module 3-Part 2 (1).pptx
No ratings yet
Module 3-Part 2 (1).pptx
50 pages
Unit-6 Pipelining
No ratings yet
Unit-6 Pipelining
63 pages
6. Pipeline -3117 (1)
No ratings yet
6. Pipeline -3117 (1)
22 pages
Esp
No ratings yet
Esp
96 pages
5. Pipeline -3117
No ratings yet
5. Pipeline -3117
21 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Ca Unit 2.2
100% (2)
Ca Unit 2.2
22 pages
Unit 5
No ratings yet
Unit 5
51 pages
Pipeline Processing Coa
No ratings yet
Pipeline Processing Coa
34 pages
Pipe Lining
No ratings yet
Pipe Lining
7 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
UNIT-5: Pipeline and Vector Processing
No ratings yet
UNIT-5: Pipeline and Vector Processing
63 pages
Mos Integrated Circuit: V25+ 16/8-Bit Single-Chip Microcontroller
No ratings yet
Mos Integrated Circuit: V25+ 16/8-Bit Single-Chip Microcontroller
80 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
28 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
csso-U-5
No ratings yet
csso-U-5
29 pages
Pipeline and Vector Processing
83% (12)
Pipeline and Vector Processing
37 pages
CH-1 1 Pipelining
No ratings yet
CH-1 1 Pipelining
43 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
Module 5
No ratings yet
Module 5
16 pages
Unit-5-Parallel Processing
No ratings yet
Unit-5-Parallel Processing
11 pages
Unit-7-n
No ratings yet
Unit-7-n
13 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
Chapter 5 - CO - BIM - III
No ratings yet
Chapter 5 - CO - BIM - III
7 pages
Unit 4 - P 2
No ratings yet
Unit 4 - P 2
13 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Unit 3 (Pipelining)
No ratings yet
Unit 3 (Pipelining)
33 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
Coa Unit 4
No ratings yet
Coa Unit 4
10 pages
Unit-4 Pipelinie and Vector Processing
No ratings yet
Unit-4 Pipelinie and Vector Processing
33 pages
COA M3 BIT (1)
No ratings yet
COA M3 BIT (1)
4 pages
Cpe 203
No ratings yet
Cpe 203
18 pages
Microprocessor and Microcontroller Part 1
No ratings yet
Microprocessor and Microcontroller Part 1
45 pages
RISC and CISC, Parallel Processing
No ratings yet
RISC and CISC, Parallel Processing
23 pages
Microcomp Theory Tec-1
No ratings yet
Microcomp Theory Tec-1
25 pages
ACA Unit 8 Hardware and Software For VLIW and EPIC Notes - Unit 8
No ratings yet
ACA Unit 8 Hardware and Software For VLIW and EPIC Notes - Unit 8
35 pages
Combinepdf
No ratings yet
Combinepdf
14 pages
32-Bit Constants: - Most Constants Are Small - For The Occasional 32-Bit Constant Lui RT, Constant
No ratings yet
32-Bit Constants: - Most Constants Are Small - For The Occasional 32-Bit Constant Lui RT, Constant
10 pages
WINSEM2024-25_BECE204L_TH_VL2024250504045_2024-12-14_Reference-Material-I
No ratings yet
WINSEM2024-25_BECE204L_TH_VL2024250504045_2024-12-14_Reference-Material-I
32 pages
Comp Architecture Chapter 4 - Pipelining
No ratings yet
Comp Architecture Chapter 4 - Pipelining
53 pages
Project Report of RISC-V CPU
No ratings yet
Project Report of RISC-V CPU
6 pages
Harvard Versus Princetonesd - Lecture9
No ratings yet
Harvard Versus Princetonesd - Lecture9
8 pages
Make Mousey The Junkbot
No ratings yet
Make Mousey The Junkbot
16 pages
8085 Multiplication
No ratings yet
8085 Multiplication
9 pages
8086 Instruction Set
100% (1)
8086 Instruction Set
92 pages
Computer Organization-Basic Processing Unit
100% (2)
Computer Organization-Basic Processing Unit
48 pages
CRISC Research
No ratings yet
CRISC Research
13 pages
5.PIC instruction set introduction
No ratings yet
5.PIC instruction set introduction
7 pages
CSO Lecture Notes Unit - 5
No ratings yet
CSO Lecture Notes Unit - 5
11 pages
Architecture of 8085 Microprocessor
No ratings yet
Architecture of 8085 Microprocessor
4 pages
5.Pipeline and Multiprocessors
No ratings yet
5.Pipeline and Multiprocessors
16 pages
Quiz5 Solutions
No ratings yet
Quiz5 Solutions
2 pages
CS 2252 - Microprocessors and Microcontrollers PDF
No ratings yet
CS 2252 - Microprocessors and Microcontrollers PDF
3 pages
Chapter 3 - Pipelining-And-Vector-Processing
100% (1)
Chapter 3 - Pipelining-And-Vector-Processing
29 pages
Branching Instructions
No ratings yet
Branching Instructions
4 pages
COA Lecture 18 19
No ratings yet
COA Lecture 18 19
9 pages
Assignment Questions
No ratings yet
Assignment Questions
3 pages
Subject: Poor Services Provided by Shivalik Hostel Mess
No ratings yet
Subject: Poor Services Provided by Shivalik Hostel Mess
1 page
Observation Method: Assignment
No ratings yet
Observation Method: Assignment
1 page
Difference Between Microprocessor and Microcontroller
No ratings yet
Difference Between Microprocessor and Microcontroller
2 pages
EC1301 Micro Processor
No ratings yet
EC1301 Micro Processor
3 pages
381 Mips Processor Design Project
No ratings yet
381 Mips Processor Design Project
2 pages
Risc Properties
No ratings yet
Risc Properties
2 pages
Intel Core i5-12400 2.5 GHz 6-Core LGA 1700 Processor - Multitech Computers Lebanon
No ratings yet
Intel Core i5-12400 2.5 GHz 6-Core LGA 1700 Processor - Multitech Computers Lebanon
1 page
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Comptia Server+ Primer
From Everand
Comptia Server+ Primer
John Greene
5/5 (1)

Parallel Processing

Uploaded by

Parallel Processing

Uploaded by

Parallel processing

Number of Single SISD SIMD

• There is no computer at present that can be

M M ••• M Memory modules

R1  Ai, R2  Bi Load Ai and Bi

Clock Segment 1 Segment 2 Segment 3

• State Space Diagram

Conventional Machine (Non-Pipelined)

Pipelined Machine (k stages)

• Thus, theoretical maximum speedup of the pipeline can be k,

* Some instructions skip some phases

Segment4: Execute instruction

Target address available

You might also like