100% found this document useful (1 vote)

240 views16 pages

5.pipeline and Multiprocessors

Uploaded by

citecollege301

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

240 views16 pages

5.pipeline and Multiprocessors

Uploaded by

citecollege301

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Unit-5 Pipeline and Multiprocessors

5.1 Parallel Processing

Parallel processing is a method in computing in which separate parts of an overall
complex task are broken up and run simultaneously on multiple CPUs, thereby
reducing the amount of time for processing.
Any system that has more than one CPU can perform parallel processing, as well as
multi-core processors which are commonly found on computers today. Multi-core
processors are IC chips that contain two or more processors for better performance,
reduced power consumption and more efficient processing of multiple tasks.
 Pipelining
Pipelining is the process of accumulating instruction from the processor through a
pipeline. It allows storing and executing instructions in an orderly process. It is also
known as pipeline processing.
Pipelining is a technique where multiple instructions are overlapped during
execution. Pipeline is divided into stages and these stages are connected with one
another to form a pipe like structure. Instructions enter from one end and exit from
another end.

Fig: Pipelining System

In pipeline system, each segment consists of an input register followed by a

combinational circuit. The register is used to hold data and combinational circuit
performs operations on it. The output of combinational circuit is applied to the input
register of the next segment.
There are two types of pipeline i.e. Arithmetic pipeline and Instruction Pipeline.
 Arithmetic Pipeline
An arithmetic pipeline divides an arithmetic problem into various sub problems for
execution in various pipeline segments. It is used for floating point operations,
multiplication and various other computations. The process or flowchart arithmetic
pipeline for floating point addition is shown in the diagram.

Floating point addition using arithmetic pipeline(Sub-Operation)

1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result
First of all, the two exponents are compared and the larger of two exponents is
chosen as the result exponent. The difference in the exponents then decides how
many times we must shift the smaller exponent to the right. Then after shifting of
exponent, both the mantissas get aligned. Finally, the addition of both numbers take
place followed by normalization of the result in the last segment.

Example: Let us consider two number,

X=0.3214*10^3 and Y=0.4500*10^2
Explanation: First of all, the two exponents are subtracted to give 3-
2=1. Thus 3 becomes the exponent of result and the smaller exponent is
shifted 1 times to the right to give

Y=0.0450*10^3

Finally, the two numbers are added to produce

Z=0.3664*10^3
As the result is already normalized the result remains the same.

 Instruction pipeline
In this a stream of instructions can be executed by overlapping fetch, decode and
execute phases of an instruction cycle. This type of technique is used to increase the
throughput of the computer system. An instruction pipeline reads instruction from
the memory while previous instructions are being executed in other segments of the
pipeline. Thus we can execute multiple instructions simultaneously. The pipeline
will be more efficient if the instruction cycle is divided into segments of equal
duration.

In the most general case computer needs to process each instruction in following
sequence of steps:

1) Fetch the instruction from memory (FI)

2) Decode the instruction (DA)
3) Calculate the effective address
4) Fetch the operands from memory (FO)
5) Execute the instruction (EX)
6) Store the result in the proper place

The flowchart for instruction pipeline is shown below.

Example:
Here the instruction is fetched on first clock cycle in segment 1.
Now it is decoded in next clock cycle, then operands are fetched and finally the
instruction is executed. We can see that here the fetch and decode phase overlap due
to pipelining. By the time the first instruction is being decoded, next instruction is
fetched by the pipeline.

In case of third instruction we see that it is a branched instruction. Here when it is

being decoded 4th instruction is fetched simultaneously. But as it is a branched
instruction it may point to some other instruction when it is decoded. Thus fourth
instruction is kept on hold until the branched instruction is executed. When it gets
executed then the fourth instruction is copied back and the other phases continue as
usual.
5.2 Pipeline Example
 Four Segment Instruction Pipeline

Fig: Four Segment Instruction Pipeline

In Instruction Pipeline Each step is executed in a particular segment, and there
are times when different segments may take different times to operate on the
incoming information. Moreover, there are times when two or more segments
may require memory access at the same time, causing one segment to wait until
another is finished with the memory.

The organization of an instruction pipeline will be more efficient if the instruction

cycle is divided into segments of equal duration. One of the most common
examples of this type of organization is a Four-segment instruction pipeline.

A four-segment instruction pipeline combines two or more different segments

and makes it as a single one. For instance, the decoding of the instruction can be
combined with the calculation of the effective address into one segment.

The following block diagram shows a typical example of a four-segment

instruction pipeline. The instruction cycle is completed in four segments.

 Segment 1: The instruction fetch segment can be implemented using

first in, first out (FIFO) buffer.
 Segment 2: The instruction fetched from memory is decoded in the
second segment, and eventually, the effective address is calculated in a
separate arithmetic circuit.
 Segment 3: An operand from memory is fetched in the third segment.
 Segment 4: The instructions are finally executed in the last segment of
the pipeline organization.
 Data Dependency
A position in which an instruction is dependent on a result from a sequentially
earlier instruction before it can be done its execution. In high-performance
processors operating pipeline or superscalar techniques, a data dependency will
learn an interruption in the flowing services of a processor pipeline or prevent the
parallel issue of instructions in a superscalar processor.

Consider two instructions ik and ii of the same program, where ik precedes ii. If
ik and ii have a common register or memory operand, they are data-dependent on
each other, except when the common operand is used in both instructions as a
source operand.

An example is when ii uses the result of ik as a source operand. In sequential

execution data, dependencies do not generate any issue, because instructions are
implemented rigidly in the stated sequence.
Data dependency can appear either in ‘straight-line code’ between subsequent
instructions or in a loop between instructions belonging to subsequent iterations
of a loop as shown in the figure.

Therefore, by ‘straight-line code’ it can define any code sequence, even instructions
of a loop body that does not involve instructions from subsequent loop iterations.
Straight-line code can include three different types of dependencies, known as RAW
(Read after Write), WAR (Write after Read), and WAW (Write after Write)
dependencies.

5.3 RISC Pipeline

RISC stands for Reduced Instruction Set Computers. It was introduced to execute
as fast as one instruction per clock cycle. This RISC pipeline helps to simplify the
computer architecture’s design.
 Three Segment Instruction
Three segment Instruction pipeline consists following Segment.
 I: Instruction fetch
 A: ALU Operation
 E: Execute Instruction
Consider now the operation of the following four instruction

1. LOAD: Load M[address1] to R1

2. LOAD: Load M[address2] to R2
3. ADD: Add R1 and R2 then sum is R3
4. STORE: Store R3 to M[address3]

Then pipeline timing with data conflict is

Clock Cycles 1 2 3 4 5 6
1) Load R1 I A E
2) Load R2 I A E
3) Add R1+R2 I A E
4) STORE R3 I A E

Pipeline timing with delayed load is

Clock Cycles 1 2 3 4 5 6
1. Load R1 I A E
2. Load R2 I A E
3. No Operation I A E
4. Add R1+R2 I A E
5. STORE R3 I A E

 Delayed Load
A similar sort of tactic, called the delayed load, can be used on LOAD instructions.
On LOAD instructions, the register that is to be target of the load is locked by the
processor. The processor then continuous execution of the instruction stream until it
reaches an instruction requiring that register at which point it idles until the load is
complete. If the compiler can rearrange instructions so that useful work can be done
while the load is in the pipeline.

 Delayed Branch
When branches are processed by a pipeline simply, after each taken branch, at least
one cycle remains unutilized. This is because of the assembly line-like apathy of
pipelining. Instruction slots following branches are known as branch delay slots.
Delay slots can also appear following load instructions; these are defined load delay
slots. Branch delay slots are wasted during traditional execution. However, when
delayed branching is employed, these slots can be at least partly used.

In the figure, it can transfer the add instruction of our program segment that initially
preceded the branch into the branch delay slot. With delayed branching, the
processor implements the add instruction first, but the branch will only be efficient
later. Thus, in this example, delayed branching keep the initial execution sequence
–

add r1, r2, r3;

b anywhere;
anywhere: sub
It defines an unconditional branch. Conditional branches cause the same or higher
delays during an easy pipelined execution. This is because of the additionally needed
operation of checking the particular condition.

Accordingly, instruction in the delay slot of an untaken branch will always be

executed. Branching to the target instruction (sub) is executed with one pipeline
cycle of delay. This cycle is used to execute the instruction in the delay slot (add).
Thus delayed branching results in the following execution sequence –
a, add
b, b
c, sub
5.4 Multiprocessors

A Multiprocessor is a computer system with two or more central processing units

(CPUs) share full access to a common RAM. The main objective of using a
multiprocessor is to boost the system’s execution speed, with other objectives being
fault tolerance and application matching.

 Characteristics of Multiprocessors
A multiprocessor is a single computer that has multiple processors. It is possible that
the processors in the multiprocessor system can communicate and cooperate at
various levels of solving a given problem. The communications between the
processors take place by sending messages from one processor to another, or by
sharing a common memory.

Characteristics of Multiprocessors

 Parallel Computing: This involves the simultaneous application of multiple

processors. These processors are developed using a single architecture to
execute a common task. In general, processors are identical and they work
together in such a way that the users are under the impression that they are the
only users of the system. In reality, however, many users are accessing the
system at a given time.
 Distributed Computing: This involves the usage of a network of processors.
Each processor in this network can be considered as a computer in its own
right and have the capability to solve a problem. These processors are
heterogeneous, and generally, one task is allocated to a single processor.
 Supercomputing: This involves the usage of the fastest machines to resolve
big and computationally complex problems. In the past, supercomputing
machines were vector computers but at present, vector or parallel computing
is accepted by most people.
 Pipelining: This is a method wherein a specific task is divided into several
subtasks that must be performed in a sequence. The functional units help in
performing each subtask. The units are attached serially and all the units work
simultaneously.
 Vector Computing: It involves the usage of vector processors, wherein
operations such as ‘multiplication’ are divided into many steps and are then
applied to a stream of operands (“vectors”).
 Systolic: This is similar to pipelining, but units are not arranged in a linear
order. The steps in systolic are normally small and more in number and
performed in a lockstep manner. This is more frequently applied in special-
purpose hardware such as image or signal processors.

 Interconnection Structure:
The processors must be able to share a set of main memory modules & I/O devices
in a multiprocessor system. This sharing capability can be provided through
interconnection structures. The interconnection structure that are commonly used
can be given as follows:

1) Time Shared Common bus

A common-bus multiprocessor system consists of a number of processors connected
through a common path to a memory unit. A time-shared common bus for five
processors. Only one processor can communicate with the memory or another
processor at any given time. Transfer operations are conducted by the processor that
is in control of the bus at the time. Any other processor wishing to initiate a transfer
must first determine the availability status of the bus, and only after the bus becomes
available can the processor address the destination unit to initiate the transfer.
A single common-bus system is restricted to one transfer at a time. This means that
when one processor is communicating with the memory, all other processors are
either busy with internal operations or must be idle waiting for the bus. As a
consequence, the total overall transfer rate within the system is limited by the speed
of the single path. The processors in the system can be kept busy more often through
the implementation of two or more independent buses to permit multiple
simultaneous bus transfers. However, this increases the system cost and complexity.

2) Multiport Memory
Multiport memory system employs separate buses between each memory module
and each CPU. This is shown in figure below for four CPUs and four memory
modules (MMs). Each processor bus is connected to each memory module. A
processor bus consists of the address, data, and control lines required to
communicate with memory. The memory module is said to have four ports and each
port accommodates one of the buses. The module must have internal control logic
to determine which port will have access to memory at any given time. Memory
access conflicts are resolved by assigning fixed priorities to each memory port. The
priority for memory access associated with each processor may be established by the
physical port position that its bus occupies in each module. Thus CPU1 will have
priority over CPU2, CPU2 will have priority over CPU3, and CPU4 will have the
lowest priority. The advantage of the multiport memory organization is the high
transfer rate that can be achieved because of the multiple paths between processors
and memory. The disadvantage is that it requires expensive memory control logic
and a large number of cables and connectors. As a consequence, this interconnection
structure is usually appropriate for systems with a small number of processors.
3) Crossbar Switch
The crossbar switch organization consists of a number of cross points that are placed
at intersections between processor buses and memory module paths. Figure below
shows a crossbar switch interconnection between four CPUs and four memory
modules. The small square in each cross point is a switch that determines the path
from a processor to a memory module. Each switch point has control logic to set up
the transfer path between a processor and memory. It examines the address that is
placed in the bus to determine whether its particular module is being addressed. It
also resolves multiple requests for access to the same memory module on a
predetermined priority basis.

4) Multistage Switching Network

 The 2×2 crossbar switch is used in the multistage network. It has 2 inputs
(A & B) and 2 outputs (0 & 1). To establish the connection between the
input & output terminals, the control inputs CA & CB are associated.

 The input is connected to 0 output if the control input is 0 & the input is
connected to 1 output if the control input is 1. This switch can arbitrate
between conflicting requests. Only 1 will be connected if both A & B require
the same output terminal, the other will be blocked/ rejected.
 We can construct a multistage network using 2×2 switches, in order to control
the communication between a number of sources & destinations. Creating a
binary tree of cross-bar switches accomplishes the connections to connect the
input to one of the 8 possible destinations.
Fig: 2*2 Crossbar Switch

Fig: 1 to 8-way switch using 2*2 Switch

 In the above diagram, PA & PB are 2 processors, and they are

connected to 8 memory modules in a binary way from 000(0) to
111(7) through switches. Three levels are there from a source to
a destination. To choose output in a level, one bit is assigned to
each of the 3 levels. There are 3 bits in the destination number:
1st bit determines the output of the switch in 1st level, 2nd bit in
2nd level & 3rd bit in the 3rd level.
 Example: If the source is: PB & the destination is memory
module 011 (as in the figure): A path is formed from PB to 0
output in 1st level, output 1 in 2nd level & output 1 in 3rd level.
 Usually, the processor acts as the source and the memory unit
acts as a destination in a tightly coupled system. The destination
is a memory module. But, processing units act as both, the source
and the destination in a loosely coupled system.
 Many patterns can be made using 2×2 switches such as Omega
networks, Butterfly Network, etc.

5) Hypercube interconnection
Hypercube (or Binary n-cube multiprocessor) structure represents a loosely coupled
system made up of N=2n processors interconnected in an n-dimensional binary cube.
Each processor makes a made of the cube. Each processor makes a node of the cube.
Therefore, it is customary to refer to each node as containing a processor, in effect
it has not only a CPU but also local memory and I/O interface. Each processor has
direct communication paths to n other neighbor processors. These paths correspond
to the cube edges.

There are 2 distinct n-bit binary addresses which can be assigned to the processors.
Each processor address differs from that of each of its n neighbors by exactly one-
bit position.

 Hypercube structure for n= 1, 2 and 3.

 A one cube structure contains n = 1 and 2n = 2.
 It has two processors interconnected by a single path.
 A two-cube structure contains n=2 and 2n=4.
 It has four nodes interconnected as a cube.
 An n-cube structure contains 2n nodes with a processor residing in each
node.

Each node is assigned a binary address in such a manner, that the addresses of two
neighbors differ in exactly one-bit position. For example, the three neighbors of the
node with address 100 are 000, 110, and 101 in a three-cube structure. Each of these
binary numbers differs from address 100 by one-bit value.
Routing messages through an n-cube structure may take from one to n links from a
source node to a destination node.

Example:
In a three-cube structure, node 000 may communicate with 011 (from 000 to 010 to
011 or from 000 to 001 to 011). It should cross at least three links to communicate
from node 000 to node 111. A routing procedure is designed by determining the
exclusive-OR of the source node address with the destination node address. The
resulting binary value will have 1 bits corresponding to the axes on which the two
nodes differ. Then, message is transmitted along any one of the exes.

For example, a message at node 010 going to node 001 produces an exclusive-OR
of the two addresses equal to 011 in a three-cube structure. The message can be
transmitted along the second axis to node 000 and then through the third axis to node
001.

PPT-Unit-4 CPU Scheduling and Algorithms
No ratings yet
PPT-Unit-4 CPU Scheduling and Algorithms
56 pages
CSC3C03-Problem Solving Using C
No ratings yet
CSC3C03-Problem Solving Using C
98 pages
Unit-4 USB - SCI - PCI Bus
100% (1)
Unit-4 USB - SCI - PCI Bus
4 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
Data Structure PPT
No ratings yet
Data Structure PPT
21 pages
Instruction Format
100% (1)
Instruction Format
33 pages
Input/Output Organization in Computer Organisation and Architecture
100% (1)
Input/Output Organization in Computer Organisation and Architecture
99 pages
8086 Full
100% (4)
8086 Full
72 pages
Written Arguments Consumer
No ratings yet
Written Arguments Consumer
3 pages
Addressing Modes
100% (1)
Addressing Modes
26 pages
Module 5 COA Solutions
100% (1)
Module 5 COA Solutions
22 pages
Computer Architecture 3rd Edition by Moris Mano CH 12
No ratings yet
Computer Architecture 3rd Edition by Moris Mano CH 12
21 pages
Unit 3 Control Unit: Computer Architecture
No ratings yet
Unit 3 Control Unit: Computer Architecture
12 pages
Coa Unit-4 Notes
No ratings yet
Coa Unit-4 Notes
44 pages
Memory Organization
No ratings yet
Memory Organization
99 pages
Unit 3 - Peripheral Interfacing
No ratings yet
Unit 3 - Peripheral Interfacing
56 pages
Module 5
No ratings yet
Module 5
19 pages
Module 1 PDF
100% (1)
Module 1 PDF
33 pages
UNIT 4 Microprogrammed Control Unit
No ratings yet
UNIT 4 Microprogrammed Control Unit
32 pages
Stack and SUBROUTINES Bindu Agarwalla
No ratings yet
Stack and SUBROUTINES Bindu Agarwalla
15 pages
Question Bank of Computer Network
No ratings yet
Question Bank of Computer Network
1 page
Cache Memory: Computer Architecture Unit-1
No ratings yet
Cache Memory: Computer Architecture Unit-1
54 pages
Register Transfer and Micro Operations
No ratings yet
Register Transfer and Micro Operations
49 pages
William Stallings Computer Organization and Architecture 8 Edition
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition
55 pages
Critical Section Problem
No ratings yet
Critical Section Problem
14 pages
Caterpillar Model
100% (1)
Caterpillar Model
109 pages
Hardwired and Microprogrammed
No ratings yet
Hardwired and Microprogrammed
45 pages
Basic Operational Concepts
No ratings yet
Basic Operational Concepts
29 pages
Memory Reference Instructions Execution
100% (1)
Memory Reference Instructions Execution
13 pages
CD Expt 3 Implementation of A Lexical Analyzer Using Lex Tool
No ratings yet
CD Expt 3 Implementation of A Lexical Analyzer Using Lex Tool
6 pages
Computer Networks Chapter 3 Transport Layer - Part I Notes
No ratings yet
Computer Networks Chapter 3 Transport Layer - Part I Notes
7 pages
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
No ratings yet
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
55 pages
File Allocation Methods
No ratings yet
File Allocation Methods
9 pages
Booth'S Algorithm: (Signed & Unsigned)
No ratings yet
Booth'S Algorithm: (Signed & Unsigned)
18 pages
Usart 8251 Sharing
No ratings yet
Usart 8251 Sharing
14 pages
Chapter 4.1 Introduction To Assembly Language
No ratings yet
Chapter 4.1 Introduction To Assembly Language
46 pages
Module 4
No ratings yet
Module 4
35 pages
KTU - CST202: I/O Organization - T M S
No ratings yet
KTU - CST202: I/O Organization - T M S
34 pages
Module 1
No ratings yet
Module 1
79 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Sets: Addressing Modes and Formats
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Sets: Addressing Modes and Formats
47 pages
Arithmetic Processor: 10-2 Addition and Subtraction
No ratings yet
Arithmetic Processor: 10-2 Addition and Subtraction
9 pages
COA Unit 1
No ratings yet
COA Unit 1
33 pages
Instruction Set, Addressing Modes, Assembler Directives
No ratings yet
Instruction Set, Addressing Modes, Assembler Directives
9 pages
4thsem Microprocessor Notes PDF
No ratings yet
4thsem Microprocessor Notes PDF
148 pages
Multiplication and Division Instructions
No ratings yet
Multiplication and Division Instructions
31 pages
Comp Architecture Chapter 4 - Pipelining
No ratings yet
Comp Architecture Chapter 4 - Pipelining
53 pages
Module 5 Notes - COA
No ratings yet
Module 5 Notes - COA
6 pages
System Software Cs2304 Notes
No ratings yet
System Software Cs2304 Notes
100 pages
Unit-1 Introduction To Microprocessor Architecture PDF
No ratings yet
Unit-1 Introduction To Microprocessor Architecture PDF
15 pages
Computer Organisation Asynchronous Bus
No ratings yet
Computer Organisation Asynchronous Bus
10 pages
Unit I Introduction To 8085 Microprocessor
No ratings yet
Unit I Introduction To 8085 Microprocessor
55 pages
Instruction Cycle
No ratings yet
Instruction Cycle
4 pages
Unit 5 - Pipeling and Multipoessors
No ratings yet
Unit 5 - Pipeling and Multipoessors
74 pages
Comp 11
No ratings yet
Comp 11
13 pages
COA Unit - V Notes
No ratings yet
COA Unit - V Notes
21 pages
The 8086 Microprocessor Supports 8 Types of Instructions
No ratings yet
The 8086 Microprocessor Supports 8 Types of Instructions
6 pages
IEC 61850-Introduction-Sv
No ratings yet
IEC 61850-Introduction-Sv
32 pages
Unit - 2 Central Processing Unit TOPIC 1: General Register Organization
No ratings yet
Unit - 2 Central Processing Unit TOPIC 1: General Register Organization
13 pages
UNIT - 5 Pipeling Concept
No ratings yet
UNIT - 5 Pipeling Concept
15 pages
Pipeline & Parallel Processing
No ratings yet
Pipeline & Parallel Processing
19 pages
8086 Notes
No ratings yet
8086 Notes
43 pages
Coa Unit 4
No ratings yet
Coa Unit 4
10 pages
PIpeline Processing and Multi Processing
No ratings yet
PIpeline Processing and Multi Processing
16 pages
Endorsement of Higher Qualification-New
0% (1)
Endorsement of Higher Qualification-New
2 pages
Woodmizer LT15 Parts
No ratings yet
Woodmizer LT15 Parts
39 pages
Corporate Training
No ratings yet
Corporate Training
11 pages
Build A Simple Webservice With Delphi 2006 and Microsoft Server 2003 IIS 6.0
No ratings yet
Build A Simple Webservice With Delphi 2006 and Microsoft Server 2003 IIS 6.0
7 pages
Unit 2
No ratings yet
Unit 2
18 pages
Nikuni KTMcatalogue A4
No ratings yet
Nikuni KTMcatalogue A4
4 pages
Communication Superiority4
No ratings yet
Communication Superiority4
9 pages
Sl. No
No ratings yet
Sl. No
5 pages
Appendix C - Machine Language: Code Operand Description
No ratings yet
Appendix C - Machine Language: Code Operand Description
1 page
Dissertation Knowledge Management PDF
100% (2)
Dissertation Knowledge Management PDF
7 pages
Starzplay Dec Data
No ratings yet
Starzplay Dec Data
423 pages
Adm Full Notes
No ratings yet
Adm Full Notes
74 pages
Normalizer Free Networks
No ratings yet
Normalizer Free Networks
22 pages
QuickGuide 2018
No ratings yet
QuickGuide 2018
7 pages
Lecture 2 - Problem Solving Process
No ratings yet
Lecture 2 - Problem Solving Process
32 pages
Yoga Pavan Resume
No ratings yet
Yoga Pavan Resume
2 pages
Fake Snapchat Chat Generator
No ratings yet
Fake Snapchat Chat Generator
1 page
The Business of Intellectual Property A Literature Review of IP Management Research
No ratings yet
The Business of Intellectual Property A Literature Review of IP Management Research
20 pages
PanduitProductDetails UTP28SP2MBU
No ratings yet
PanduitProductDetails UTP28SP2MBU
2 pages
Chapter Two and Exception Handling
No ratings yet
Chapter Two and Exception Handling
6 pages
Computational Fluid Dynamic Analysis of Innovative Design of Solar-Biomass Hybrid Dryer
No ratings yet
Computational Fluid Dynamic Analysis of Innovative Design of Solar-Biomass Hybrid Dryer
12 pages
Project Documentation: File: Examen - Project Date: 16/06/2021 Profile: Codesys V3.5 Sp17
No ratings yet
Project Documentation: File: Examen - Project Date: 16/06/2021 Profile: Codesys V3.5 Sp17
9 pages
Exp22 Excel Ch04 CumulativeAssessment Variation Rockville Auto Sales Instructions
No ratings yet
Exp22 Excel Ch04 CumulativeAssessment Variation Rockville Auto Sales Instructions
2 pages
Aimcat 1803 Exp Review
No ratings yet
Aimcat 1803 Exp Review
2 pages
15kw - SN College - SLD
No ratings yet
15kw - SN College - SLD
1 page
Clearance Propeller PS
No ratings yet
Clearance Propeller PS
1 page
Application-Specific Integrated Circuit ASIC A Complete Guide
From Everand
Application-Specific Integrated Circuit ASIC A Complete Guide
Gerardus Blokdyk
No ratings yet

5.pipeline and Multiprocessors

Uploaded by

5.pipeline and Multiprocessors

Uploaded by

Unit-5 Pipeline and Multiprocessors

5.1 Parallel Processing

Fig: Pipelining System

In pipeline system, each segment consists of an input register followed by a

Floating point addition using arithmetic pipeline(Sub-Operation)

Example: Let us consider two number,

Finally, the two numbers are added to produce

1) Fetch the instruction from memory (FI)

The flowchart for instruction pipeline is shown below.

In case of third instruction we see that it is a branched instruction. Here when it is

Fig: Four Segment Instruction Pipeline

The organization of an instruction pipeline will be more efficient if the instruction

A four-segment instruction pipeline combines two or more different segments

The following block diagram shows a typical example of a four-segment

 Segment 1: The instruction fetch segment can be implemented using

An example is when ii uses the result of ik as a source operand. In sequential

5.3 RISC Pipeline

1. LOAD: Load M[address1] to R1

Then pipeline timing with data conflict is

Pipeline timing with delayed load is

add r1, r2, r3;

Accordingly, instruction in the delay slot of an untaken branch will always be

A Multiprocessor is a computer system with two or more central processing units

 Parallel Computing: This involves the simultaneous application of multiple

1) Time Shared Common bus

4) Multistage Switching Network

Fig: 1 to 8-way switch using 2*2 Switch

 In the above diagram, PA & PB are 2 processors, and they are

 Hypercube structure for n= 1, 2 and 3.

You might also like