0% found this document useful (0 votes)

54 views45 pages

Unit-4-Pipeline and Vector Processing

Uploaded by

Rishi Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views45 pages

Unit-4-Pipeline and Vector Processing

Uploaded by

Rishi Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Unit 5:Pipeline and

Vector
Reference Processing
: Chapter 9 from Computer System Architecture by Morris Mano

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Parallel Processing
• Why?
• To increase computational speed.
• To achieve faster execution time.

• How to achieve?
• Concurrent Data Processing.
• Multiprocessor System.
• Parallel processing can be viewed from various
level of complexity
• Lowest level: Parallel and serial operation by the type
of register used.
• Higher level: multiple functional units

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Multifunctional Units
• Used to establish Parallel Processing.
• All units are independent of each other
• A multifunctional organization is usually
associated with a complex control unit to
coordinate all the activities among the various
components.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Flynn’s Classification
• Considers the organization of a computer system by
the number of instructions and data items that are
manipulated simultaneously.
• The instructions read from the memory constitutes an
instruction stream.
• The operation performed on the data in the processor
constitutes a data stream.

1. SISD (Single Instruction stream, Single Data stream)

2. SIMD (Single Instruction stream, Multiple Data stream)
3. MISD (Multiple Instruction stream, Single Data stream)
4. MIMD(Multiple Instruction stream, Multiple Data
stream)

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Continue…
• SISD: Instructions are executed sequentially.
• Parallel Processing in this case can be achieved by:
Multiple function units or Pipeline processing.

• SIMD: many processing units.

• All processor receive same instruction from control
unit but operate on different data.

• MIMD: a computer system capable of processing

several programs at the same time.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Continue…
• Flynn’s classification depends on the distinction
between the performance of the control unit and
the data processing unit.

• One type of parallel processing that does not fit

Flynn’s classification is pipelining.

• In this we consider parallel processing under:

1. Pipeline processing: arithmetic sub operation or
instruction phase overlap.
2. Vector Processing: large vectors and matrices
3. Array Processing: large arrays of data
© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT
Pipelining
• It is a technique of decomposing a sequential process into sub
operations, with each subprocess being executed in a special dedicated
segment that operates concurrently with all other segments.

• Like an industrial assembly line.

• Example: Ai * Bi + Ci for i=1,2,3,…,7

Suboperations in each segment of pipeline are :
R1  Ai , R2  Bi
R3  R1 * R2 , R4 Ci
R5  R3 + R4

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Content of register in Pipeline
example
Clock Segment 1 Segment 2 Segment 3
Pulse
Number R1 R2 R3 R4 R5

1 A1 B1 - - -
2 A2 B2 A1 * B1 C1 -
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 - - A7 * B7 C7 A6 * B6 + C6
9 - - - - A7 * B7 + C7

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Four segment Pipeline

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Space-time Diagram

Task: Total operation performed going through all

the segments in the pipeline

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Speedup

• Pipeline Unit:
• k segment pipeline with clock cycle time tp to complete n tasks.
• Clock cycles required to complete n task= k + (n - 1)
• Time to complete n task= k*tp + (n - 1)*tp = (k+n-1) * tp
• Non Pipeline Unit:
• tn to complete each task
• Time to complete n task= n * tn

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Continue…
• As the number of task increases, n becomes much larger than k – 1,
and k+n-1 approaches the value of n.
• In that condition
• If we have tn= k*tp , than = k (Maximum speed that pipeline can
provide

• Example: n= 100, tp=20ns, k=4, tn= 80ns

Speedup=8000 / 2060 =3.88
• Reason for not getting maximum speedup:
• Different segments may take different times

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Multiple functional unit in
Parallel

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Pipeline is applicable in
• Arithmetic pipeline:
• Divide arithmetic operation in suboperation.
• Instruction Pipeline:
• Overlapping the phase of instructions.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Arithmetic Pipeline
• Usually found in very high speed computers.
• Used to implement floating point operations, multiplication of fixed
point numbers and similar computation encountered in scientific
problems.
• Floating point operations are easily decomposed into suboperations.
• Floating point addition and subtraction can be decomposed in:
1. Compare the exponent.
2. Align the mantisaas.
3. Add or subtract the mantisaas.
4. Normalize the result.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Example
• X= 0.9504 * 103 + Y=0.8200 * 102
1. Compare the exponent:
T1= 60ns • 3–2=1
• So larger exponent 3 is chosen as the exponent of the result.
2. Align the mantisa:
T2= 70ns • Y=0.0820 * 103
3. Add to mantisaas:
T3= 100ns • R= 1.0324 * 103
4. Normalize the Result:
T4= 80ns • R= 0.10324 * 104

TR= 10ns
Speedup=Tn / Tp = 320 /110
© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT
Instruction Pipeline
• An instruction pipeline reads consecutive instructions from memory while
previous instructions are being executed in other segment.
• The instruction fetch segment can be implemented by FIFO buffer.
• Whenever execution unit is not using memory, the control increments the
program counter and read the next instruction.
• Reduce average access time to memory for reading instructions.
• Instruction phases:
1. Fetch the instruction from memory
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT
Reduce the Phases
• A register mode instruction does not need an
effective address calculation.

• Two or more segment require memory access at

the same time, causing one segment to wait until
another is finished with the memory.

• Memory conflicts can be resolved by using two

memory buses for accessing instructions and
data in separate modules.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Four segment Instruction
pipeline
1. FI is the segment that fetches an instruction.

2. DA is the segment that decodes the instruction

and calculates the effective address.

3. FO is the segment that fetches the operand.

4. EX is the segment that executes the instruction.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Continue…

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Timing of instruction Pipeline

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Instruction Pipeline conflicts
• Causes the instruction pipeline to deviate from
its normal operation.
1. Resource conflicts:
• Caused by access to memory by two segments at the
same time. Most of these conflicts can be resolved by
using separate instruction and data memories.
2. Data dependency:
• Conflicts arise when an instruction depends on the
result of a previous instruction, but this result is not
yet available.
3. Branch difficulties:
• Arise from branch and other instructions that
changes the value of PC.
© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT
Data Dependency
• Occurs when an instruction needs data that are not yet available.
• An instruction in FO segment may need to fetch an operand that is
being generated at the same time by the previous instruction in
segment EX. Therefore, second instruction must wait.
• Address dependency may occur when an operand address cannot be
calculated.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Solution to Data dependency
1. Hardware Interlocks:
• Interlock is a circuit that detects instructions
whose source operands are destinations of
instructions farther up in pipeline.
• Instruction whose source is not available to be
delayed by enough clock cycles to resolve the
conflict.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Solution to Data dependency
2.Operand forwarding:
• Detect a conflict and then avoid it by routing the
data through special paths between pipeline
segments.
• Instead of transferring ALU results into
destination register, the hardware checks the
destination operand and if it is needed as a
source in the next instruction, it passes the result
directly into the ALU input, bypassing the
register file.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Solution to Data dependency
3.Delayed Load:
• The compiler for such computers is designed to
detect a data conflict and reorder the instructions
as necessary to delay the loading of conflicting
data by inserting no-operation instructions.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Branch Difficulties
• Branch instruction can be conditional or
unconditional.
• Breaks the normal sequence of instruction
stream, causing difficulties in the operation of the
instruction pipeline.

Handling of Branch Instruction
1. Prefetch target instruction:
• Prefetch the target instruction in addition to the
instruction following the branch. Both are saved until
branch is executed.
2. Branch target buffer(BTB):
• Associative memory included in the fetch segment.
• Each entry in BTB consists of the address of a
previously executed branch instruction and the target
instruction for that branch.
• It also stores the next few instructions after the
branch target instruction.

Handling of Branch Instruction
3. Loop Buffer:
• Variation of BTB.
• It is small and very high speed register file.
• When a program loop is detected in the program, it
is stored in the loop buffer in its entirety, including all
branches.
• The loop mode is removed by the final branching
out.

Handling of Branch Instruction
4. Branch prediction:
• Use some additional logic to guess the outcome of a
conditional branch instruction before it is executed.
• The pipeline then begins prefetching the instruction
stream from the predicted path.
• A correct prediction eliminates the wasted time
caused by branch penalties.

Handling of Branch Instruction
5. Delayed branch:
• Employed in most of the RISC processors.
• Compiler detects the branch instructions and
rearranges the machine language code sequence by
inserting useful instructions that keep the pipeline
operating without interruptions.

RISC Pipeline
• The simplicity of the instruction can be utilized to implement an
instruction pipeline using a small number of sub operation, which
executes in one clock cycle.
• Decode operation can occur at the same time, due to fixed length
instruction format.
• There is no need to calculate effective address or fetching operands
from memory
• Instruction pipeline can be implemented with 2 or 3 segment:
• Fetch instruction
• Executed the instruction in ALU
• Store the result in destination register

RISC Pipeline
• Data transfer instruction in RISC are limited to load and store
instructions which use register indirect mode.
• To resolve memory conflicts between fetch instruction and load-store
an operand, most RISC machines use two separate buses with two
memories.
• Advantage:
• Ability to execute instructions at the rate of one per clock cycle.
• Support given by the compiler which detects and minimize the delays
encounter due to data conflicts and branch penalities.

Example: Three-segment
Instruction pipeline
I: Instruction Fetch
A: ALU operation
E: Execute instruction

Now, consider following four instructions

1. LOAD: R1<- M[ADD1]
2. LOAD: R2<- M[ADD2]
3. ADD: R3<- R1 + R2
4. STORE: M[ADD3] <- R3

Delayed Load

Advantage: Data dependency is taken care of by the compiler rather than the hardware.

Delayed Branch
• The RISC processor rely on the compiler to redefine the branches so
that they take effect at the proper time in the pipeline which is
referred to as delayed branch.
• Compiler analyze the instructions before and after the branch and
rearrange the program sequence by inserting useful instructions in
the delay steps.

Delayed Branch

Vector Processing
• There is a class of computational problems that are beyond the
capabilities of a conventional computers.
• Science and Engineering Applications: (the problems can be formulated in terms of
vectors and matrices)
• Long-range weather forecasting, Petroleum explorations, Seismic data
analysis, Medical diagnosis, Aerodynamics and space flight simulations,
Artificial intelligence and expert systems, Mapping the human genome, Image
processing

Vector Operations
• Arithmetic operations on large arrays of numbers which are floating-
point number.
• V = [v1, v2, v3… vn]
• Conventional system is capable of processing one operand at a time.

Continue…
• A Vector processing eliminated overhead associated with the time it
takes to fetch and execute the instructions in the program loop.
• Can be specified with single vector instruction of the form
C(1:100) = A(1:100) + B(1:100)
• Includes: initial address of the operands, the length of the vectors, and the
operation to be performed

• Matrix multiplication is one of the most computational intensive

operations performed in computers with vector processors.

Memory Interleaving
• Simultaneous memory access to memory from
two or more source.
• Arithmetic pipeline usually requires two or
more operands to enter the pipeline at the
same time.
• The memory can be partitioned in number of
modules instead of using separate memory bus.
• Memory module is one kind of memory array.
• The two least significant bits of the address can • In interleaved memory, different sets
be used to distinguish between the four of addresses are assigned to different
modules. memory modules.
• The advantage is that it allows the use of a
• The vector processor that uses n-way
technique called interleaving.
interleaved memory can fetch n
operands from n different modules.
© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT
Supercomputer
• Supercomputer = Vector Instructions + Pipelined floating-point
arithmetic operations
• Components are tightly coupled
• Multiple function units, each has its own pipeline configuration
• Performance Evaluation Index
• MIPS : Million Instruction Per Second
• FLOPS : Floating-point Operation Per Second
• megaflops : 106, gigaflops : 109

Cray-1
• Developed in 1976
• 12 distinct functional units
• 150+ registers in memory
• 80 megaflops
• Memory
• capacity of 4 billion 64bits word
• divided in 16 banks
• transfer rate is 320 million words per second

Array Processor
• Performs computations on large arrays of data
• Attached array processor : Auxiliary processor attached to a general purpose
computer
• SIMD array processor : Computer with multiple processing units operating in
parallel

Objective is to provide vector manipulation

Digital Fundamentals & Computer Architecture
No ratings yet
Digital Fundamentals & Computer Architecture
110 pages
Unit 4 COA
No ratings yet
Unit 4 COA
19 pages
Unit-6 Pipelining
No ratings yet
Unit-6 Pipelining
63 pages
Pipeline - 3117
No ratings yet
Pipeline - 3117
22 pages
Pipelining and Vector Processing Chapter 9
100% (6)
Pipelining and Vector Processing Chapter 9
29 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
94 pages
Riscv Design
No ratings yet
Riscv Design
82 pages
Unit III - Basic Processing Unit
No ratings yet
Unit III - Basic Processing Unit
123 pages
Survey and Comparison of Pipeline of Some RISC and CISC System Architectures
No ratings yet
Survey and Comparison of Pipeline of Some RISC and CISC System Architectures
6 pages
Pipeline - 3117
No ratings yet
Pipeline - 3117
21 pages
Unit 3-2 COA
No ratings yet
Unit 3-2 COA
58 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
CSC 307 - Computer System Architecture-Elizade 20182019new
No ratings yet
CSC 307 - Computer System Architecture-Elizade 20182019new
162 pages
Module 5
No ratings yet
Module 5
16 pages
Unit 5-2 COA
No ratings yet
Unit 5-2 COA
52 pages
MCA R S1 S2 - Compressed - Compressed Min 1 1
No ratings yet
MCA R S1 S2 - Compressed - Compressed Min 1 1
159 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Coa Unit 5
No ratings yet
Coa Unit 5
71 pages
UNIT-5: Pipeline and Vector Processing
No ratings yet
UNIT-5: Pipeline and Vector Processing
63 pages
Detailed Notes On Data Hazards, Structural Hazards, and Control Hazards
No ratings yet
Detailed Notes On Data Hazards, Structural Hazards, and Control Hazards
1 page
Lectures On Pipeline and Vector Processing: Unit 6
No ratings yet
Lectures On Pipeline and Vector Processing: Unit 6
27 pages
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
No ratings yet
Unit - V: Pipeline & Vector Processing and Multi Processors Pipeline and Vector Processing: Multiprocessors
20 pages
Ec8552 - Cao MCQ
No ratings yet
Ec8552 - Cao MCQ
27 pages
Lect28-Pipeline 15012019
No ratings yet
Lect28-Pipeline 15012019
36 pages
Ripes A Visual Computer Architecture Simulator
100% (1)
Ripes A Visual Computer Architecture Simulator
8 pages
Unit 6 Coa
No ratings yet
Unit 6 Coa
20 pages
Computer Network - Lab Manuals
No ratings yet
Computer Network - Lab Manuals
29 pages
Gayathri Report Content
No ratings yet
Gayathri Report Content
24 pages
Pipe Lining
No ratings yet
Pipe Lining
7 pages
COAL Assignment (Y86 Processor Architecture)
100% (1)
COAL Assignment (Y86 Processor Architecture)
32 pages
Csso U 5
No ratings yet
Csso U 5
29 pages
10 - Processor Structure and Function
No ratings yet
10 - Processor Structure and Function
45 pages
Cco Unit 5
No ratings yet
Cco Unit 5
41 pages
CS 3351 Digital Principles and Computer Organization
No ratings yet
CS 3351 Digital Principles and Computer Organization
31 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Unit 5
No ratings yet
Unit 5
51 pages
Unit-4-Pipeline and Vector Processing
No ratings yet
Unit-4-Pipeline and Vector Processing
45 pages
Coa M3 Bit
No ratings yet
Coa M3 Bit
4 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Co Unit3
No ratings yet
Co Unit3
41 pages
Pipelining and Vector Processing-1-30
No ratings yet
Pipelining and Vector Processing-1-30
30 pages
Problem Set 4 Sol
No ratings yet
Problem Set 4 Sol
14 pages
Module 3 Chapter 2
No ratings yet
Module 3 Chapter 2
40 pages
Unit 6 - Pipeline, Vector Processing and Multiprocessors
No ratings yet
Unit 6 - Pipeline, Vector Processing and Multiprocessors
23 pages
05 - 01 Pipeline and Vector Processing
No ratings yet
05 - 01 Pipeline and Vector Processing
14 pages
Pipeline & Parallel Processing
No ratings yet
Pipeline & Parallel Processing
19 pages
Stud CSA Mod 5p2 Arithmetic SuperPipeline
No ratings yet
Stud CSA Mod 5p2 Arithmetic SuperPipeline
57 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
Presentation 5156 Content Document 20250301102853AM
No ratings yet
Presentation 5156 Content Document 20250301102853AM
40 pages
Chap. 9 Pipeline and Vector Processing
0% (1)
Chap. 9 Pipeline and Vector Processing
12 pages
Lesson Plan LP - CS6303 LP Rev. No: 00 Date: 20/06/2014 Page: 01 of 06 Sub Code: CS6303 Sub Name: Unit: I Branch: Be (Cse) Semester: Iii
No ratings yet
Lesson Plan LP - CS6303 LP Rev. No: 00 Date: 20/06/2014 Page: 01 of 06 Sub Code: CS6303 Sub Name: Unit: I Branch: Be (Cse) Semester: Iii
6 pages
Chapter 3
No ratings yet
Chapter 3
59 pages
Homework1 PDF
No ratings yet
Homework1 PDF
4 pages
05 Risc V Pipeline
No ratings yet
05 Risc V Pipeline
31 pages
FINAL Presentation
No ratings yet
FINAL Presentation
31 pages
Vectors
No ratings yet
Vectors
52 pages
What Are The Basic Components in A Microprocessor
No ratings yet
What Are The Basic Components in A Microprocessor
5 pages
Digital Principles and Computer Organization - CS3351 - Important Questions With Answer - Unit 4 - Processor
No ratings yet
Digital Principles and Computer Organization - CS3351 - Important Questions With Answer - Unit 4 - Processor
13 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
Pipelining 2
No ratings yet
Pipelining 2
43 pages
COAU5
No ratings yet
COAU5
31 pages
Pipeline Processing Coa
No ratings yet
Pipeline Processing Coa
34 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
Pipeline 1
No ratings yet
Pipeline 1
6 pages
Pipelining
No ratings yet
Pipelining
33 pages
CO Module 5 Notes
No ratings yet
CO Module 5 Notes
16 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
28 pages
Chapter 5 - CO - BIM - III
No ratings yet
Chapter 5 - CO - BIM - III
7 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
Chapter 10: Pipeline: Objectives
No ratings yet
Chapter 10: Pipeline: Objectives
8 pages
Pipelining Vector Processing
No ratings yet
Pipelining Vector Processing
27 pages
Unit - 6 Pipeline & Vector Processing: 2140707 Computer Organization
No ratings yet
Unit - 6 Pipeline & Vector Processing: 2140707 Computer Organization
36 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
39 pages
Chapter 5 Pipelining and Vector Processing Modified
No ratings yet
Chapter 5 Pipelining and Vector Processing Modified
37 pages
Unit-5-Parallel Processing
No ratings yet
Unit-5-Parallel Processing
11 pages
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
No ratings yet
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
36 pages
Pipelining and Vector Processing: - Parallel
No ratings yet
Pipelining and Vector Processing: - Parallel
37 pages
Chapter 8 Pipeline and Vector Processing
0% (1)
Chapter 8 Pipeline and Vector Processing
12 pages
COA DR MVN 5 UNIT - Latest PDF
No ratings yet
COA DR MVN 5 UNIT - Latest PDF
24 pages
COA Question Bank-2
No ratings yet
COA Question Bank-2
6 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Chap. 9 Pipeline and Vector Processing
No ratings yet
Chap. 9 Pipeline and Vector Processing
16 pages
Unit-4 Pipelinie and Vector Processing
No ratings yet
Unit-4 Pipelinie and Vector Processing
33 pages
Parallel Processing
No ratings yet
Parallel Processing
32 pages
Comp Architecture Chapter 4 - Pipelining
No ratings yet
Comp Architecture Chapter 4 - Pipelining
53 pages
Unit-6: Pipeline & Vector Processing
No ratings yet
Unit-6: Pipeline & Vector Processing
41 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet

Unit-4-Pipeline and Vector Processing

Uploaded by

Unit-4-Pipeline and Vector Processing

Uploaded by

Unit 5:Pipeline and

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

1. SISD (Single Instruction stream, Single Data stream)

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

• SIMD: many processing units.

• MIMD: a computer system capable of processing

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

• One type of parallel processing that does not fit

• In this we consider parallel processing under:

• Like an industrial assembly line.

• Example: Ai * Bi + Ci for i=1,2,3,…,7

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Task: Total operation performed going through all

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

• Example: n= 100, tp=20ns, k=4, tn= 80ns

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

• Two or more segment require memory access at

• Memory conflicts can be resolved by using two

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

2. DA is the segment that decodes the instruction

3. FO is the segment that fetches the operand.

4. EX is the segment that executes the instruction.

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Now, consider following four instructions

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

• Matrix multiplication is one of the most computational intensive

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

© Ronak Patel, Computer Engineering Department , CSPIT, CHARUSAT

Objective is to provide vector manipulation

You might also like