0% found this document useful (0 votes)

14 views5 pages

COA Unit V B

The document discusses data hazards in pipelined processors, outlining methods to resolve them such as forwarding, code re-ordering, and stall insertion. It also explains vector processors, their architectures, and instruction types, highlighting the differences between memory-to-memory and register-to-register architectures. Additionally, it covers array processors, their types, and advantages, emphasizing their efficiency in handling repetitive operations and the cache coherence problem in multi-processor systems.

Uploaded by

Rene Dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

COA Unit V B

Uploaded by

Rene Dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Hazards

A data hazard occurs when the current instruction requires the result of a preceding instruction, but there are
insufficient segments in the pipeline to compute the result and write it back to the register file in time for the
current instruction to read that result from the register file.

We typically remedy this problem in one of three ways:

 Forwarding: In order to resolve a dependency, one adds special circuitry to the pipeline that is comprised
of wires and switches with which one forwards or transmits the desired value to the pipeline segment that
needs that value for computation. Although this adds hardware and control circuitry, the method works
because it takes far less time for the required value(s) to travel through a wire than it does for a pipeline
segment to compute its result.
 Code Re-Ordering: Here, the compiler reorders statements in the source code, or the assembler reorders
object code, to place one or more statements between the current instruction and the instruction in which
the required operand was computed as a result. This requires an "intelligent" compiler or assembler, which
must have detailed information about the structure and timing of the pipeline on which the data hazard
would occur. We call this type of software a hardware-dependent compiler.
 Stall Insertion: It is possible to insert one or more stalls (no-op instructions) into the pipeline, which delays
the execution of the current instruction until the required operand is written to the register file. This
decreases pipeline efficiency and throughput, which is contrary to the goals of pipeline processor design.
Stalls are an expedient method of last resort that can be used when compiler action or forwarding fails or
might not be supported in hardware or software design.

Vector Processors

There are two essentially different models of parallel computers: vector processors and multiprocessors. A vector
processor, is simply a machine that has an instruction that can operate on a vector. A pipelined vector processor is a
vector processor that can issue a vector instruction that operates on all of the elements of the vector in parallel by
sending those elements through a highly pipelined functional unit with a fast clock. A processor array is a vector
processor that achieves the parallelism by having a collection of identical, synchronized processing elements (PE),
each of which executes the same instruction on different data, which are controlled by a single control unit. Every
PE has a unique identifier, its processor id, which can be used during the computation. The control unit, which
might be a full-fledged CPU, broadcasts the instruction to be executed to the processing elements, which execute it
on data from a memory that is usually local to each, and can store the result in their local memories, or can return
global results back to the CPU. A global result line is usually a separate, parallel bus that allows each PE to
transmit values back to the CPU to be combined by a parallel, global operation, such as a logical-and or a logical-
or, depending upon the hardware support in the CPU.
Because all PEs execute the same instruction at the same time, this type of architecture is suited to problems with
data parallelism. Data parallelism is a type of parallelism that is characterized by the ability to perform the same
operation on different data simultaneously. For example, a loop of the form
for i = 0 to N-1
do a[i] = a[i] + 1;
has data parallelism because the updates to the distinct array elements a[i] are independent of each other and may
be performed in parallel, whereas the loop
for i = 1 to N-1
do a[i] = a[i-1] + 1;
has no data parallelism because the update to a[i] cannot be performed until the update to a[i-1]. If the value of N is
smaller than the number of processing elements, the entire loop takes the same amount of time as a single processor
takes to perform the increment on a scalar variable. If the value of N is larger, then the work has to be distributed to
the PEs so that they each update the values of several array elements. This may be handled by the hardware, by a
runtime library, or by the programmer, depending on the particular architecture and software.

Vector processor classification

According to from where the operands are retrieved in a vector processor, pipe lined vector computers are
classified
into two architectural configurations:
1. Memory to memory architecture –
In memory to memory architecture, source operands, intermediate and final results are retrieved (read) directly
from the main memory. For memory to memory vector instructions, the information of the base address, the offset,
the increment, and the the vector length must be specified in order to enable streams of data transfers between the
main memory and pipelines.
The main points about memory to
memory architecture are:
•
There is no limitation of size
•
Speed is comparatively slow in this architecture
2. Register to register architecture –
In register to register architecture, operands and results are retrieved indirectly from the main memory
through the use of large number of vector registers or scalar registers. The processors like Cray-1 and the
Fujitsu VP-200 use vector instructions in register to register formats. The main points about register to register
architecture are:
• Register to register architecture has limited size.
• Speed is very high as compared to the memory to memory architecture.
• The hardware cost is high in this architecture.
A block diagram of a modern multiple pipeline vector computer is shown below:

Vector instruction types

A Vector operand contains an ordered set of n elements, where n is called the a vector are same type scalar
quantities, which may be a floating character.

Four primitive types of vector instructions are:

f1 : V --> V
f2 : V --> S
f3 : V x V --> V
f4 : V x S --> V
Where V and S denotes a vector operand and a scalar operand, respectively. The instructions, f1 and f2 are unary
operations and f3 and f4 are binary operations. The VCOM (vector complement), which complements each
complement of the vector, is an f1 operation. The pipe lined implementation of f1 operation is shown in the figure
1:

The VMAX (vector maximum), which finds the maximum scalar quantity from all the complements in the vector,
is an f2 operation. The pipe lined implementation of f2 operation is shown in the fig 2.

The VMPL (vector multiply) , which multiply the respective scalar components of two vector operands and
produces another product vector, is an f3 operation. The pipe lined implementation of f3 operation is shown in the
figure 3:
The SVP (scalar vector product), which multiply one constant value to each component of the vector is f4
operation. The pipe lined implementation of f4 operation is shown in the figure 4.

Vector Instruction Format in Vector Processors

Different Instruction formats are used by different vector processors. Vector instructions are generally specified by
some fields. The main fields that are used in Vector Instruction Set are given below.
1. Operations Code (Opcode) –
The operation code must be specified to select the functional unit or to reconfigure a multi functional unit to
perform the specified operation dictated by this field. Usually, microcode control is used to set up the required
resources.
For example:
Opcode – 0001 mnemonic – ADD operation
Opcode – 0010 mnemonic – SUB operation
Opcode – 1111 mnemonic – HLT operation
2. Base addresses –
For a memory reference instruction, the base addresses are needed for both source operands and result
vectors. The designated vector registers must be specified in the instruction, if the operands and results are located
in the vector register file, i.e., collection of registers.
1. For example:
ADD R1, R2
Here, R1 and R2 are the addresses of the register.
2. Offset (or Displacement) –
This field is required to get the effective memory address of operand vector. The address offset relative to the base
address should be specified. Using the base address and the offset (positive or negative), the effective address is
calculated.
3. Address Increment –
The address increment between the scalar elements of vector operand must be specified. Some computers, i.e., the
increment is always 1.
For example:
R1 <- 400
Auto incr-R1 is incremented the value of R1 by 1.
R1 = 399
4. Vector length – The vector length (positive integer) is needed to determine the termination of a instruction
Array processor
Array processor A computer/processor that has an architecture especially designed for processing arrays (e.g.
matrices) of numbers. The architecture includes a number of processors (say 64 by 64) working simultaneously,
each handling one element of the array, so that a single operation can apply to all elements of the array in parallel.
To obtain the same effect in a conventional processor, the operation must be applied to each element of the array
sequentially, and so consequently much more slowly. An array processor may be built as a self-contained unit
attached to a main computer via an I/O port or internal bus; alternatively, it may be a distributed array processor
where the processing elements are distributed throughout, and closely linked to, a section of the computer's
memory. Array processors are very powerful tools for handling problems with a high degree of parallelism. They
do however demand a modified approach to programming. The conversion of conventional (sequential) programs
to serve array processors is not a trivial task, and it is sometimes necessary to select different parallel) algorithms to
suit the parallel approach.

Types of Array Processors

There are basically two types of array processors:
1. Attached Array Processors
2. SIMD Array Processors

Attached Array Processors

An attached array processor is a processor which is attached to a general purpose computer and its purpose is to
enhance and improve the performance of that computer in numerical computational tasks. It achieves high
performance by means of parallel processing with multiple functional units.

SIMD Array Processors

SIMD is the organization of a single computer containing multiple processors operating in parallel. The processing
units are made to operate under the control of a common control unit, thus providing a single instruction stream and
multiple data streams. A general block diagram of an array processor is shown below. It contains a set of identical
processing elements (PE's), each of which is having a local memory M. Each processor element includes an ALU
and registers. The master control unit controls all the operations of the processor elements. It also decodes the
instructions and determines how the instruction is to be executed. The main memory is used for storing the
program. The control unit is responsible for fetching the instructions. Vector instructions are send to all PE's
simultaneously and results are returned to the memory. The best known SIMD array processor is the ILLIAC IV
computer developed by the Burroughs corps. SIMD processors are highly specialized computers. They are only
suitable for numerical problems that can be expressed in vector or matrix form and they are not suitable for other
types of computations.
Summary about Array Processor

1)Array processors increases the overall instruction processing speed.

2)As most of the Array processors operates asynchronously from the host CPU, hence it improves the overall
capacity of the system.

3)Array Processors has its own local memory, hence providing extra memory for systems with low memory.

4)The AP (array processor) is most efficient in doing repetitive operations such as doing FFT’s and multiplying
large vectors. Its efficiency degrades for non-repetitive operations, or operations requiring a great number of
decisions based on the results of computations.

5)Since the AP’s have their own program and data memory, the AP instruction and data must be transferred to, and
the results transferred from the AP. These I/O operations may cost more CPU time than the amount saved by using
the array processor.

6) As a general rule , use of AP is most efficient than the CPU when multiple or complex (such as FFT) operations,
which are highly repetitious, are going to be done on relatively large amount of data (thousands of words or more.).
In other cases use of AP will not help much and will keep other processes from using valuable resource.

Cache Coherence : In a single CPU system two copies of the same data, one in cache and another one in main
memory become different. This data inconsistency is called as Cache coherence problem.

Technical - Manual - For - Agilia - SP - Range - in - v1.6 - Agilia SP MC - Eng
No ratings yet
Technical - Manual - For - Agilia - SP - Range - in - v1.6 - Agilia SP MC - Eng
302 pages
Internal Verification of Assessment Decisions - BTEC (RQF) : Higher Nationals
No ratings yet
Internal Verification of Assessment Decisions - BTEC (RQF) : Higher Nationals
57 pages
Space Laser Communications For Beyond 5G 6G 1
No ratings yet
Space Laser Communications For Beyond 5G 6G 1
13 pages
Arduino Based Gloves Translator of Filipino Sign Language FSL Into Speech and Text
No ratings yet
Arduino Based Gloves Translator of Filipino Sign Language FSL Into Speech and Text
62 pages
Analog Electronic Circuit - Lab - Manual 2024-2025
No ratings yet
Analog Electronic Circuit - Lab - Manual 2024-2025
50 pages
Flyer 2023 2024 Final
No ratings yet
Flyer 2023 2024 Final
3 pages
MVJ Mou
No ratings yet
MVJ Mou
5 pages
2020 Bict Syllabus
No ratings yet
2020 Bict Syllabus
239 pages
Calimpusan Raci Matrix
No ratings yet
Calimpusan Raci Matrix
4 pages
Murriz Company Profile 2022
No ratings yet
Murriz Company Profile 2022
20 pages
Csa - Unit-4
No ratings yet
Csa - Unit-4
9 pages
Technology Transfer Refers To The Process of Transferring Knowledge
No ratings yet
Technology Transfer Refers To The Process of Transferring Knowledge
2 pages
Taylor'S Principle For Gauge Design: Subject: Gauges and Measurement (20MTE15)
No ratings yet
Taylor'S Principle For Gauge Design: Subject: Gauges and Measurement (20MTE15)
18 pages
Coa, Unit V, Notes
No ratings yet
Coa, Unit V, Notes
26 pages
FDP Presentation-Day1 Rene Dev
No ratings yet
FDP Presentation-Day1 Rene Dev
23 pages
Industry 4.0 PDF
No ratings yet
Industry 4.0 PDF
4 pages
Unit 4 COA
No ratings yet
Unit 4 COA
8 pages
Module 5 Instruction Level Parallelism and Pipelining
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining
54 pages
GS-26 English
No ratings yet
GS-26 English
20 pages
Coa Module 5
No ratings yet
Coa Module 5
10 pages
19 Computer Architecture Vector Processor
No ratings yet
19 Computer Architecture Vector Processor
20 pages
VLIW ARCHITECTURE and Pipeline
No ratings yet
VLIW ARCHITECTURE and Pipeline
5 pages
HSV5 TB
No ratings yet
HSV5 TB
15 pages
PP Unit 2 Tesseract
No ratings yet
PP Unit 2 Tesseract
38 pages
Bray / Mccannalok: 41R High Performance Valves For The SUGAR INDUSTRY
No ratings yet
Bray / Mccannalok: 41R High Performance Valves For The SUGAR INDUSTRY
4 pages
OLT Config
No ratings yet
OLT Config
16 pages
Crash Course On Control: Karl-Erik Årzén
No ratings yet
Crash Course On Control: Karl-Erik Årzén
113 pages
Durg 15-11 To 09-12
No ratings yet
Durg 15-11 To 09-12
476 pages
COA Module5 Notes
No ratings yet
COA Module5 Notes
20 pages
Vector Processor
No ratings yet
Vector Processor
13 pages
SIMD
No ratings yet
SIMD
44 pages
Soumen Dikpati C.V
No ratings yet
Soumen Dikpati C.V
2 pages
COA Chapter 9
No ratings yet
COA Chapter 9
36 pages
Unit - 4 ADC
No ratings yet
Unit - 4 ADC
40 pages
Onur Digitaldesign 2020 Lecture19 Simd Beforelecture
No ratings yet
Onur Digitaldesign 2020 Lecture19 Simd Beforelecture
64 pages
BS 1881-112 1983 Concrete Methods of Accelerated Curing of Test Cubes
No ratings yet
BS 1881-112 1983 Concrete Methods of Accelerated Curing of Test Cubes
11 pages
Module 4 Chapter 2
No ratings yet
Module 4 Chapter 2
42 pages
l22 Vector
No ratings yet
l22 Vector
32 pages
Array Processors
No ratings yet
Array Processors
16 pages
Approach G80
No ratings yet
Approach G80
13 pages
Alignment Report Nelamangala Chikkaballapura
No ratings yet
Alignment Report Nelamangala Chikkaballapura
39 pages
SARA-N2 DataSheet (UBX-15025564)
No ratings yet
SARA-N2 DataSheet (UBX-15025564)
26 pages
Unit 5
No ratings yet
Unit 5
29 pages
UNIT-V-Pipeline and Array Processing and Multi Processors
No ratings yet
UNIT-V-Pipeline and Array Processing and Multi Processors
51 pages
Ca Part 3
No ratings yet
Ca Part 3
20 pages
Unit 4 - Parallel Computer Structures Word
No ratings yet
Unit 4 - Parallel Computer Structures Word
12 pages
Onur 447 Spring15 Lecture14 Simd Afterlecture
No ratings yet
Onur 447 Spring15 Lecture14 Simd Afterlecture
60 pages
IC Electronic English Catalogue 2010
No ratings yet
IC Electronic English Catalogue 2010
48 pages
Questions With Answers
No ratings yet
Questions With Answers
22 pages
Rs 4.18 Rs 7.17 Rs 14.15 Rs 14.34 Rs 38.24: Telenor
No ratings yet
Rs 4.18 Rs 7.17 Rs 14.15 Rs 14.34 Rs 38.24: Telenor
2 pages
COA Unit3 Notes
No ratings yet
COA Unit3 Notes
47 pages
Coa Mod 4 5
No ratings yet
Coa Mod 4 5
91 pages
CA Classes-196-200
No ratings yet
CA Classes-196-200
5 pages
CA Classes-201-205
No ratings yet
CA Classes-201-205
5 pages
Stanley Assignment
No ratings yet
Stanley Assignment
6 pages
Oven 101 PDF
100% (1)
Oven 101 PDF
2 pages
CH 2 Vector Processing
No ratings yet
CH 2 Vector Processing
16 pages
Course File SEM
No ratings yet
Course File SEM
24 pages
Toyota's Jit Revolution
No ratings yet
Toyota's Jit Revolution
13 pages
Effects Overview
No ratings yet
Effects Overview
3 pages
An Application of The Internet-Based Project Management System
No ratings yet
An Application of The Internet-Based Project Management System
8 pages
BENNING Tebechop 3000HD DM
No ratings yet
BENNING Tebechop 3000HD DM
8 pages
Verifying Flowmeter Accuracy
100% (1)
Verifying Flowmeter Accuracy
8 pages
XX-BSC Compact Vector Processing
No ratings yet
XX-BSC Compact Vector Processing
49 pages
Coa-Unit - 5 Notes
No ratings yet
Coa-Unit - 5 Notes
38 pages
Arithmatic Pipline Unit-3
No ratings yet
Arithmatic Pipline Unit-3
27 pages
Module 5 Coa
No ratings yet
Module 5 Coa
11 pages
4-Concept of Pipelining
No ratings yet
4-Concept of Pipelining
20 pages
Bangabandhu Sheikh Mujibur Rahman Maritime University Bangladesh
No ratings yet
Bangabandhu Sheikh Mujibur Rahman Maritime University Bangladesh
7 pages
ACA20012021 - Vector & Multiple Issue Processor - 2
No ratings yet
ACA20012021 - Vector & Multiple Issue Processor - 2
21 pages
Pipelining
No ratings yet
Pipelining
13 pages
Parallel Computig Assignment
No ratings yet
Parallel Computig Assignment
15 pages
Computer Architecture Simd Vector Gpu
No ratings yet
Computer Architecture Simd Vector Gpu
16 pages
5 Marks Q. Describe Array Processor Architecture
No ratings yet
5 Marks Q. Describe Array Processor Architecture
11 pages
Coa Unit-3,4 Notes
No ratings yet
Coa Unit-3,4 Notes
17 pages
Vector Processor
No ratings yet
Vector Processor
83 pages
MCSE-103 by Mohd Abdullah
No ratings yet
MCSE-103 by Mohd Abdullah
9 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
16 pages
Unit 9: Fundamentals of Parallel Processing
No ratings yet
Unit 9: Fundamentals of Parallel Processing
16 pages
CSO Lecture Notes Unit - 5
No ratings yet
CSO Lecture Notes Unit - 5
11 pages
Implementing Linear Algebraalgorithms For Dense Matrices
No ratings yet
Implementing Linear Algebraalgorithms For Dense Matrices
22 pages
For Example: C (1:50) A (1:50) + B (1:50)
No ratings yet
For Example: C (1:50) A (1:50) + B (1:50)
7 pages
COA Chapter 6
No ratings yet
COA Chapter 6
6 pages
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
No ratings yet
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
31 pages
Parallel Processing
No ratings yet
Parallel Processing
33 pages
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
No ratings yet
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
26 pages
OAK - Week 11b
No ratings yet
OAK - Week 11b
50 pages
Vector (Array) Processing and Superscalar Processors
No ratings yet
Vector (Array) Processing and Superscalar Processors
7 pages
Microprocessor Array System
No ratings yet
Microprocessor Array System
7 pages
Vector
No ratings yet
Vector
38 pages
26-27 SIMD Architecture
No ratings yet
26-27 SIMD Architecture
33 pages
Simple Vector Processor Modeled With VHDL
No ratings yet
Simple Vector Processor Modeled With VHDL
6 pages
Assign
No ratings yet
Assign
12 pages
Chapter 3 Csa Summary
No ratings yet
Chapter 3 Csa Summary
10 pages

COA Unit V B

Uploaded by

COA Unit V B

Uploaded by

Data Hazards

We typically remedy this problem in one of three ways:

Vector processor classification

Vector instruction types

Four primitive types of vector instructions are:

Vector Instruction Format in Vector Processors

Types of Array Processors

Attached Array Processors

SIMD Array Processors

1)Array processors increases the overall instruction processing speed.

You might also like