Chapter 8

Vector processors apply operations simultaneously to vectors of data rather than scalars. They use vector instructions like vector-vector, vector-scalar, and vector-memory. Vectorization improves performance by reducing software overhead. Memory is organized for concurrent access to maximize throughput. Vector supercomputers balance vector and scalar performance through architectural design goals like scalability and high I/O performance. SIMD computers apply the same instruction to multiple data elements using processing elements with local interconnects.

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

195 views59 pages

Chapter 8

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Chapter 8

Multivector and SIMD computers

-by
Prajwala T R
Dept of CSE
PESIT
Vector processing principles
Vector instruction types
• Vector-ordered collection of scalar items of
same type.
• Uses fixed addressing increment-stride
• Vector processor is ensemble of vector
registers,functional pipelines,regs,vectorizer.
• Vector processing –arithmetic and logical
operators are applied to vectors
• Vectorization
• Vector processors are faster ,efficient
• Reduces software overhead.
Vector instruction types
• Vector vector instruction
• Vector scalar instructions
• Vector memory instructions
• Gather and scatter instructions
– M->v1 X v0
– V1 X v0->M
• Masking instructions-compress or expand the
vector
Vector instructions in cray like
computers
Vector access memory schemes
• Vector operand specification
– Base address
– Stride
– Length
– Access rate should match pipeline rate
C(concurrent)-Access memory
organization
• M-way lower order interleaved memory
structure
• If stride is one successive address are accessed
with one cycle delay
• If stride 2 then access are separated by 2
minor cycle.
• Maximum throughput of m words per cycle.
Low order interleaving
S(simultaneous)-Access memory organization
C/S access memory organization
• N buses and m memory modules
• N buses operate in parallel(c-access)
• M modules are interleaved to allow c access.
• Most popular memory access module in
vector computers
NEC SX vector super computer
Relative vector/scalar performance
• Amdhals law redefined
• P=1/(1-f)+f/r
• Indicates speedup of vector to scalar
processing.
• The hardware speed ratio r is designer’s
choice.
Performance directed design goals
• Architectural design goals
– Maintaining good vector to scalar performance
balance
– Supporting scalability
– Increasing memory system capacity and
performance
– Providing high performance i/o and easy access to
network
Balances vector scalar ratio
• Scalar processing is indispensible part of
general purpose architecture
• Vector balance point
• Vector performance
– 9 MFLOPS-vector
– 1 MFLOPS -scalar
• I/O and networking performance
– With speed of supercomputers increasing
problem size increases and I/O bandwidth
requirement as well
– I/O rate
– Cray systems
– 100GBPS transfer rate
• Memory demand
– Latency and bandwidth
– Effective memory hierarchy
– Memory sizes available on chip is rapidly
increasing.
– Relative speed mismatch
• Scalability
– Support of shared memory with increasing
number of processors and memory port.
– Constraints
• Latency
• Communication overhead
Table of comparison
Cray Y MP 816 system organization
C-90 and clusters
Cray MPP systems
• Off the shell components are not suitable.
• Balance of speed between processor memory
and I/O required.
• Lack of efficient memory operation like
synchronization and communication in RISC
• All the lead to introduction of MPP
• T3D
– 150 MHz clock, partition to emulate as SIMD or
MIMD dynamically
– Distributed memory.
– Mach based microkernel operating systems
– Program debugging and performance tools
Development phases
Fujitsu VP2000
Fujitsu 5000
Mainframe computers
LINPACK results
Compound vector processing
• CVF-compound vector function- composite function of
vector operations are converted from looping structure
of linked scalar operation
• Ex:
Do I=1,N
Load r1,x(I)
Load r2,y(I)
Mul r1,s
Add r2,r1
Store y(I),r2
continue
After vectorization
M(x:x+N-1)->v1
N(y:y+N-1)->v2
S X v1->v1
V2+v1->v2
V2->M(y:y+N-1)
• CVF
Y(I)=S X X(I)+Y(I)
Compound vector functions
• Vector loops and chains
– The loop count is determined at compile time or
run time
• Strip mining-when vector has length greater
than vector register
– Vector registers are not allocated to any other
operation until all segments of current vector are
handled.
• Functional unit independence
– Vector registers act as interface between pipeline
stages
– Vector registers and functional units must be
reserved before a vector chain is established
example
Timing diagram
Chaining limitations
• Number of vector operations
• Number of functional pipeline units
• Number of interfaces for adjacent pipelining
stages
• Degree of chaining depends on how many
unary and binary operators.
• How many scalar operations and vector
operations
• Vector recurrence-
– Functional pipeline feed back input to its own
source registers
– Ex-component counter
What is Systolic Computing?
A set of simple processing elements with local connections
which takes external inputs and processes them in a
predetermined manner in a pipelined fashion
Host Station in Systolic Architecture

• As a result of the local-communication scheme, a systolic network is easily

extended without adding any burden to the I/O.
• Systolic Array.
Control Control Control
Unit Unit Unit
……..
Processing Processing Processing
Units Units Units

Interconnection Network(Local)

• Systolic arrays usually pipe data from an outside host and also pipe the
results back to the host.
Multipipeline networking
• Pipeline net-constructed by interconnecting
multiple functional pipelines using BCN
• 2 level architecture of pipelining
Program graph transformation

• rule 1:Adding k delays to any node in systolic

graph and then subtracting k delay from all
incoming edges
• Rule 2:multiply all edges with scaling constant
• 0-graph is called systolic program graph
SIMD computer organization
• Distributed memory model
– Local memory
– scalar and vector control unit
– All processing elements are interconnected by
routing network
– Masking logic
Shared memory model
• Alignment network
• Alignment network
• must be properly set to avoid conflicts
• SIMD instructions
– All instructions must use vector operands of equal
length n
– Data routing functions
• Host and I/O
– Control memory
– Mass storage and graphics display results
CM-2 architecture
• Front end
• Sequencer
• Modes of communication
– Broadcasting
– Global combining
– Scalar memory bus
• Processing nodes
– 32 bit slice processor
– Floating point accelerator
– Bit slice ALU
• Hypercube routers
• applications
MasPar MP architecture
MasPar MP architecture
• Array control unit
• Scalar RISC processor
• Uses demand paging
• Fetches and decodes the instructions
• PE array
• 1024 PE
• 64 PE clusters-16 clusters per PE
• multistage cross bar interconnection netwrok
• Parallel disk arrays

XX-BSC Compact Vector Processing
No ratings yet
XX-BSC Compact Vector Processing
49 pages
7TH - Unit 4-21ec74h6 - Ca
No ratings yet
7TH - Unit 4-21ec74h6 - Ca
67 pages
Advanced Computer Architecture Assigment
No ratings yet
Advanced Computer Architecture Assigment
60 pages
20 Question of CA
No ratings yet
20 Question of CA
26 pages
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
No ratings yet
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
17 pages
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
No ratings yet
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
31 pages
Onur Digitaldesign 2020 Lecture19 Simd Beforelecture
No ratings yet
Onur Digitaldesign 2020 Lecture19 Simd Beforelecture
64 pages
CH 04. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
No ratings yet
CH 04. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
50 pages
Multivector&SIMD Computers Ch8
No ratings yet
Multivector&SIMD Computers Ch8
12 pages
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
No ratings yet
Onur Digitaldesign 2020 Lecture20 Gpu Beforelecture
73 pages
Homelite Weed Eater
No ratings yet
Homelite Weed Eater
22 pages
SIMD
No ratings yet
SIMD
44 pages
Onur 447 Spring15 Lecture14 Simd Afterlecture
No ratings yet
Onur 447 Spring15 Lecture14 Simd Afterlecture
60 pages
Vector
No ratings yet
Vector
38 pages
Unit 3-4
No ratings yet
Unit 3-4
76 pages
Advance Computer Architecture2
No ratings yet
Advance Computer Architecture2
36 pages
Manual Modem Valenet Gpon Ont User Manual
0% (1)
Manual Modem Valenet Gpon Ont User Manual
72 pages
CA 4 Notes
No ratings yet
CA 4 Notes
34 pages
Array & Vector Processor
No ratings yet
Array & Vector Processor
17 pages
Introduction To Linear Regression Analysis 5th Edition Montgomery Douglas C. &amp Peck Elizabeth A. &amp Vining G. Geoffrey
100% (2)
Introduction To Linear Regression Analysis 5th Edition Montgomery Douglas C. &amp Peck Elizabeth A. &amp Vining G. Geoffrey
34 pages
23.L20 Multiprocessing Multithreading Vectorization
No ratings yet
23.L20 Multiprocessing Multithreading Vectorization
38 pages
Lec 18-VectorSIMDGPUArchitectures
No ratings yet
Lec 18-VectorSIMDGPUArchitectures
29 pages
ACA1
No ratings yet
ACA1
29 pages
Data-Level Parallelism: Nima Honarmand
No ratings yet
Data-Level Parallelism: Nima Honarmand
59 pages
Chapter 04
No ratings yet
Chapter 04
47 pages
UNIT-V-Pipeline and Array Processing and Multi Processors
No ratings yet
UNIT-V-Pipeline and Array Processing and Multi Processors
51 pages
Ch7 Processing
No ratings yet
Ch7 Processing
22 pages
Chapter 06 - Welding
No ratings yet
Chapter 06 - Welding
78 pages
S 8 Mod 1
No ratings yet
S 8 Mod 1
33 pages
Module 1.6
No ratings yet
Module 1.6
53 pages
Vector Processor
No ratings yet
Vector Processor
83 pages
Ca Part 3
No ratings yet
Ca Part 3
20 pages
Module 4 Chapter 2
No ratings yet
Module 4 Chapter 2
42 pages
Design Analysis
No ratings yet
Design Analysis
4 pages
Vector Processor
No ratings yet
Vector Processor
13 pages
Proj 1
No ratings yet
Proj 1
5 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Architecture Chapter4 E5 2012
No ratings yet
Architecture Chapter4 E5 2012
92 pages
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
No ratings yet
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
58 pages
Aca UNIT-5
No ratings yet
Aca UNIT-5
10 pages
19 Computer Architecture Vector Processor
No ratings yet
19 Computer Architecture Vector Processor
20 pages
CH 2 Vector Processing
No ratings yet
CH 2 Vector Processing
16 pages
Slide 7
No ratings yet
Slide 7
40 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
DS Unit 2
No ratings yet
DS Unit 2
18 pages
Flynn's Taxonomy: Data-Level Parallelism in Vector, SIMD, and GPU Architectures
No ratings yet
Flynn's Taxonomy: Data-Level Parallelism in Vector, SIMD, and GPU Architectures
28 pages
Chapter 9
No ratings yet
Chapter 9
28 pages
KHF 950 Manual
100% (2)
KHF 950 Manual
244 pages
Computer Architecture Simd Vector Gpu
No ratings yet
Computer Architecture Simd Vector Gpu
16 pages
Chapter 06
No ratings yet
Chapter 06
76 pages
Lecture 13 - Angular Deformation
No ratings yet
Lecture 13 - Angular Deformation
9 pages
Unit 4 - Parallel Computer Structures Word
No ratings yet
Unit 4 - Parallel Computer Structures Word
12 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
16 pages
Advanced Computer Architecture: Presented By, Krishna
No ratings yet
Advanced Computer Architecture: Presented By, Krishna
35 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Stone Masonry Tools
No ratings yet
Stone Masonry Tools
12 pages
Data Reservoir Model
No ratings yet
Data Reservoir Model
33 pages
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
No ratings yet
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
26 pages
CS7103 - MultiCore Architecture Ppts Unit-II
No ratings yet
CS7103 - MultiCore Architecture Ppts Unit-II
43 pages
Parallel Processing
No ratings yet
Parallel Processing
33 pages
Chapter
No ratings yet
Chapter
9 pages
Express VPN Activation Code
50% (2)
Express VPN Activation Code
5 pages
Vector Computers
No ratings yet
Vector Computers
43 pages
Zareen 6
No ratings yet
Zareen 6
11 pages
26-27 SIMD Architecture
No ratings yet
26-27 SIMD Architecture
33 pages
atII Bks Lec 2021 28
No ratings yet
atII Bks Lec 2021 28
6 pages
Module 5 Coa
No ratings yet
Module 5 Coa
11 pages
BCSE412L - Parallel Computing 04
No ratings yet
BCSE412L - Parallel Computing 04
9 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Bizmanualz Construction Management Policies and Procedures Sample
0% (1)
Bizmanualz Construction Management Policies and Procedures Sample
5 pages
Chiller Cutsheet M081 M161
No ratings yet
Chiller Cutsheet M081 M161
1 page
Determining The Use of Open or Enclosed Lineshaft
No ratings yet
Determining The Use of Open or Enclosed Lineshaft
9 pages
CA Classes-201-205
No ratings yet
CA Classes-201-205
5 pages
Recording Log Document For Vocals
No ratings yet
Recording Log Document For Vocals
10 pages
SIMD Computer Organizations
0% (1)
SIMD Computer Organizations
20 pages
Manteniment R290 R190 PDF
No ratings yet
Manteniment R290 R190 PDF
13 pages
A Novel Approach To Genetic Algorithm Based Cryptography: Abstract: Cryptography Is Immensely Essential
No ratings yet
A Novel Approach To Genetic Algorithm Based Cryptography: Abstract: Cryptography Is Immensely Essential
4 pages
CA Classes-236-240
No ratings yet
CA Classes-236-240
5 pages
Engineering Made Easy by Integrating Instructional Design Strategies and Pedagogic Approaches
No ratings yet
Engineering Made Easy by Integrating Instructional Design Strategies and Pedagogic Approaches
7 pages
Options and Accessories: Complete Wear Part List
No ratings yet
Options and Accessories: Complete Wear Part List
1 page
HSHDJDJ
No ratings yet
HSHDJDJ
4 pages
Changing Registration in G3000 Instructions
No ratings yet
Changing Registration in G3000 Instructions
2 pages
Crystolivewax-03 2010
No ratings yet
Crystolivewax-03 2010
3 pages
Operation and Maintenance Manual: Downloaded From Manuals Search Engine
No ratings yet
Operation and Maintenance Manual: Downloaded From Manuals Search Engine
123 pages
The Final Mill Report
100% (1)
The Final Mill Report
31 pages
A. Determine A Radius of Riser: To Calculate Welding Length On TKY Joint Please Follow Below Instruction
100% (1)
A. Determine A Radius of Riser: To Calculate Welding Length On TKY Joint Please Follow Below Instruction
3 pages
Liebherr Model Numbers - Heavy Equipment Forums
No ratings yet
Liebherr Model Numbers - Heavy Equipment Forums
6 pages
A CLFILE Is A ANSI Standard Generic Output File For Tool
No ratings yet
A CLFILE Is A ANSI Standard Generic Output File For Tool
2 pages
Commissioning Procedures
0% (1)
Commissioning Procedures
4 pages
Monitor of A Computer
No ratings yet
Monitor of A Computer
2 pages

Chapter 8

Uploaded by

Chapter 8

Uploaded by

Chapter 8

Multivector and SIMD computers

• As a result of the local-communication scheme, a systolic network is easily

• rule 1:Adding k delays to any node in systolic

You might also like