0% found this document useful (0 votes)
25 views11 pages

Zareen 6

Uploaded by

Jehangir Vakil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views11 pages

Zareen 6

Uploaded by

Jehangir Vakil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 2

Data Level Parallelism


Book 1 – Computer Architecture: A Quantitative Approach, Henessy and Patterson,
5th Edition, Morgan Kaufmann, 2012
Chapter 4 - Data-Level Parallelism in Vector, SIMD, and GPU Architectures
Parallelism

Classes of parallelism in applications

• Data Level Parallelism


oMany data items can be operated on at the same time

• Task Level Parallelism


oDifferent tasks are created that can operate independently and
largely in parallel.
Flynn’s Taxonomy
SIMD
True SIMD
One CPU (control Unit) + multiple
ALUs (Processing Elements(PEs))
each with a memory (can be shared
memory)

Pipelined SIMD
One CPU (control Unit) + pipelined
ALU
ALU work in a pipelined manner not
independently
Data Level Parallelism - Single Instruction stream,
Multiple Data Stream (SIMD)
Three variants

• Vector Architectures

• Multimedia SIMD Extensions

• GPUs, APUs
Vector Processing
• Vector – a set of scalar data elements, all of the same type, stored in memory
• Vector Processor – an ensemble of hardware resources, including vector
registers, functional pipelines, processing elements, and register counters for
performing vector operations
• Vector Processing occurs when arithmetic and logical operations are applied to
vectors
Properties of Vector Processors

• Vector Operations : arithmetic (add, sub, mul, div), memory accesses,


effective address calculations
• Multiple vector instructions can be in progress at the same time =>
more parallelism
• Applications to benefit
• Large scientific and engineering applications (simulations, weather
forecasting, applications involving large matrix operations)
• Multimedia applications
(video codecs, image processing, audio processing)
Basic Vector Architectures
• Vector Processor : ordinary pipelined scalar unit + vector unit
• Types of vector processors
• Memory-Memory processors: all vector operations are memory to
memory (CDC)
• Vector-Register processors: all vector operations except load and
store are among the vector registers (CRAY-1, CRAY-2, X-MP, Y-MP)
➢VMIPS – Vector processor as an extension of the 5-stage MIPS processor
Components of VMIPS Processor
• Vector registers—Each vector register is a fixed-length bank holding a single
vector
➢Vector register has at least 2 read and 1 write port
➢Typically 8-32 vector registers, each holding 64-128, 64 bit elements
➢VMIPS - 8 vector registers, each holding 64 elements of 64 bits (16 Rd ports, 8
Wr ports)

• Vector Functional Units (FUs) : fully pipelined, can start new operation every
clock cycle
• Typically 4 to 8 FUs: FP add, FP mult, FP reciprocal, integer add, logical, shift
• May have multiple of same unit
• VMIPS : 5 FUs (FP add/sub, FP mul, FP div, integer, FP logical)
Components of VMIPS Processor

• Vector Load-Store Units (LSUs)


➢Fully pipelined
➢May have multiple LSUs
➢VMIPS – 1 VLSU, bandwidth is 1 word per cycle after initial delay
• Scalar Registers
➢Single element for FP scalar or address
➢VMIPS – 32 GPR, 32 FPRs they are read out and latched at one input of the
FUs
• Cross-bar to connect FUs, LSUs, registers

You might also like