0% found this document useful (0 votes)

82 views47 pages

CS6461 - Computer Architecture Fall 2016 - Vector Operations

The document discusses vector processors and their advantages over conventional computers for scientific computing workloads. Vector processors can improve performance by including high-level vector instructions that can operate on entire arrays or "vectors" of data with a single instruction, as opposed to scalar instructions that operate on a single data element at a time. This takes advantage of the fact that scientific programs often perform the same computation across large collections of data. The document provides examples of vector instructions and how they work at a high level in a pipelined manner to achieve much higher throughput than scalar processors for suitable workloads.

Uploaded by

闫麟阁

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views47 pages

CS6461 - Computer Architecture Fall 2016 - Vector Operations

Uploaded by

闫麟阁

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

CS6461 Computer Architecture

Fall 2016
Adapted from Professor Stephen H. Kaislers Slides

Lecture 9 Vector Operations

(Partially based on notes from David Patterson, UC Berkeley)

Anyone can build a fast CPU. The trick is

to build a fast computer.
- Seymour Cray -
Improving Performance

Many scientific programs compute using collections of

like numbers either integer or floating point - e.g.,
vectors
Performance can be improved if we structure hardware
to efficiently deal with such collections
Vector processors have high-level operations that work
on linear arrays of numbers, e.g., vectors
Vector instructions access memory with a known pattern
No data caches required
Single vector instruction implies a lot of work

CSCI 6461 Computer Architecture 2

Conventional Computer

Initialize I = 0
20 Read B(I)
Read C(I)
Store A(I) = B(I) + C(I)
Increment I = I + 1
If I <= 100 Go to 20

B(1) will be fetched from memory.

C(1) will be fetched from memory.
A scalar add instruction will operate
on B(1) and C(1).
A(1) will be stored back to memory
Step (1) to (4) will be repeated 100
times.

CSCI 6461 Computer Architecture 3

General Purpose Computer

General purpose computer: A(i) = B(i) * C(i) ; i =1, ... ,N

Cycle: 1 2 3 4 5 6 ... N*5

Operation

Separate B(1) B(2)

mant. / exp. C(1) C(2)
...

Multiply B(1)
mantissa C(1)
...

Add B(1)
exponents C(1)
...

Normal.
result
A(1) ...

Put
sign
A(1) ... A(N)

CSCI 6461 Computer Architecture 4

Vector Computer

A(1:100) = B(1:100) + C(1:100)

Fetch vectors of values B(I) and C(I) into memory
Use vector integer add instruction to operate on B(I), C(I) pairs
Stream of A(I) values will be stored back to memory, one value
every clock cycle

CSCI 6461 Computer Architecture 5

Vector Computer

Vector pipeline (5 sub units / segments): A = B * C

Cycle: 1 2 3 4 5 6 ... N+4

Operation
Separate B(1) B(2) B(3) B(4) B(5) B(6)
...
Mant. / Exp. C(1) C(2) C(3) C(4) C(5) C(6)
Multiply B(1) B(2) B(3) B(4) B(5)
...
mantissa C(1) C(2) C(3) C(4) C(5)
Add B(1) B(2) B(3) B(4)
...
exponents C(1) C(2) C(3) C(4)
Normal. B(3)
A(1) A(2) ...
result C(3)
Put
A(1) A(2) ... A(N)
sign

CSCI 6461 Computer Architecture 6

Basic Ideas

Vector registers: Each vector register is a fixed-

length bank holding a single vector.
Usually comprised of normal general-purpose registers and
floating-point registers.
They can provide data as input to the vector functional
units, as well as compute addresses.
Vector functional units: Fully pipelined and can start
a new operation on every clock cycle.
Vector load-store unit: loads or stores a vector to or
from memory.
Vector Length Control: A vector has a natural length
determined by the length of the vector registers.

CSCI 6461 Computer Architecture 7

Two Types of Vector Processors

Vector-Register Processors:
All vector operations (except load and store) occur in the
vector registers.
Vector counterpart of a load-store architecture
All major vector computers (Cray machines, NEC SX/2 ~
SX/5, Fujitsu VP200, etc.)
Memory-Memory Processors:
All vector operations are memory to memory.
CDC vector computers: CDC 203, CDC 205, TI ASC
All are obsolete!

CSCI 6461 Computer Architecture 8

Properties of Vector Processors

Vector instructions access memory with known pattern

Highly interleaved memory
Amortize memory latency over multiple elements
No (data) caches required! (Do use instruction cache)
Single vector instruction implies lots of work ( loop)
=> fewer instruction fetches

Vector processor
Memory
Mask-
Unit registers MASK

I/O LOAD ADD

ControlUnit (CU) STORE Vector-
registers MULT
ScalarUnit (SU)
DIV
(RISC Processor)

Vector pipelines
CSCI 6461 Computer Architecture 9
Basic Vector-Register Processor Architecture

Main Memory
FP add/subtract

FP multiply
Vector load-store

FP divide

Integer
Vector
registers Logical

8 64-element vector registers

Scalar 5 Functional Units; each unit is
registers fully pipelined,
can start a new operation on
every clock cycle
Load/store unit - fully pipelined
Scalar registers

CSCI 6461 Computer Architecture 10

Whats in a Vector Processor

A scalar processor
Scalar register file
Scalar functional units (arithmetic, load/store, etc)
A vector register file (a 2D register array)
Each register is an array of elements, e.g. 32 registers with 32 64-bit
elements per register
MVL = maximum vector length = max # of elements per register
A set of pipelined vector functional units: Integer, FP, load/store, etc
Sometimes vector and scalar units are combined (share ALUs)
Three types of addressing
Unit stride
Contiguous block of information in memory
Fastest: always possible to optimize this
Non-unit (constant) stride
Harder to optimize memory system for all possible strides
Prime number of data banks makes it easier to support different strides at full
bandwidth
Indexed (gather-scatter)
Vector equivalent of register indirect
Good for sparse arrays of data
Increases number of programs that vectorize

CSCI 6461 Computer Architecture 11

How a Vector Pipeline Works

Consider the steps involved in a floating-point addition on a

vector machine with IEEE Arithmetic hardware
The exponents of the two floating-point numbers to be added are
compared to find the number with the smallest magnitude.
The significands of the number with the smaller magnitude is
shifted so that the exponents of the two numbers agree.
The significands are added.
The result of the addition is normalized.
Checks are made to see if any floating-point exceptions occurred
during the addition, such as overflow.
Rounding occurs.

CSCI 6461 Computer Architecture 12

Cray-1 Vector Computer

CSCI 6461 Computer Architecture 13

Cray Processors

From Bottom Left:

Cray-1,
Cray-XMP,
Cray-2,
Cray-T916

Cray Research built

aestheticallly
pleasing
supercomputers.
For over two
decades they were
the fastest machines
on earth.

CSCI 6461 Computer Architecture 14

Vector Instructions

Instruction Operands Operation Comment

VADD.VV V1,V2,V3 V1=V2+V3 vector + vector
VADD.SV V1,R0,V2 V1=R0+V2 scalar + vector
VMUL.VV V1,V2,V3 V1=V2*V3 vector x vector
VMUL.SV V1,R0,V2 V1=R0*V2 scalar x vector
VLD V1,R1 V1=M[R1...R1+63] load, stride=1
VLDS V1,R1,R2 V1=M[R1R1+63*R2] load, stride=R2
VLDX V1,R1,V2 V1=M[R1+V2i,i=0..63] indexed("gather")
VST V1,R1 M[R1...R1+63]=V1 store, stride=1
VSTS V1,R1,R2 V1=M[R1...R1+63*R2] store, stride=R2
VSTX V1,R1,V2 V1=M[R1+V2i,i=0..63] indexed(scatter")

CSCI 6461 Computer Architecture 15

SAXPY: A Common Equation
32 element SAXPY: scalar SAXPY: S = aX + Y
LD F0, a
ADDI R4, Rx,#256 X,Y are vectors (of same length);
Loop: a is a scalar
LD F2, 0(Rx) One of the most common vector
MUL.D F2, F0, F2
operations found in all arithmetic
LD F4, 0(Ry)
ADD.D F4, F2, F4 systems.
SD F4, 0(Ry) All transformations in linear algebra
ADDI Rx, Rx, 8 can be expressed in this basic triad.
ADDI Ry, Ry, 8
SUB R20,R4,Rx
BNZ R20,loop
Now, 32 element SAXPY: vector
LD F0,a #load a
VLD V1,Rx #load X[0:31]
VMULD.SV V2,F0,V1 #vector mult
VLD V3,Ry #load Y[0:31]
VADDD.VV V4,V2,V3 #vector add
VST Ry,V4 #store Y[0:31]
CSCI 6461 Computer Architecture 16
Terminology

Vector Start-up Time: A measure of the latency in starting up

the vector pipeline.
The number of clock cycles required prior to the generation of the
first result.

The start-up time adds a considerable overhead for small

value of N.

The effect of start-up time is negligible for large value of N.

To maintain an initiation rate of one word fetched/store per

clock, the memory must be able to meet this rate.
Usually done by interleaving memory in banks.

CSCI 6461 Computer Architecture 17

Issues

What to do when the application vector length is not exactly

maximum vector length (MVL)?
Vector-length (VL) register controls the length of any vector
operation, including a vector load or store
Set it before performing any vector operation
VADD.VV with VL=10 is equivalent to
for (i=0; i<10; i++)
V1[i] = V2[i]+V3[i]
VL can be anything from 0 to MVL

CSCI 6461 Computer Architecture 18

Issues

Problem: Vector registers have finite length

Solution: Break loops into pieces that fit in registers,
Stripmining
Vector Length modulo VL /= 0!!
So, do short piece first, then do rest with length VL
EX: Suppose VL = 64. We have a vector that is 264, which
is mod 8.
So, process a vector length 8, then four vectors of length
64.
Problem: All computations have some scalar
components, e.g., non-vectorizable
Solution: Separate scale from vector computations
(by hand; but maybe automatically)
CSCI 6461 Computer Architecture 19
Ex: Vector Code

Note: Fast processing rates do not always translate directly into

Fast processing of loops.

CSCI 6461 Computer Architecture 20

Assessing Performance

Pipe(line)length p: Number of stages in pipeline = N

segments
One result per cycle (if pipe is full)
Speed-up:
Serial computation: N*p cycles
Vector computation: N + p - 1 cycles
Speed-up: S = (N * p) / (N + p - 1)
N >> p S ~ p
Problems:
N~ p
No recursive references: A(i) = A(i-1) + C(i)

CSCI 6461 Computer Architecture 21

Characteristics of Vectorizable Code - I

Vectorization can only be done within a DO/FOR

loop; it must be the innermost loop.
It is crucial to ensure that there are sufficient
iterations in the DO loop to offset the start-up time
overhead.
Put as much work as possible into a vectorizable
statement to provide more opportunities for
concurrent operations.
There is a limit to vectorization because a compiler
may not vectorize the code if it is too complicated.
Exercise: How do you vectorize a WHILE loop??

CSCI 6461 Computer Architecture 22

Characteristics of Vectorizable Code - II

The existence of certain operations in the DO loop may

prevent the compiler from converting the entire, or part of
the DO loop for vector processing:
vectorization inhibitors include subroutine calls, recursion,
references to external functions, and any input/output statements
(which are actually system calls)
These types of vector inhibitors can be removed by:
expanding the function
in-lining subroutines at the point of reference.

CSCI 6461 Computer Architecture 23

Vector Code Example

Vector Processing Example:

/* Multiply a[m][k] * b[k][n] to get c[m][n] */
for (i = 1; i < m; i++)
{
for (j =1; j < n; j++)
{
sum = 0;
for (t =1; t <k; t++)
{
sum = sum + a[i][t] * b[t][j]; //// This is a dependency!!!
}
c[i][j] = sum;
}
}

CSCI 6461 Computer Architecture 24

Optimized Vector Code
/* Multiply a[m][k] * b[k][n] to get c[m][n] */
for (i = 1; i < m; i++)
{
for (j = 1; j < n; j += 32) /* Step j by 32 at a time. */
{
sum[0:31] = 0; /* Initialize a vector register to zeros. */
for (t = 1; t < k; t++)
{
a_scalar = a[i][t];
b_vector[0:31] = b[t][j:j+31];
/* Do a vector-scalar multiply. */
prod[0:31] = b_vector[0:31] * a_scalar; It's actually better to
/* Vector-vector add into results. */ interchange the i and
sum[0:31] += prod[0:31];
j loops, so that you
}
/* Unit-stride store of vector of results. */ only change
c[i][j:j+31] = sum[0:31]; vector length once
} during the whole
} matrix multiply

CSCI 6461 Computer Architecture 25

Vector Stride

Suppose adjacent elements of the vector are not sequential in

memory

do 10 i = 1,100
do 10 j = 1,100
A(i,j) = 0.0
do 10 k = 1,100
10 A(i,j) = A(i,j)+B(i,k)*C(k,j)

Either B or C accesses not adjacent (800 bytes between)

stride: distance separating elements that are to be merged into
a single vector (caches do unit stride)
=> LVWS (load vector with stride) instruction
Strides => can cause bank conflicts
(e.g., stride = 32 and 16 banks)

CSCI 6461 Computer Architecture 26

Vector Chaining

Suppose:
MULV V1,V2,V3
ADDV V4,V1,V5
chaining: vector register (V1) is not as a single entity
but as a group of individual registers, then pipeline
forwarding can work on individual elements of a
vector
Flexible chaining: allow vector to chain to any other
active vector operation => more read/write ports, e.g.
pass the result from one vector operation to another
vector operation
As long as enough HW, increases convoy size
CSCI 6461 Computer Architecture 27
Vector Register Bypassing

CSCI 6461 Computer Architecture 28

Vector Conditional Execution

CSCI 6461 Computer Architecture 29

Two Approaches

CSCI 6461 Computer Architecture 30

Vectors w/ Sparse Matrices

Suppose:
do 100 i = 1,n
100 A(K(i)) = A(K(i)) + C(M(i))

gather (LVI) operation takes an index vector and fetches data

from each address in the index vector
This produces a dense vector in the vector registers
After these elements are operated on in dense form, the sparse
vector can be stored in expanded form by a scatter store
(SVI), using the same index vector
Can't be figured out by a compiler since it can't know elements
distinct, no dependencies
Use CVI to create index 0, 1xm, 2xm, ..., 63xm

CSCI 6461 Computer Architecture 31

Gather Example

CSCI 6461 Computer Architecture 32

Vector Issues

Pitfall: Concentrating on peak performance and ignoring

start-up overhead:
NV (length faster than scalar) > 100!

Pitfall: Increasing vector performance, without

comparable increases in scalar performance (Amdahl's
Law)
problems of Cray competitor (ETA)

Pitfall: Good processor vector performance without

providing good memory bandwidth
MMX?

CSCI 6461 Computer Architecture 33

Some Previous Vector Processors

CSCI 6461 Computer Architecture 34

Vector Memory-Memory vs Register Machines

Vector memory-memory instructions hold all vector operands

in main memory
The first vector machines, CDC Star-100 (73) and TI ASC
(71), were memory-memory machines
Cray-1 (76) was first vector register machine
CSCI 6461 Computer Architecture 35
Vector Memory-Memory vs Register Machines

Vector memory-memory architectures (VMMA) require greater

main memory bandwidth, why?
All operands must be read in and out of memory
VMMAs make if difficult to overlap execution of multiple vector
operations, why?
Must check dependencies on memory addresses
VMMAs incur greater startup latency
Scalar code was faster on CDC Star-100 for vectors < 100
elements
For Cray-1, vector/scalar breakeven point was around 2
elements
Apart from CDC follow-ons (Cyber-205, ETA-10) all major
vector machines since Cray-1 have had vector register
architectures

CSCI 6461 Computer Architecture 36

CSCI 6461 Computer Architecture
The Cell Processor

Observed clock speed: > 4 GHz

Peak performance (single precision): > 256 GFlops

Peak performance (double precision): >26 GFlops
Local storage size per SPU: 256KB
Total number of transistors: 234M
The Cell Processor

Sony Playstation 3
Partnership between Sony,
Toshiba, IBM
Power PC-based main core (PPE)
Multiple SPEs
On die memory controller
Inter-core transport bus
High speed IO
Clocked at 3-4ghz
256GFLOPS Single Precision @
4ghz
Offload a large amount of work
onto compiler / software.

CSCI 6461 Computer Architecture 38

Cell Processor Die Layout

CSCI 6461 Computer Architecture 39

Power Processing Element (PPE)

PowerPC instruction set with AltiVec VMX instructions

Slow, but power-efficient
Used for general purpose computing and controlling
SPEs
Simultaneous Multithreading
Separate 32 KB L1 Caches for instructions and data
Unified 512 KB L2 Cache
Two issue in-order instruction fetch
Conspicuous lack of instruction window
PPEs and SPEs use different instruction sets.

CSCI 6461 Computer Architecture 40

Synergistic Processing Element (SPE)

SPEs are vector processors:

Not efficient for general-purpose
computation.
Meant to be used in parallel
(7 on PS3 implementation)
Instructions based on VMX
In-order execution w/ dual issue
Modified for 128 registers
Instructions assumed to be 4x 32 bits
128 registers (each 128 bits wide)
Vector logic
8 single precision operations per cycle
Significant performance hit for double
precision

CSCI 6461 Computer Architecture 41

SPE Local Storage

On chip local storage (256KB)

NOT a cache
Completely private to each SPE
Directly addressable by software
Software controlled DMA to and from main memory
Request queue handles 16 simultaneous requests
Up to 16 KB transfer each
Priority: DMA, L/S, Fetch
Fetch / execute parallelism

CSCI 6461 Computer Architecture 42

SPE Control Logic/Pipeline

Little ILP, and thus little control

logic faster execution
No hardware branch prediction
Software branch prediction
Loop unrolling
18 cycle penalty
Simple commit unit
no reorder buffer or other
complexities
Same execution unit for FP/int
Instruction Scheduling a HUGE
problem
Done primarily in software
IBM predicted 80-90% usage
ideally
CSCI 6461 Computer Architecture 43
Modern Vector Supercomputer
65nm CMOS technology
Vector unit (3.2 GHz)
8 foreground VRegs + 64 background
VRegs (256x64-bit elements/VReg)
64-bit functional units: 2 multiply, 2 add, 1
divide/sqrt, 1 logical, 1 mask unit
8 lanes (32+ FLOPS/cycle, 100+
GFLOPS peak per CPU)
1 load or store unit (8 x 8-byte
accesses/cycle)
Scalar unit (1.6 GHz)
4-way superscalar with out-of-order and
speculative execution
64KB I-cache and 64KB data cache

Memory system provides 256GB/s DRAM bandwidth per CPU

Up to 16 CPUs and up to 1TB DRAM form shared-memory node
total of 4TB/s bandwidth to shared DRAM memory
Up to 512 nodes connected via 128GB/s network links (message passing
between nodes)
CSCI 6461 Computer Architecture 44
Vector Advantages

Easy to get high performance: N operations

are independent
use same functional unit
access disjoint registers
access registers in same order as previous instructions
access contiguous memory words or known pattern
can exploit large memory bandwidth
hide memory latency (and any other latency)
Scalable: (get higher performance by adding HW resources)
Compact: Describe N operations with 1 short instruction
Predictable: performance vs. statistical performance (cache)
Multimedia ready: N * 64b, 2N * 32b, 4N * 16b, 8N * 8b
Mature, developed compiler technology

CSCI 6461 Computer Architecture 45

Vector Disadvantages

Vector Disadvantage: Out of Fashion?

Hard to say. Many irregular loop structures seem to still
be hard to vectorize automatically.
Not as fast with scalar instructions
Complexity of the multi-ported Vector Register File
Difficulties implementing precise exceptions
High price of on-chip vector memory systems
Increased code complexity

CSCI 6461 Computer Architecture 46

The
Last
(Vector)
Samurais

CSCI 6461 Computer Architecture 47

Vector Processing in Computer Architecture
No ratings yet
Vector Processing in Computer Architecture
3 pages
VHDL Model of a Simple Vector Processor
No ratings yet
VHDL Model of a Simple Vector Processor
6 pages
Vector
No ratings yet
Vector
38 pages
Lec. 12: Vector Computers: EECS 252 Graduate Computer Architecture
No ratings yet
Lec. 12: Vector Computers: EECS 252 Graduate Computer Architecture
31 pages
Unit 3-4
No ratings yet
Unit 3-4
76 pages
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
No ratings yet
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
31 pages
Chapter 04
No ratings yet
Chapter 04
47 pages
GPU SIMD Architecture Overview
No ratings yet
GPU SIMD Architecture Overview
26 pages
CS7103 - MultiCore Architecture Ppts Unit-II
No ratings yet
CS7103 - MultiCore Architecture Ppts Unit-II
43 pages
Flynn's Taxonomy: Data-Level Parallelism in Vector, SIMD, and GPU Architectures
No ratings yet
Flynn's Taxonomy: Data-Level Parallelism in Vector, SIMD, and GPU Architectures
28 pages
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
No ratings yet
Data-Level Parallelism in Vector, SIMD, and GPU Architectures
58 pages
19 Computer Architecture Vector Processor
No ratings yet
19 Computer Architecture Vector Processor
20 pages
Data-Level Parallelism with Vectors & GPUs
No ratings yet
Data-Level Parallelism with Vectors & GPUs
6 pages
SIMD
No ratings yet
SIMD
44 pages
Vector
No ratings yet
Vector
42 pages
Computer Architecture Simd Vector Gpu
No ratings yet
Computer Architecture Simd Vector Gpu
16 pages
Unit Iii - Aca
No ratings yet
Unit Iii - Aca
13 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
16 pages
CH 04. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
No ratings yet
CH 04. Data-Level Parallelism in Vector, SIMD, and GPU Architectures
50 pages
Supercomputers and Vector Machines
No ratings yet
Supercomputers and Vector Machines
40 pages
7TH - Unit 4-21ec74h6 - Ca
No ratings yet
7TH - Unit 4-21ec74h6 - Ca
67 pages
Unit 4 - 5th Sem-Ec355tbf
No ratings yet
Unit 4 - 5th Sem-Ec355tbf
67 pages
Unit 2
No ratings yet
Unit 2
43 pages
Understanding Vector Processors
No ratings yet
Understanding Vector Processors
96 pages
Onur Digitaldesign 2020 Lecture19 Simd Beforelecture
No ratings yet
Onur Digitaldesign 2020 Lecture19 Simd Beforelecture
64 pages
Data-Level Parallelism in SIMD Architectures
No ratings yet
Data-Level Parallelism in SIMD Architectures
92 pages
Guc 315 61 38694 2023-11-23T11 50 52
No ratings yet
Guc 315 61 38694 2023-11-23T11 50 52
33 pages
Module 1.6
No ratings yet
Module 1.6
53 pages
Module 4 Chapter 2
No ratings yet
Module 4 Chapter 2
42 pages
l22 Vector
No ratings yet
l22 Vector
32 pages
1 Vector Processing: Solutions
No ratings yet
1 Vector Processing: Solutions
16 pages
CA 13 VectorProcessors
No ratings yet
CA 13 VectorProcessors
16 pages
UNIT-V-Pipeline and Array Processing and Multi Processors
No ratings yet
UNIT-V-Pipeline and Array Processing and Multi Processors
51 pages
CA Classes-236-240
No ratings yet
CA Classes-236-240
5 pages
CA Classes-201-205
No ratings yet
CA Classes-201-205
5 pages
XX-BSC Compact Vector Processing
No ratings yet
XX-BSC Compact Vector Processing
49 pages
Data-Level Parallelism in RV64V Architecture
No ratings yet
Data-Level Parallelism in RV64V Architecture
87 pages
Vector Processor
No ratings yet
Vector Processor
83 pages
Onur 447 Spring15 Lecture14 Simd Afterlecture
No ratings yet
Onur 447 Spring15 Lecture14 Simd Afterlecture
60 pages
Understanding Vector Processors and VMIPS
No ratings yet
Understanding Vector Processors and VMIPS
13 pages
CSE 820 Graduate Computer Architecture Vectors and Multiprocessor Introduction
No ratings yet
CSE 820 Graduate Computer Architecture Vectors and Multiprocessor Introduction
39 pages
Unit5 Aca
100% (1)
Unit5 Aca
11 pages
Vector and SIMD Computer Systems
No ratings yet
Vector and SIMD Computer Systems
59 pages
Parallel Computer Architectures 2015
No ratings yet
Parallel Computer Architectures 2015
59 pages
CRAY-1 Brochure 1975
No ratings yet
CRAY-1 Brochure 1975
15 pages
26-27 SIMD Architecture
No ratings yet
26-27 SIMD Architecture
33 pages
Ca Part 3
No ratings yet
Ca Part 3
20 pages
CA 4 Notes
No ratings yet
CA 4 Notes
34 pages
17.40 Vector - RISCV 20190611 Vectors
No ratings yet
17.40 Vector - RISCV 20190611 Vectors
26 pages
MCA - HW - Lecture 7and8 - Prelim
No ratings yet
MCA - HW - Lecture 7and8 - Prelim
146 pages
Lecture 7
No ratings yet
Lecture 7
29 pages
Andes RVV Webinar II Final
No ratings yet
Andes RVV Webinar II Final
35 pages
Vector Processing for Tech Enthusiasts
No ratings yet
Vector Processing for Tech Enthusiasts
25 pages
Why Vector Processing: Deep Pipeline More Parallelism
No ratings yet
Why Vector Processing: Deep Pipeline More Parallelism
7 pages
CSC403 - Computer Organization and Architecture PDF
No ratings yet
CSC403 - Computer Organization and Architecture PDF
3 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
35 pages
GPU Programming for IIT Students
No ratings yet
GPU Programming for IIT Students
37 pages
Bangabandhu Sheikh Mujibur Rahman Maritime University Bangladesh
No ratings yet
Bangabandhu Sheikh Mujibur Rahman Maritime University Bangladesh
7 pages
CS6461 - Computer Architecture Fall 2016 Morris Lancaster - Memory Systems
No ratings yet
CS6461 - Computer Architecture Fall 2016 Morris Lancaster - Memory Systems
66 pages
CS6461 - Computer Architecture Fall 2016: Morris Lancaster - Lecturer
No ratings yet
CS6461 - Computer Architecture Fall 2016: Morris Lancaster - Lecturer
58 pages
CS6461 - Computer Architecture Fall 2016 Instructor Morris Lancaster
No ratings yet
CS6461 - Computer Architecture Fall 2016 Instructor Morris Lancaster
28 pages
CS6461 Computer Architecture Lecture 8
No ratings yet
CS6461 Computer Architecture Lecture 8
61 pages
Cs 6461 Computer Architecture Lecture 11
No ratings yet
Cs 6461 Computer Architecture Lecture 11
51 pages
CS6461 - Computer Architecture Fall 2016 Adapted From Professor Stephen Kaisler's Slides
No ratings yet
CS6461 - Computer Architecture Fall 2016 Adapted From Professor Stephen Kaisler's Slides
71 pages
I/O Systems: CS6461 - Computer Architecture Fall 2016 Morris Lancaster
No ratings yet
I/O Systems: CS6461 - Computer Architecture Fall 2016 Morris Lancaster
50 pages
CS6461 Computer Architecture Lecture 5
No ratings yet
CS6461 Computer Architecture Lecture 5
58 pages
CS6461 - Computer Architecture Fall 2016 Morris Lancaster: Lecture 3 - Instruction Set Architecture
No ratings yet
CS6461 - Computer Architecture Fall 2016 Morris Lancaster: Lecture 3 - Instruction Set Architecture
40 pages
CS6461 - Computer Architecture Fall 2016: - Introduction
No ratings yet
CS6461 - Computer Architecture Fall 2016: - Introduction
18 pages
Computer Architecture Course Guide
No ratings yet
Computer Architecture Course Guide
11 pages
AD 2000-Merkblatt A 401 2002-05 en
No ratings yet
AD 2000-Merkblatt A 401 2002-05 en
2 pages
Subsea Asset Integrity Guide
100% (1)
Subsea Asset Integrity Guide
29 pages
QA QC - Mechanical (Ravi Kiran)
No ratings yet
QA QC - Mechanical (Ravi Kiran)
5 pages
7155 - 5 Apron Conveyors Apron Feeders
No ratings yet
7155 - 5 Apron Conveyors Apron Feeders
5 pages
Industrial Chimney Specifications
No ratings yet
Industrial Chimney Specifications
1 page
DWG Mech 002
No ratings yet
DWG Mech 002
1 page
Aritifical inTeilLiGence
No ratings yet
Aritifical inTeilLiGence
24 pages
STP Manual v1.01
No ratings yet
STP Manual v1.01
46 pages
Environmental Impact of JIT Final
No ratings yet
Environmental Impact of JIT Final
25 pages
Adopting Cloud Mindset
No ratings yet
Adopting Cloud Mindset
2 pages
DZone Refcardz Getting Started With Ajax (2008) PDF
100% (1)
DZone Refcardz Getting Started With Ajax (2008) PDF
6 pages
Hy 1000 Si
No ratings yet
Hy 1000 Si
55 pages
Workholding Techniques Guide
No ratings yet
Workholding Techniques Guide
29 pages
Magic VLSI Layout Tutorial Guide
No ratings yet
Magic VLSI Layout Tutorial Guide
7 pages
Excel Pivot Tables Introdu - (Z-Library) - 51-100!1!25
No ratings yet
Excel Pivot Tables Introdu - (Z-Library) - 51-100!1!25
25 pages
RFQ - 9851069380 - VIS H M12x125 L28 AC9 ZNIDN3L PL
No ratings yet
RFQ - 9851069380 - VIS H M12x125 L28 AC9 ZNIDN3L PL
1 page
Rectifiers and Filters
No ratings yet
Rectifiers and Filters
21 pages
Space Race To Mars
No ratings yet
Space Race To Mars
8 pages
DF 2.5T (Q23-501) Manual Servicio
No ratings yet
DF 2.5T (Q23-501) Manual Servicio
127 pages
GE Krautkramer USN58L Brochure
No ratings yet
GE Krautkramer USN58L Brochure
2 pages
10.sinif Ingilizce 2.donem 2.yazili Indir 2024 2025
No ratings yet
10.sinif Ingilizce 2.donem 2.yazili Indir 2024 2025
2 pages
Marantz SR4200
No ratings yet
Marantz SR4200
29 pages
Massey Ferguson Datatronic 3 Guide
No ratings yet
Massey Ferguson Datatronic 3 Guide
146 pages
Electronic Washing Machine Timer Circuit
No ratings yet
Electronic Washing Machine Timer Circuit
6 pages
Khu Paul Etz
No ratings yet
Khu Paul Etz
2 pages
A Survey of Multilinear Subspace Learning For Tensor Data
No ratings yet
A Survey of Multilinear Subspace Learning For Tensor Data
35 pages
Sonalika International Di-47 RX Heavy Duty Tractor-T - 957-1475-2015
100% (1)
Sonalika International Di-47 RX Heavy Duty Tractor-T - 957-1475-2015
13 pages
Samsung Schematic Diagrams
No ratings yet
Samsung Schematic Diagrams
18 pages
The Role of NGOs in Socio-Eco Development
0% (1)
The Role of NGOs in Socio-Eco Development
17 pages
International Standard Banking Practice: Documents and The Need For Completion of A Box, Field or Space
No ratings yet
International Standard Banking Practice: Documents and The Need For Completion of A Box, Field or Space
1 page