0% found this document useful (0 votes)
3 views

Vector & Array Processing

This document provides an overview of vector processing and GPU basics, highlighting the differences between CPU and GPU architectures, and introducing CUDA programming. It explains vector processors, their architectures, and the parallel processing capabilities of GPUs. Additionally, it outlines the CUDA architecture and its components, emphasizing its role in enhancing computing performance through parallel execution.

Uploaded by

Ashmy Shams
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Vector & Array Processing

This document provides an overview of vector processing and GPU basics, highlighting the differences between CPU and GPU architectures, and introducing CUDA programming. It explains vector processors, their architectures, and the parallel processing capabilities of GPUs. Additionally, it outlines the CUDA architecture and its components, emphasizing its role in enhancing computing performance through parallel execution.

Uploaded by

Ashmy Shams
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Vector Processing & GPU Basics

MODULE III
Contents/Syllabus
Vector processing and array processing

CPU v/s GPU

GPU Architecture

Introduction to GPU programming – CUDA,

Memory Hierarchy Design


Vector Processor
Vector processor is basically a central processing
unit that has the ability to execute the complete
vector input in a single instruction.

It is a complete unit of hardware resources that


executes a sequential set of similar data items in
the memory using a single instruction.
Architecture & Working
Vectorized Code
Vectorized Code
Scalar Processing v/s Vector Processing

Loop 10 iterations
Read instruction and decode
Read ith instruction and decode
Fetch all 10 elements of A[]
Fetch the A[i] element
Fetch all 10 elements of B[]
Fetch the B[i] element
Add A[ ]+B[ ]
Add A[i] + B[i]
Store result in C[ ]
Store result in C[i]
Increment i till i=10
Classification of Vector
Processors

Vector Processor
Architectures

Register to Register Memory to Memory


Architecture Architecture
Register to Register
Architecture
Highly used in vector computers.

The fetching of the operand or previous results


indirectly takes place through the main memory by
the use of registers.

The several vector pipelines present in the vector


computer help in retrieving the data from the
registers and also storing the results in the
desired register.
Register to Register
Architecture
These vector registers are user instruction
programmable.

According to the register address present in the


instruction, the data is fetched and stored in the
desired register.
Memory to Memory Architecture

The operands or the results are directly fetched


from the memory despite using registers.

The address of the desired data to be accessed


must be present in the vector instruction.
Memory to Memory Architecture
This architecture enables the fetching of data of
size 512 bits from memory to pipeline.

Due to high memory access time, the pipelines of


the vector computer requires higher startup time,
as higher time is required to initiate the vector
instruction.
Graphics Processing Unit
Highlights

What is a GPU?
What is the Difference between a CPU and a GPU?
Why should you use a GPU?
GPU - Introduction
The GPU accelerate applications running on the CPU
by offloading some of the compute-intensive and
time consuming portions of the code.

The rest of the application still runs on the CPU.

This is known as "heterogeneous" or "hybrid"


computing.
Uses massive Parallel Processing Power
GPU-Introduction
A CPU consists of two to eight CPU cores, while
the GPU consists of hundreds of smaller cores.

Together, they operate to crunch through the data


in the application.

This massively parallel architecture is what gives


the GPU its high compute performance.
CPU V/S GPU
Check out these YouTube Videos
CPU V/S GPU

CPU GPU

Central Processing Unit Graphics Processing Unit

Several cores Many cores

Low latency High throughput

Good for serial processing Good for parallel processing

Can do a handful of operations at once Can do thousands of operations at once


Best GPU Manufacturers
CUDA Architecture
CUDA (an acronym for Compute Unified Device
Architecture) is a parallel computing platform and
application programming interface (API) model
created by Nvidia.

It allows software developers and software


engineers to use a CUDA-enabled graphics
processing unit (GPU) for general purpose
CUDA Architecture
The CUDA platform is designed to work with
programming languages such as C, C++, and Fortran.

This accessibility makes it easier for specialists


in parallel programming to use GPU resources
CUDA - GPU PROCESS

1. Copy data from main memory


to GPU memory
2. CPU initiates the GPU
compute kernel
3. GPU's CUDA cores execute the
kernel in parallel
4. Copy the resulting data from
GPU memory to main memory
CUDA ARCHITECTURE
The CUDA Architecture consists of several
components, in the green boxes below:
1. Parallel compute engines inside NVIDIA GPUs
2. OS kernel-level support for hardware
initialization, configuration, etc.
3. User-mode driver, which provides a device-level
API for developers
4. PTX instruction set architecture (ISA) for
parallel computing kernels and functions
CUDA ARCHITECTURE

You might also like