0% found this document useful (0 votes)

11 views25 pages

Vector & Array Processing

This document provides an overview of vector processing and GPU basics, highlighting the differences between CPU and GPU architectures, and introducing CUDA programming. It explains vector processors, their architectures, and the parallel processing capabilities of GPUs. Additionally, it outlines the CUDA architecture and its components, emphasizing its role in enhancing computing performance through parallel execution.

Uploaded by

Ashmy Shams

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views25 pages

Vector & Array Processing

Uploaded by

Ashmy Shams

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Vector Processing & GPU Basics

MODULE III
Contents/Syllabus
Vector processing and array processing

CPU v/s GPU

GPU Architecture

Introduction to GPU programming – CUDA,

Memory Hierarchy Design

Vector Processor
Vector processor is basically a central processing
unit that has the ability to execute the complete
vector input in a single instruction.

It is a complete unit of hardware resources that

executes a sequential set of similar data items in
the memory using a single instruction.
Architecture & Working
Vectorized Code
Vectorized Code
Scalar Processing v/s Vector Processing

Loop 10 iterations
Read instruction and decode
Read ith instruction and decode
Fetch all 10 elements of A[]
Fetch the A[i] element
Fetch all 10 elements of B[]
Fetch the B[i] element
Add A[ ]+B[ ]
Add A[i] + B[i]
Store result in C[ ]
Store result in C[i]
Increment i till i=10
Classification of Vector
Processors

Vector Processor
Architectures

Register to Register Memory to Memory

Architecture Architecture
Register to Register
Architecture
Highly used in vector computers.

The fetching of the operand or previous results

indirectly takes place through the main memory by
the use of registers.

The several vector pipelines present in the vector

computer help in retrieving the data from the
registers and also storing the results in the
desired register.
Register to Register
Architecture
These vector registers are user instruction
programmable.

According to the register address present in the

instruction, the data is fetched and stored in the
desired register.
Memory to Memory Architecture

The operands or the results are directly fetched

from the memory despite using registers.

The address of the desired data to be accessed

must be present in the vector instruction.
Memory to Memory Architecture
This architecture enables the fetching of data of
size 512 bits from memory to pipeline.

Due to high memory access time, the pipelines of

the vector computer requires higher startup time,
as higher time is required to initiate the vector
instruction.
Graphics Processing Unit
Highlights

What is a GPU?
What is the Difference between a CPU and a GPU?
Why should you use a GPU?
GPU - Introduction
The GPU accelerate applications running on the CPU
by offloading some of the compute-intensive and
time consuming portions of the code.

The rest of the application still runs on the CPU.

This is known as "heterogeneous" or "hybrid"

computing.
Uses massive Parallel Processing Power
GPU-Introduction
A CPU consists of two to eight CPU cores, while
the GPU consists of hundreds of smaller cores.

Together, they operate to crunch through the data

in the application.

This massively parallel architecture is what gives

the GPU its high compute performance.
CPU V/S GPU
Check out these YouTube Videos
CPU V/S GPU

CPU GPU

Central Processing Unit Graphics Processing Unit

Several cores Many cores

Low latency High throughput

Good for serial processing Good for parallel processing

Can do a handful of operations at once Can do thousands of operations at once

Best GPU Manufacturers
CUDA Architecture
CUDA (an acronym for Compute Unified Device
Architecture) is a parallel computing platform and
application programming interface (API) model
created by Nvidia.

It allows software developers and software

engineers to use a CUDA-enabled graphics
processing unit (GPU) for general purpose
CUDA Architecture
The CUDA platform is designed to work with
programming languages such as C, C++, and Fortran.

This accessibility makes it easier for specialists

in parallel programming to use GPU resources
CUDA - GPU PROCESS

1. Copy data from main memory

to GPU memory
2. CPU initiates the GPU
compute kernel
3. GPU's CUDA cores execute the
kernel in parallel
4. Copy the resulting data from
GPU memory to main memory
CUDA ARCHITECTURE
The CUDA Architecture consists of several
components, in the green boxes below:
1. Parallel compute engines inside NVIDIA GPUs
2. OS kernel-level support for hardware
initialization, configuration, etc.
3. User-mode driver, which provides a device-level
API for developers
4. PTX instruction set architecture (ISA) for
parallel computing kernels and functions
CUDA ARCHITECTURE

A1.1 - Computer Hardware and Operations
No ratings yet
A1.1 - Computer Hardware and Operations
30 pages
Revision Unit 2 Chapter - 1
No ratings yet
Revision Unit 2 Chapter - 1
10 pages
Coe4590 15 Gpu1
No ratings yet
Coe4590 15 Gpu1
14 pages
Introduction - CUDA C Programming Guide
No ratings yet
Introduction - CUDA C Programming Guide
573 pages
8.4 GPU Architecture and Programming
No ratings yet
8.4 GPU Architecture and Programming
27 pages
CUDA 1 - Introduction To GPU, CUDA
No ratings yet
CUDA 1 - Introduction To GPU, CUDA
21 pages
GPU Architecture and Programming
No ratings yet
GPU Architecture and Programming
3 pages
Introduction CUDA
No ratings yet
Introduction CUDA
46 pages
w13s1 MultiprocessingGPU
No ratings yet
w13s1 MultiprocessingGPU
21 pages
Using GPUs
No ratings yet
Using GPUs
18 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
GPU Architecture
No ratings yet
GPU Architecture
12 pages
Cuda
No ratings yet
Cuda
69 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
17 pages
21.L18 Intro To GPU and CUDA C
No ratings yet
21.L18 Intro To GPU and CUDA C
89 pages
Quiz3 - Pacuribot
No ratings yet
Quiz3 - Pacuribot
4 pages
p10 Cuda
No ratings yet
p10 Cuda
28 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
Note2 4
No ratings yet
Note2 4
11 pages
Gpu Cuda Part2
No ratings yet
Gpu Cuda Part2
15 pages
Lecture 2
No ratings yet
Lecture 2
77 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
GPU Programming Slides 2
No ratings yet
GPU Programming Slides 2
37 pages
Gpu Research Paper
No ratings yet
Gpu Research Paper
6 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
Cuuda Nvidai Guide - Part1
No ratings yet
Cuuda Nvidai Guide - Part1
15 pages
0 Gpu Computing I Give It
No ratings yet
0 Gpu Computing I Give It
57 pages
A Beginner'S Guide To Programming Gpus With Cuda: Mike Peardon
No ratings yet
A Beginner'S Guide To Programming Gpus With Cuda: Mike Peardon
21 pages
Gpus
No ratings yet
Gpus
32 pages
STD 6 CH 06-1
No ratings yet
STD 6 CH 06-1
3 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
14 pages
Unit 4
No ratings yet
Unit 4
48 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
GPU Architecture
No ratings yet
GPU Architecture
8 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
Ili6480 PDF
No ratings yet
Ili6480 PDF
60 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
cs179 2017 Lec01
No ratings yet
cs179 2017 Lec01
24 pages
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
No ratings yet
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
2 pages
Control Statements Python
No ratings yet
Control Statements Python
9 pages
Comparative Study On CPU GPU and TPU
No ratings yet
Comparative Study On CPU GPU and TPU
9 pages
1 Cuda
100% (1)
1 Cuda
173 pages
CUDA
No ratings yet
CUDA
46 pages
Parallel & Distributed Computing Report
No ratings yet
Parallel & Distributed Computing Report
4 pages
Topic GPU1
No ratings yet
Topic GPU1
32 pages
GPUMod 2
No ratings yet
GPUMod 2
64 pages
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
No ratings yet
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
21 pages
Assembly - Chapter - 12 - Updated - Part - 2
No ratings yet
Assembly - Chapter - 12 - Updated - Part - 2
31 pages
SIEL UPS Management and Software
No ratings yet
SIEL UPS Management and Software
7 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
An Introduction To Graphical Processing Unit: Jayshree Ghorpade, Jitendra Parande, Rohan Kasat, Amit Anand
No ratings yet
An Introduction To Graphical Processing Unit: Jayshree Ghorpade, Jitendra Parande, Rohan Kasat, Amit Anand
6 pages
Apple II-IIe-IIc Expansion Guide 1985
No ratings yet
Apple II-IIe-IIc Expansion Guide 1985
322 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Tech Preview 22H2 Apple Silicon Tips and Techniques v9
No ratings yet
Tech Preview 22H2 Apple Silicon Tips and Techniques v9
52 pages
CUDA
No ratings yet
CUDA
33 pages
Course Algo Ch1 Ch2 Part1 2023 Cne2
No ratings yet
Course Algo Ch1 Ch2 Part1 2023 Cne2
14 pages
Common TTPs of The Modern Ransomware Low Res
No ratings yet
Common TTPs of The Modern Ransomware Low Res
137 pages
Mportant Java Question For ICSE Class X Board Exam
No ratings yet
Mportant Java Question For ICSE Class X Board Exam
103 pages
Cloud Computing NETW1009: Course Instructor: Dr. - Ing. Maggie Mashaly
No ratings yet
Cloud Computing NETW1009: Course Instructor: Dr. - Ing. Maggie Mashaly
15 pages
HP ENVY 5020 Datasheet PDF
100% (1)
HP ENVY 5020 Datasheet PDF
2 pages
Subject Code: Csd4003: Pre-Lab Task
No ratings yet
Subject Code: Csd4003: Pre-Lab Task
3 pages
OSI Model
100% (5)
OSI Model
33 pages
D510MO TechProdSpec02
No ratings yet
D510MO TechProdSpec02
92 pages
FR-8 MIDI Imp E1
No ratings yet
FR-8 MIDI Imp E1
6 pages
Sap-Taw12 Questions
100% (2)
Sap-Taw12 Questions
27 pages
ECE 498AL The CUDA Programming Model
No ratings yet
ECE 498AL The CUDA Programming Model
37 pages
Logitech Bundles PDF
No ratings yet
Logitech Bundles PDF
4 pages
Mansi Kadam PC Lab Assignment 1
No ratings yet
Mansi Kadam PC Lab Assignment 1
4 pages
Oberon Glass Black
No ratings yet
Oberon Glass Black
2 pages
AWT Controls
No ratings yet
AWT Controls
16 pages
Introduction To Programming Massively Parallel Graphics Processors
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
84 pages
PC Acces Ethernet Link
No ratings yet
PC Acces Ethernet Link
7 pages
A Peek Inside Oracle ASM Metadata
No ratings yet
A Peek Inside Oracle ASM Metadata
6 pages
Apple Inc
No ratings yet
Apple Inc
21 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
A Level Computer Science Python Turtle Worksheet
100% (1)
A Level Computer Science Python Turtle Worksheet
47 pages
How To Retrieve The Hibernation Debug Log
No ratings yet
How To Retrieve The Hibernation Debug Log
2 pages
Laboratory Manual: 18EC3017 Biomedical Electronics & IOT For Healthcare
No ratings yet
Laboratory Manual: 18EC3017 Biomedical Electronics & IOT For Healthcare
10 pages
Microsurvey Fieldgenius 8: Basic Setup Guidelines For Base and Rover Operation
No ratings yet
Microsurvey Fieldgenius 8: Basic Setup Guidelines For Base and Rover Operation
17 pages
Digvijay Singh
No ratings yet
Digvijay Singh
2 pages
VMware VSphere Enterprise DataSheet DS en
No ratings yet
VMware VSphere Enterprise DataSheet DS en
4 pages

Vector & Array Processing

Uploaded by

Vector & Array Processing

Uploaded by

Vector Processing & GPU Basics

CPU v/s GPU

Introduction to GPU programming – CUDA,

Memory Hierarchy Design

It is a complete unit of hardware resources that

Register to Register Memory to Memory

The fetching of the operand or previous results

The several vector pipelines present in the vector

According to the register address present in the

The operands or the results are directly fetched

The address of the desired data to be accessed

Due to high memory access time, the pipelines of

The rest of the application still runs on the CPU.

This is known as "heterogeneous" or "hybrid"

Together, they operate to crunch through the data

This massively parallel architecture is what gives

Central Processing Unit Graphics Processing Unit

Several cores Many cores

Low latency High throughput

Good for serial processing Good for parallel processing

Can do a handful of operations at once Can do thousands of operations at once

It allows software developers and software

This accessibility makes it easier for specialists

1. Copy data from main memory

You might also like