Presented by Ragasudha.B Pavitha.P

The document provides an overview of GPUs, detailing their evolution from graphics rendering to applications in AI, machine learning, and data analytics. It discusses the architecture of GPUs, including CUDA and Tensor cores, memory hierarchy, and the concept of embarrassingly parallel operations. Additionally, it addresses current challenges faced by GPUs, including high prices, power consumption, and competition among major manufacturers like NVIDIA, AMD, and Intel.

Uploaded by

tamilmozhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views13 pages

Presented by Ragasudha.B Pavitha.P

Uploaded by

tamilmozhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

GPU

PRESENTED BY;
RAGASUDHA.B
PAVITHA.P
INTRODUCTION TO GPU

❑ GPU has been one of the famous topics in this decade especially after the
rise of AI and block chain technology
❑ It is originally created for accelerating the graphics rendering and reduce
the computational load and improves the smoother gaming performance
❑ Now its been used for,
 Scientific stimulation
 machine learning
 data analytics
 cryptographic mining
HISTORY TO NOW

 How many calculation do you think your graphics card perform every single
second ?
 May be 100 million? Well, 100 million calculations a second is required to
run Mario 64 from 1996.
 May be 100 billion ? Then you would have a computer they run Minecraft
back into in 2011.
 In order to run the most realiastics video games such as god of war ,you
need a graphic card that could perform 36 trillions calculations per second.
 This reflects the rise of the GPU over the years

INTRODUCTION TO CORES

CUDA Cores Tensor Cores

 It can be thought of a binary ❑ Tensor Cores

calculation with addition and multiple
 They are matrix multiplication and
buttons and few others.
addition calculation.
 Which are used most when running
 Used for geometric transformation
video games
and working with neural networks and
AI.
Overview of GPU Memory Hierarchy

 Memory Hierarchy This includes various levels of memory such as:

• Register File: Fast, small storage for each thread.
• Shared Memory: Allows threads within the same block to share data
quickly.
• L1/L2 Cache: Caches that store frequently accessed data to reduce
latency.
• Global Memory: Large, slower memory accessible by all threads.
• Constant Memory: Read-only memory for storing constant values.
• Texture Memory: Specialized memory for texture mapping and filtering.
Concept of Embarrassing parallel
operations

 Now lets explore the computational architecture and see how applications like
vedio game graphics and bit coin mining run what’s called “embarrassingly”
parallel operations
 An embarrassingly parallel problem is one where the sub-problems are
completely independent of each other, meaning they can be solved
simultaneously without any need for communication or synchronization
between them. This makes them ideal for parallel computing since each task
runs separately without waiting for others.
 GPU solve embarrassingly parallel problem using a principle SIMD – single
instruction multiple data
where same instructions are repeated across thousands to millions of different
object.
EXAMPLE:

 Lets consider an example how SIMD is used to create this 3d video game environment
• In a gaming environment consider a cowboy hat kept on the table is composed of 28,000
triangles built by 14,000 vertices each with XYZ coordinates .
• These vertex coordinates are built using a coordinate system ,“MODEL SPACE”. With the origin of
(0,0,0) being at the centre of the hat
• To build a 3D world we place 100s of objects, each with their own model space into the world
environment. In order to determine the relative position of the object with other objects we have
to transform all the vertices from each separate model space into the shared world coordinate
system
• Well. For this we are going to use the single instruction which adds the position of the origin of the
hat in world space to the corresponding XYZ coordinates X,Y and Z coordinate of a single vertex
in model space.
• Next we have to copy this instruction to multiple data which is all the remaining XYZ of the other
thousands of the vectors that are used to build the hat
• Next we can repeat the same process to the other objects in the world..
CASE STUDY

 While studying the architecture of the GPU we found one interesting questionable thing about) NVIDIA’s different
version and we found the reason behind it. This is what we are going to discuss in our presentation.
➢ The sudden initiative for our question is about the fact that NVIDIA uses same GPUGA102 chip design
manufacturing
➢ During the manufacture process sometimes
1️⃣.Patterning errors
2️⃣.dust particles
3. .other manufacturing issues
The above situations cause damage create a defective areas of the circuit. Instead of throwing out the
entire chip because of a small defect engineers find the defective region and permanently isolate and
deactivate the nearby circuitary.
By having a GPU with a highly repeatitive design a small defect in one core only damages the particular
streaming multiprocessor circuit and doesn’t affect the other areas of the chip.
As a result these chips are tested and categorized or binned according to the number of defects
ADVANCED TECHNOLOGY

SIMD SIMT

 DEFINITION: SIMD allows a single  DIFINITION: SIMT extends SIMD by

instruction to be executed on multiple allowing multiple threads to execute a
data points simultaneously. single instruction on multiple data points.
 Usage: Commonly used in CPUs and  Usage: Primarily used in GPUs, where
vector processors for tasks like each thread executes the same
multimedia processing, scientific instruction on different data, ideal for
computing, and data parallelism. parallel computing tasks like rendering,
simulations, and machine learning.
 Example : Adding two arrays element-
wise in a single instruction.  Example : Processing multiple pixels in
parallel when rendering an image, with
each thread handling a different pixel.
CURRENT PROBLEMS

 GPUs (Graphics Processing Units) have become essential for gaming, AI, data science,
and other computational tasks, but they also face several challenges. Here are some of
the current problems with GPUs:
 1. High Prices & Supply Chain Issues
• GPUs remain expensive due to high demand in gaming, AI, and crypto mining.
• Supply chain disruptions and semiconductor shortages have worsened the situation.
• Scalpers and limited production capacity make new GPUs hard to obtain at reasonable
prices.
 2. Power Consumption & Heat Generation
• High-end GPUs consume significant power, sometimes exceeding 400W (e.g., RTX 4090).
• This leads to heat issues, requiring better cooling solutions, which increases costs.
• Energy efficiency is a major concern, especially for data centers and AI workloads
3. Driver & Software Optimization
•New GPU releases often suffer from unoptimized drivers, leading to crashes or performance drops.
•Compatibility issues with certain games, applications, or operating systems (especially Linux).
•Poor support for older hardware as companies focus on newer GPUs.
4. Limited VRAM in Some Models
•Some mid-range GPUs (like RTX 4060 Ti 8GB) lack sufficient VRAM for modern gaming and AI tasks.
•VRAM bottlenecks affect performance in high-resolution gaming and video editing.
5. Scalability & Bottlenecks in AI Workloads
•Training large AI models requires multiple GPUs, but communication between them (NVLink, PCIe)
can be a bottleneck.
•Memory bandwidth and cache sizes sometimes limit performance for complex computations.
6. Environmental Impact
•High energy consumption contributes to carbon emissions.
•Manufacturing GPUs requires rare earth materials, leading to environmental concerns.
7. Software & Ecosystem Fragmentation
•Different GPU architectures (CUDA for NVIDIA, ROCm for AMD, SYCL for Intel) create compatibility
issues.
•AI and deep learning frameworks favor NVIDIA's CUDA, limiting AMD and Intel GPU adoption.
NVIDIA VS AMD VS INTEL

 1. NVIDIA vs. AMD vs. Intel - Performance & Market Competition

• NVIDIA leads in AI and gaming performance, thanks to CUDA, DLSS (Deep
Learning Super Sampling), and superior ray tracing.
• AMD offers better price-to-performance but struggles in ray tracing and AI
features.
• Intel Arc GPUs are new but have driver issues and limited software support
compared to NVIDIA and AMD.
THANKYOU

AGR 351 - Soil Water Movement - PPT 1 - Agri Junction
No ratings yet
AGR 351 - Soil Water Movement - PPT 1 - Agri Junction
27 pages
Report On Gpu
No ratings yet
Report On Gpu
39 pages
Concrete Technology Mini Project Group:8
No ratings yet
Concrete Technology Mini Project Group:8
22 pages
Method of Testing To Determine Flow Resistance of HVAC Ducts and Fittings
100% (1)
Method of Testing To Determine Flow Resistance of HVAC Ducts and Fittings
6 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
NVIDIA GPU Computing - A Journey From PC Gaming To Deep Learning
100% (1)
NVIDIA GPU Computing - A Journey From PC Gaming To Deep Learning
91 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
Introduction To Massively Parallel Computing
No ratings yet
Introduction To Massively Parallel Computing
44 pages
GPU Architecture & Implications: David Luebke NVIDIA Research
No ratings yet
GPU Architecture & Implications: David Luebke NVIDIA Research
94 pages
0 Gpu Computing I Give It
No ratings yet
0 Gpu Computing I Give It
57 pages
GPU Architecture and Function: Michael Foster and Ian Frasch
No ratings yet
GPU Architecture and Function: Michael Foster and Ian Frasch
35 pages
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
No ratings yet
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
24 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
00 CourseIntroduction
No ratings yet
00 CourseIntroduction
33 pages
3dfx Nst121spring2015presentation 150428172041 Conversion Gate02
No ratings yet
3dfx Nst121spring2015presentation 150428172041 Conversion Gate02
36 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
TDCI Arch
No ratings yet
TDCI Arch
77 pages
Gpgpu Final
No ratings yet
Gpgpu Final
124 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
GPGPU
No ratings yet
GPGPU
139 pages
GPU Introduction
No ratings yet
GPU Introduction
52 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
CSED405 Lec2-CUDA Overview - 240916 - 131108
No ratings yet
CSED405 Lec2-CUDA Overview - 240916 - 131108
52 pages
Part1 22
No ratings yet
Part1 22
77 pages
Seminar Igor Kamzic COSC3P93
No ratings yet
Seminar Igor Kamzic COSC3P93
58 pages
Gpus
No ratings yet
Gpus
32 pages
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
100% (1)
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
29 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
GPU Gems 2
No ratings yet
GPU Gems 2
534 pages
Topic 8
No ratings yet
Topic 8
71 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
UNIT 4 GPU Computing - HPC
No ratings yet
UNIT 4 GPU Computing - HPC
13 pages
GPU Architecture
33% (3)
GPU Architecture
28 pages
NVIDIAFermiComputeArchitectureWhitepaper PDF
No ratings yet
NVIDIAFermiComputeArchitectureWhitepaper PDF
21 pages
GPU Architectures
No ratings yet
GPU Architectures
29 pages
GPU in Supercomputer
No ratings yet
GPU in Supercomputer
7 pages
L 3 GPU
No ratings yet
L 3 GPU
33 pages
Graphics Processing Unit (Gpu) Memory Hierarchy: Presented by Vu Dinh and Donald Macintyre
No ratings yet
Graphics Processing Unit (Gpu) Memory Hierarchy: Presented by Vu Dinh and Donald Macintyre
24 pages
Intro Computing BCSM-F18-071 - Assignment 1
No ratings yet
Intro Computing BCSM-F18-071 - Assignment 1
10 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
Parallel Path Tracing
No ratings yet
Parallel Path Tracing
35 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
How CUDA Programming Works - 1647539841016001sz6e
No ratings yet
How CUDA Programming Works - 1647539841016001sz6e
101 pages
GPU (Graphics Processing Unit)
No ratings yet
GPU (Graphics Processing Unit)
23 pages
GPU Architecture
No ratings yet
GPU Architecture
12 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
NVIDIA Investor Presentation Oct 2024
No ratings yet
NVIDIA Investor Presentation Oct 2024
30 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
GPUIntro
No ratings yet
GPUIntro
21 pages
Chapter7 GPU
No ratings yet
Chapter7 GPU
45 pages
Chapter 4 Notes
No ratings yet
Chapter 4 Notes
2 pages
Gpu IEEE Paper
No ratings yet
Gpu IEEE Paper
14 pages
Hematocritc Determination
No ratings yet
Hematocritc Determination
3 pages
Microsoft Office 2007 Word Assignments Computers Grade 9
No ratings yet
Microsoft Office 2007 Word Assignments Computers Grade 9
0 pages
06 CFD Computational Fluid Dynamics Daimler FSG Academy Deep Dive Simulation 20201127
100% (1)
06 CFD Computational Fluid Dynamics Daimler FSG Academy Deep Dive Simulation 20201127
28 pages
Food Recognition and Calorie Estimation Using Image Processing
No ratings yet
Food Recognition and Calorie Estimation Using Image Processing
5 pages
Contribution of Phosphocreatine and Aerobic Metabolism To Energy Supply During Repeated Sprint Exercise
No ratings yet
Contribution of Phosphocreatine and Aerobic Metabolism To Energy Supply During Repeated Sprint Exercise
10 pages
Linear Verification For Spanning Trees 1
No ratings yet
Linear Verification For Spanning Trees 1
6 pages
1 - Goodenough Park 2013 The Li Ion Rechargeable Battery A Perspective
No ratings yet
1 - Goodenough Park 2013 The Li Ion Rechargeable Battery A Perspective
10 pages
Presentation2 Chapter 9
No ratings yet
Presentation2 Chapter 9
21 pages
Measurement
No ratings yet
Measurement
22 pages
ECU Part Number Vehicle Application List
100% (3)
ECU Part Number Vehicle Application List
2 pages
中三數學三角學的應用 (Applications of Trigonometry) 附件測驗卷 (4份,附答案) .
No ratings yet
中三數學三角學的應用 (Applications of Trigonometry) 附件測驗卷 (4份,附答案) .
65 pages
G10pretest Posttest
100% (1)
G10pretest Posttest
3 pages
How The Internal Anatomy of The Leaf Facilitates Photosynthesis
No ratings yet
How The Internal Anatomy of The Leaf Facilitates Photosynthesis
4 pages
May 12 L3 Proposed Exam Paper UBGLFP-20-3
No ratings yet
May 12 L3 Proposed Exam Paper UBGLFP-20-3
9 pages
Chapter6 Bearing Capacity and Settlement of Shallow Foundations
No ratings yet
Chapter6 Bearing Capacity and Settlement of Shallow Foundations
57 pages
1-Implementing A Java Program
No ratings yet
1-Implementing A Java Program
13 pages
Rizzoni Principles 7e Ch01 ISM
No ratings yet
Rizzoni Principles 7e Ch01 ISM
93 pages
Boiler Water Testing Procedure
No ratings yet
Boiler Water Testing Procedure
3 pages
Source Rock - TM
No ratings yet
Source Rock - TM
57 pages
1Z0 1087 24 Demo
No ratings yet
1Z0 1087 24 Demo
4 pages
Unit 5 10 PDF
No ratings yet
Unit 5 10 PDF
4 pages
Concept of Inheritance Encapsulation and Polymorphism
No ratings yet
Concept of Inheritance Encapsulation and Polymorphism
36 pages
Gombosi
100% (1)
Gombosi
8 pages
How Does The Shape of An Ice Cube Affect How Fast It Melts
No ratings yet
How Does The Shape of An Ice Cube Affect How Fast It Melts
2 pages
Design Spec WASP UAV
No ratings yet
Design Spec WASP UAV
42 pages
Mech BSN 2019R3 EN WS06.1
No ratings yet
Mech BSN 2019R3 EN WS06.1
19 pages
Excel Calculation Guide For Pipette Intermediate Checks Advance
No ratings yet
Excel Calculation Guide For Pipette Intermediate Checks Advance
3 pages

Presented by Ragasudha.B Pavitha.P

Uploaded by

Presented by Ragasudha.B Pavitha.P

Uploaded by

GPU

CUDA Cores Tensor Cores

 It can be thought of a binary ❑ Tensor Cores

 Memory Hierarchy This includes various levels of memory such as:

 DEFINITION: SIMD allows a single  DIFINITION: SIMT extends SIMD by

 1. NVIDIA vs. AMD vs. Intel - Performance & Market Competition

You might also like