0% found this document useful (0 votes)
23 views13 pages

Presented by Ragasudha.B Pavitha.P

The document provides an overview of GPUs, detailing their evolution from graphics rendering to applications in AI, machine learning, and data analytics. It discusses the architecture of GPUs, including CUDA and Tensor cores, memory hierarchy, and the concept of embarrassingly parallel operations. Additionally, it addresses current challenges faced by GPUs, including high prices, power consumption, and competition among major manufacturers like NVIDIA, AMD, and Intel.

Uploaded by

tamilmozhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views13 pages

Presented by Ragasudha.B Pavitha.P

The document provides an overview of GPUs, detailing their evolution from graphics rendering to applications in AI, machine learning, and data analytics. It discusses the architecture of GPUs, including CUDA and Tensor cores, memory hierarchy, and the concept of embarrassingly parallel operations. Additionally, it addresses current challenges faced by GPUs, including high prices, power consumption, and competition among major manufacturers like NVIDIA, AMD, and Intel.

Uploaded by

tamilmozhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

GPU

PRESENTED BY;
RAGASUDHA.B
PAVITHA.P
INTRODUCTION TO GPU

❑ GPU has been one of the famous topics in this decade especially after the
rise of AI and block chain technology
❑ It is originally created for accelerating the graphics rendering and reduce
the computational load and improves the smoother gaming performance
❑ Now its been used for,
 Scientific stimulation
 machine learning
 data analytics
 cryptographic mining
HISTORY TO NOW

 How many calculation do you think your graphics card perform every single
second ?
 May be 100 million? Well, 100 million calculations a second is required to
run Mario 64 from 1996.
 May be 100 billion ? Then you would have a computer they run Minecraft
back into in 2011.
 In order to run the most realiastics video games such as god of war ,you
need a graphic card that could perform 36 trillions calculations per second.
 This reflects the rise of the GPU over the years

INTRODUCTION TO CORES

CUDA Cores Tensor Cores

 It can be thought of a binary ❑ Tensor Cores


calculation with addition and multiple
 They are matrix multiplication and
buttons and few others.
addition calculation.
 Which are used most when running
 Used for geometric transformation
video games
and working with neural networks and
AI.
Overview of GPU Memory Hierarchy

 Memory Hierarchy This includes various levels of memory such as:


• Register File: Fast, small storage for each thread.
• Shared Memory: Allows threads within the same block to share data
quickly.
• L1/L2 Cache: Caches that store frequently accessed data to reduce
latency.
• Global Memory: Large, slower memory accessible by all threads.
• Constant Memory: Read-only memory for storing constant values.
• Texture Memory: Specialized memory for texture mapping and filtering.
Concept of Embarrassing parallel
operations

 Now lets explore the computational architecture and see how applications like
vedio game graphics and bit coin mining run what’s called “embarrassingly”
parallel operations
 An embarrassingly parallel problem is one where the sub-problems are
completely independent of each other, meaning they can be solved
simultaneously without any need for communication or synchronization
between them. This makes them ideal for parallel computing since each task
runs separately without waiting for others.
 GPU solve embarrassingly parallel problem using a principle SIMD – single
instruction multiple data
where same instructions are repeated across thousands to millions of different
object.
EXAMPLE:

 Lets consider an example how SIMD is used to create this 3d video game environment
• In a gaming environment consider a cowboy hat kept on the table is composed of 28,000
triangles built by 14,000 vertices each with XYZ coordinates .
• These vertex coordinates are built using a coordinate system ,“MODEL SPACE”. With the origin of
(0,0,0) being at the centre of the hat
• To build a 3D world we place 100s of objects, each with their own model space into the world
environment. In order to determine the relative position of the object with other objects we have
to transform all the vertices from each separate model space into the shared world coordinate
system
• Well. For this we are going to use the single instruction which adds the position of the origin of the
hat in world space to the corresponding XYZ coordinates X,Y and Z coordinate of a single vertex
in model space.
• Next we have to copy this instruction to multiple data which is all the remaining XYZ of the other
thousands of the vectors that are used to build the hat
• Next we can repeat the same process to the other objects in the world..
CASE STUDY

 While studying the architecture of the GPU we found one interesting questionable thing about) NVIDIA’s different
version and we found the reason behind it. This is what we are going to discuss in our presentation.
➢ The sudden initiative for our question is about the fact that NVIDIA uses same GPUGA102 chip design
manufacturing
➢ During the manufacture process sometimes
1️⃣.Patterning errors
2️⃣.dust particles
3. .other manufacturing issues
The above situations cause damage create a defective areas of the circuit. Instead of throwing out the
entire chip because of a small defect engineers find the defective region and permanently isolate and
deactivate the nearby circuitary.
By having a GPU with a highly repeatitive design a small defect in one core only damages the particular
streaming multiprocessor circuit and doesn’t affect the other areas of the chip.
As a result these chips are tested and categorized or binned according to the number of defects
ADVANCED TECHNOLOGY

SIMD SIMT

 DEFINITION: SIMD allows a single  DIFINITION: SIMT extends SIMD by


instruction to be executed on multiple allowing multiple threads to execute a
data points simultaneously. single instruction on multiple data points.
 Usage: Commonly used in CPUs and  Usage: Primarily used in GPUs, where
vector processors for tasks like each thread executes the same
multimedia processing, scientific instruction on different data, ideal for
computing, and data parallelism. parallel computing tasks like rendering,
simulations, and machine learning.
 Example : Adding two arrays element-
wise in a single instruction.  Example : Processing multiple pixels in
parallel when rendering an image, with
each thread handling a different pixel.
CURRENT PROBLEMS

 GPUs (Graphics Processing Units) have become essential for gaming, AI, data science,
and other computational tasks, but they also face several challenges. Here are some of
the current problems with GPUs:
 1. High Prices & Supply Chain Issues
• GPUs remain expensive due to high demand in gaming, AI, and crypto mining.
• Supply chain disruptions and semiconductor shortages have worsened the situation.
• Scalpers and limited production capacity make new GPUs hard to obtain at reasonable
prices.
 2. Power Consumption & Heat Generation
• High-end GPUs consume significant power, sometimes exceeding 400W (e.g., RTX 4090).
• This leads to heat issues, requiring better cooling solutions, which increases costs.
• Energy efficiency is a major concern, especially for data centers and AI workloads
3. Driver & Software Optimization
•New GPU releases often suffer from unoptimized drivers, leading to crashes or performance drops.
•Compatibility issues with certain games, applications, or operating systems (especially Linux).
•Poor support for older hardware as companies focus on newer GPUs.
4. Limited VRAM in Some Models
•Some mid-range GPUs (like RTX 4060 Ti 8GB) lack sufficient VRAM for modern gaming and AI tasks.
•VRAM bottlenecks affect performance in high-resolution gaming and video editing.
5. Scalability & Bottlenecks in AI Workloads
•Training large AI models requires multiple GPUs, but communication between them (NVLink, PCIe)
can be a bottleneck.
•Memory bandwidth and cache sizes sometimes limit performance for complex computations.
6. Environmental Impact
•High energy consumption contributes to carbon emissions.
•Manufacturing GPUs requires rare earth materials, leading to environmental concerns.
7. Software & Ecosystem Fragmentation
•Different GPU architectures (CUDA for NVIDIA, ROCm for AMD, SYCL for Intel) create compatibility
issues.
•AI and deep learning frameworks favor NVIDIA's CUDA, limiting AMD and Intel GPU adoption.
NVIDIA VS AMD VS INTEL

 1. NVIDIA vs. AMD vs. Intel - Performance & Market Competition


• NVIDIA leads in AI and gaming performance, thanks to CUDA, DLSS (Deep
Learning Super Sampling), and superior ray tracing.
• AMD offers better price-to-performance but struggles in ray tracing and AI
features.
• Intel Arc GPUs are new but have driver issues and limited software support
compared to NVIDIA and AMD.
THANKYOU

You might also like