0% found this document useful (0 votes)
78 views

Graphics Processing Unit: Shashwat Shriparv Infinitysoft

The document provides an overview of graphics processing units (GPUs). It defines a GPU as a dedicated processor for computer graphics that is specialized for parallel processing. The document compares GPUs to CPUs, noting GPUs have many parallel execution units while CPUs operate serially. It outlines the typical architecture of a GPU, including multiple processing units with hundreds of ALUs and fast dedicated memory. The document also describes how GPUs interact with CPUs via a command buffer and discusses synchronization issues that can arise.

Uploaded by

shashwat2010
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Graphics Processing Unit: Shashwat Shriparv Infinitysoft

The document provides an overview of graphics processing units (GPUs). It defines a GPU as a dedicated processor for computer graphics that is specialized for parallel processing. The document compares GPUs to CPUs, noting GPUs have many parallel execution units while CPUs operate serially. It outlines the typical architecture of a GPU, including multiple processing units with hundreds of ALUs and fast dedicated memory. The document also describes how GPUs interact with CPUs via a command buffer and discusses synchronization issues that can arise.

Uploaded by

shashwat2010
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

GRAPHICS

PROCESSING UNIT

Shashwat Shriparv
[email protected]
InfinitySoft
12/07/2021 1
Presentation Overview
Definition
Comparison with CPU
Architecture
GPU-CPU Interaction
GPU Memory

12/07/2021 2
Why GPU?
 To provide a separate dedicated graphics
resources including a graphics processor and
memory.
 To relieve some of the burden of the main
system resources, namely the Central Processing
Unit, Main Memory, and the System Bus, which
would otherwise get saturated with graphical
operations and I/O requests.

12/07/2021 3
There comes

GPU

12/07/2021 4
What is a GPU?
 A Graphics Processing Unit or GPU (also
occasionally called Visual Processing Unit or
VPU) is a dedicated processor efficient at
manipulating and displaying computer graphics .
 Like the CPU (Central Processing Unit), it is a
single-chip processor.

12/07/2021 5
HOWEVER,

The abstract goal of a GPU, is to enable


a representation of a 3D world as
realistically as possible. So these GPUs are
designed to provide additional
computational power that is customized
specifically to perform these 3D tasks.

12/07/2021 6
GPU vs CPU
 A GPU is tailored for highly parallel operation
while a CPU executes programs serially.
 For this reason, GPUs have many parallel
execution units , while CPUs have few execution
units .
 GPUs have singificantly faster and more
advanced memory interfaces as they need to
shift around a lot more data than CPUs.
 GPUs have much deeper pipelines (several
thousand stages vs 10-20 for CPUs).

12/07/2021 7
BRIEF HISTORY
 First-Generation GPUs
– Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6 feature
set.

 Second-Generation GPUs
– 1999 -2000; Nvidia’s GeForce256 and GeForce2, ATi’s Radeon7500, and
S3’s Savage3D; T&L; OpenGL and DX7;Configurable.

 Third-Generation GPUs
– 2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB, DX7/8; Vertex
Programmability + ASM

 Fourth-Generation GPUs
– 2002 onwards; GeForce FX family, Radeon 9700; OpenGL+extensions,
DX9; Vertex/Pixel Programability + HLSL; 0.13μ Process, 125M T/C, 200M
T/S.
 Fifth-Generation GPUs
- GeForce 8X:DirectX10.
12/07/2021 8
GPU Architecture

How many processing units?

How many ALUs?

Do you need a cache?

What kind of memory?


12/07/2021 9
GPU Architecture

How many processing units?


– Lots.
How many ALUs?

Do you need a cache?

What kind of memory?


12/07/2021 10
GPU Architecture

How many processing units?


– Lots.
How many ALUs?
– Hundreds.
Do you need a cache?

What kind of memory?


12/07/2021 11
GPU Architecture

How many processing units?


– Lots.
How many ALUs?
– Hundreds.
Do you need a cache?
– Sort of.
What kind of memory?

12/07/2021 12
GPU Architecture
How many processing units?
– Lots.
How many ALUs?
– Hundreds.
Do you need a cache?
– Sort of.
What kind of memory?
– very fast.

12/07/2021 13
The difference…….

Without GPU With GPU


12/07/2021 14
The GPU pipeline

 The GPU receives geometry information


from the CPU as an input and provides a
picture as an output
 Let’s see how that happens…

host vertex triangle pixel memory


interface processing setup processing interface

12/07/2021 15
Details………..

12/07/2021 16
Host Interface
The host interface is the communication
bridge between the CPU and the GPU.
 It receives commands from the CPU and also
pulls geometry information from system memory.
 It outputs a stream of vertices in object space
with all their associated information (texture
coordinates, per vertex color etc) .

host vertex triangle pixel memory


interface processing setup processing interface

12/07/2021 17
Vertex Processing
The vertex processing stage receives vertices
from the host interface in object space and
outputs them in screen space
This may be a simple linear transformation, or a
complex operation involving morphing effects
No new vertices are created in this stage, and
no vertices are discarded (input/output has 1:1
mapping)

host vertex triangle pixel memory


interface processing setup processing interface

12/07/2021 18
Triangle setup
In this stage geometry information becomes
raster information (screen space geometry is the
input, pixels are the output)
Prior to rasterization, triangles that are
backfacing or are located outside the viewing
frustrum are rejected

host vertex triangle pixel memory


interface processing setup processing interface

12/07/2021 19
Triangle Setup (cont…..)
A pixel is generated if and only if its center is inside
the triangle
Every pixel generated has its attributes computed
to be the perspective correct interpolation of the
three vertices that make up the triangle

12/07/2021 20
Pixel Processing
Each pixel provided by triangle setup is fed into
pixel processing as a set of attributes which are
used to compute the final color for this pixel
The computations taking place here include
texture mapping and math operations

host vertex triangle pixel memory


interface processing setup processing interface

12/07/2021 21
Memory Interface
Pixel colors provided by the previous stage are
written to the framebuffer
Used to be the biggest bottleneck before pixel
processing took over
Before the final write occurs, some pixels are
rejected by the zbuffer .On modern GPUs z is
compressed to reduce framebuffer bandwidth
(but not size).

host vertex triangle pixel memory


interface processing setup processing interface

12/07/2021 22
Programmability in GPU pipeline
In current state of the art GPUs, vertex and
pixel processing are now programmable
The programmer can write programs that are
executed for every vertex as well as for every
pixel
This allows fully customizable geometry and
shading effects that go well beyond the generic
look and feel of older 3D applications

host vertex triangle pixel memory


interface processing setup processing interface

12/07/2021 23
GPU Pipelined Architecture
(simplified view)
GPU

…110010100100…

C
Vertex Vertex Pixel Frame
P Rasterizer
Setup Shader Shader buffer
U

Texture
Storage +
Filtering

Vertices Pixels

12/07/2021 24
GPU Pipelined Architecture
(simplified view)

GPU

C
Vertex Vertex Pixel Frame
P Rasterizer
Setup Shader Shader buffer
U

Texture
Storage +
Filtering

One unit can limit the speed of the pipeline…

12/07/2021 25
CPU/GPU interaction
The CPU and GPU inside the PC work in
parallel with each other
There are two “threads” going on, one for
the CPU and one for the GPU, which
communicate through a command buffer:
GPU reads commands from here

Pending GPU commands

CPU writes commands here

12/07/2021 26
CPU/GPU interaction (cont)
If this command buffer is drained empty,
we are CPU limited and the GPU will spin
around waiting for new input. All the GPU
power in the universe isn’t going to make
your application faster!
If the command buffer fills up, the CPU
will spin around waiting for the GPU to
consume it, and we are effectively GPU
limited

12/07/2021 27
Synchronization issues
In the figure below, the CPU must not
overwrite the data in the “yellow” block
until the GPU is done with the “black”
command, which references that data:

GPU reads commands from here

CPU writes commands here data


12/07/2021 28
Inlining data
One way to avoid these problems is to
inline all data to the command buffer and
avoid references to separate data:
GPU reads commands from here

CPU writes commands here

 However, this is also bad for performance, since we may need to copy several Mbyte
passing around a pointer

12/07/2021 29
GPU readbacks
The output of a GPU is a rendered image on the
screen, what will happen if the CPU tries to read
it? GPU reads commands from here

Pending GPU commands

CPU writes commands here

 GPU must be synchronized with the CPU, ie it must drain its


entire command buffer, and the CPU must wait while this
happens
12/07/2021 30
GPU readbacks (cont)
We lose all parallelism, since first the CPU
waits for the GPU, then the GPU waits for
the CPU (because the command buffer
has been drained)
Both CPU and GPU performance take a
nosedive
Bottom line: the image the GPU produces
is for your eyes, not for the CPU (treat the
CPU -> GPU highway as a one way street)

12/07/2021 31
About GPU memory…..

12/07/2021 32
Memory Hierarchy
CPU and GPU Memory Hierarchy
Disk

CPU Main
Memory

GPU Video
Memory
CPU Caches
GPU Caches

CPU Registers GPU Constant GPU Temporary


Registers Registers
12/07/2021 33
Where is GPU Data Stored?
– Vertex buffer
– Frame buffer
– Texture

Texture

Vertex Fragment
Vertex Buffer Processor
Rasterizer
Processor
Frame
Buffer(s)

12/07/2021 34
CPU memory vs GPU memory
CPU GPU
Registers Read/write Read/write

Local Mem Read/write stack None

Global Mem Read/write heap Read-only during


computation.
Write-only at end (to
pre-computed
address)

Disk Read/write disk None

12/07/2021 35
It looks like…..

12/07/2021 36
Some applications…..

Computer generated holography using a


graphics processing unit
Improve the performance of CAD tools.
Computer graphics in games

12/07/2021 37
New…..

NVIDIA's new graphics processing unit,


the GeForce 8X ULTRA, said to represent
the very latest in visual effects
technologies.

12/07/2021 38
THANK
YOU

Shashwat Shriparv
[email protected]
InfinitySoft
12/07/2021 39

You might also like