0% found this document useful (0 votes)

17 views95 pages

GPUComputing

Uploaded by

Huseyn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views95 pages

GPUComputing

Uploaded by

Huseyn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 95

HOW GPU COMPUTING WORKS

High Performance Computing for Astronomy and Astrophysics

IIT SKA National Centre for

Kharagpur India Supercomputing Development of
Mission Advanced Computing
EXPANDING UNIVERSE OF HIGH PERFORMANCE COMPUTING

DATA
ANALYTICS SIMULATION
SUPERCOMPUTING

EDGE APPLIANCE
NETWORK
EDGE VISUALIZATION
STREAMING

EXTREME IO

CLOUD AI

2
HOW GPU ACCELERATION WORKS
Application Code

Compute-Intensive Functions
Rest of Sequential
5% of Code CPU Code
GPU CPU

+ 3
ACCELERATED COMPUTING
GPU Accelerator
Optimized for
Parallel Tasks

4
SILICON BUDGET

The three components of any processor

Less ALU More

More Control Less

More Cache Less

5
This material is released by NVIDIA Corporation under the Creative Commons Attribution 4.0 International (CC BY 4.0)
CPU IS A LATENCY REDUCING ARCHITECTURE
CPU Strengths
GPU Accelerator
CPU Optimized
• Very large main memory
for
Optimized for Parallel
• Very fast clock Tasks
speeds
Serial Tasks • Latency optimized via large caches
• Small number of threads can run very
quickly

CPU Weaknesses

• Relatively low memory bandwidth

• Cache misses very costly
• Low performance/watt

6
GPU IS ALL ABOUT HIDING LATENCY
GPU Strengths
GPU Accelerator
• CPUmemory
High bandwidth main
Optimized for
• SignificantlyOptimized for resources
more compute Parallel Tasks
• Latency tolerant via Tasks
Serial parallelism
• High throughput
• High performance/watt

GPU Weaknesses

• Relatively low memory capacity

• Low per-thread performance

7
WHY
HOW GPU COMPUTING WORKS
High Performance Computing for Astronomy and Astrophysics
WHERE'S MY DATA?
WHY
HOW GPU COMPUTING WORKS
NOBODY CARES ABOUT FLOPs

10
REALLY
ALMOST NOBODY CARES ABOUT FLOPs
^

11
CPU

DRAM

12
CPU

2000 GFLOPs FP64

200 GBytes / sec

DRAM

13
CPU

2000 GFLOPs FP64

200 GBytes / sec

DRAM
= 25 Giga-FP64 / sec
(because FP64 = 8 bytes)

14
THIS IS COMPUTE INTENSITY
How many operations must I do on some data to make it worth the cost of loading it?

CPU

2000 GFLOPs FP64

FLOPs
Required Compute Intensity = = 80
Data Rate

200 GBytes / sec

DRAM
= 25 Giga-FP64 / sec
(because FP64 = 8 bytes)

15
THIS IS COMPUTE INTENSITY
How many operations must I do on some data to make it worth the cost of loading it?

CPU

2000 GFLOPs FP64

FLOPs
Required Compute Intensity = = 80
Data Rate

200 GBytes / sec

DRAM
= 25 Giga-FP64 / sec
(because FP64 = 8 bytes)

So for every number I load from memory, I need to do 80 operations on it to break even

16
GPU CPU

HBM HBM HBM

DRAM

NVIDIA A100 Intel Xeon 8280 AMD Rome 7742

Peak FP64 GigaFLOPs 19500 2190 2300
Memory B/W (GB/sec) 1555 131 204
Compute Intensity 100 134 90
17
REALLY
ALMOST NOBODY CARES ABOUT FLOPs
^
...BECAUSE WE SHOULD REALLY BE CARING ABOUT
MEMORY BANDWIDTH

18
REALLY
ALMOST NOBODY CARES ABOUT FLOPs
^
...BECAUSE WE SHOULD REALLY BE CARING ABOUT
MEMORY BANDWIDTH LATENCY

19
DAXPY: aX + Y = Z