PDC Lecture 7-8 GPU Architectures
PDC Lecture 7-8 GPU Architectures
Load Balancing
Load Balancers
Flynn’s Taxonomy
Computation Models
SISD
SIMD
MISD
MIMD
AMD
Southern
Islands
Architecture
Then, the total number of SIMDs and CUs No. of SIMDs = 1000/64
needed to execute a total of 1000 instructions is; = 15.625
No. of CUs = 15.625/4
No. of SIMDs = 1000/64 = 3.90 ~ 4
No. of CUs = 15.625/16 (Ceiling function)
= 0.97 ~
10
NVIDIA Fermi Architecture
NVIDIA
Fermi
Architecture
If 1 SM = 16 CUDA Processors Then, the total number of CUDA processors and SMs
And 1 CUDA Processor = 32 warps needed to execute a total of 1000 instructions is;
Then, the total number of CUDA processors and SMs No. of CUDA Processors =
needed to execute a total of 1000 instructions is; 1000/32
= 31.25
No. of CUDA Processors = No. of CUs = 31.25/32
1000/32 = 0.97 ~ 1
No. of CUs = 31.25/16 (Ceiling function)
13
Cell Broadband Engine
Intel’s
Response to
NVIDIA GPUs
Pro: Applications
They make your code run faster.
Traditional GPU Applications: Gaming, image
processing
Cons: i.e., manipulating image pixels, oftentimes the
They’re expensive (False). same operation on each pixel
They’re hard to program. Scientific and Engineering Problems: physical
Your code may not be cross-platform modeling, matrix algebra, sorting, etc.
(False). Data parallel algorithms:
Large data arrays
Single Instruction, Multiple Data (SIMD) parallelism
Floating point computations
CPU GPU
multiprocessors (SMs)
GPU Architectures
Conventional CPU architecture
Modern GPGPU architectures
AMD Southern Islands GPU Architecture
Exercise for AMD GPGPU and AMD 7000 series GPU
Nvidia Fermi GPU Architecture
Exercise for NVIDIA GPGPU and NVIDIA GTX series GPU
Cell Broadband Engine
Heterogeneity
Heterogeneous Concurrent Computing
Forms of Heterogeneity
Goals of Heterogeneous Concurrent Computing
Processing Elements
Parallel Virtual Machine (PVM)
The PVM System
To cover this topic, different reference material has been used for
consultation.
Textbook: