0% found this document useful (0 votes)

22 views21 pages

GPUIntro

Uploaded by

sumitwalia177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views21 pages

GPUIntro

Uploaded by

sumitwalia177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 21

Emergence of GPU systems

for general purpose high

performance computing

ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Nov 4, 2013

Titan Supercomputer
Oak Ridge National Laboratory in Oak Ridge, Tenn
World’s fastest computer on TOP500 list Nov 2012 – May 2013
Down to No 2 June 2013*

18,688 NVIDIA Tesla

K20X GPUs (each
having 2688 cores)

20 petaflops

Upgraded from Jaguar

supercomputer.

10 times faster and 5

times more energy
efficient than 2.3-
petaflops Jaguar
system while occupying
the same floor space.

http://nvidianews.nvidia.com/Releases/NVIDIA-Powers-Titan-World-s-Fastest-Supercomputer-For-Open-Scientific-Research-8a0.aspx#source=pr 2
No 1: Tianhe-2 (MilkyWay-2) – 3,120,000 cores (Intel Xeon E5-2692 with Intel Xeon Phi coprocessors)
Tesla K20 GPU Computing modules
Kepler architecture. Introduced November 2012

K20 – 2496 thread processors

(cores)
K20X – 2688 thread processors
(cores)

2013: K40 – 2880 thread

processors

K20
2496 FP32 cores, 832 FP64
cores
Wattage 225 watts

GFLOPs:
Single Precision: 3519 - 4106
Double Precision: 1173
3
UNC-C CUDA
Teaching Center
2010: NVIDIA Corp. selected UNC-
Charlotte Department of Computer
Science to be a CUDA Teaching
Center, kindly providing GPU
equipment and TA support.
Donated C2050 used in coit-grid06

2011: NVIDIA kindly provided 50 GTX 480 GPU cards valued at

$15,000 as continuing support for the CUDA Teaching Center.
2012: NVIDIA donates a K20, used in cci-grid08.
2013 NVIDIA Teaching Center status renewed.
Our course materials are posted on NVIDIA’s corporate site
next to those from Stanford, and other top schools.
4
http://developer.nvidia.com/cuda-training

5
CPU-GPU architecture evolution
1970s - 1980s
Co-processors -- very old idea appeared in Early designs
Co-processor
1970s and 1980s -- floating point co-
processors attached to microprocessors that CPU
did not then have floating point capability.
Coprocessors simply executed floating point Memory
instructions that were fetched from memory.

CPU
Graphics cards -- Around same time,
hardware support for displays, especially with
increasing use of graphics and PC games. Graphics
card
Led to graphics processing units (GPUs)
attached to CPU to create video display. Display

2013: Xeon Phi processor with 60 cores is described as a co-processor although

7
connected thro a PCIe interface in a similar fashion to recent GPU cards.
Pipelined programmable GPU
Dedicated pipeline (late1990s-early 2000s)

By late1990’s, graphics chips

Input stage
needed to support 3-D graphics,
especially for games and graphics.
APIs such as DirectX and OpenGL.
Vertex shader
stage
Generally had a pipeline structure
with individual stages performing Graphics
memory
specialized operations, finally Geometry
leading to loading frame buffer for shader stage
display.

Individual stages may have access Rasterizer stage

to graphics memory for storing Frame
intermediate computed data. buffer
Pixel shading
stage

8
Example -- GeForce 6 Series Architecture (2004-5)
From GPU Gems 2, Copyright 2005 by NVIDIA Corporation

9
General-Purpose GPU designs

High performance pipelines call for high-speed (IEEE) floating point

operations.

People tried to use GPU cards to speed up scientific computations

Known as GPGPU (General-purpose computing on graphics

processing units) -- Difficult to do with specialized graphics pipelines,
but possible.)

By mid 2000’s, recognized that individual stages of graphics pipeline

could be implemented by a more general purpose processor core
(although with a data-parallel paradigm)

a 10
Graphics Processing Units (GPUs)
Brief History
GPU Computing
General-purpose computing
on graphics processing units
(GPGPUs)
GPUs with
programmable shading
Nvidia GeForce
GE 3 (2001) with
programmable shading

DirectX graphics API

OpenGL graphics API
Hardware-accelerated
3D graphics
S3 graphics cards-
single chip 2D
accelerator
Atari 8-bit IBM PC Professional Playstation
computer Graphics Controller
text/graphics chip card

1970 1980 1990 2000 2010

Source of information http://en.wikipedia.org/wiki/Graphics_Processing_Unit
NVIDIA products
Tesla Kepler K20
NVIDIA Corp. a leader in GPU has 2496 thread
processors
GPUs for high performance Maxwell
computing: C2050 GPU has (2013)
448 thread Kepler
processors (2011)

Fermi
NVIDIA's first Tesla
GPU with C870, S870, C1060, S1070, C2050, …
general purpose
processors GeForce 400 series
GTX460/465/470/475/
Quadro 480/485
Established by Jen- GT 80
GeForce 200 series
Hsun Huang, Chris GeForce
8800 GTX260/275/280/285/295
Malachowsky,
Curtis Priem GeForce 8 series

GeForce 2 series GeForce FX series

NV1 GeForce 1

1993 1995 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
NVIDIA GT 80 chip/GeForce 8800 card
(2006)
First GPU for high performance computing as well as graphics
Unified processors
that could perform
vertex, geometry,
pixel, and general
computing
operations

Could now write

programs in C
rather than graphics
APIs.

Single-instruction
multiple thread
(SIMT) prog. model 13
Evolving GPU design:
NVIDIA Fermi architecture
(announced Sept 2009)
• Data parallel single instruction
multiple data operation (“Stream”
processing)
• Up to 512 cores (“stream processing
engines”, SPEs, organized as 16
SPEs, each having 32 SPEs)
• 3GB or 6 GB GDDR5 memory
• Many innovations including L1/L2
caches, unified device memory
addressing, ECC memory, …
• First implementation: Tesla 20 series
(single chip C2050/2070, 4 chip
S2050/2070)
3 billion transistor chip?
Number of cores limited by power
considerations, C2050 has 448 * Whitepaper NVIDIA’s Next Generation CUDA 14
Compute Architecture: Fermi, NVIDIA, 2008
cores.
GPU performance gains over CPUs
1400
T12

1200
NVIDIA GPU GT200
1000
Intel CPU
800
GFLOPs

G80
600

400
G70 3GHz Xeon
Westmere
200 NV40
3GHz Core2 Quad
NV30 3GHz Dual Duo
Core P4

0
9/22/2002 2/4/2004 6/18/2005 10/31/2006 3/14/2008 7/27/2009
15
Source © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007-2009
ECE 498AL Spring 2010, University of Illinois, Urbana-Champaign
NVIDIA Kepler architecture and GPUs
(2012+)
GK104 chip with 1536 cores

A lot of major new features

over earlier Fermi
architecture
K10/GK104 1536 cores
K20/GK110 2496 cores
K40/GK180 2880 cores

CUDA Computer Capability 3.0

see next

http://www.tomshardware.com/news/
Nvidia-Kepler-GK104-GeForce-GTX- 16
670-680,14691.html
NVIDA GPUs

Stream processing -- Term used to denote processing of

a stream of instructions operating in a data parallel
fashion.

Stream Processors (SPs) – theeexecution cores that will

execute the stream. Each stream processor has compute
resources such as register file, instruction scheduler, …

Streaming multiprocessors (SMs) -- groups of streaming

processors that shares control logic and cache.
NVIDIA C2050
(as on coit-grid06.uncc.edu and cci-grid07)

•
14 streaming multiprocessor (SMs)
•
Each streaming multiprocessor has 32 streaming
processor (SPs)
•
So 448 streaming processor (cores)

Apparently Fermi was originally intended to have

512 cores (16 SM) but design got too hot.

18
NVIDIA K20
(as on coit-grid08)

•
13 streaming multiprocessor (SMXs, extreme)
•
Each streaming multiprocessor has 192
streaming processor (SPs)
•
So 2496 streaming processor (cores)

Actually 15 SMs (2880 core) fabricated on chip to

improve yield.

19
CUDA
(Compute Unified Device Architecture)
• Architecture and programming model introduced in NVIDIA in 2007
• Enables GPUs to execute programs written in C.
• Within C programs, call SIMT “kernel” routines that are executed on
GPU.
• CUDA syntax extension to C identify routine as a Kernel.
• Very easy to learn although to get highest possible execution
performance requires understanding of hardware architecture.
• Version 3 introduced 2009
• Version 4 introduced 2011 – significant additions including “unified
virtual addressing” – a single address space across GPU and host.
• Most recent version 5.5 introduced July 2013
• We will go into CUDA in detail shortly and have programming 20
Questions

Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
1 Cuda
100% (1)
1 Cuda
173 pages
NVIDIA GPU Computing - A Journey From PC Gaming To Deep Learning
100% (1)
NVIDIA GPU Computing - A Journey From PC Gaming To Deep Learning
91 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
21 pages
GPU Architecture
33% (3)
GPU Architecture
28 pages
AHA U4
No ratings yet
AHA U4
199 pages
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
100% (1)
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
29 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
UNIT 4 GPU Computing - HPC
No ratings yet
UNIT 4 GPU Computing - HPC
13 pages
CUDA
No ratings yet
CUDA
46 pages
06 Intro Gpus
No ratings yet
06 Intro Gpus
33 pages
Programming For Graphics Processing Units (Gpus) : Parallel
No ratings yet
Programming For Graphics Processing Units (Gpus) : Parallel
35 pages
p10 Cuda
No ratings yet
p10 Cuda
28 pages
Bava Kalai Final
No ratings yet
Bava Kalai Final
235 pages
GPU Gems2 ch29
No ratings yet
GPU Gems2 ch29
21 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
247 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
The Evolution of Gpus For General Purpose Computing
No ratings yet
The Evolution of Gpus For General Purpose Computing
38 pages
Gpu IEEE Paper
No ratings yet
Gpu IEEE Paper
14 pages
Part1 22
No ratings yet
Part1 22
77 pages
Gpu Cuda Part1
No ratings yet
Gpu Cuda Part1
27 pages
GPU (Graphics Processing Unit)
No ratings yet
GPU (Graphics Processing Unit)
23 pages
Report On Gpu
No ratings yet
Report On Gpu
39 pages
Universidad Nacional Mayor de San Marcos: Arquitectura de Computadoras Mg. Juan Carlos Gonzales Suarez 2019 - I
No ratings yet
Universidad Nacional Mayor de San Marcos: Arquitectura de Computadoras Mg. Juan Carlos Gonzales Suarez 2019 - I
22 pages
Lecture-12-PDC - CUDA
No ratings yet
Lecture-12-PDC - CUDA
25 pages
GPU Architecture
No ratings yet
GPU Architecture
12 pages
0 Gpu Computing I Give It
No ratings yet
0 Gpu Computing I Give It
57 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
Gpus
No ratings yet
Gpus
32 pages
GPU Based Super Computer: By: Adam Powell Student # 3198371 For COSC 3P93
No ratings yet
GPU Based Super Computer: By: Adam Powell Student # 3198371 For COSC 3P93
13 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
Literature Review On Dell Laptops
100% (2)
Literature Review On Dell Laptops
8 pages
Introduction To Massively Parallel Computing
No ratings yet
Introduction To Massively Parallel Computing
44 pages
Whitepaper NVIDIA's Next Generation CUDA Compute Architecture
No ratings yet
Whitepaper NVIDIA's Next Generation CUDA Compute Architecture
21 pages
D&I of GPU Based Image Processing On CASE Cluster
No ratings yet
D&I of GPU Based Image Processing On CASE Cluster
28 pages
GPU Architecture and Function: Michael Foster and Ian Frasch
No ratings yet
GPU Architecture and Function: Michael Foster and Ian Frasch
35 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
GPGPU
No ratings yet
GPGPU
139 pages
NVIDIAFermiComputeArchitectureWhitepaper PDF
No ratings yet
NVIDIAFermiComputeArchitectureWhitepaper PDF
21 pages
Nvidia Cuda Arc
No ratings yet
Nvidia Cuda Arc
16 pages
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
No ratings yet
Evolution of The Graphics Process Units: Dr. Zhijie Xu Z.xu@hud - Ac.uk
24 pages
Gpgpu Workshop Cuda
No ratings yet
Gpgpu Workshop Cuda
10 pages
Lecture 1: An Introduction To CUDA: Mike Giles
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
40 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
36 pages
GPU Introduction
No ratings yet
GPU Introduction
52 pages
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
No ratings yet
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
21 pages
History and Evolution of Gpu Architecture: Chris Mcclanahan
No ratings yet
History and Evolution of Gpu Architecture: Chris Mcclanahan
7 pages
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
No ratings yet
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
2 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
Capacity Management Presentation
100% (4)
Capacity Management Presentation
57 pages
CUDA Wikipedia
No ratings yet
CUDA Wikipedia
10 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
Iphone 6s Invoice
No ratings yet
Iphone 6s Invoice
2 pages
Deployment Diagram
No ratings yet
Deployment Diagram
6 pages
Parallel Processing Using GPU's
No ratings yet
Parallel Processing Using GPU's
34 pages
LS6. Computer Operation Leah G
No ratings yet
LS6. Computer Operation Leah G
3 pages
Introduction To Mri - Powerpoint Presentation
No ratings yet
Introduction To Mri - Powerpoint Presentation
10 pages
LC Chicago
No ratings yet
LC Chicago
15 pages
Advanced Network Design-Assessment
0% (1)
Advanced Network Design-Assessment
19 pages
Introduction To Electronics
No ratings yet
Introduction To Electronics
29 pages
Features of Windows - 8 and 8.12037
No ratings yet
Features of Windows - 8 and 8.12037
43 pages
CypCar User Manual
No ratings yet
CypCar User Manual
77 pages
Security Sensornet
No ratings yet
Security Sensornet
25 pages
Putty 040319
No ratings yet
Putty 040319
119 pages
Unit 1
No ratings yet
Unit 1
70 pages
OCI Foundations QnA
No ratings yet
OCI Foundations QnA
14 pages
Unit 7
No ratings yet
Unit 7
53 pages
Coding Question of Python
No ratings yet
Coding Question of Python
19 pages
Front Cover: BRMS For IBM I, Including Cloud Storage Solutions For I
No ratings yet
Front Cover: BRMS For IBM I, Including Cloud Storage Solutions For I
19 pages
760GM-GS3 multiQIG
No ratings yet
760GM-GS3 multiQIG
155 pages
Zynq-7000 Ap Soc and 7 Series Devices Memory Interface Solutions (V4.0)
No ratings yet
Zynq-7000 Ap Soc and 7 Series Devices Memory Interface Solutions (V4.0)
13 pages
Tesla Learner Sample
No ratings yet
Tesla Learner Sample
27 pages
Project Presentation Template
No ratings yet
Project Presentation Template
26 pages
ManageVMware Vsphere With ManageIQ
No ratings yet
ManageVMware Vsphere With ManageIQ
36 pages
EdisonElecChair Copowski
No ratings yet
EdisonElecChair Copowski
22 pages
GLC Detector Upgrade
No ratings yet
GLC Detector Upgrade
18 pages
Rosca A
No ratings yet
Rosca A
16 pages
Working With MySQL
No ratings yet
Working With MySQL
10 pages
LAN Redundancy
No ratings yet
LAN Redundancy
39 pages
The Equations of Motion 1
No ratings yet
The Equations of Motion 1
15 pages
Igor - T Platforms IDC Workshop - v1.0
No ratings yet
Igor - T Platforms IDC Workshop - v1.0
14 pages
Mindconnect Nano: Product Sheet and Specific Terms
No ratings yet
Mindconnect Nano: Product Sheet and Specific Terms
5 pages
1601 PerformanceTuningAndBestPracticesForGoogleBigQueryV2Connector en H2L
No ratings yet
1601 PerformanceTuningAndBestPracticesForGoogleBigQueryV2Connector en H2L
11 pages
Inspired by Industry 9 - Tesla 1
No ratings yet
Inspired by Industry 9 - Tesla 1
12 pages
dc04 2
No ratings yet
dc04 2
10 pages
The BlackHOLEtm
No ratings yet
The BlackHOLEtm
9 pages
LED LCD Monitor (LED Monitor ) : Owner'S Manual
No ratings yet
LED LCD Monitor (LED Monitor ) : Owner'S Manual
42 pages
Introduction
No ratings yet
Introduction
7 pages
Ui22Cs57 Lab 5 Tejkumar
No ratings yet
Ui22Cs57 Lab 5 Tejkumar
8 pages
Lesson 9 - Types of Computers
No ratings yet
Lesson 9 - Types of Computers
5 pages
Msec 5
No ratings yet
Msec 5
2 pages
FSAnderson APS-2005
No ratings yet
FSAnderson APS-2005
1 page
FPGA Implementation of AES Key Expansion Algorithm in Fully Pipelined and Loop Unrolled Architectures
No ratings yet
FPGA Implementation of AES Key Expansion Algorithm in Fully Pipelined and Loop Unrolled Architectures
4 pages
So-06202401564 PT Cablenet Fiber Data
No ratings yet
So-06202401564 PT Cablenet Fiber Data
1 page
икт 9 неделя
No ratings yet
икт 9 неделя
5 pages
Archiware P5 Configuration
No ratings yet
Archiware P5 Configuration
9 pages
The Mark: S Ábado, 11 de Enero de 2020
No ratings yet
The Mark: S Ábado, 11 de Enero de 2020
3 pages
Pengaturcaraan Komputer
No ratings yet
Pengaturcaraan Komputer
6 pages
Advantages of Using MS Word
No ratings yet
Advantages of Using MS Word
1 page
Software Architecture: Eucalyptus
No ratings yet
Software Architecture: Eucalyptus
3 pages
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
From Everand
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
Robert Johnson
No ratings yet
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
From Everand
GameCube Architecture: Architecture of Consoles: A Practical Analysis, #10
Rodrigo Copetti
No ratings yet
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet
Mega Drive Architecture: Architecture of Consoles: A Practical Analysis, #3
From Everand
Mega Drive Architecture: Architecture of Consoles: A Practical Analysis, #3
Rodrigo Copetti
No ratings yet
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet

GPUIntro

Uploaded by

GPUIntro

Uploaded by

Emergence of GPU systems

for general purpose high

ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Nov 4, 2013

18,688 NVIDIA Tesla

Upgraded from Jaguar

10 times faster and 5

K20 – 2496 thread processors

2013: K40 – 2880 thread

2011: NVIDIA kindly provided 50 GTX 480 GPU cards valued at

2013: Xeon Phi processor with 60 cores is described as a co-processor although

By late1990’s, graphics chips

Individual stages may have access Rasterizer stage

High performance pipelines call for high-speed (IEEE) floating point

People tried to use GPU cards to speed up scientific computations

Known as GPGPU (General-purpose computing on graphics

By mid 2000’s, recognized that individual stages of graphics pipeline

DirectX graphics API

1970 1980 1990 2000 2010

GeForce 2 series GeForce FX series

Could now write

A lot of major new features

CUDA Computer Capability 3.0

Stream processing -- Term used to denote processing of

Stream Processors (SPs) – theeexecution cores that will

Streaming multiprocessors (SMs) -- groups of streaming

Apparently Fermi was originally intended to have

Actually 15 SMs (2880 core) fabricated on chip to

You might also like