GPU Architecture and Function: Michael Foster and Ian Frasch
GPU Architecture and Function: Michael Foster and Ian Frasch
Function
Michael Foster and Ian Frasch
Overview
● What is a GPU?
● How is a GPU different from a CPU?
● The graphics pipeline
● History of the GPU
● GPU architecture
● Optimizations
● GPU performance trends
● Current development
What is a GPU?
● Dedicated graphics chip that handles
all processing required for rendering
3D objects on the screen
● Typically placed on a video card,
which contains its own memory and
display interfaces (HDMI, DVI, VGA,
etc)
● Primitive GPUs were developed in
the 1980s, although the first
“complete” GPUs began in the mid
1990s.
Systems level view
● Video card connected to
motherboard through PCI-
Express or AGP (Accelerated
Graphics Port)
● Northbridge chip enables data
transfer between the CPU and
GPU
● Graphics memory on the video
card contains the pixel RGB
data for each frame
How is a GPU different from a CPU?
Throughput more important than latency
o High throughput needed for the huge amount of
computations required for graphics
o Not concerned about latency because human visual
system operates on a much longer time scale
16 ms maximum latency at 60 Hz refresh rate
Long pipelines with many stages; a single instruction may
thousands of cycles to get through the pipeline.
Latency
How is a GPU different from a CPU?
Extremely parallel
o Different pixels and elements of the image can be
operated on independently
o Hundreds of cores executing at the same time to
take advantage of this fundamental parallelism
Inputs and Outputs
Inputs to GPU (from the CPU/memory):
● Vertices (3D coordinates) of objects
● Texture data
● Lighting data
Outputs from GPU:
● Frame buffer
o Placed in a specific section of graphics memory
o Contains RGB values for each pixel on the screen
o Data is sent directly to display
The Graphics Pipeline: A Visual
3D coordinates
/Pixel Shader
The Graphics Pipeline
● The GPU completes every
stage of this computational
pipeline
Transformations
Camera transformation
o Convert vertices from 3D world
coordinates to 3D camera
coordinates, with the camera
(user view) as the origin
Projection transformation
o Convert vertices from 3D camera
coordinates to 2D screen view
coordinates that the user will see
Illustration of 3D-2D Projection
(With overlapping vertices)
Depth values in Z buffer
determine which triangle
will be visible if two
vertices map to the same
2D coordinate
Transformations
● These transformations simply modify
vertices, so they are done by vertex shaders
● Transform computations are heavy on matrix
multiplication
● Each vertex can be transformed
independently
o Data Parallelism
Example renderings
(programmable)
Shader processors
C. M. Wittenbrink, E. Kilgariff, and A. Prabhu. (2011, April). IEEE Micro. [Online]. 31(2), pp. 50 - 59. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5751939
D. Luebke and G. Humphreys, “How GPUs Work,” IEEE Computer. [Online]. vol. 40, no. 2, pp. 96-100, Feb. 2007. Available:
http://www.cs.virginia.edu/~gfx/papers/pdfs/59_HowThingsWork.pdf
B. Mederos, L. Velho, and L. H. de Figueiredo, “Moving least squares multiresolution surface approximation,” Proc. XVI Brazilian Symp.
Computer Graphics and Image Processing, [Online]. Oct. 2003, pp 19-26. Available: http://w3.impa.br/~boris/mederosb_moving.pdf
K. Hagen. (2014, July 23). Introduction to Real-Time Rendering. [Online]. Available: http://www.slideshare.net/korayhagen/introduction-to-real-
time-rendering
D. Luebke. (2007). GPU Architecture: Implications & Trends. [Online]. Available: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Sources (continued)
M. Houston and A. Lefohn. (2011). GPU architecture II: Scheduling the graphics pipeline. [Online]. Available:
https://courses.cs.washington.edu/courses/cse558/11wi/lectures/08-GPU-architecture-II_BPS-2011.pdf
J. Ragan-Kelley. (2010, July 29). Keeping Many Cores Busy: Scheduling the Graphics Pipeline. [Online]. Available:
http://bps10.idav.ucdavis.edu/talks/09-raganKelley_SchedulingRenderingPipeline_BPS_SIGGRAPH2010.pdf
B. C. Johnstone, “Bandwidth Requirements of GPU Architectures,” M. S. thesis, Dept. Comp. Eng., Rochester Institute of Technology,
Rochester, NY, 2014.
J. D. Owens, M. Houston, D. Luebke, and S. Green, (2008). “GPU Computing,” Proc. IEEE, vol. 96, no. 5, pp. 879-899, [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4490127&tag=1
T. Dalling. (2014, Feb. 24). “Explaining Homogeneous Coordinates & Projective Geometry,” [Online]. Available:
http://www.tomdalling.com/blog/modern-opengl/explaining-homogenous-coordinates-and-projective-geometry/
NVIDIA, (2009). "Whitepaper: NVIDIA's next generation CUDA compute architecture: Fermi," [Online]. Available:
http://www.nvidia.com/content/pdf/fermi_white_papers/nvidiafermicomputearchitecturewhitepaper.pdf
C. McClanahan. (2010). History and Evolution of GPU Architecture: A Paper Survey. [Online]. Available: http://mcclanahoochie.com/blog/wp-
content/uploads/2011/03/gpu-hist-paper.pdf
J. Bikker. (2014). Graphics: Universiteit Utrecht - Information and Computing Sciences. [Online]. Available:
http://www.cs.uu.nl/docs/vakken/gr/2015/index.html
Sources (continued)
K. Rupp. (2014, June 21). “CPU, GPU and MIC Hardware Characteristics over Time,” [Online]. Available: http://www.karlrupp.net/2013/06/cpu-
gpu-and-mic-hardware-characteristics-over-time/
T. S. Crow, “Evolution of the Graphical Processing Unit,” M. S. thesis, Dept. Comp. Science, University of Nevada, Reno, NV, 2004.
HD2000 - The First GPU’s under the AMD Name. (2007, May 14) [Online]. Available: http://www.bjorn3d.com/2007/05/hd2000-the-first-gpus-
under-the-amd-name-2/
E. Kelgariff and R. Fernando. (2005). “Chapter 30. The GeForce 6 Series GPU Architecture,” GPU Gems 2. [Online]. Available:
http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter30.html
P. N. Glaskowsky. (2009, Sept.). NVIDIA’s Fermi: The First Complete GPU Computing Architecture. [Online]. Available:
http://www.nvidia.com/content/PDF/fermi_white_papers/P.Glaskowsky_NVIDIA%27s_Fermi-The_First_Complete_GPU_Architecture.pdf