Volume Tiled Forward Shading
Volume Tiled Forward Shading
Forward Shading
JEREMIAH VAN OOSTEN – 3910539 - INFOMGMT
Abstract
Forward Rendering
Deferred Shading
Tiled Forward Shading
Clustered Forward Shading
Forward Rendering
Thread Dispatch
Coalesced Access to Global Memory
Avoid Bank Conflicts to Shared Memory
Thread Dispatch
Reduction
Scan
Parallel Reduction
Would produce
Sorting
Radix Sort
Merge Sort
Radix Sort
BVH Basics
BVH Construction
BVH Traversal
BVH Basics
The term cell refers to the area that is being checked for overlap
Uses a stack to push the index of the child node of the BVH if the
AABB of the node overlaps with the AABB of the cell
32-threads in a warp each perform the AABB intersection test
during traversal
If it is a leaf node, the AABB of the lights is checked against the
AABB of the cell
Volume Tiled Forward Shading
Initialize
Determine Grid Size
Compute AABBs for Volume Tiles
Update
Depth pre-pass
Mark tiles
Find unique tiles
Assign lights to tiles
Shade samples
Determine Grid Size
Record all of the opaque scene objects into the depth buffer
Required to ensure only visible samples are drawn in the next
pass…
Mark Active Tiles
Akeley, K., Akin, A., Ashbaugh, B., Beretta, B., Carmack, J., & Craighead, M. et al. (2007). ARB_vertex_program. Opengl.org. Retrieved 23 September 2016, from
https://www.opengl.org/registry/specs/ARB/vertex_program.txt
AMD Graphics Cores Next (GCN) Architecture. (2012) (1st ed.). Retrieved from https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
Andersson, J. (2009). Parallel Graphics in Frostbite – Current & Future. Presentation, Siggraph.
Balestra, C., & Engstad, P. (2008). The technology of uncharted: Drake’s fortune. Presentation, Game Developer Conference.
Beretta, B., Brown, P., Craighead, M., Everitt, C., Hart, E., & Leech, J. et al. (2013). ARB_fragment_program. OpenGL.org. Retrieved 23 September 2016, from
https://www.opengl.org/registry/specs/ARB/fragment_program.txt
Blelloch, G. (1989). Scans as primitive parallel operations. IEEE Transactions On Computers, 38(11), 1526-1538. http://dx.doi.org/10.1109/12.42122
Catmull, E. (1974). A Subdivision Algorithm for Computer Display of Curved Surfaces (Ph.D). University of Utah.
Clark, J. (1976). Hierarchical geometric models for visible surface algorithms. Communications Of The ACM, 19(10), 547-554. http://dx.doi.org/10.1145/360349.360354
CUDA C Best Practices Guide. (2016) (1st ed.). Retrieved from http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf
Deering, M., Winner, S., Schediwy, B., Duffy, C., & Hunt, N. (1988). The triangle processor and normal vector shader. ACM SIGGRAPH Computer Graphics, 22(4), 21-30.
http://dx.doi.org/10.1145/378456.378468
Dickau, R. (2008). Lebesgue 3D curve, iteration 2. Retrieved from https://commons.wikimedia.org/wiki/File:Lebesgue-3d-step2.png
Downloads. (2017). Crytek.com. Retrieved 4 January 2017, from http://www.crytek.com/cryengine/cryengine3/downloads
Ericson, C. (2005). Real-time collision detection. Amsterdam: Elsevier.
Foley, J., van Dam, A., Feiner, S., & Hughes, J. (1996). Computer Graphics: Principles and Practice (2nd ed.). Boston: Addison-Wesley.
Geldreich, R., & Pritchard, M. (2004). GDC Vault - Deferred Shading on DX9 Class Hardware and the Xbox. Gdcvault.com. Retrieved 27 September 2016, from
http://www.gdcvault.com/play/1015172/Deferred-Shading-on-DX9-Class
Green, O., McColl, R., & Bader, D. (2012). GPU merge path. Proceedings Of The 26Th ACM International Conference On Supercomputing - ICS '12.
http://dx.doi.org/10.1145/2304576.2304621
Harada, T. (2012). A 2.5D culling for Forward+. SIGGRAPH Asia 2012 Technical Briefs On - SA '12. http://dx.doi.org/10.1145/2407746.2407764
Harada, T., McKee, J., & Yang, J. (2012). Forward+: Bringing Deferred Lighting to the Next Level.
Hargreaves, S., & Harris, M. (2004). Deferred Shading. Presentation.
References
Harris, M., Sengupta, S., & Owens, J. (2008). Parallel Prefix Sum (Scan) with CUDA. In H. Nguyen, GPU Gems 3 (1st ed., pp. 871-873). Addison-Wesley.
Hillis, W., & Steele, G. (1986). Data parallel algorithms. Communications Of The ACM, 29(12), 1170-1183. http://dx.doi.org/10.1145/7902.7903
Howes, L. (2012). Making GPGPU Easier - Software and Hardware Improvements in GPU Computing. Presentation, University of Texas, Austin, Texas.
Karras, T. (2012). Thinking Parallel, Part II: Tree Traversal on the GPU. Parallel Forall. Retrieved 5 January 2017, from https://devblogs.nvidia.com/parallelforall/thinking-parallel-part-ii-
tree-traversal-gpu/
Lottes, T. (2009). FXAA. Santa Clara, California, USA: NVIDIA Corporation. Retrieved from http://developer.download.nvidia.com/assets/gamedev/files/sdk/11/FXAA_WhitePaper.pdf
McGuire, M. (2011). Meshes. Graphics.cs.williams.edu. Retrieved 2 June 2017, from http://graphics.cs.williams.edu/data/meshes.xml
McKee, J. (2012). Technology Behind AMD's "Leo Demo". Presentation, San Francisco, California.
Mittring, M. (2009). A bit more deferred - CryEngine 3. Presentation, Raleigh, North Carolina.
Morton, G. (1966). A computer oriented geodetic data base and a new technique in file sequencing (1st ed.). Ottawa: International Business Machines Co.
NVIDIA GeForce GTX 1080 Whitepaper. (2016) (1st ed.). Retrieved from http://international.download.nvidia.com/geforce-com/international/pdfs/
GeForce_GTX_1080_Whitepaper_FINAL.pdf
Olsson, O. (2015). Introduction to Real-Time Shading with Many Lights. Presentation.
Olsson, O., & Assarsson, U. (2011). Tiled Shading. Journal Of Graphics, GPU, And Game Tools, 15(4), 235-251. http://dx.doi.org/10.1080/2151237x.2011.621761
Olsson, O., Billeter, M., & Assarsson, U. (2012). Clustered Deferred and Forward Shading. In Eurographics/ ACM SIGGRAPH Symposium on High Performance Graphics. Eurographics:
The Eurographics Association. Retrieved from http://dx.doi.org/10.2312/EGGH/HPG12/087-096
Programming Guide :: CUDA Toolkit Documentation. (2016). Docs.nvidia.com. Retrieved 13 January 2017, from https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Rasterization Rules (Windows). (2017). Msdn.microsoft.com. Retrieved 10 July 2017, from
https://msdn.microsoft.com/en-us/library/windows/desktop/cc627092(v=vs.85).aspx#Multisample
Saito, T., & Takahashi, T. (1990). Comprehensible rendering of 3-D shapes. ACM SIGGRAPH Computer Graphics, 24(4), 197-206. http://dx.doi.org/10.1145/97880.97901
SAT (Separating Axis Theorem) – dyn4j. (2017). Dyn4j.org. Retrieved 10 July 2017, from http://www.dyn4j.org/2010/01/sat/
Satish, N., Harris, M., & Garland, M. (2009). Designing efficient sorting algorithms for manycore GPUs. 2009 IEEE International Symposium On Parallel & Distributed Processing.
http://dx.doi.org/10.1109/ipdps.2009.5161005
References
Segal, M., & Akeley, K. (1994). The OpenGL Graphics System: A Specification (1st ed.). Silicon Graphics, Inc. Retrieved from
https://www.opengl.org/registry/doc/glspec10.pdf
Segal, M., & Akeley, K. (2004). The OpenGL Graphics System: A Specification (2nd ed.). Silicon Graphics Inc. Retrieved from
https://www.opengl.org/registry/doc/glspec20.20041022.pdf
Shishkovtsov, O. (2006). Deferred Shading in S.T.A.L.K.E.R. In M. Pharr & R. Fernando, GPU Gems 2: Programming Techniques For
High-Performance Graphics And General-Purpose Computation (3rd ed.). Pearson Addison Wesley Prof. Retrieved from
http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter09.html
Singer, G. (2013). The History of the Modern Graphics Processor. TechSpot. Retrieved 2 September 2016, from
http://www.techspot.com/article/650-history-of-the-gpu
van der Leeuw, M. (2007). Deferred Rendering in Killzone 2. Presentation, Palo Alto, California.
van Oosten, J. (2011). Optimizing CUDA Applications - 3D Game Engine Programming. 3D Game Engine Programming. Retrieved
6 January 2017, from http://www.3dgep.com/optimizing-cuda-applications/
van Oosten, J. (2014). Introduction to DirectX 11. 3D Game Engine Programming. Retrieved 21 September 2016, from
http://www.3dgep.com/introduction-to-directx-11
van Oosten, J. (2015). Forward vs Deferred vs Forward+ Rendering with DirectX 11. 3D Game Engine Programming. Retrieved 29
September 2016, from http://www.3dgep.com/forward-plus
Wilt, N. (2013). The CUDA Handbook: A Comprehensive Guide to GPU Programming (1st ed., pp. 365-383). Addison-Wesley.
Young, E. (2010). DirectCompute Optimizations and Best Practices. Presentation, San Jose, California.
Zhang, H., Manocha, D., Hudson, T., & Hoff, K. (1997). Visibility culling using hierarchical occlusion maps. Proceedings Of The
24Th Annual Conference On Computer Graphics And Interactive Techniques - SIGGRAPH '97.
http://dx.doi.org/10.1145/258734.258781