0% found this document useful (0 votes)
83 views

Chapt 07

The document discusses the history and current state of parallelism in computing. It describes how parallelism has evolved from single processors to multi-core CPUs and many-core GPUs. It also provides examples of how parallel techniques are used for animation processing and geometry processing on the Cell Broadband Engine.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Chapt 07

The document discusses the history and current state of parallelism in computing. It describes how parallelism has evolved from single processors to multi-core CPUs and many-core GPUs. It also provides examples of how parallel techniques are used for animation processing and geometry processing on the Cell Broadband Engine.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 208

Current Generation Parallelism

In Games

Jon Olick
id Software

Beyond Programmable Shading: In Action


Brief History of Parallelism
• 1 Processor
– The good old days.
– Why parallelize? Just wait a little and your programs
will get faster.

Beyond Programmable Shading: In Action


Brief History of Parallelism
• 2 to 3 Processors
– Logical splitting of game process into pipelined
pieces.
• Game
• Rendering
• Sound
• Loading/Decompression

Beyond Programmable Shading: In Action


Brief History of Parallelism
• About 6 to 8 Processors
– The transition to a job scheduling type architecture
– 1st order parallelism
• Game
• Rendering
• Sound
• Physics
• Collision
• Loading/Decompression
• Etc…

Beyond Programmable Shading: In Action


Brief History of Parallelism
• About 8 to 16 Processors
– End of CPU history.
– Enter 1998 in GPU history.
• Approx # of processors as average parallel scalar
operations.
– 2nd order parallelism
– Jobs which create and manage the resources of
other jobs.
• GPU Command Processor (DMA engine)

Beyond Programmable Shading: In Action


Brief History of Parallelism
• About 16+ processors
– 3rd order parallelism
– GPU Vertex Processors
– Jobs which create and manage the resources of
other jobs which create and manage the resources of
other jobs

Beyond Programmable Shading: In Action


Brief History of Parallelism
Riva 128 Riva TNT GeForce 256
NVidia
300

250

200
Approx # of CPUs

150
Nvidia

100

50

0
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

Beyond Programmable Shading: In Action


Current State of Parallelism
• Desktop Processors
– Intel Core 2 Quad, 4 processors, 3.2 ghz, 102 Gflops
• Soon to be 8 core?
• Multimedia Processors
– Cell Processor - 8 processors - 3.2 ghz - 192.0 Gflops
• 1 main, 7 co-processors
• Graphics Accelerators
– GTX 280 - 1.296 ghz – 0.933 Tflops
• 240 stream processors

Beyond Programmable Shading: In Action


CELL BROADBAND ENGINE™

Beyond Programmable Shading: In Action


PLAYSTATION®3 Processor Overview
• Game
• Animation
• Geometry Processing
• Post Processing
• Occlusion Rasterization
• Sorting
• Collision Detection
• Fourier Transform
• (De)Compression
• Not going to cover all of these…
Beyond Programmable Shading: In Action
PLAYSTATION®3 Processor Overview
• Parallelize ordinarily sequential CPU
processing
• Assist in what is typically considered GPU
processing

Beyond Programmable Shading: In Action


Primary Programming Challenges
• Fitting code and data in the 256k local co-
processor memory
– Best solutions are ones that don't treat the 256k
local store as a typical on demand caching
architecture
• Scattered reads/writes bad, sequential reads/writes
good
• Software Pipelining
• Only 16 byte aligned reads/writes
• Synchronization

Beyond Programmable Shading: In Action


MD6 ANIMATION PROCESSING

Beyond Programmable Shading: In Action


MD6 Animation Processing

Game
Logic

Beyond Programmable Shading: In Action


MD6 Animation Processing

Game
Logic

Blending Tree
Generation

Beyond Programmable Shading: In Action


MD6 Animation Processing

Game
Logic

Blending Tree
Generation

Low Level Operation


List Generation

Beyond Programmable Shading: In Action


MD6 Animation Processing

Game
Logic

Blending Tree
Generation

Low Level Operation


List Generation

Low Level Operation Execution

Beyond Programmable Shading: In Action


MD6 Animation Processing

Serial
Game
Logic

Blending Tree
Generation

Low Level Operation


List Generation
Parallel

Low Level Operation Execution

Beyond Programmable Shading: In Action


MD6 Animation Webs
• Separates Thinking from Representation
– Game Object says what it wants to look like.
– Animation Webs take care of the rest.
• Unstructured graph
– Each node has a blend tree
• Designed with simplicity in mind
– Animators should animate, not fiddle with nodes.
– Extract as much information as possible directly
from the animation data.

Beyond Programmable Shading: In Action


MD6 Animation Processing
• Additive Blending
• Subtractive Blending
• Animation Algebra
– Blend Equations
• Animation blending trees in the form of an equation.
• Example equation:
– (animA + animB) – animC

Beyond Programmable Shading: In Action


Partial Animation Blending
• Generalized play an animation only on the
face, torso, etc…
• One weight per joint per animation
• Compute alpha for slerp via following
equation:
– For each joint
• Let w0 = weight of joint in animation A
• Let w1 = weight of joint in animation B
• If(w1 > w0)
– Let alpha = (alpha * w1) / w0
• Else
– Let alpha = ((w1 – w0) + alpha * w0) / w1
Beyond Programmable Shading: In Action
Varying parameter treatment

Beyond Programmable Shading: In Action


Varying parameter treatment

Beyond Programmable Shading: In Action


Varying parameter treatment

16k 16k 16k


Beyond Programmable Shading: In Action
GEOMETRY PROCESSING

Beyond Programmable Shading: In Action


Two modes of usage
• Primary mode
– Use offline tools
– Partition into vertex sets
– Use indexed triangles
– All features of pipeline can be used

SPU

Beyond Programmable Shading: In Action


Two modes of usage (cont)
• Secondary mode
– Data generated by other tools
– Formats other than indexed triangles
– Non-partitioned objects
– Subset of pipeline features can be used

SPU

Beyond Programmable Shading: In Action


SPU Geometry Pipeline Stages

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action


Vertex Decompression

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action


Vertex Attributes

Unique Vertex
Array 0

Instance Vertex
Array 1

Beyond Programmable Shading: In Action


Vertex Decompression

Unique Vertex Float Tables


Array 0

Instance Vertex
Array 1

Beyond Programmable Shading: In Action


24bit Unit Vector

• Smallest 2 compression
– Two smallest components with 10 bits each
• Encoded from –sqrt(2)/2 to +sqrt(2)/2
– Largest component reconstructed via
• Largest = sqrt(1 – smallestA2 – smallestB2)
• One additional bit for sign of largest component.

Beyond Programmable Shading: In Action


24bit Unit Vector

• Smallest 2 compression
– Two smallest components with 10 bits each
• Encoded from –sqrt(2)/2 to +sqrt(2)/2
– Largest component reconstructed via
• Largest = sqrt(1 – smallestA2 – smallestB2)
• One additional bit for sign of largest component.
• One more bit to represent W as +1 or -1
– For constructing bi-normal from normal and tangent.

Beyond Programmable Shading: In Action


N-bit Fixed Point with integer offsets

• Simple n.x fixed point values


– Per-segment integer offset
• Bit count may vary from attribute to attribute

Beyond Programmable Shading: In Action


Index Decompression

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action


Index Table Construction
• Index table is created by a vertex cache
optimizer
– Based on K-cache algorithm
• Number of vertex program outputs affects
Vertex Cache size.
• Four vertex mini cache most important
optimization factor

Beyond Programmable Shading: In Action


Index Cache Optimizer
• Our vertex cache optimizer produces very
regular index data

0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11
9 8 11

Beyond Programmable Shading: In Action


Index Decompression
• Our vertex cache optimizer produces very
regular index data
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11
9 8 11

Beyond Programmable Shading: In Action


Index Decompression

2
1
Triangle Indexes
0 1 2
Beyond Programmable Shading: In Action
Index Decompression

1
0
Triangle Indexes
2 0 1
Beyond Programmable Shading: In Action
Index Decompression
• Before Rotation

0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11
9 8 11

Beyond Programmable Shading: In Action


Index Decompression
• After Rotation

0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
6 8 9
9 11 10
11 9 8

Beyond Programmable Shading: In Action


Index Decompression

Beyond Programmable Shading: In Action


Index Decompression

85% compression
6.5 : 1

Beyond Programmable Shading: In Action


Blend Shapes

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action


Blend Shapes in MLB® 08 The Show

Beyond Programmable Shading: In Action


Skinning

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action


Skinning on SPUs

void SkinVs(float4 inPosition : ATTR0, float4 weights : ATTR3,


float4 matrixIndex : ATTR4,
out float4 position : POSITION,
uniform float4 joints[72], uniform float4x4 modelViewProj)
{
position = 0;
for (int i = 0; i < 4; i++)
{
float idx = matrixIndex[i];
float3x4 joint = float3x4(joints[idx+0], joints[idx+1],
joints[idx+2]);
position += weights[i] * mul(joint, inPosition);
}
position = mul(modelViewProj, position);
}

Beyond Programmable Shading: In Action


Skinning on SPUs

30% Performance Improvement

Beyond Programmable Shading: In Action


Skinning on SPUs

30% Performance Improvement

Shadow map generation.... 70%!

Beyond Programmable Shading: In Action


Discrete Progressive Mesh

SPU Pipeline • Smoothly reduces the


Vertex Decompress triangle count as a model
Index Decompress moves into the distance
Blend Shapes • With discrete progressive
Skinning mesh, the LOD calculation is
done once for an entire
Progressive Mesh
object
Triangle Culling

Compression

Output

Beyond Programmable Shading: In Action


At an LOD there are two types of
vertexes

Parent Vertex
Child Vertex

LOD = 0.0

Beyond Programmable Shading: In Action


As the LOD level decreases, the
children “slide” towards their parents

Parent Vertex
Child Vertex

LOD = 0.2

Beyond Programmable Shading: In Action


The children continue to move towards
their parents

Parent Vertex
Child Vertex

LOD = 0.7

Beyond Programmable Shading: In Action


At the next integral LOD, all child
vertexes disappear as do the triangles

Parent Vertex
Child Vertex

LOD = 1.0

Beyond Programmable Shading: In Action


Continuous Progressive Mesh

SPU Pipeline • Like discrete progressive


Vertex Decompress mesh, child vertexes move
Index Decompress smoothly toward their
Blend Shapes parents
Skinning • However, the LOD is
calculated for each vertex
Progressive Mesh
instead of just once for the
Triangle Culling object
Compression

Output

Beyond Programmable Shading: In Action


Vertex set about to undergo continuous
progressive mesh

Parent Vertex
Child Vertex, LOD 1
Child Vertex, LOD 0

Beyond Programmable Shading: In Action


A single vertex set can straddle several
LOD ranges

Parent Vertex
Child Vertex, LOD 1
Child Vertex, LOD 0

LOD = 1.0

LOD = 0.0

Beyond Programmable Shading: In Action


Vertexes move depending on their
distance

Parent Vertex
Child Vertex, LOD 1
Child Vertex, LOD 0

LOD = 1.0

LOD = 0.0

Beyond Programmable Shading: In Action


Triangle Culling

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action


Up to 70% of triangles do not contribute to
final image.

Beyond Programmable Shading: In Action


Off Screen Triangles

Beyond Programmable Shading: In Action


Back Facing Triangles

Beyond Programmable Shading: In Action


Zero Area Triangles

Beyond Programmable Shading: In Action


Zero Area Triangles

Beyond Programmable Shading: In Action


No Pixel Triangles

Beyond Programmable Shading: In Action


Triangle Culling

Beyond Programmable Shading: In Action


Multisampling adds some complications…

Beyond Programmable Shading: In Action


Culled

Beyond Programmable Shading: In Action


Triangle Culling

10% to 20%
Performance Improvement

Beyond Programmable Shading: In Action


Compression for Output

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes

Skinning

Progressive Mesh

Triangle Culling
Compression

Output

Beyond Programmable Shading: In Action


Float Tables

Beyond Programmable Shading: In Action


When done, the vertex attributes are
compressed into one output stream

Float Tables

Output
Vertex Array

Beyond Programmable Shading: In Action


Output Buffering Schemes

SPU Pipeline
Vertex Decompress

Index Decompress

Blend Shapes
Skinning

Progressive Mesh

Triangle Culling

Compression

Output

Beyond Programmable Shading: In Action


Double Buffer

• Each buffer stores vertex


Vertex Vertex and index data for an entire
and and frame
Index Index
Data Data • SPUs atomically access a
for for mutex which is used to
Frame 0 Frame 1
allocate memory from a
buffer
• Uses lots of memory

Beyond Programmable Shading: In Action


It is possible to completely fill a buffer

Vertex
and
Index
Data

Data SPU

Beyond Programmable Shading: In Action


Double buffering adds a frame of lag
Standard Pipeline
Build Jobs Render on
Scan Out
on PPU RSX™
Build Jobs Render on
Scan Out
on PPU RSX™
Build Jobs Render on
Scan Out
on PPU RSX™

Build Jobs Process Jobs Render on


Scan Out
on PPU on SPU RSX™
Build Jobs Process Jobs Render on
Scan Out
on PPU on SPU RSX™
Build Jobs Process Jobs Render on
Scan Out
on PPU on SPU RSX™

Pipeline with Double Buffered SPU Processing

Beyond Programmable Shading: In Action


Single Buffering

• Uses only half the memory!


Vertex • Still possible to completely
and
Index fill the buffer
Data
for
Single
Frame

Beyond Programmable Shading: In Action


Single Buffering has a shorter pipeline

Build Jobs on SPU Processing/


Scan Out
PPU RSX™ Rendering
Build Jobs on SPU Processing/
Scan Out
PPU RSX™ Rendering
Build Jobs on SPU Processing/
Scan Out
PPU RSX™ Rendering

• Vertex and index data is created just-in-time


for the RSX™
• Draw commands are inserted into the
command buffer while the RSX™ is rendering
• Requires tight SPU↔RSX™ synchronization

Beyond Programmable Shading: In Action


SPU↔RSX™ Synchronization
Using Local Stalls

Command  Place local stalls in the command buffer


Buffer where necessary
Draw 17  RSX™ will stop processing at a local stall
Local Stall until it is overwritten by new commands
Local Stall  SPUs will generally stay ahead of the
RSX™, so stalls rarely occur
Local Stall

Local Stall

Local Stall

Local Stall

Local Stall

Other

Beyond Programmable Shading: In Action


SPU will overwrite local stalls when it
outputs a set of new commands

Command
Buffer
Draw 17
SPU
Local Stall

Local Stall

Local Stall
New
Commands
Local Stall

Local Stall

Local Stall

Local Stall

Put
Other
Pointer

Beyond Programmable Shading: In Action


Ring Buffers

Data
Start of • Small memory footprint
Free Area
• Will not run out of memory

Vertex End of
and Free Area
Index

Beyond Programmable Shading: In Action


RSX™ writes a semaphore once a
chunk of data has been consumed

• A command to write a semaphore needs to be


added to the command buffer after all
commands that use the data
– The value of the semaphore to be written is the new
end of free area pointer

Start of
Data 19 Free Area
Draw 5

Current
Semaphore New End
RSX™ of Free
Draw 6 Data 6
Execution Area
Semaphore

Draw 7 Data 14

Semaphore

Beyond Programmable Shading: In Action


Each SPU has its own buffer

Buffer 0 Buffer 1

Data 22
SPU 0
Data 8
Buffer 3
Buffer 2 Data 11 SPU 1 Data 13
Data 9 Data 17

Data 12 Data 16 Data 18


SPU 2 Data 23

Data 21
Data 19 SPU 3
Data 10
Data 7
SPU 4
Data 6 Data 15
SPU 5
Data 14 Data 20

Buffer 4 Buffer 5
Beyond Programmable Shading: In Action
Geometry Performance

Beyond Programmable Shading: In Action


Software Pipelined C with SPU Intrinsics

do
{
m1 = in1;
in1 = si_lqx(pIn1, offset);
m2 = in2;
in2 = si_lqx(pIn2, offset);
m3 = in3;
in3 = si_lqx(pIn3, offset);
temp2 = si_selb(m3, m1, mask_0X00);
si_stqx(out1, pOut1, offset);
temp3 = si_selb(m2, m1, mask_00X0);
si_stqx(out2, pOut2, offset);
temp1 = si_selb(m1, m2, mask_0X00);
si_stqx(out3, pOut3, offset);
offset = si_ai(offset, 0x30);
out2 = si_shufb(m2, temp2, qs_bCaD);
out1 = si_selb(temp1, m3, mask_00X0);
out3 = si_shufb(m3, temp3, qs_caBD);
} while(si_to_int(offset) != 0);
Beyond Programmable Shading: In Action
Software Pipelined C with SPU Intrinsics

do
{
m1 = in1;
in1 = si_lqx(pIn1, offset);
m2 = in2;
in2 = si_lqx(pIn2, offset);
m3 = in3;
in3 = si_lqx(pIn3, offset);
Up to 20x faster
than naive C/C++
temp2 = si_selb(m3, m1, mask_0X00);
si_stqx(out1, pOut1, offset);
temp3 = si_selb(m2, m1, mask_00X0);
si_stqx(out2, pOut2, offset);
temp1 = si_selb(m1, m2, mask_0X00);
si_stqx(out3, pOut3, offset);
offset = si_ai(offset, 0x30);
out2 = si_shufb(m2, temp2, qs_bCaD);
out1 = si_selb(temp1, m3, mask_00X0);
out3 = si_shufb(m3, temp3, qs_caBD);
} while(si_to_int(offset) != 0);
Beyond Programmable Shading: In Action
1 SPU

Beyond Programmable Shading: In Action


1 SPU

800,000+
Triangles Per Frame
at 60 Frames per Second

Beyond Programmable Shading: In Action


1 SPU

800,000+
Triangles Per Frame
at 60 Frames per Second

60% of which are culled!

Beyond Programmable Shading: In Action


Next Generation Parallelism
In Games

Jon Olick
id Software

Beyond Programmable Shading: In Action


GAME ENTITY PROCESSING

Beyond Programmable Shading: In Action


Game Entity Processing
• Current Generation
– Serial Processing of entities in a giant for loop.

for(int i = 0; i < numEntities; ++i) {


entity[i]->Think();
}

Beyond Programmable Shading: In Action


Game Entity Processing
• Current Generation
– Serial Processing of entities in a giant for loop.
• Next Generation
– Parallelism via Double Buffering
– Every entity runs in parallel with each other with no
dependency stalls.
– Each entity can only read from previous frame’s
results
– Each entity can only write to itself

Beyond Programmable Shading: In Action


Game Entity Processing
• Record the progress of the game and replay
to debug.
• Single thread and randomize processing of
entities to help find bugs.
• Can protect memory so that bad accesses
cause exceptions to enforce double
buffering rules.

Beyond Programmable Shading: In Action


Game Entity Processing
• What about entities which have dependant
entities?
• Bucketing and Synchronization Points

Beyond Programmable Shading: In Action 96


RAY
CASTING
THE NEXT GENERATION

Beyond Programmable Shading: In Action


Why Ray Casting?

Beyond Programmable Shading: In Action


Why Ray Casting?
• A good question…

Beyond Programmable Shading: In Action


Beyond Programmable Shading: In Action
• Back in Quake 1
– If you had to make a decision between an additional
CPU and a Graphics Card which would you choose?

Beyond Programmable Shading: In Action


• Back in Quake 1
– If you had to make a decision between an additional
CPU and a Graphics Card which would you choose?
– Why is this any different today?

Beyond Programmable Shading: In Action


• Back in Quake 1
– If you had to make a decision between an additional
CPU and a Graphics Card which would you choose?
– Why is this any different today?
– Its not any different.

Beyond Programmable Shading: In Action


Why Ray Casting?
• What value does it provide to developers?

Beyond Programmable Shading: In Action


Why Ray Casting?
• What value does it provide to developers?
– Shorter & Cheaper Development
– Higher Quality Games

Beyond Programmable Shading: In Action


Why Ray Casting?
• What value does it provide to end users?

Beyond Programmable Shading: In Action


Screenshot From E3 Rage Video

Beyond Programmable Shading: In Action


Screenshot From E3 Rage Video

Beyond Programmable Shading: In Action


Why Ray Casting?

Beyond Programmable Shading: In Action


Current State of Rasterization

Command Buffer
Vertex
Processing
Vertex Processing

Triangle Setup

Fragment Processing
Fragment
Processing

Beyond Programmable Shading: In Action


Future of Rasterization

Beyond Programmable Shading: In Action


Future of Rasterization

Beyond Programmable Shading: In Action


Future of Rasterization

Beyond Programmable Shading: In Action


Future of Rasterization

Beyond Programmable Shading: In Action


Future of Rasterization

Beyond Programmable Shading: In Action


Future of Rasterization

Beyond Programmable Shading: In Action


Future of Rasterization

Beyond Programmable Shading: In Action


Future of Rasterization

Multiple Cores
Command Buffer
Vertex Processing
Vertex Processing

Triangle Setup

Fragment Processing
Fragment Processing

Beyond Programmable Shading: In Action


Future of Rasterization
28 31 29 28 32 29 28 33 34 35 36 37 38 39 37 38 40 37 38 41

GPU Triangle Setup

Beyond Programmable Shading: In Action


Future of Rasterization

Command Buffer

Vertex
Vertex Processing
Processing

Triangle Sorting

Triangle Setup
Fragment
Processing
Fragment Processing

Beyond Programmable Shading: In Action


Future of Rasterization x2
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11 GPU Triangle Setup
9 8 11
9 8 12
13 14 15
15 14 16
16 14 17
16 14 18
16 14 19

Beyond Programmable Shading: In Action 121


Future of Rasterization x2
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11 GPU Triangle Setup
9 8 11
9 8 12
13 14 15
15 14 16
16 14 17
16 14 18
16 14 19

Beyond Programmable Shading: In Action 122


Future of Rasterization x2
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11 GPU Triangle Setup
9 8 11
9 8 12
13 14 15
15 14 16
16 14 17
16 14 18
16 14 19

Beyond Programmable Shading: In Action 123


Future of Rasterization x2
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11 GPU Triangle Setup
9 8 11
9 8 12
13 14 15
15 14 16
16 14 17
16 14 18
16 14 19

Beyond Programmable Shading: In Action 124


Future of Rasterization x2
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11 GPU Triangle Setup
9 8 11
9 8 12
13 14 15
15 14 16
16 14 17
16 14 18
16 14 19

Beyond Programmable Shading: In Action 125


Why voxels, and not triangles?

Beyond Programmable Shading: In Action


Why voxels, and not triangles?
• Unique Texturing is possible with
rasterization
– Rage – idTech 5

Beyond Programmable Shading: In Action


Why voxels, and not triangles?
• Unique Texturing is possible with
rasterization
– Rage – idTech 5
• Unique Geometry is possible with
rasterization
– Progressive Mesh

Beyond Programmable Shading: In Action


Why voxels, and not triangles?
• Unique Texturing is possible with
rasterization
– Rage – idTech 5
• Unique Geometry is possible with
rasterization
– Progressive Mesh
• SVO Solves Two Problems in One
– Unique Texturing & Unique Geometry

Beyond Programmable Shading: In Action


Why is the control flow efficient?

Beyond Programmable Shading: In Action


Why is the control flow efficient?

Beyond Programmable Shading: In Action


Why is the control flow efficient?

Beyond Programmable Shading: In Action


Why is the control flow efficient?

Beyond Programmable Shading: In Action


Voxel Mip Mapping – Thin Walls

Beyond Programmable Shading: In Action


Voxel Mip Mapping – Thin Walls

Beyond Programmable Shading: In Action


Voxel Mip Mapping – Thin Walls

Beyond Programmable Shading: In Action


Voxel Mip Mapping – Thin Walls

Beyond Programmable Shading: In Action


Caveats of Ray-Tracing?
• “Primary rays cache, secondary rays thrash”™
– Importance sampling to the rescue!
• Ray Tracing != Ray Casting

Beyond Programmable Shading: In Action


Sparse Voxel Oct-trees
• Oct-trees as collection of maximal blocks.
– Related to run-length encoding.
– Variable splitting planes

Beyond Programmable Shading: In Action


Data Structure
• Disk Caching with Virtual and Physical Pages

Beyond Programmable Shading: In Action


Is Disk Caching a Valid Lever?

Beyond Programmable Shading: In Action 141


Hot Data Structure
• Disk Caching with Virtual and Physical Pages

Beyond Programmable Shading: In Action


Hot Data Structure
• Disk Caching with Virtual and Physical Pages
– Start out with a single virtual page.

Beyond Programmable Shading: In Action


Hot Data Structure
• Disk Caching with Virtual and Physical Pages
– Start out with a single virtual page.
– Render some voxels into the tree until page capacity
is reached.

Beyond Programmable Shading: In Action


Hot Data Structure
• Disk Caching with Virtual and Physical Pages
– Start out with a single virtual page.
– Render some voxels into the tree until page capacity
is reached.
– Split page into 8 sub-pages and attempt to add the
overflow voxel again.

Beyond Programmable Shading: In Action


Hot Data Structure
• Disk Caching with Virtual and Physical Pages
– Start out with a single virtual page.
– Render some voxels into the tree until page capacity
is reached.
– Split page into 8 sub-pages and attempt to add the
overflow voxel again.
– Store out virtual pages to disk.
– Load/Unload each page’s levels as necessary at
runtime.

Beyond Programmable Shading: In Action


Hot Data Structure
• Page capacity can be based on...
– CUDA's shared memory size
– SPU local store size
– Optimum disk streaming performance
– Minimum physical page memory

Beyond Programmable Shading: In Action


Virtual Page Fragmentation
• Traverse indexing oct-tree
– Write out pages according to optimal layout (breadth
first, depth first, etc...)

Beyond Programmable Shading: In Action


Physical Page Fragmentation
• Constantly loading / unloading data
fragments memory over time
• Bucket memory into sections and assign
each page to a section.

Beyond Programmable Shading: In Action 149


Page Optimization
• Execution time proportional to number of
blocks.

Beyond Programmable Shading: In Action


Page Optimization
• Execution time proportional to number of
blocks.
• Number of blocks can be reduced through
translation.

Beyond Programmable Shading: In Action


Page Optimization
• Execution time proportional to number of
blocks.
• Number of blocks can be reduced through
translation.
• Translating by 2n doesn’t affect any oct-tree
level smaller than 2n

Beyond Programmable Shading: In Action


Page Optimization
• Create scratch page with enlarged region
– 2n+1 x 2n+1 x 2n+1
• Apply successive translations of magnitude
power of 2 in the x, y, & z directions and
keep track of the number of nodes.
• Store off total translation for ray casting
adjustment.
• O(n * 22n)
– n is the number of levels in the oct-tree

Beyond Programmable Shading: In Action


Page Optimization

Beyond Programmable Shading: In Action 154


Page Optimization

Beyond Programmable Shading: In Action 155


Page Optimization
• Minimize outside nodes for faster casting

Beyond Programmable Shading: In Action


Page Optimization

Beyond Programmable Shading: In Action 157


Page Optimization

Beyond Programmable Shading: In Action 158


Page Optimization

Beyond Programmable Shading: In Action 159


Data Structure
• Different structures for editing, runtime and
storage.

Beyond Programmable Shading: In Action


Runtime Data Structure
• child offsets : 32
• diffuse rgb : 3
• specular scale/power : 1
• planes : 12
• normal xyz : 3
• pad : 1
• total : 52 bytes per node

Beyond Programmable Shading: In Action


Storage Data Structure
• children bit mask : 1
• diffuse rgb : 3
• specular scale/power : 1
• normal xyz : 3
• total : 8 bytes per node

Beyond Programmable Shading: In Action


Data Compression
• Compressing child bits
• Compressing Colors

Beyond Programmable Shading: In Action


Compressing Child Bits
0.06

0.05

0.04

0.03

0.02

0.01

0
0 50 100 150 200 250

Beyond Programmable Shading: In Action 164


Compressing Child Bits
0.06 0.08
0.06
0.04
0.04
0.02 Level 0 Level 4
0.02
0 0
0 50 100 150 200 250 0 50 100 150 200 250
0.05 0.08
0.04 0.06
0.03
0.04
0.02 Level 1 Level 5
0.01 0.02
0 0
0 50 100 150 200 250 0 50 100 150 200 250
0.06 0.05
0.05 0.04
0.04 0.03
0.03
Level 2 0.02 Level 6
0.02
0.01
0.01
0 0
1 51 101 151 201 251 0 50 100 150 200 250
0.05 0.08
0.07
0.04 0.06
0.05
0.03
0.04
0.02 Level 3 0.03 Level 7
0.02
0.01 0.01
0 0
0 50 100 150 200 250 0 50 100 150 200 250
Beyond Programmable Shading: In Action
Compressing Child Bits
• Split by oct-tree level.
• Entropy Encoding

Beyond Programmable Shading: In Action


Compressing Color Data

46

15 35 90 63 29

10 20 35 87 98 81 95 61 65 29 35 26 30 25

Beyond Programmable Shading: In Action


Compressing Color Data

46

-31 -11 44 17 -17

-5 +5 0 -3 8 -9 5 -2 2 0 6 3 1 -4

Beyond Programmable Shading: In Action


Compressing Color Data
• Split by oct-tree level.
• Quantization
• Entropy Encoding
• 8:1 expected compression ratio

Beyond Programmable Shading: In Action


Data Storage Size
• 1.15 bits of positional data per voxel
• Cost savings improves as triangle size
decreases.
• 160 bits per triangle in traditional format
– x,z,y,s,t all 32-bits

• 80 bits per triangle in compressed format


– x,y,z,s,t all 16-bits

• 72 bits equivalent per triangle in oct-tree


– (for next generation)
Beyond Programmable Shading: In Action
Generating the Data
• Every surface can enumerate into voxels.
– Triangles
• 3D Scan Conversion, Volume Projection, Subdivision

Beyond Programmable Shading: In Action


3D Scan Conversion
Z=0

Z=1

= Z=2

Z=3

Z=4
Beyond Programmable Shading: In Action
Volume Projection

Beyond Programmable Shading: In Action


Volume Projection

Beyond Programmable Shading: In Action


Subdivision

Beyond Programmable Shading: In Action


Subdivision

Beyond Programmable Shading: In Action


Subdivision

Beyond Programmable Shading: In Action


Generating the Data
• Every surface can enumerate into voxels.
– 3D Scan Conversion, Volume Projection, Subdivision
• Thick surfaces are unnecessary

Beyond Programmable Shading: In Action


Generating the Data
• Every surface can enumerate into voxels.
– 3D Scan Conversion, Volume Projection, Subdivision
• Thick surfaces are unnecessary
– Flood fill world and remove unnecessary voxels.

Beyond Programmable Shading: In Action


Generating the Data
• Every surface can enumerate into voxels.
– 3D Scan Conversion, Volume Projection, Subdivision
• Thick surfaces are unnecessary
– Flood fill world and remove unnecessary voxels.
• Generate geometry mip-maps

Beyond Programmable Shading: In Action


Generating the Data
• Every surface can enumerate into voxels.
– 3D Scan Conversion, Volume Projection, Subdivision
• Thick surfaces are unnecessary
– Flood fill world and remove unnecessary voxels.
• Generate geometry mip-maps
• Perform ray-tracing to light the voxels.

Beyond Programmable Shading: In Action


Using the Data
• For each pixel on the screen
– Shoot out a ray into the oct-tree and write out the
node number (and depth)

Beyond Programmable Shading: In Action


Oct-tree Ray Traversal
• Similar to KD-tree traversal. Clip the line
with the mid-planes only.
• Tree traversal with two lookup tables.
– One to find which nodes to intersect with a given ray
direction in a worst-case scenario.
– The other to determine the order of intersection.
• Faster than most stackless traversal
methods for CUDA.

Beyond Programmable Shading: In Action


LOD
• How to handle oct-tree mip-mapping and
when is it necessary to load additional detail
levels?

Beyond Programmable Shading: In Action 184


LOD - Stop Depth
• Stop Depth based on pixel and voxel size
[Wald07]

Beyond Programmable Shading: In Action 185


LOD - Stop Depth
• Stop Depth based on pixel and voxel size
[Wald07]
– Oblique surfaces have unnecessary extra detail
• Hurts casting performance by traversing detail that
you won’t see
• Hurts streaming performance by loading unnecessary
data

Beyond Programmable Shading: In Action 186


LOD - Post Process
• Ray casting outputs node indexes

Beyond Programmable Shading: In Action 187


LOD - Post Process
• Ray casting outputs node indexes
• A post process which looks at ratios of nodes
to pixels.
– Small feedback buffer (320x180) contains list of
pages which require additional detail.

Beyond Programmable Shading: In Action 188


LOD - Post Process
• Ray casting outputs node indexes
• A post process which looks at ratios of nodes
to pixels.
– Small feedback buffer (320x180) contains list of
pages which require additional detail.
• Up to 20% performance improvement

Beyond Programmable Shading: In Action 189


Post Process Blur
• Fixes the “Jam your head into a wall”
scenario.

Beyond Programmable Shading: In Action 190


Post Process Blur
• Fixes the “Jam your head into a wall”
scenario.
• Width of blur kernel related to size of voxel
on screen.

Beyond Programmable Shading: In Action 191


Rendering Dynamic Geometry
• With voxels
– Option 1
• Ray cast or rasterize a triangle mesh
• Transform to base pose
• Trace with local oct-tree
• Allows instancing of geometry

Beyond Programmable Shading: In Action


Rendering Dynamic Geometry
• With voxels
– Option 1
• Ray cast or rasterize a triangle mesh
• Transform to base pose
• Trace with local oct-tree
• Allows instancing of geometry
– Option 2
• Have two different oct-trees: static & dynamic
• Render both and merge results together with depth information

Beyond Programmable Shading: In Action


Rendering Dynamic Geometry
• With voxels
– Option 1
• Ray cast or rasterize a triangle mesh
• Transform to base pose
• Trace with local oct-tree
• Allows instancing of geometry
– Option 2
• Have two different oct-trees: static & dynamic
• Render both and merge results together with depth information
• With triangles
– Hybrid rendering via rasterization and deferred shading.

Beyond Programmable Shading: In Action


Depth Advance Optimization
• Render a coarse hull of the geometry into a
depth-buffer.
– Automatically calculate from voxel geometry.
• Start the ray casting at the depth-buffer
values.

Beyond Programmable Shading: In Action


Depth Advance Optimization
• Render a coarse hull of the geometry into a
depth-buffer.
– Automatically calculate from voxel geometry.
• Start the ray casting at the depth-buffer
values.
• Skips most of the traversal process.
– Up to 2x speed improvement

Beyond Programmable Shading: In Action


Depth Advance Optimization
• Render a coarse hull of the geometry into a
depth-buffer.
– Automatically calculate from voxel geometry.
• Start the ray casting at the depth-buffer
values.
• Skips most of the traversal process.
– Up to 2x speed improvement
– Less sensitive to scene complexity

Beyond Programmable Shading: In Action


Adaptive Sub-Sampling
• After rendering the scene, perform a Sobel
edge filter over the frame buffer to figure
out where additional rays would improve
the quality of the image.
• Cast additional rays.
• Repeat until 16 ms.

Beyond Programmable Shading: In Action


Adaptive Sub-Sampling Problems
• Inherently always sampling the most
divergent parts of the scene
• Can manage performance hit by sampling
highly aliased to less aliased in chunks

Beyond Programmable Shading: In Action


Infinite Surface Detail
• Oct-tree node's recursively point back in on
themselves to create an infinite amount of
detail
• Create detail octree sub-segments to
simulate rough, smooth, porous, sharp
edges, etc..
• Programatically simulate virtual detail
levels.

Beyond Programmable Shading: In Action


How much time to innovate?
• 1 year tools
• 3 months runtime

Beyond Programmable Shading: In Action


Expected Runtime Performance
• 33% of the time rendering characters / etc
• 66% of the time rendering world
• Ray-casting the world must complete in
~20ms for 30 FPS
• Theoretically possible on today's technology
at 720p and 30 fps (GeForce 8800 Series)

Beyond Programmable Shading: In Action


Expected Runtime Performance
• 33% of the time rendering characters / etc
• 66% of the time rendering world
• Ray-casting the world must complete in
~20ms for 30 FPS
• Theoretically possible on today's technology
at 720p and 30 fps (GeForce 8800 Series)
– Theory falls a little short of reality.

Beyond Programmable Shading: In Action


How would this affect a platform
launch?
• Generational skip in geometric complexity
• Next gen platforms 4 times better at least
• 60 FPS at 1080p with Anti-aliasing

Beyond Programmable Shading: In Action


Special Thanks
• Paul Debevec
– Light probes used with permission
• Dimitry Parkin
– www.parkparkin.com
• John Carmack
• Cass Everitt • Sony
• Mark Harris • Intel
• Nathaniel Duca • Nvidia
• Aaron Lefohn
• Mike Houston
• Tom Forsyth
Beyond Programmable Shading: In Action 205
Beyond Programmable Shading: In Action 206
Questions

Jon Olick ([email protected])


id Software
Beyond Programmable Shading: In Action
References
– http://www.sci.utah.edu/~wald/Publications/2007/
MROct/download/mroct.pdf
– [Wald07]

Beyond Programmable Shading: In Action 208

You might also like