Chapt 07
Chapt 07
In Games
Jon Olick
id Software
250
200
Approx # of CPUs
150
Nvidia
100
50
0
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
Game
Logic
Game
Logic
Blending Tree
Generation
Game
Logic
Blending Tree
Generation
Game
Logic
Blending Tree
Generation
Serial
Game
Logic
Blending Tree
Generation
SPU
SPU
SPU Pipeline
Vertex Decompress
Index Decompress
Blend Shapes
Skinning
Progressive Mesh
Triangle Culling
Compression
Output
SPU Pipeline
Vertex Decompress
Index Decompress
Blend Shapes
Skinning
Progressive Mesh
Triangle Culling
Compression
Output
Unique Vertex
Array 0
Instance Vertex
Array 1
Instance Vertex
Array 1
• Smallest 2 compression
– Two smallest components with 10 bits each
• Encoded from –sqrt(2)/2 to +sqrt(2)/2
– Largest component reconstructed via
• Largest = sqrt(1 – smallestA2 – smallestB2)
• One additional bit for sign of largest component.
• Smallest 2 compression
– Two smallest components with 10 bits each
• Encoded from –sqrt(2)/2 to +sqrt(2)/2
– Largest component reconstructed via
• Largest = sqrt(1 – smallestA2 – smallestB2)
• One additional bit for sign of largest component.
• One more bit to represent W as +1 or -1
– For constructing bi-normal from normal and tangent.
SPU Pipeline
Vertex Decompress
Index Decompress
Blend Shapes
Skinning
Progressive Mesh
Triangle Culling
Compression
Output
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11
9 8 11
2
1
Triangle Indexes
0 1 2
Beyond Programmable Shading: In Action
Index Decompression
1
0
Triangle Indexes
2 0 1
Beyond Programmable Shading: In Action
Index Decompression
• Before Rotation
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
9 6 8
10 9 11
9 8 11
0 1 2
0 2 3
3 2 4
3 4 5
6 7 8
6 8 9
9 11 10
11 9 8
85% compression
6.5 : 1
SPU Pipeline
Vertex Decompress
Index Decompress
Blend Shapes
Skinning
Progressive Mesh
Triangle Culling
Compression
Output
SPU Pipeline
Vertex Decompress
Index Decompress
Blend Shapes
Skinning
Progressive Mesh
Triangle Culling
Compression
Output
Compression
Output
Parent Vertex
Child Vertex
LOD = 0.0
Parent Vertex
Child Vertex
LOD = 0.2
Parent Vertex
Child Vertex
LOD = 0.7
Parent Vertex
Child Vertex
LOD = 1.0
Output
Parent Vertex
Child Vertex, LOD 1
Child Vertex, LOD 0
Parent Vertex
Child Vertex, LOD 1
Child Vertex, LOD 0
LOD = 1.0
LOD = 0.0
Parent Vertex
Child Vertex, LOD 1
Child Vertex, LOD 0
LOD = 1.0
LOD = 0.0
SPU Pipeline
Vertex Decompress
Index Decompress
Blend Shapes
Skinning
Progressive Mesh
Triangle Culling
Compression
Output
10% to 20%
Performance Improvement
SPU Pipeline
Vertex Decompress
Index Decompress
Blend Shapes
Skinning
Progressive Mesh
Triangle Culling
Compression
Output
Float Tables
Output
Vertex Array
SPU Pipeline
Vertex Decompress
Index Decompress
Blend Shapes
Skinning
Progressive Mesh
Triangle Culling
Compression
Output
Vertex
and
Index
Data
Data SPU
Local Stall
Local Stall
Local Stall
Local Stall
Other
Command
Buffer
Draw 17
SPU
Local Stall
Local Stall
Local Stall
New
Commands
Local Stall
Local Stall
Local Stall
Local Stall
Put
Other
Pointer
Data
Start of • Small memory footprint
Free Area
• Will not run out of memory
Vertex End of
and Free Area
Index
Start of
Data 19 Free Area
Draw 5
Current
Semaphore New End
RSX™ of Free
Draw 6 Data 6
Execution Area
Semaphore
Draw 7 Data 14
Semaphore
Buffer 0 Buffer 1
Data 22
SPU 0
Data 8
Buffer 3
Buffer 2 Data 11 SPU 1 Data 13
Data 9 Data 17
Data 21
Data 19 SPU 3
Data 10
Data 7
SPU 4
Data 6 Data 15
SPU 5
Data 14 Data 20
Buffer 4 Buffer 5
Beyond Programmable Shading: In Action
Geometry Performance
do
{
m1 = in1;
in1 = si_lqx(pIn1, offset);
m2 = in2;
in2 = si_lqx(pIn2, offset);
m3 = in3;
in3 = si_lqx(pIn3, offset);
temp2 = si_selb(m3, m1, mask_0X00);
si_stqx(out1, pOut1, offset);
temp3 = si_selb(m2, m1, mask_00X0);
si_stqx(out2, pOut2, offset);
temp1 = si_selb(m1, m2, mask_0X00);
si_stqx(out3, pOut3, offset);
offset = si_ai(offset, 0x30);
out2 = si_shufb(m2, temp2, qs_bCaD);
out1 = si_selb(temp1, m3, mask_00X0);
out3 = si_shufb(m3, temp3, qs_caBD);
} while(si_to_int(offset) != 0);
Beyond Programmable Shading: In Action
Software Pipelined C with SPU Intrinsics
do
{
m1 = in1;
in1 = si_lqx(pIn1, offset);
m2 = in2;
in2 = si_lqx(pIn2, offset);
m3 = in3;
in3 = si_lqx(pIn3, offset);
Up to 20x faster
than naive C/C++
temp2 = si_selb(m3, m1, mask_0X00);
si_stqx(out1, pOut1, offset);
temp3 = si_selb(m2, m1, mask_00X0);
si_stqx(out2, pOut2, offset);
temp1 = si_selb(m1, m2, mask_0X00);
si_stqx(out3, pOut3, offset);
offset = si_ai(offset, 0x30);
out2 = si_shufb(m2, temp2, qs_bCaD);
out1 = si_selb(temp1, m3, mask_00X0);
out3 = si_shufb(m3, temp3, qs_caBD);
} while(si_to_int(offset) != 0);
Beyond Programmable Shading: In Action
1 SPU
800,000+
Triangles Per Frame
at 60 Frames per Second
800,000+
Triangles Per Frame
at 60 Frames per Second
Jon Olick
id Software
Command Buffer
Vertex
Processing
Vertex Processing
Triangle Setup
Fragment Processing
Fragment
Processing
Multiple Cores
Command Buffer
Vertex Processing
Vertex Processing
Triangle Setup
Fragment Processing
Fragment Processing
Command Buffer
Vertex
Vertex Processing
Processing
Triangle Sorting
Triangle Setup
Fragment
Processing
Fragment Processing
0.05
0.04
0.03
0.02
0.01
0
0 50 100 150 200 250
46
15 35 90 63 29
10 20 35 87 98 81 95 61 65 29 35 26 30 25
46
-5 +5 0 -3 8 -9 5 -2 2 0 6 3 1 -4
Z=1
= Z=2
Z=3
Z=4
Beyond Programmable Shading: In Action
Volume Projection