Graphics Performance Optimization
Graphics Performance Optimization
Sébastien Dominé
Manager of Developer Technology Tools
Agenda
Texture
Storage +
Filtering
Vertices Pixels
Possible Pipeline Bottlenecks
Texture
Storage +
CPU/Bus Filtering
Vary
FPS Yes Transfer
vertex size/
varies? limited
AGP rate
No CPU
limited
CPU Bottlenecks
Texture
Storage +
CPU/Bus Filtering
Texture
Storage +
CPU/Bus Filtering
Dynamic geometry:
Create a dynamic vertex buffer
Lock with DISCARD at start of frame
Then append with NOOVERWRITE until full
Use NOOVERWRITE more often than DISCARD
Each DISCARD takes either more time or more memory
So NOOVERWRITE should be most common
Never use no flags
Allow for CPU/GPU load balancing…
FIFO of commands
Don’t lock VBs
Don’t do pixel reads
Etc…
3D Application n
Geometry Transform Bottlenecks
Texture
Storage +
CPU/Bus Filtering
Texture
Storage +
CPU/Bus Filtering
Texture
Storage +
CPU/Bus Filtering
Compress textures:
Collapse monochrome channels into alpha
Use 16-bit color depth when possible (environment maps
and shadow maps)
Use DXT compression
Poor Texture Cache Utilization
Texture
Storage +
CPU/Bus Filtering
Use lowest pixel shader version you can until you start
loosing visual quality
Consider using shader levels of detail
GeForceFX-specific Optimizations
Texture
Storage +
CPU/Bus Filtering
Examples:
multi-pass rendering
rough visibility determination (lens flare, portals)
Caveats:
need time for query to process
can add fillrate overhead
Tools: NVPerfHUD
Drivers now support NVPerfHUD
Overlay that shows vital various statistics as the
application runs
Top graph shows :
Number of API calls – Draw*Prim*, render states,
texture states, shader states
Memory allocated – AGP and video
Bottom graph shows :
GPU Idle – Graphics HW not processing anything
Driver Time – Driver doing work (state and resource
management, shader compilation)
Driver Idle – Driver waiting for GPU to finish
Frame Time – Milliseconds per frame time
NVPerfHUD - Screenshot
Tools: FX Composer