0% found this document useful (0 votes)
52 views28 pages

Up Your Game-Understand GPU Architecture

Up Your Game-Understand GPU Architecture
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views28 pages

Up Your Game-Understand GPU Architecture

Up Your Game-Understand GPU Architecture
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Game Developers Conference

Up Your Game! Know Your


Intel GPU Architecture
Pamela Harrison
Software Technical Consulting Engineer, Intel® GPA
Stanislav Volkov
Software Architect, Intel® GPA
Agenda for Today
Intel® Graphics Performance Analyzers (Intel® GPA) Overview

What’s New

Execution Unit (EU)

Intel® GPA Profiling

Summary

Resources
What is Intel® GPA?
Tool suite for analyzing games and other real-time graphics applications.
Locate graphics bottlenecks

System Analyzer Graphics Trace


Analyzer
Graphics Frame Analyzer

Intel® GPA Framework

software.intel.com/gpa
Multi-GPU support

Graphics Trace Analyzer


• Sync highlighting and arrows: Signal,
What’s Render, Present packages
• Activity indicators (percent/time)
New Graphics Frame Analyzer
• Multi-frame support (stream mode)
• Render Target Dependency View
for Direct3D 11

Intel® GPA Plugin for Unreal Engine


EU
Architecture

Architecture Latency vs. Metrics


Overview Stall
Execution Unit (EU) Overview

Shared Functions Copy Engine Media Engine


General Register File
Sub-slice Sub-slice Sub-slice Sub-slice
(GRF)
Thread Dispatch Thread Dispatch Thread Dispatch Thread Dispatch
Slot 0 ALU0
EU EU EU EU EU EU EU EU EU EU EU EU EU EU EU EU
Slot 1
EU EU EU EU EU EU EU EU EU EU EU EU EU EU EU EU
Slot 2
EU EU EU EU EU EU EU EU EU EU EU EU EU EU EU EU
Slot 3
EU EU EU EU EU EU EU EU EU EU EU EU EU EU EU EU
Slot 4
Slot 5 ALU1
Dataport

Dataport

Dataport

Dataport
Sampler

Sampler

Sampler

Sampler
Slot 6
SLM

SLM

SLM

SLM
Pixel Back - End Pixel Back - End
Branch
L3 Cache Thread Control
SEND
Execution Unit (EU) Overview

General Register File


• 7 Thread (GRF)
Slots Slot 0 ALU0
Slot 1
• 128 registers x Slot 2
32B = 4KB per Slot 3
Slot Slot 4
Slot 5 ALU1
• Each slot can Slot 6
run a different
shader
Branch
• More threads
= more latency Thread Control
hiding SEND
Execution Unit (EU) Overview

General Register File


• ALU0 (FPU0):
(GRF)
Slot 0
ALU0 • float16, int8, int16
Slot 1 @SIMD16
Slot 2 • float32, int32
Slot 3 @SIMD8
Slot 4
Slot 5 ALU1
Slot 6
• ALU1 (FPU1/EM):
• Transcendental
Math operations
Branch
• float32 @ SIMD2
Thread Control
SEND
Execution Unit (EU) Overview

General Register File


(GRF)
Slot 0
ALU0
Slot 1
Slot 2
Slot 3 Processes send
Slot 4 instructions:
Slot 5 ALU1
Slot 6 • Read, write and
service
operations
Branch
Thread Control • Cause high
SEND execution
latency
Latency vs. Stall

Thread 1 MAD SEND MAD


Latency vs. Stall

Thread 1 MAD SEND MAD

Thread 2 MAD SEND


Latency vs. Stall

Hidden Latency Hidden Latency

Thread 1 MAD SEND MAD

Thread 2 MAD SEND


Latency vs Stall

Hidden Latency Stall Hidden Latency

Thread 1 MAD SEND MAD

Thread 2 MAD SEND


EU Performance Observability
1 2

General Register File In Graphics Frame Analyzer


(GRF)
Slot 0 ALU0
Slot 1
Metrics are averaged across all EUs
Slot 2
over the time measured:
Slot 3
Slot 4 ▪ EU Thread Occupancy, % - percent
1 of thread slots in use
Slot 5 ALU1
Slot 6

2 ▪ EU Active, % - ALU0 or ALU1


executing an instruction
Branch
Thread Control 3 ▪ EU Stall, % - at least one thread
loaded, but no instruction executed
SEND

3
Intel® GPA
Profiling

Hotspot Thread
Shader
Analysis Dispatch
Profiler
Problem
Automatic Hotspot Analysis
Hotspot Example: L3 Cache
Hotspot Example: L3 Cache
Hotspot Example: Shader Execution
Hotspot Example: Shader Execution
Hotspot Example: Thread Dispatch
Hotspot Example: Thread Dispatch
Hotspot Example: Thread Dispatch
Hotspot Example: Thread Dispatch

SIMD8

SIMD16
Up Your Game!

Understand
Understand Help
Your
Hardware
Your Profiler Increase
Performance
Intel® Graphics Performance Analyzers

Product Site – Overview, features, what’s new…


software.intel.com/gpa

Resources Training – Tutorials, Quick tips, Articles, Workshops


software.intel.com/gpa/training

Support – Connect with experts in community forums

Free Download software.intel.com/gpa


Legal Notices and Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of
performance, course of dealing, or usage in trade.

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any
patent claim thereafter drafted which includes subject matter disclosed herein.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be
absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.

Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may
affect your actual performance.

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.

Your costs and results may vary.

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3
instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this
product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for
more information regarding the specific instruction sets covered by this notice.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

27

You might also like