0% found this document useful (0 votes)
37 views33 pages

8 WiiProfiler V3 0

WiiProfiler is a free tool for profiling Wii game performance. Version 3.0 introduces sampling based on performance counters and instrumenting functions using counters. It can track CPU function time, instructions, branches, cache misses, and frame rate. Integration requires linking to a library and making a few function calls. The interface focuses on fast, easy operation through statistical graphs, call trees, and frame-based views of profiling data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views33 pages

8 WiiProfiler V3 0

WiiProfiler is a free tool for profiling Wii game performance. Version 3.0 introduces sampling based on performance counters and instrumenting functions using counters. It can track CPU function time, instructions, branches, cache misses, and frame rate. Integration requires linking to a library and making a few function calls. The interface focuses on fast, easy operation through statistical graphs, call trees, and frame-based views of profiling data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

WiiProfiler v3.

Steve Rabin
Principal Software Engineer
Software Development Support Group
Agenda
‹ WiiP fil iintroduction
WiiProfiler t d ti
– What it provides
‹ WiiP fil M
WiiProfiler Methodology
h d l
– Game integration
– V2.0
V2 0 features
f
‹ WiiProfiler v3.0 features
– Sampling
l b
based
d on performance
f counters
– Instrumenting using performance counters
– T
Tracking
ki user d
data
t
– Code coverage
Introduction
‹ Measures CPU function performance
– How much time spent in each function
– Cycles, instructions, branches, cache misses
– Function call tree
– Function code coverage
– Frame rate performance
p

‹ Free tool created exclusively for Wii


– Version 1.0
1 0 (May 2007)
– Version 2.0 (April 2008)
– Version 3.0 (Open BETA now, Final Summer 2009)

‹ Requirements
– NDEV and minor p
programmer
g integration
g
WiiProfiler v1.0
WiiProfiler v2.0
WiiProfiler v3.0
WiiProfiler Design Methodology
‹ E t
Extremely
l ffastt and
d easy to
t integrate
i t t
– Only a couple required function calls to library functions
((10 minute integration)
g )

‹ Extremely fast and easy to operate


– Minimalist
Mi i li t interface
i t f that
th t jjustt works
k
– Deep functionality with little cognitive overhead

‹ Effortless visual exploration of data


– Use graphs to maximize comprehension
– Frame-based graphs show problem frames
– Easy to compare and interpret
Methodology:
gy
Fast and easy to integrate
Code Integration: Step 1

‹ Link against "wiiprofiler.a"


Code Integration: Step 2

Include the header file:


#include <revolution/wiiprofiler.h>
Code Integration: Step 3

WIIPROFILER_Init(void * bufferMEM2,u32 sizeInBytes,


BOOL doesGameWaitForRetrace);

‹ Call init function with a MEM2 buffer


– At least 8MB, as large as 100MB
– Larger buffer = longer profiling

‹ Answer the question:


– Does your main loop wait for the vertical retrace?
Code Integration: Step 4

while(true)
hil ( )
{ //Top of main loop Add This
WIIPROFILER_MarkFrameBegin();

//Game code, etc.


}
Methodology:
gy
Fast and easy to operate
Only Two Choices

‹ Connect to NDEV ‹ Open


p ap
profile
Demo:
Fast and Easy to Operate
‹ Statistical sampling
– Various rates available, Simple vs Full
– Accuracy vs Overhead/Size tradeoff

‹ Start and Stop

‹ Open and Save

‹ Settings and right click menus


Methodology:
Effortless Visualization
Demo:
Effortless Visualization
‹ Functions
– Sparklines
– Self vs Total
– Hide insignificant

‹ Call tree exploration


– Reverse call tree

‹ Statistical graph
– Click functions
– Zoom, scroll, choose
frame
– Highlight Band
– Range and average
Demo:
Effortless Visualization
‹ Frame
F rate graph
h
– Examine frame rate spikes
– Events

‹ Resort functions (new in v3.0)


– Sort based on selected frame
– Sort based on average (default)
– Sort
S t alphabetically
l h b ti ll
– Continuously resort
Performance Counter
Factoid Theater
‹ 4 CPU performance counters in Broadway CPU
– Reset, start, stop, and read in code

‹ R
Reset
t counters
t
– PPCMtpmc1(0); PPCMtpmc2(0); PPCMtpmc3(0); PPCMtpmc4(0);

‹ St t counters
Start t
– PPCMtmmcr0( <counter1> | <counter2> );
– PPCMtmmcr1( <counter3> | <counter4> );

‹ Stop counters
– PPCMtmmcr0( 0 );
– PPCMtmmcr1( 0 );

‹ Read counters
– PPCMfpmc1(); PPCMfpmc2(); PPCMfpmc3(); PPCMfpmc4();
Performance Counter
Factoid Theater
‹ P f
Performance counter
t eventt examples
l (~60
( 60 total)
t t l)
– PMC1_CYCLE # processor cycles
– PMC1_L2_HIT # of accesses that hit L2
– PMC1_L1_MISS # of accesses that miss L1
– PMC1_Bx_UNRESOLVED # of branches unresolved
– PMC1_Bx_STALL_CYCLE # of cycles
y stalled due to branches
– PMC2_CYCLE # processor cycles
– PMC2_INSTRUCTION # of instructions completed
– PMC2 IC MISS
PMC2_IC_MISS # of L1 instruction cache misses
– PMC2_L1_CASTOUT # of L1 castouts to L2
– PMC2_Bx_FALL_THROUGH # of fall through branches

‹ Select one PMC1, PMC2, PMC3, PMC4 at a time


‹ Bracket code (Reset, Start, Stop) and measure results
Performance Counters in
WiiProfiler v3.0

‹ Use performance counters to


– Statistically sample functions
– Instrument
I t t iindividual
di id l ffunctions
ti
Performance Counter
Statistical Sampling
‹ Sample by CPU

– Mispredicted branches Data Instruction


L1 Cache L1 Cache
– Undecided branches 32KB 32KB

– Floating point instructions Combined


L2 Cache
– L1 or L2 instruction misses
– L1 or L2 data misses 256KB

– L1 writes to L2
Main Memory
– L2 writes to memory MEM1 MEM2
24MB 64MB
Performance Counter
Statistical Sampling
‹ Ch
Choose a sampling
li rate
t
– Between every 10 and every 100K
‹ Too often (every 10 to 100)
– Large overhead
– Can be less accurate (cache pollution)
– Fills up buffer fast
‹ Often (every 100 to 1K)
– Medium overhead
– Good accuracy
‹ Less often ((every
y 1K to 100K))
– Least overhead
– Most accurate overall (less accurate per frame)
Instrumenting Functions
‹ Ch
Choose a class
l off performance
f counters
t
– Cycles only
– Cycles and instructions
– Branch prediction performance
– Why branch prediction failed
– C h and
Cache d memory performance
f
– L1 cache performance
– L2 cache performance
p
– Outbound cache writes

‹ Explanation
E l ti off selected
l t d in
i big
bi gray b
box
‹ Decide:
– Whether or not to also statistically sample by time
Instrumenting Functions:
Branch Prediction Performance
‹ Performance counters selected
– PMC1_Bx_UNRESOLVED PMC3_Bx_TAKEN
– PMC2_Bx_FALL_THROUGH PMC4_Bx_MISSED

‹ Data teased out from these 4 counters


– % of correctly predicted branches
– % of incorrectly predicted branches
– Correctly predicted branches
– Incorrectly predicted branches
– Skipped branches based on prediction
– Taken branches based on prediction
– B
Branches
h predicted
di t d by
b hardware
h d
– Branches unconditionally taken
– All branches
Instrumenting Functions:
L1 Cache Performance
‹ P f
Performance counters
t selected
l t d
– PMC1_L1_MISS PMC3_DC_MISS
– PMC2_IC_MISS
PMC2 IC MISS PMC4 CYCLE
PMC4_CYCLE

‹ Data teased out from these 4 counters


– Cycles
– Cycles waiting for memory
– Instruction not found in L1
– Data not found in L1
– Memory not found in L1
– Average cycles waiting for memory
– % of time waiting for memory
Instrumenting Functions:
Selecting Functions
‹ Up
U to 10 ffunctions
i profiled
fil d at a time
i

‹3 ways to select a function


– Choose a Self or Total function
– Drop down list of all game functions
– Choose
Ch a function
f ti from
f Code
C d Coverage
C

‹ Data captured is similar to "Total"


– Function call and child calls
Instrumenting Functions:
Profile and Explore
‹ # function
f ti calls
ll tracked
t k d
‹ # recursive calls tracked
‹ Performance counters
– Total count for performance counter
– Range per frame (max, ave, min)
– Raw call data (might graph slowly)

‹ Helpers
– Expand top level
– Auto-select similar
Tracking User Data
‹ Track any data you want in code
– Track floating point values

‹ WIIPROFILER_TrackValue(name, value);
– Will track multiple values per frame

‹ WIIPROFILER_TrackAccumulatedValue(name, value);
– Will track one accumulated value per frame

‹ WiiProfiler on PC
– Appears in Instrumented tab
– Graphs in Instrumented Graph tab
Code Coverage
‹ D i
During a profile
fil (or
( over multiple)
lti l )
– Which functions get called
– Which functions don
don'tt get called

‹ Filter
– Exclude SDK and platform libraries
– Exclude functions with certain prefixes
p
– Include functions with certain prefixes

‹ Resett b
R button
tt
‹ Instrument button
WiiProfiler v3.0
v3 0 Release
‹ Open BETA for next 1-2 months
– Sign up and we'll send it to you:
https://www.warioworld.com/wii/wiiprofiler

‹ Final release v3.0 early Summer


– More robust communications layer
y
– Instrumenting functions
‹ Allow
RSO and REL functions
‹ Remove interrupts from data
WiiProfiler Summary
‹ Statistical sampling profiler
– Time and performance counters

‹ Instrument functions
– Using performance counters

‹ Track and graph arbitrary data

‹ Function-based code coverage


Q
Questions?

Ask me after the presentation


Or e-mail [email protected]

You might also like