0% found this document useful (0 votes)
77 views12 pages

Make Gprof

The document discusses two programming tools - Make and Gprof. It provides an overview of how Make can be used to compile and link multi-file programs only as needed by defining rules and dependencies in a Makefile. It also explains how Gprof can be used to profile programs and identify inefficient parts of code by instrumenting, timing, and analyzing profiling output. Examples are given of a Makefile and using Gprof on a sample program.

Uploaded by

Gilco333
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views12 pages

Make Gprof

The document discusses two programming tools - Make and Gprof. It provides an overview of how Make can be used to compile and link multi-file programs only as needed by defining rules and dependencies in a Makefile. It also explains how Gprof can be used to profile programs and identify inefficient parts of code by instrumenting, timing, and analyzing profiling output. Examples are given of a Makefile and using Gprof on a sample program.

Uploaded by

Gilco333
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Make and Gprof

Prof. David August


COS 217

Goals of Todays Lecture


Overview of two important programming tools
o Make for compiling and linking multi-file programs
o Gprof for profiling to identify slow parts of the code

Make
o Overview of compilation process
o Motivation for using Makefiles
o Example Makefile, refined in five steps

Gprof
o Timing, instrumenting, and profiling
o GNU Performance Profiler (Gprof)
o Running gprof and understanding the output

Example of a Three-File Program


Program divided into three files
o intmath.h: interface, included in intmath.c and
o
o

testintmath.c
intmath.c: implementation of math functions
testintmath.c: implementation of tests of the math functions

Creating the testintmath binary executable


testintmath.c

intmath.h

intmath.c

testintmath
gcc Wall ansi pedantic o testintmath testintmath.c intmath.c
3

Many Steps, Under the Hood


Preprocessing (gcc E intmath.c > intmath.i)
o Removes preprocessor directives
o Produces intmath.i and testintmath.i

Compiling (gcc S intmath.i)


o Converts to assembly language
o Produces intmath.s and testintmath.s

Assembling (gcc c intmath.s)


o Converts to machine language with unresolved directives
o Produces the intmath.o and testintmath.o binaries

Linking (gcc o testintmath testintmath.o


intmath.o lc)
o Creates machine language exectutable
o Produces the testintmath binary

Motivation for Makefiles


Typing at command-line gets tedious
o Long command with compiler, flags, and file names
o Easy to make a mistake

Compiling everything from scratch is time-consuming


o Repeating preprocessing, compiling, assembling, and linking
o Repeating these steps for every file, even if just one has changed

UNIX Makefile tool


o Makefile: file containing information necessary to build a program
Lists the files as well as the dependencies
o Recompile or relink only as necessary
When a dependent file has changed since command was run
E.g. if intmath.c changes, recompile intmath.c but not
testintmath.c
o Simply type make, or make f <makefile_name>

Main Ingredients of a Makefile


Group of lines
o Target: the file you want to create
o Dependencies: the files on which this file depends
o Command: what to execute to create the file (after a TAB)

Examples
testintmath: testintmath.o intmath.o
gcc o testintmath testintmath.o intmath.o

intmath.o: intmath.c intmath.h


gcc -Wall -ansi -pedantic -c -o intmath.o intmath.c

Complete Makefile #1
Three groups
o testintmath: link testintmath.o and intmath.o
o testintmath.o: compile testintmath.c, which depends on intmath.h
o intmath.o: compile intmath.c, which depends on intmath.h

testintmath: testintmath.o intmath.o


gcc o testintmath testintmath.o intmath.o
testintmath.o: testintmath.c intmath.h
gcc -Wall -ansi -pedantic -c -o testintmath.o testintmath.c
intmath.o: intmath.c intmath.h
gcc -Wall -ansi -pedantic -c -o intmath.o intmath.c
7

Adding Non-File Targets


Adding useful shortcuts for the programmer
o make all: create the final binary
o make clobber: delete all temp files, core files, binaries, etc.
o make clean: delete all binaries

Commands in the example


o rm f: remove files without querying the user
o Files ending in ~ and starting/ending in # are temporary files
o core is a file produced when a program dumps core

all: testintmath
clobber: clean
rm -f *~ \#*\# core
clean:
rm -f testintmath *.o
8

Complete Makefile #2
# Build rules for non-file targets
all: testintmath
clobber: clean
rm -f *~ \#*\# core
clean:
rm -f testintmath *.o
# Build rules for file targets
testintmath: testintmath.o intmath.o
gcc o testintmath testintmath.o intmath.o
testintmath.o: testintmath.c intmath.h
gcc -Wall -ansi -pedantic -c -o testintmath.o testintmath.c
intmath.o: intmath.c intmath.h
gcc -Wall -ansi -pedantic -c -o intmath.o intmath.c
9

Useful Abbreviations
Abbreviations
o Target file: $@
o First item in the dependency list: $<

Example
testintmath: testintmath.o intmath.o
gcc o testintmath testintmath.o intmath.o

testintmath: testintmath.o intmath.o


gcc o $@ $< intmath.o
10

Complete Makefile #3
# Build rules for non-file targets
all: testintmath
clobber: clean
rm -f *~ \#*\# core
clean:
rm -f testintmath *.o
# Build rules for file targets
testintmath: testintmath.o intmath.o
gcc o $@ $< intmath.o
testintmath.o: testintmath.c intmath.h
gcc -Wall -ansi -pedantic -c -o $@ $<
intmath.o: intmath.c intmath.h
gcc -Wall -ansi -pedantic -c -o $@ $<
11

Useful Pattern Rules: Wildcard %


Can define a default behavior
o Build rule: gcc -Wall -ansi -pedantic -c -o $@ $<
o Applied when target ends in .o and dependency in .c

%.o: %.c
gcc -Wall -ansi -pedantic -c -o $@ $<

Can omit command clause in build rules (even some rules!)


testintmath: testintmath.o intmath.o
gcc o $@ $< intmath.o
testintmath.o: testintmath.c intmath.h
intmath.o: intmath.c intmath.h
12

Macros for Compiling and Linking


Make it easy to change which compiler is used
o Macro: CC = gcc
o Usage: $(CC) -o $@ $< intmath.o

Make it easy to change the compiler flags


o Macro: CFLAGS = -Wall -ansi pedantic
o Usage: $(CC) $(CFLAGS) -c -o $@ $<
CC = gcc
# CC = gccmemstat
CFLAGS = -Wall -ansi -pedantic
# CFLAGS = -Wall -ansi -pedantic -g
# CFLAGS = -Wall -ansi -pedantic -DNDEBUG
# CFLAGS = -Wall -ansi -pedantic -DNDEBUG -O3

13

Sequence of Makefiles (see Web)


1. Initial Makefile with file targets
testintmath, testintmath.o, intmath.o

2. Adding non-file targets


all, clobber, and clean

3. Adding abbreviations
$@ and $<

4. Adding pattern rules


%.o: %.c

5. Adding macros
CC and CFLAGS

14

References on Makefiles
Brief discussion in the King book
o Section 15.4 (pp. 320-322)

GNU make
o http://www.gnu.org/software/make/manual/html_mono/make.html

Cautionary notes
o Dont forget to use a TAB character, rather than blanks
o Be careful with how you use the rm f command

15

Timing, Instrumenting, Profiling


How slow is the code?
o How long does it take for certain types of inputs?

Where is the code slow?


o Which code is being executed most?

Why is the code running out of memory?


o Where is the memory going?
o Are there leaks?

Why is the code slow?


o How imbalanced is my hash table or binary tree?

Input

Program

Output
16

Timing
Most shells provide tool to time program execution
o E.g., bash time command
bash> time sort < bigfile.txt > output.txt
real
0m12.977s
user
0m12.860s
sys
0m0.010s

Breakdown of time
o Real: elapsed time between invocation and termination
o User: time spent executing the program
o System: time spent within the OS on the programs behalf

But, which parts of the code are the most time consuming?
17

Instrumenting
Most operating systems provide a way to get the time
o e.g., UNIX gettimeofday command

#include <sys/time.h>
struct timeval start_time, end_time;
gettimeofday(&start_time, NULL);
<execute some code here>
gettimeofday(&end_time, NULL);
float seconds = end_time.tv_sec - start_time.tv_sec +
1.0E-6F * (end_time.tv_usec - start_time.tv_usec);

18

Profiling
Gather statistics about your programs execution
o
o
o
o

e.g., how much time did execution of a function take?


e.g., how many times was a particular function called?
e.g., how many times was a particular line of code executed?
e.g., which lines of code used the most time?

Most compilers come with profilers


o e.g., pixie and gprof

Gprof (GNU Performance Profiler)


o gcc Wall ansi pedantic pg o intmath.o
intmath.c

19

Profiler Basics
Profiler is just a tool
o Only as good as its user
o Can help find hotspots, but you must analyze them

Analysis includes
o
o
o
o

Deciding to do nothing
Changing algorithm
Changing low-level details
Knowing when to stop Amdahls law

Process
o
o
o
o
o

Write code
Make sure its correct, verify correctness, test correctness
Run profiler
Possibly optimize code
Make sure its correct, verify correctness, test correctness
20

Gprof (GNU Performance Profiler)


Instrumenting the code
o gcc Wall ansi pedantic pg o intmath.o
intmath.c

Running the code (e.g., testintmath)


o Produces output file gmon.out containing statistics
Printing a human-readable report from gmon.out
o gprof testintmath > gprofreport

21

Two Main Outputs of Gprof


Call graph profile: detailed information per function
o Which functions called it, and how much time was consumed?
o Which functions it calls, how many times, and for how long?
o We wont look at this output in any detail

Flat profile: one line per function


o
o
o
o
o
o
o

name: name of the function


%time: percentage of time spent executing this function
cumulative seconds: [skipping, as this isnt all that useful]
self seconds: time spent executing this function
calls: number of times function was called (excluding recursive)
self ms/call: average time per execution (excluding descendents)
total ms/call: average time per execution (including descendents)

22

Call Graph Output


called/total
index %time
[1]

parents
self descendents

called+self
called/total

59.7

12.97
0.00
0.00
0.00
1/3
-----------------------------------------------

name

index
children
<spontaneous>
internal_mcount [1]
atexit [35]

<spontaneous>
40.3
0.00
8.75
_start [2]
0.00
8.75
1/1
main [3]
0.00
0.00
2/3
atexit [35]
----------------------------------------------0.00
8.75
1/1
_start [2]
[3]
40.3
0.00
8.75
1
main [3]
0.00
8.32
1/1
getBestMove [4]
0.00
0.43
1/1
clock [20]
0.00
0.00
1/747130
GameState_expandMove [6]
0.00
0.00
1/1
exit [33]
0.00
0.00
1/1
Move_read [36]
0.00
0.00
1/1
GameState_new [37]
0.00
0.00
6/747135
GameState_getStatus [31]
0.00
0.00
1/747130
GameState_applyDeltas [25]
0.00
0.00
1/1
GameState_write [44]
0.00
0.00
1/1
sscanf [54]
0.00
0.00
1/1698871
GameState_getPlayer [30]
0.00
0.00
1/1
_fprintf [58]
0.00
0.00
1/1
Move_write [59]
0.00
0.00
3/3
GameState_playerToStr [63]
0.00
0.00
1/2
strcmp [66]
0.00
0.00
1/1
GameState_playerFromStr [68]
0.00
0.00
1/1
Move_isValid [69]
0.00
0.00
1/1
GameState_getSearchDepth [67]
----------------------------------------------0.00
8.32
1/1
main [3]
[4]
38.3
0.00
8.32
1
getBestMove [4]
0.27
8.05
6/6
minimax [5]
0.00
0.00
6/747130
GameState_expandMove [6]
0.00
0.00
35/4755325
Delta_free [10]
0.00
0.00
1/204617
GameState_genMoves [17]
0.00
0.00
5/945027
Move_free [23]
0.00
0.00
6/747130
GameState_applyDeltas [25]
0.00
0.00
6/747129
GameState_unApplyDeltas [27]
0.00
0.00
2/1698871
GameState_getPlayer [30]
----------------------------------------------747123
minimax [5]
0.27
8.05
6/6
getBestMove [4]
[5]
38.3
0.27
8.05
6+747123 minimax [5]
0.63
3.56 747123/747130
GameState_expandMove [6]
0.25
1.82 4755290/4755325
Delta_free [10]
0.07
0.69 204616/204617
GameState_genMoves [17]
0.02
0.36 945022/945027
Move_free [23]
0.23
0.00 747123/747130
GameState_applyDeltas [25]
0.22
0.00 747123/747129
GameState_unApplyDeltas [27]
0.10
0.00 1698868/1698871
GameState_getPlayer [30]
0.09
0.00 747129/747135
GameState_getStatus [31]
0.01
0.00 542509/542509
GameState_getValue [32]
747123
minimax [5]
----------------------------------------------0.00
0.00
1/747130
main [3]
0.00
0.00
6/747130
getBestMove [4]
0.63
3.56 747123/747130
minimax [5]
[6]
19.3
0.63
3.56 747130
GameState_expandMove [6]
0.47
2.99 4755331/5700361
calloc [7]
0.11
0.00 2360787/2360787
.rem [28]
----------------------------------------------0.00
0.00
1/5700361
Move_read [36]
0.00
0.00
1/5700361
GameState_new [37]
0.09
0.59 945028/5700361
GameState_genMoves [17]
0.47
2.99 4755331/5700361
GameState_expandMove [6]
[7]
19.1
0.56
3.58 5700361
calloc [7]
0.32
2.09 5700361/5700362
malloc [8]
0.64
0.00 5700361/5700361
.umul [18]
0.43
0.00 5700361/5700361
_memset [22]
0.10
0.00 5700361/5700363
.udiv [29]
----------------------------------------------0.00
0.00
1/5700362
_findbuf [41]
0.32
2.09 5700361/5700362
calloc [7]
[8]
11.1
0.32
2.09 5700362
malloc [8]
0.62
0.62 5700362/5700362
_malloc_unlocked
0.23
0.20 5700362/11400732
_mutex_unlock [14]
0.22
0.20 5700362/11400732
mutex_lock [15]
[2]

Complex format
at the beginning
lets skip for now.

<cycle 1> [13]

23

Flat Profile
%
time
57.1
4.8
4.4
3.5
2.8
2.8
2.5
2.1
1.9
1.9
1.9
1.8
1.4
1.4
1.3
1.2
1.1
1.0
1.0
1.0
1.0
0.5
0.4
0.4
0.4
0.3
0.1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0

cumulative
self
seconds
seconds
calls
12.97
12.97
14.05
1.08 5700352
15.04
0.99
15.84
0.80 22801464
16.48
0.64 5700361
17.11
0.63
747130
17.67
0.56 5700361
18.14
0.47 11400732
18.58
0.44 11400732
19.01
0.43 5700361
19.44
0.43
1
19.85
0.41 5157853
20.17
0.32 5700366
20.49
0.32 5700362
20.79
0.30 5157847
21.06
0.27
6
21.31
0.25 4755325
21.54
0.23 5700352
21.77
0.23
747130
21.99
0.22 5157845
22.21
0.22
747129
22.32
0.11 2360787
22.42
0.10 5700363
22.52
0.10 1698871
22.61
0.09
747135
22.68
0.07
204617
22.70
0.02
945027
22.71
0.01
542509
22.71
0.00
104
22.71
0.00
64
22.71
0.00
54
22.71
0.00
52
22.71
0.00
51
22.71
0.00
51
22.71
0.00
13
22.71
0.00
10
22.71
0.00
7
22.71
0.00
4
22.71
0.00
4
22.71
0.00
4
22.71
0.00
3
22.71
0.00
3

self
ms/call
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
430.00
0.00
0.00
0.00
0.00
45.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

total
ms/call name
internal_mcount [1]
0.00 _free_unlocked [12]
_mcount (693)
0.00 _return_zero [16]
0.00 .umul [18]
0.01 GameState_expandMove [6]
0.00 calloc [7]
0.00 _mutex_unlock [14]
0.00 mutex_lock [15]
0.00 _memset [22]
430.00 .div [21]
0.00 cleanfree [19]
0.00 _malloc_unlocked [13]
0.00 malloc [8]
0.00 _smalloc
[24]
1386.66 minimax [5]
0.00 Delta_free [10]
0.00 free [9]
0.00 GameState_applyDeltas [25]
0.00 realfree [26]
0.00 GameState_unApplyDeltas [27]
0.00 .rem [28]
0.00 .udiv [29]
0.00 GameState_getPlayer [30]
0.00 GameState_getStatus [31]
0.00 GameState_genMoves [17]
0.00 Move_free [23]
0.00 GameState_getValue [32]
0.00 _ferror_unlocked [357]
0.00 _realbufend [358]
0.00 nvmatch [60]
0.00 _doprnt [42]
0.00 memchr [61]
0.00 printf [43]
0.00 _write [359]
0.00 _xflsbuf [360]
0.00 _memcpy [361]
0.00 .mul [62]
0.00 ___errno [362]
0.00 _fflush_u [363]
0.00 GameState_playerToStr [63]
0.00 _findbuf [41]

Second part of
profile looks like
this; its the
simple
(i.e.,useful) part;
corresponds to
the prof tool

24

Overhead of Profiling
%
time
57.1
4.8
4.4
3.5
2.8
2.8
2.5
2.1
1.9
1.9
1.9
1.8
1.4
1.4
1.3
1.2
1.1
1.0
1.0
1.0
1.0
0.5
0.4
0.4
0.4

cumulative
self
seconds
seconds
calls
12.97
12.97
14.05
1.08 5700352
15.04
0.99
15.84
0.80 22801464
16.48
0.64 5700361
17.11
0.63
747130
17.67
0.56 5700361
18.14
0.47 11400732
18.58
0.44 11400732
19.01
0.43 5700361
19.44
0.43
1
19.85
0.41 5157853
20.17
0.32 5700366
20.49
0.32 5700362
20.79
0.30 5157847
21.06
0.27
6
21.31
0.25 4755325
21.54
0.23 5700352
21.77
0.23
747130
21.99
0.22 5157845
22.21
0.22
747129
22.32
0.11 2360787
22.42
0.10 5700363
22.52
0.10 1698871
22.61
0.09
747135

self
ms/call
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
430.00
0.00
0.00
0.00
0.00
45.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

total
ms/call name
internal_mcount
0.00 _free_unlocked
_mcount (693)
0.00 _return_zero
0.00 .umul [18]
0.01 GameState_expa
0.00 calloc [7]
0.00 _mutex_unlock
0.00 mutex_lock
0.00 _memset [22]
430.00 .div [21]
0.00 cleanfree [19]
0.00 _malloc_unlo
0.00 malloc [8]
0.00 _smalloc
1386.66 minimax [5]
0.00 Delta_free [10]
0.00 free [9]
0.00 GameState_appl
0.00 realfree [26]
0.00 GameState_unAp
0.00 .rem [28]
0.00 .udiv [29]
0.00 GameState_getPl
0.00 GameState_getSt
25

Malloc/calloc/free/...
%
time
57.1
4.8
4.4
3.5
2.8
2.8
2.5
2.1
1.9
1.9
1.9
1.8
1.4
1.4
1.3
1.2
1.1
1.0
1.0
1.0
1.0
0.5
0.4
0.4
0.4
0.3

cumulative
self
seconds
seconds
calls
12.97
12.97
14.05
1.08 5700352
15.04
0.99
15.84
0.80 22801464
16.48
0.64 5700361
17.11
0.63
747130
17.67
0.56 5700361
18.14
0.47 11400732
18.58
0.44 11400732
19.01
0.43 5700361
19.44
0.43
1
19.85
0.41 5157853
20.17
0.32 5700366
20.49
0.32 5700362
20.79
0.30 5157847
21.06
0.27
6
21.31
0.25 4755325
21.54
0.23 5700352
21.77
0.23
747130
21.99
0.22 5157845
22.21
0.22
747129
22.32
0.11 2360787
22.42
0.10 5700363
22.52
0.10 1698871
22.61
0.09
747135
22.68
0.07
204617

self
ms/call
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
430.00
0.00
0.00
0.00
0.00
45.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

total
ms/call name
internal_mcount [1]
0.00 _free_unlocked [12]
_mcount (693)
0.00 _return_zero [16]
0.00 .umul [18]
0.01 GameState_expandMove
0.00 calloc [7]
0.00 _mutex_unlock [14]
0.00 mutex_lock [15]
0.00 _memset [22]
430.00 .div [21]
0.00 cleanfree [19]
0.00 _malloc_unlocked [13]
0.00 malloc [8]
0.00 _smalloc
[24]
1386.66 minimax [5]
0.00 Delta_free [10]
0.00 free [9]
0.00 GameState_applyDeltas
0.00 realfree [26]
0.00 GameState_unApplyDeltas
0.00 .rem [28]
0.00 .udiv [29]
0.00 GameState_getPlayer
0.00 GameState_getStatus
26
0.00 GameState_genMoves [17]

expandMove
%
time
57.1
4.8
4.4
3.5
2.8
2.8
2.5
2.1
1.9
1.9
1.9
1.8
1.4
1.4
1.3
1.2
1.1
1.0
1.0
1.0

cumulative
self
seconds
seconds
calls
12.97
12.97
14.05
1.08 5700352
15.04
0.99
15.84
0.80 22801464
16.48
0.64 5700361
17.11
0.63
747130
17.67
0.56 5700361
18.14
0.47 11400732
18.58
0.44 11400732
19.01
0.43 5700361
19.44
0.43
1
19.85
0.41 5157853
20.17
0.32 5700366
20.49
0.32 5700362
20.79
0.30 5157847
21.06
0.27
6
21.31
0.25 4755325
21.54
0.23 5700352
21.77
0.23
747130
21.99
0.22 5157845

self
ms/call
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
430.00
0.00
0.00
0.00
0.00
45.00
0.00
0.00
0.00
0.00

total
ms/call name
internal_mcount [1]
0.00 _free_unlocked [12]
_mcount (693)
0.00 _return_zero [16]
0.00 .umul [18]
0.01 GameState_expandMove
0.00 calloc [7]
0.00 _mutex_unlock [14]
0.00 mutex_lock [15]
0.00 _memset [22]
430.00 .div [21]
0.00 cleanfree [19]
0.00 _malloc_unlocked [13]
0.00 malloc [8]
0.00 _smalloc
[24]
1386.66 minimax [5]
0.00 Delta_free [10]
0.00 free [9]
0.00 GameState_applyDeltas
0.00 realfree [26]

May be worthwhile to optimize this routine

27

Dont Even Think of Optimizing These


% cumulative
time
seconds
57.1
12.97
4.8
14.05
4.4
15.04
3.5
15.84
2.8
16.48
2.8
17.11
2.5
17.67
2.1
18.14
1.9
18.58
1.9
19.01
1.9
19.44
1.8
19.85
1.4
20.17
1.4
20.49
1.3
20.79
1.2
21.06
1.1
21.31
1.0
21.54
1.0
21.77
1.0
21.99
1.0
22.21
0.5
22.32
0.4
22.42
0.4
22.52
0.4
22.61
0.3
22.68
0.1
22.70
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71
0.0
22.71

self
seconds
calls
12.97
1.08 5700352
0.99
0.80 22801464
0.64 5700361
0.63
747130
0.56 5700361
0.47 11400732
0.44 11400732
0.43 5700361
0.43
1
0.41 5157853
0.32 5700366
0.32 5700362
0.30 5157847
0.27
6
0.25 4755325
0.23 5700352
0.23
747130
0.22 5157845
0.22
747129
0.11 2360787
0.10 5700363
0.10 1698871
0.09
747135
0.07
204617
0.02
945027
0.01
542509
0.00
104
0.00
4
0.00
3
0.00
2
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1

self
total
ms/call ms/call name
internal_mcount [1]
0.00
0.00 _free_unlocked [12]
_mcount (693)
0.00
0.00 _return_zero [16]
0.00
0.00 .umul [18]
0.00
0.01 GameState_expandMove [6]
0.00
0.00 calloc [7]
0.00
0.00 _mutex_unlock [14]
0.00
0.00 mutex_lock [15]
0.00
0.00 _memset [22]
430.00
430.00 .div [21]
0.00
0.00 cleanfree [19]
0.00
0.00 _malloc_unlocked <cycle 1> [13]
0.00
0.00 malloc [8]
0.00
0.00 _smalloc
<cycle 1> [24]
45.00 1386.66 minimax [5]
0.00
0.00 Delta_free [10]
0.00
0.00 free [9]
0.00
0.00 GameState_applyDeltas [25]
0.00
0.00 realfree [26]
0.00
0.00 GameState_unApplyDeltas [27]
0.00
0.00 .rem [28]
0.00
0.00 .udiv [29]
0.00
0.00 GameState_getPlayer [30]
0.00
0.00 GameState_getStatus [31]
0.00
0.00 GameState_genMoves [17]
0.00
0.00 Move_free [23]
0.00
0.00 GameState_getValue [32]
0.00
0.00 _ferror_unlocked [357]
0.00
0.00 _thr_main [367]
0.00
0.00 GameState_playerToStr [63]
0.00
0.00 strcmp [66]
0.00
0.00 GameState_getSearchDepth [67]
0.00
0.00 GameState_new [37]
0.00
0.00 GameState_playerFromStr [68]
0.00
0.00 GameState_write [44]
0.00
0.00 Move_isValid [69]
0.00
0.00 Move_read [36]
0.00
0.00 Move_write [59]
0.00
0.00 check_nlspath_env [46]
0.00
430.00 clock [20]
0.00
0.00 exit [33]
0.00 8319.99 getBestMove [4]
0.00
0.00 getenv [47]
0.00 8750.00 main [3]
0.00
0.00 mem_init [70]
0.00
0.00 number [71]
0.00
0.00 scanf [53]

28

Using a Profiler
Test your code as you write it
o It is very hard to debug a lot of code all at once
o Isolate modules and test them independently
o Design your tests to cover boundary conditions

Instrument your code as you write it


o Include asserts and verify data structure sanity often
o Include debugging statements (e.g., #ifdef DEBUG and #endif)
o Youll be surprised what your program is really doing!!!

Time and profile your code only when you are done
o Dont optimize code unless you have to (you almost never will)
o Fixing your algorithm is almost always the solution
o Otherwise, running optimizing compiler is usually enough
29

Summary
Two valuable UNIX tools
o Make: building large program in pieces
o Gprof: profiling a program to see where the time goes

Always use make, selectively use gprof


o A little thinking saves a lot of effort
o Extra performance not always achievable
o Understand concept of diminishing returns
When is being lazy the right choice

30

Travel Time and Time Travel


You plan to visit a friend in Turkey
Concorde to Paris + 737 to Istanbul = $3500
747 to Paris + 737 to Istanbul = $1200
Equipment

New York to Paris

Paris to Istanbul

Total

747 + 737

8 Hours

4 Hours

12 Hours

SST + 737

3 Hours

4 Hours

7 Hours

Taking the SST (which is 2.7 times faster) speeds up the


overall trip by only a factor of 1.7!
Teleporter to Paris? (Teleporter is 106 times faster)
Time Machine to Paris?
31

Amdahls Law
Fraction optimized limits overall speedup

Amdahls Law:

Speedup

1 f +

f
s

where f is fraction optimized,


s is speedup of that fraction

32

Amdahls Law
Speedup Possible

Speed Enhancement is limited by fraction optimized:


10
9
8
7
6
5
4
3
2
1
0

0.0

0.2

0.4

0.6

0.8

1.0

Fraction Optimized (f )

lim
s

1
f
1 f +
s

1
1 f

where f is fraction optimized,


s is speedup of that fraction
33

Example Parallelism
Parallel Processing - throw more processors at problem
1024 parallel processors - LOTS OF MONEY!
90% of code is parallel (f = 0.9)
Parallel portion speeds up by 1024 (s = 1024)
Serial portion of code (1-f) limits speedup

lim
s

1
1 f +

f
s

1
1 f

Serial portion limits to 10x speedup!


34

You might also like