Make Gprof
Make Gprof
Make
o Overview of compilation process
o Motivation for using Makefiles
o Example Makefile, refined in five steps
Gprof
o Timing, instrumenting, and profiling
o GNU Performance Profiler (Gprof)
o Running gprof and understanding the output
testintmath.c
intmath.c: implementation of math functions
testintmath.c: implementation of tests of the math functions
intmath.h
intmath.c
testintmath
gcc Wall ansi pedantic o testintmath testintmath.c intmath.c
3
Examples
testintmath: testintmath.o intmath.o
gcc o testintmath testintmath.o intmath.o
Complete Makefile #1
Three groups
o testintmath: link testintmath.o and intmath.o
o testintmath.o: compile testintmath.c, which depends on intmath.h
o intmath.o: compile intmath.c, which depends on intmath.h
all: testintmath
clobber: clean
rm -f *~ \#*\# core
clean:
rm -f testintmath *.o
8
Complete Makefile #2
# Build rules for non-file targets
all: testintmath
clobber: clean
rm -f *~ \#*\# core
clean:
rm -f testintmath *.o
# Build rules for file targets
testintmath: testintmath.o intmath.o
gcc o testintmath testintmath.o intmath.o
testintmath.o: testintmath.c intmath.h
gcc -Wall -ansi -pedantic -c -o testintmath.o testintmath.c
intmath.o: intmath.c intmath.h
gcc -Wall -ansi -pedantic -c -o intmath.o intmath.c
9
Useful Abbreviations
Abbreviations
o Target file: $@
o First item in the dependency list: $<
Example
testintmath: testintmath.o intmath.o
gcc o testintmath testintmath.o intmath.o
Complete Makefile #3
# Build rules for non-file targets
all: testintmath
clobber: clean
rm -f *~ \#*\# core
clean:
rm -f testintmath *.o
# Build rules for file targets
testintmath: testintmath.o intmath.o
gcc o $@ $< intmath.o
testintmath.o: testintmath.c intmath.h
gcc -Wall -ansi -pedantic -c -o $@ $<
intmath.o: intmath.c intmath.h
gcc -Wall -ansi -pedantic -c -o $@ $<
11
%.o: %.c
gcc -Wall -ansi -pedantic -c -o $@ $<
13
3. Adding abbreviations
$@ and $<
5. Adding macros
CC and CFLAGS
14
References on Makefiles
Brief discussion in the King book
o Section 15.4 (pp. 320-322)
GNU make
o http://www.gnu.org/software/make/manual/html_mono/make.html
Cautionary notes
o Dont forget to use a TAB character, rather than blanks
o Be careful with how you use the rm f command
15
Input
Program
Output
16
Timing
Most shells provide tool to time program execution
o E.g., bash time command
bash> time sort < bigfile.txt > output.txt
real
0m12.977s
user
0m12.860s
sys
0m0.010s
Breakdown of time
o Real: elapsed time between invocation and termination
o User: time spent executing the program
o System: time spent within the OS on the programs behalf
But, which parts of the code are the most time consuming?
17
Instrumenting
Most operating systems provide a way to get the time
o e.g., UNIX gettimeofday command
#include <sys/time.h>
struct timeval start_time, end_time;
gettimeofday(&start_time, NULL);
<execute some code here>
gettimeofday(&end_time, NULL);
float seconds = end_time.tv_sec - start_time.tv_sec +
1.0E-6F * (end_time.tv_usec - start_time.tv_usec);
18
Profiling
Gather statistics about your programs execution
o
o
o
o
19
Profiler Basics
Profiler is just a tool
o Only as good as its user
o Can help find hotspots, but you must analyze them
Analysis includes
o
o
o
o
Deciding to do nothing
Changing algorithm
Changing low-level details
Knowing when to stop Amdahls law
Process
o
o
o
o
o
Write code
Make sure its correct, verify correctness, test correctness
Run profiler
Possibly optimize code
Make sure its correct, verify correctness, test correctness
20
21
22
parents
self descendents
called+self
called/total
59.7
12.97
0.00
0.00
0.00
1/3
-----------------------------------------------
name
index
children
<spontaneous>
internal_mcount [1]
atexit [35]
<spontaneous>
40.3
0.00
8.75
_start [2]
0.00
8.75
1/1
main [3]
0.00
0.00
2/3
atexit [35]
----------------------------------------------0.00
8.75
1/1
_start [2]
[3]
40.3
0.00
8.75
1
main [3]
0.00
8.32
1/1
getBestMove [4]
0.00
0.43
1/1
clock [20]
0.00
0.00
1/747130
GameState_expandMove [6]
0.00
0.00
1/1
exit [33]
0.00
0.00
1/1
Move_read [36]
0.00
0.00
1/1
GameState_new [37]
0.00
0.00
6/747135
GameState_getStatus [31]
0.00
0.00
1/747130
GameState_applyDeltas [25]
0.00
0.00
1/1
GameState_write [44]
0.00
0.00
1/1
sscanf [54]
0.00
0.00
1/1698871
GameState_getPlayer [30]
0.00
0.00
1/1
_fprintf [58]
0.00
0.00
1/1
Move_write [59]
0.00
0.00
3/3
GameState_playerToStr [63]
0.00
0.00
1/2
strcmp [66]
0.00
0.00
1/1
GameState_playerFromStr [68]
0.00
0.00
1/1
Move_isValid [69]
0.00
0.00
1/1
GameState_getSearchDepth [67]
----------------------------------------------0.00
8.32
1/1
main [3]
[4]
38.3
0.00
8.32
1
getBestMove [4]
0.27
8.05
6/6
minimax [5]
0.00
0.00
6/747130
GameState_expandMove [6]
0.00
0.00
35/4755325
Delta_free [10]
0.00
0.00
1/204617
GameState_genMoves [17]
0.00
0.00
5/945027
Move_free [23]
0.00
0.00
6/747130
GameState_applyDeltas [25]
0.00
0.00
6/747129
GameState_unApplyDeltas [27]
0.00
0.00
2/1698871
GameState_getPlayer [30]
----------------------------------------------747123
minimax [5]
0.27
8.05
6/6
getBestMove [4]
[5]
38.3
0.27
8.05
6+747123 minimax [5]
0.63
3.56 747123/747130
GameState_expandMove [6]
0.25
1.82 4755290/4755325
Delta_free [10]
0.07
0.69 204616/204617
GameState_genMoves [17]
0.02
0.36 945022/945027
Move_free [23]
0.23
0.00 747123/747130
GameState_applyDeltas [25]
0.22
0.00 747123/747129
GameState_unApplyDeltas [27]
0.10
0.00 1698868/1698871
GameState_getPlayer [30]
0.09
0.00 747129/747135
GameState_getStatus [31]
0.01
0.00 542509/542509
GameState_getValue [32]
747123
minimax [5]
----------------------------------------------0.00
0.00
1/747130
main [3]
0.00
0.00
6/747130
getBestMove [4]
0.63
3.56 747123/747130
minimax [5]
[6]
19.3
0.63
3.56 747130
GameState_expandMove [6]
0.47
2.99 4755331/5700361
calloc [7]
0.11
0.00 2360787/2360787
.rem [28]
----------------------------------------------0.00
0.00
1/5700361
Move_read [36]
0.00
0.00
1/5700361
GameState_new [37]
0.09
0.59 945028/5700361
GameState_genMoves [17]
0.47
2.99 4755331/5700361
GameState_expandMove [6]
[7]
19.1
0.56
3.58 5700361
calloc [7]
0.32
2.09 5700361/5700362
malloc [8]
0.64
0.00 5700361/5700361
.umul [18]
0.43
0.00 5700361/5700361
_memset [22]
0.10
0.00 5700361/5700363
.udiv [29]
----------------------------------------------0.00
0.00
1/5700362
_findbuf [41]
0.32
2.09 5700361/5700362
calloc [7]
[8]
11.1
0.32
2.09 5700362
malloc [8]
0.62
0.62 5700362/5700362
_malloc_unlocked
0.23
0.20 5700362/11400732
_mutex_unlock [14]
0.22
0.20 5700362/11400732
mutex_lock [15]
[2]
Complex format
at the beginning
lets skip for now.
23
Flat Profile
%
time
57.1
4.8
4.4
3.5
2.8
2.8
2.5
2.1
1.9
1.9
1.9
1.8
1.4
1.4
1.3
1.2
1.1
1.0
1.0
1.0
1.0
0.5
0.4
0.4
0.4
0.3
0.1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
cumulative
self
seconds
seconds
calls
12.97
12.97
14.05
1.08 5700352
15.04
0.99
15.84
0.80 22801464
16.48
0.64 5700361
17.11
0.63
747130
17.67
0.56 5700361
18.14
0.47 11400732
18.58
0.44 11400732
19.01
0.43 5700361
19.44
0.43
1
19.85
0.41 5157853
20.17
0.32 5700366
20.49
0.32 5700362
20.79
0.30 5157847
21.06
0.27
6
21.31
0.25 4755325
21.54
0.23 5700352
21.77
0.23
747130
21.99
0.22 5157845
22.21
0.22
747129
22.32
0.11 2360787
22.42
0.10 5700363
22.52
0.10 1698871
22.61
0.09
747135
22.68
0.07
204617
22.70
0.02
945027
22.71
0.01
542509
22.71
0.00
104
22.71
0.00
64
22.71
0.00
54
22.71
0.00
52
22.71
0.00
51
22.71
0.00
51
22.71
0.00
13
22.71
0.00
10
22.71
0.00
7
22.71
0.00
4
22.71
0.00
4
22.71
0.00
4
22.71
0.00
3
22.71
0.00
3
self
ms/call
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
430.00
0.00
0.00
0.00
0.00
45.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
total
ms/call name
internal_mcount [1]
0.00 _free_unlocked [12]
_mcount (693)
0.00 _return_zero [16]
0.00 .umul [18]
0.01 GameState_expandMove [6]
0.00 calloc [7]
0.00 _mutex_unlock [14]
0.00 mutex_lock [15]
0.00 _memset [22]
430.00 .div [21]
0.00 cleanfree [19]
0.00 _malloc_unlocked [13]
0.00 malloc [8]
0.00 _smalloc
[24]
1386.66 minimax [5]
0.00 Delta_free [10]
0.00 free [9]
0.00 GameState_applyDeltas [25]
0.00 realfree [26]
0.00 GameState_unApplyDeltas [27]
0.00 .rem [28]
0.00 .udiv [29]
0.00 GameState_getPlayer [30]
0.00 GameState_getStatus [31]
0.00 GameState_genMoves [17]
0.00 Move_free [23]
0.00 GameState_getValue [32]
0.00 _ferror_unlocked [357]
0.00 _realbufend [358]
0.00 nvmatch [60]
0.00 _doprnt [42]
0.00 memchr [61]
0.00 printf [43]
0.00 _write [359]
0.00 _xflsbuf [360]
0.00 _memcpy [361]
0.00 .mul [62]
0.00 ___errno [362]
0.00 _fflush_u [363]
0.00 GameState_playerToStr [63]
0.00 _findbuf [41]
Second part of
profile looks like
this; its the
simple
(i.e.,useful) part;
corresponds to
the prof tool
24
Overhead of Profiling
%
time
57.1
4.8
4.4
3.5
2.8
2.8
2.5
2.1
1.9
1.9
1.9
1.8
1.4
1.4
1.3
1.2
1.1
1.0
1.0
1.0
1.0
0.5
0.4
0.4
0.4
cumulative
self
seconds
seconds
calls
12.97
12.97
14.05
1.08 5700352
15.04
0.99
15.84
0.80 22801464
16.48
0.64 5700361
17.11
0.63
747130
17.67
0.56 5700361
18.14
0.47 11400732
18.58
0.44 11400732
19.01
0.43 5700361
19.44
0.43
1
19.85
0.41 5157853
20.17
0.32 5700366
20.49
0.32 5700362
20.79
0.30 5157847
21.06
0.27
6
21.31
0.25 4755325
21.54
0.23 5700352
21.77
0.23
747130
21.99
0.22 5157845
22.21
0.22
747129
22.32
0.11 2360787
22.42
0.10 5700363
22.52
0.10 1698871
22.61
0.09
747135
self
ms/call
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
430.00
0.00
0.00
0.00
0.00
45.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
total
ms/call name
internal_mcount
0.00 _free_unlocked
_mcount (693)
0.00 _return_zero
0.00 .umul [18]
0.01 GameState_expa
0.00 calloc [7]
0.00 _mutex_unlock
0.00 mutex_lock
0.00 _memset [22]
430.00 .div [21]
0.00 cleanfree [19]
0.00 _malloc_unlo
0.00 malloc [8]
0.00 _smalloc
1386.66 minimax [5]
0.00 Delta_free [10]
0.00 free [9]
0.00 GameState_appl
0.00 realfree [26]
0.00 GameState_unAp
0.00 .rem [28]
0.00 .udiv [29]
0.00 GameState_getPl
0.00 GameState_getSt
25
Malloc/calloc/free/...
%
time
57.1
4.8
4.4
3.5
2.8
2.8
2.5
2.1
1.9
1.9
1.9
1.8
1.4
1.4
1.3
1.2
1.1
1.0
1.0
1.0
1.0
0.5
0.4
0.4
0.4
0.3
cumulative
self
seconds
seconds
calls
12.97
12.97
14.05
1.08 5700352
15.04
0.99
15.84
0.80 22801464
16.48
0.64 5700361
17.11
0.63
747130
17.67
0.56 5700361
18.14
0.47 11400732
18.58
0.44 11400732
19.01
0.43 5700361
19.44
0.43
1
19.85
0.41 5157853
20.17
0.32 5700366
20.49
0.32 5700362
20.79
0.30 5157847
21.06
0.27
6
21.31
0.25 4755325
21.54
0.23 5700352
21.77
0.23
747130
21.99
0.22 5157845
22.21
0.22
747129
22.32
0.11 2360787
22.42
0.10 5700363
22.52
0.10 1698871
22.61
0.09
747135
22.68
0.07
204617
self
ms/call
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
430.00
0.00
0.00
0.00
0.00
45.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
total
ms/call name
internal_mcount [1]
0.00 _free_unlocked [12]
_mcount (693)
0.00 _return_zero [16]
0.00 .umul [18]
0.01 GameState_expandMove
0.00 calloc [7]
0.00 _mutex_unlock [14]
0.00 mutex_lock [15]
0.00 _memset [22]
430.00 .div [21]
0.00 cleanfree [19]
0.00 _malloc_unlocked [13]
0.00 malloc [8]
0.00 _smalloc
[24]
1386.66 minimax [5]
0.00 Delta_free [10]
0.00 free [9]
0.00 GameState_applyDeltas
0.00 realfree [26]
0.00 GameState_unApplyDeltas
0.00 .rem [28]
0.00 .udiv [29]
0.00 GameState_getPlayer
0.00 GameState_getStatus
26
0.00 GameState_genMoves [17]
expandMove
%
time
57.1
4.8
4.4
3.5
2.8
2.8
2.5
2.1
1.9
1.9
1.9
1.8
1.4
1.4
1.3
1.2
1.1
1.0
1.0
1.0
cumulative
self
seconds
seconds
calls
12.97
12.97
14.05
1.08 5700352
15.04
0.99
15.84
0.80 22801464
16.48
0.64 5700361
17.11
0.63
747130
17.67
0.56 5700361
18.14
0.47 11400732
18.58
0.44 11400732
19.01
0.43 5700361
19.44
0.43
1
19.85
0.41 5157853
20.17
0.32 5700366
20.49
0.32 5700362
20.79
0.30 5157847
21.06
0.27
6
21.31
0.25 4755325
21.54
0.23 5700352
21.77
0.23
747130
21.99
0.22 5157845
self
ms/call
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
430.00
0.00
0.00
0.00
0.00
45.00
0.00
0.00
0.00
0.00
total
ms/call name
internal_mcount [1]
0.00 _free_unlocked [12]
_mcount (693)
0.00 _return_zero [16]
0.00 .umul [18]
0.01 GameState_expandMove
0.00 calloc [7]
0.00 _mutex_unlock [14]
0.00 mutex_lock [15]
0.00 _memset [22]
430.00 .div [21]
0.00 cleanfree [19]
0.00 _malloc_unlocked [13]
0.00 malloc [8]
0.00 _smalloc
[24]
1386.66 minimax [5]
0.00 Delta_free [10]
0.00 free [9]
0.00 GameState_applyDeltas
0.00 realfree [26]
27
self
seconds
calls
12.97
1.08 5700352
0.99
0.80 22801464
0.64 5700361
0.63
747130
0.56 5700361
0.47 11400732
0.44 11400732
0.43 5700361
0.43
1
0.41 5157853
0.32 5700366
0.32 5700362
0.30 5157847
0.27
6
0.25 4755325
0.23 5700352
0.23
747130
0.22 5157845
0.22
747129
0.11 2360787
0.10 5700363
0.10 1698871
0.09
747135
0.07
204617
0.02
945027
0.01
542509
0.00
104
0.00
4
0.00
3
0.00
2
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
0.00
1
self
total
ms/call ms/call name
internal_mcount [1]
0.00
0.00 _free_unlocked [12]
_mcount (693)
0.00
0.00 _return_zero [16]
0.00
0.00 .umul [18]
0.00
0.01 GameState_expandMove [6]
0.00
0.00 calloc [7]
0.00
0.00 _mutex_unlock [14]
0.00
0.00 mutex_lock [15]
0.00
0.00 _memset [22]
430.00
430.00 .div [21]
0.00
0.00 cleanfree [19]
0.00
0.00 _malloc_unlocked <cycle 1> [13]
0.00
0.00 malloc [8]
0.00
0.00 _smalloc
<cycle 1> [24]
45.00 1386.66 minimax [5]
0.00
0.00 Delta_free [10]
0.00
0.00 free [9]
0.00
0.00 GameState_applyDeltas [25]
0.00
0.00 realfree [26]
0.00
0.00 GameState_unApplyDeltas [27]
0.00
0.00 .rem [28]
0.00
0.00 .udiv [29]
0.00
0.00 GameState_getPlayer [30]
0.00
0.00 GameState_getStatus [31]
0.00
0.00 GameState_genMoves [17]
0.00
0.00 Move_free [23]
0.00
0.00 GameState_getValue [32]
0.00
0.00 _ferror_unlocked [357]
0.00
0.00 _thr_main [367]
0.00
0.00 GameState_playerToStr [63]
0.00
0.00 strcmp [66]
0.00
0.00 GameState_getSearchDepth [67]
0.00
0.00 GameState_new [37]
0.00
0.00 GameState_playerFromStr [68]
0.00
0.00 GameState_write [44]
0.00
0.00 Move_isValid [69]
0.00
0.00 Move_read [36]
0.00
0.00 Move_write [59]
0.00
0.00 check_nlspath_env [46]
0.00
430.00 clock [20]
0.00
0.00 exit [33]
0.00 8319.99 getBestMove [4]
0.00
0.00 getenv [47]
0.00 8750.00 main [3]
0.00
0.00 mem_init [70]
0.00
0.00 number [71]
0.00
0.00 scanf [53]
28
Using a Profiler
Test your code as you write it
o It is very hard to debug a lot of code all at once
o Isolate modules and test them independently
o Design your tests to cover boundary conditions
Time and profile your code only when you are done
o Dont optimize code unless you have to (you almost never will)
o Fixing your algorithm is almost always the solution
o Otherwise, running optimizing compiler is usually enough
29
Summary
Two valuable UNIX tools
o Make: building large program in pieces
o Gprof: profiling a program to see where the time goes
30
Paris to Istanbul
Total
747 + 737
8 Hours
4 Hours
12 Hours
SST + 737
3 Hours
4 Hours
7 Hours
Amdahls Law
Fraction optimized limits overall speedup
Amdahls Law:
Speedup
1 f +
f
s
32
Amdahls Law
Speedup Possible
0.0
0.2
0.4
0.6
0.8
1.0
Fraction Optimized (f )
lim
s
1
f
1 f +
s
1
1 f
Example Parallelism
Parallel Processing - throw more processors at problem
1024 parallel processors - LOTS OF MONEY!
90% of code is parallel (f = 0.9)
Parallel portion speeds up by 1024 (s = 1024)
Serial portion of code (1-f) limits speedup
lim
s
1
1 f +
f
s
1
1 f