0% found this document useful (0 votes)
11 views

hpc_debug

Uploaded by

Rajul
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

hpc_debug

Uploaded by

Rajul
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

HPC debugging

Victor Eijkhout

2022

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

Profiling and debugging;


optimization and
programming strategies.

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

1 Analysis basics

• Measurements: repeated and controlled


beware of transients, do you know where your data is?
• Document everything
• Script everything

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

2 Compiler options

• Defaults are a starting point


• use reporting options: -opt-report, -vec-report
useful to check if optimization happened / could not happen
• test numerical correctness before/after optimization change
(there are options for numerical corretness)

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

3 Optimization basics

• Use libraries when possible: don’t reinvent the wheel


• Premature optimization is the root of all evil (Knuth)

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

4 Code design for performance

• Keep inner loops simple: no conditionals, function calls, casts


• Avoid small functions: try macros or inlining
• Keep in mind all the cache,TLB, SIMD stuff from before
• SIMD: Fortran array syntax helps

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

5 Multicore / multithread

• Use numactl: prevent process migration


• ‘first touch’ policy: allocate data where it will be used
• Scaling behaviour mostly influenced by bandwidth

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

6 Multinode performance

• Influenced by load balancing


• Use HPCtoolkit, Scalasca, TAU for plotting
• Explore ‘eager’ limit (mvapich2: environment variables)

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

7 Classes of programming errors

Logic errors:
functions behave differently from how you thought,
or interact in ways you didn’t envision

Hard to debug

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

8 More classes of errors

Coding errors:
send without receive
forget to allocate buffer

Debuggers can help

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

Defensive programming

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

9 Defensive programming

• Keep It Simple (‘restrict expressivity’)


• Example: use collective instead of spelling it out
• easier to write / harder to get wrong
the library and runtime are likely to be better at optimizing than
you

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

10 Memory management

Beware of memory leaks:


keep allocation and free in same lexical scope

C++ does this automatically with RAII

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

11 Modular design

Design for debuggability, also easier to optimize

Separation of concerns: try to keep code aspects separate

Premature optimization is the root of all evil (Knuth)

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

12 MPI performance design

Be aware of latencies: bundle messages


(this may go again separation of concerns)

Consider ‘eager limit’

Process placement, reduction in number of processes

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

Debugging

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

13

Debugging is like being the detective in a crime movie


where you are also the murderer. (Filipe Fortes, 2013)

What do you do when your program misbehaves?

• Insert print statements, recompile, run again.


• Run your program in a debugger
• (also: attach a debugger, inspect a core dump)

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

14 Simple example: listing

tutorials/gdb/c/hello.c

#include <stdlib.h>
#include <stdio.h>
int main() {
printf("hello world\n");
return 0;
}

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

15 Simple example: running


%% cc -g -o hello hello.c
# regular invocation:
%% ./hello
hello world
# invocation from gdb:
%% gdb hello
GNU gdb 6.3.50-20050815 # ..... [version info]
Copyright 2004 Free Software Foundation, Inc. .... [copyright info
(gdb) run
Starting program: /home/eijkhout/tutorials/gdb/hello
Reading symbols for shared libraries +. done
hello world

Program exited normally.


(gdb) quit
%%
Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

16 Source listing

%% cc -o hello hello.c
%% gdb hello
GNU gdb 6.3.50-20050815 # ..... version info
(gdb) list

Important to use the -g compile option!

Eijkhout: programming
Defensive programming
Debugging

17 Run with arguments


Memory debugging
Parallel Debugging

tutorials/gdb/c/say.c

#include <stdlib.h>
#include <stdio.h>
int main(int argc,char **argv) {
int i;
for (i=0; i<atoi(argv[1]); i++)
printf("hello world\n");
return 0;
}

%% gdb say
.... the usual messages ...
(gdb) run 2
Starting program: /home/eijkhout/tutorials/gdb/c/say 2
Reading symbols for shared libraries +. done
hello world
hello world
Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging
18 Memory problems 1
// square.c
int nmax,i;
float *squares,sum;

fscanf(stdin,"%d",nmax);
for (i=1; i<=nmax; i++) {
squares[i] = 1./(i*i); sum += squares[i];
}
printf("Sum: %e\n",sum);

%% cc -g -o square square.c
%% ./square
5000
Segmentation fault

The debugger will stop at the problem.


Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

19 Stack trace

Displaying a stack trace


gdb lldb
(gdb) where (lldb) thread backtrace

(gdb) backtrace
#0 0x00007fff824295ca in __svfscanf_l ()
#1 0x00007fff8244011b in fscanf ()
#2 0x0000000100000e89 in main (argc=1, argv=0x7fff5fbfc7c0) at sq

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

20 Inspecting a stack frame

Investigate a specific frame


gdb clang
frame 2 frame select 2

Then print variables and such.

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

21 Out-of-bounds errors

// up.c
int nlocal = 100,i;
double s, *array = (double*) malloc(nlocal*sizeof(double));
for (i=0; i<nlocal; i++) {
double di = (double)i;
array[i] = 1/(di*di);
}
s = 0.;
for (i=nlocal-1; i>=0; i++) {
double di = (double)i;
s += array[i];
}

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

22 Out of bounds in debugger

Program received signal EXC_BAD_ACCESS, Could not access memo


Reason: KERN_INVALID_ADDRESS at address: 0x0000000100200000
0x0000000100000f43 in main (argc=1, argv=0x7fff5fbfe2c0) at u
15 s += array[i];
(gdb) print array
$1 = (double *) 0x100104d00
(gdb) print i
$2 = 128608

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

23 Breakpoints

Set a breakpoint at a line


gdb lldb
break foo.c:12 breakpoint set [ -f foo.c ] -l 12

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

24 Stepping

Stepping through a program


gdb lldb meaning
run start a run
cont continue from breakpoint
next next statement on same level
step next statement, this level or next

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

Memory debugging

Eijkhout: programming
Defensive programming
Debugging
25 Program with problems
Memory debugging
Parallel Debugging

tutorials/gdb/c/square1.c

#include <stdlib.h>
#include <stdio.h>
//codesnippet gdbsquare1c
int main(int argc,char **argv) {
int nmax,i;
float *squares,sum;

fscanf(stdin,"%d",&nmax);
squares = (float*) malloc(nmax*sizeof(float));
for (i=1; i<=nmax; i++) {
squares[i] = 1./(i*i);
sum += squares[i];
}
printf("Sum: %e\n",sum);
//codesnippet end

return 0;
}
Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

26 Valgrind output

%% valgrind square1
==53695== Memcheck, a memory error detector
==53695== [stuff]
10
==53695== Invalid write of size 4
==53695== at 0x100000EB0: main (square1.c:10)
==53695== Address 0x10027e148 is 0 bytes after a block of si
==53695== at 0x1000101EF: malloc (vg_replace_malloc.c:236)
==53695== by 0x100000E77: main (square1.c:8)
==53695==

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

Parallel Debugging

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

27 Debugging

I assume you know about gdb and valgrind. . .

• Interactive use of gdb, starting up multiple xterms


feasible on small scale
• Use gdb to inspect dump:
can be useful, often a program crashes hard and leaves no dump

Note: compile options -g -O0

Eijkhout: programming
Defensive programming
Debugging
28 Parallel debuggers
Memory debugging
Parallel Debugging

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

29 Buggy code

for (it=0; ; it++) {


double randomnumber = ntids * ( rand() / (double)RAND_MAX )
printf("[%d] iteration %d, random %e\n",mytid,it,randomnumb
if (randomnumber>mytid && randomnumber<mytid+1./(ntids+1))
MPI_Finalize();
MPI_Barrier(comm);
}

Eijkhout: programming
Defensive programming
Debugging
30 Parallel inspection
Memory debugging
Parallel Debugging

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

31 Stack trace

Eijkhout: programming
Defensive programming
Debugging
Memory debugging
Parallel Debugging

32 Variable inspection

Eijkhout: programming

You might also like