24.profiling II
24.profiling II
Lecture 24
April 15, 2024
Gprof
"gprof: A Call Graph Execution Profiler", by S. Graham, P. Kessler, M.
McKusick; Proceedings of the SIGPLAN '82 Symposium on Compiler
Construction, SIGPLAN Notices, Vol. 17, No 6, pp. 120-126, June 1982.
Configuration difference?
IMB Reduce (NPROCS = 4)
IMB Reduce (NPROCS = 4)
Communication Matrix (IMB Reduce, NPROCS=4)
IMB Reduce (NPROCS = 8, 1 host)
IMB Reduce (NPROCS = 16, 2 hosts)
IMB Gather (NPROCS=32, 4 hosts)
Darshan Internals
• Intercepts MPI-IO routines using PMPI • Dynamic linking at runtime
interface • LD_PRELOAD – enables overriding
• Data recorded on each process at run time • Static linking at compile-time
and then merged and stored during • Inserting wrapper functions
MPI_Finalize
• --wrap option
• MPI_Wtime() collects timing information
• Time Overhead
• In-memory file record • MPI_Wtime() call - 165 ns
• Array of counters for I/O calls • Function wrapping - 14 ns *
• Frequency count of common access sizes
• Memory overhead
• File record 2 MB limit per process
• Aggregate statistics beyond limit
22 int xStart=_____________________________,
yStart=____________________________,
zStart=___________________________;
23 int xEnd=_______________________________,
yEnd=______________________________,
zEnd=_____________________________;