This tutorial discusses extracting data from files containing molecular dynamics simulation runtime results and plotting the data using gnuplot. The tutorial instructs the user to:
1) Plot the runtime over 200 simulation steps for a single processor file to familiarize with the data format.
2) Plot the slowest runtime per time step using a multiple processor file to view performance from a user perspective.
3) Plot average runtime of the slowest process versus number of processors to analyze scaling with processor number from a high performance computing viewpoint.
4) Plot average runtime and standard deviation versus number of processors to analyze load imbalance between processors for parallelization algorithm improvement.
This tutorial discusses extracting data from files containing molecular dynamics simulation runtime results and plotting the data using gnuplot. The tutorial instructs the user to:
1) Plot the runtime over 200 simulation steps for a single processor file to familiarize with the data format.
2) Plot the slowest runtime per time step using a multiple processor file to view performance from a user perspective.
3) Plot average runtime of the slowest process versus number of processors to analyze scaling with processor number from a high performance computing viewpoint.
4) Plot average runtime and standard deviation versus number of processors to analyze load imbalance between processors for parallelization algorithm improvement.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1
Tobias Neckel
Max-Planck, October 2013
Bash course - Tutorial 4
This tutorial is about extracting data from les and plotting it. This is the last tutorial on bash; you might use spare time to work on the rst three tutorials as well.
Simulation runtime results
On the course website, you nd a .tgz le containing runtime results of a parallel molecular dynamics (MD) simulation. The .tgz le contains a number of folders. The name of the folders is as follows: jg yyyymmdd hhmm CO2OPPKD????. The rst part gives the date and time of the simulation, followed by the material simulated (CO2 ), a shortcut for the simulated scenario, and the parallelisation algorithm which has been used. For this tutorial, only the last part (????) is important as it contains the number of CPUs which have been used in the simulation. Each folder contains only a single le with 200 lines, where each line corresponds to one time step of the simulation. Each line has as many entries as processors were used for the simulation. Each entry is the time consumed by the corresponding process for the force-calculation in the MD-simulation. Get familiar with the data and gnuplot. E.g., take the le for a single processor and plot the runtime for the 200 simulation steps (so the x-axis is the number of steps and the y-axis the runtime) From a users point of view, only the runtime of the slowest processor is relevant, as this is the time the user has to wait for results. Use one of the les that show the results for several processors and plot for each time step the slowest one. From a high performance computing point of view, it is important to nd out how an algorithm scales with growing process number. So for the x-axis, you should use the number of processors, for the y-axis the average runtime (averaged over the 200 steps) of the slowest process (important: in each step, some other process can be the slowest one). If the parallelisation algorithm shall be improved, the load-imbalance between the processors has to be known. Thus, instead of plotting just the slowest runtime, plot the average runtime of all processors and the standard deviation of the runtime.
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More