Simplified Supercomputer Architecture Slide
Simplified Supercomputer Architecture Slide
Login
File system Network
Node 01
Node 02
Node 03
Node 04
4 nodes; 48 cores; Global file system; Connected by network; Controlled by Login node
Serial Tasks
SAMtools, Perl or Python scripts, R scripts, Other non-parallel code
Login
File system Network
Node 01
Node 02
Node 03
Node 04
Other 11 CPUs sit idle Task takes longer You are charged for all 12 CPUs
On-node Parallelism
BWA, Bowtie, Velvet, etc. use pthreads or OpenMP to implement threads
Login
File system Network
Node 01
Node 02
Node 03
Node 04
Makes use of all CPUs Your task can run up to 12x faster*
Caveats 1. Do you have enough RAM to run all 12 threads? 2. Is the code actually more efficient with additional threads? 3. What happens if you need more than 12 CPUs?
Node 01
Node 02
Node 03
Node 04
You can make use of all available CPUs Tasks can run up to 48x faster!
Caveats 1. Really good networking is needed because tasks communicate with one another 2. Algorithm must be written to explicitly make use of MPI. Most bioinformatics code is NOT L
Node 01
Node 02
Node 03
Node 04
You can make use of all available CPUs Tasks can run up to 48x faster!
Caveats 1. Shared file system is needed for this approach 2. You have to figure out how to shoe-horn your algorithm into this model
Node 01
Node 02
Node 03
Node 04
Upcoming Intel Xeon Phi systems (2013 / Not to scale) Notice anything different?