Introduction_to_High-Performance-Computing
Introduction_to_High-Performance-Computing
Contents
http://www.uni-oldenburg.de/fk5/wr/aktuelles/artikel/art/neue-hochleistungsrechner-fuer-die-universitaet-oldenburg-2380/
• CARL
– multi-purpose cluster as a basic computing resource
– funded by the University/MWK and the DFG under grant number
INST 184/157-1 FUGG (Forschungsgroßgerät nach Art. 91b GG)
• EDDY
– CFD cluster for wind energy research
– funded by the BMWi under grant number 0324005
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Acknowledging_the_HPC_facilities_2016
– Linux (RHEL) as OS
– many scientific applications and libraries available
– Job Scheduler (SLURM)
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=HPC_Facilities_of_the_University_of_Oldenburg_2016
17x Rmax
1.6x nodes
3.3x cores
10x RAM
HERO/FLOW
• Rmax 27.2 Tflop/s
• not ranked
• 350 nodes
• 4,000 cores
• 10 TB RAM
23.02.2017 Introduction to HPC @ UOL 11
Scientific Computing
V. School of Mathematics and Science
• 128/158x MPC-LOM/STD
– multiple nodes per chassis
– 2x Intel 2650 V4
– 12 cores @ 2,2 GHz
– 128/256 GB RAM (8x16/32)
– 1 TB HDD
• 9x MPC-GPU
– as MPC-STD
– NVIDIA Tesla GPU
• 30x MPC-BIG
– 2x Intel 2667 V4
– 8 cores @ 3,2 GHz
– 512 GB RAM (16x32)
• 2x MPC-PP
– 4x Intel 8891 V4
– 10 cores @ 2,8 GHz
– 2048 GB RAM (64x32)
• both with
– 1x Intel P3700 2.0TB NVMe Flash Adapter
with up to 2.8/2.0 GB/s R/W performance
– free PCIe slots for future expansion
• 160/81x CFD-LOM/HIM
– multiple nodes per chassis
– 2x Intel 2650 V4
– 12 cores @ 2,2 GHz
– 64/128 GB RAM (8x8/16)
• 3x CFD-GPU
– as MPC-STD
– NVIDIA Tesla GPU
Interconnect
• FDR Infiniband
– 54 Gbit/s data transfer (throughput)
– latency 0.7ms
– message rate 137 Million msg/s
– switched fabric topology (fat tree)
Parallel Filesystem
USER ENVIRONMENT
• Linux
– use ssh as before with carl or eddy as login nodes
ssh -X [email protected]
• Windows
– use MobaXterm (recommended) or PuTTY
File Systems
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=File_system_and_Data_Management
File Systems
File Env.
Path Used for
System Variable
Home $HOME /user/abcd1234 critical data that cannot easily be
reproduced (program codes, initial
conditions, results from data analysis)
Data $DATA /gss/data/abcd1234 important data from simulations for
on-going analysis and long term
(project duration) storage
Work $WORK /gss/work/abcd1234 data storage for simulation runtime,
pre- and post-processing, short term
(weeks) storage
Scratch $TMPDIR /scratch/<job-dir> temporary data storage during job
runtime
Quotas
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=File_system_and_Data_Management#Quotas
Modules
Module Commands
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=User_environment_-_The_usage_of_module_2016
• find modules
module available [module-name]
module spider [module-name]
– list all modules [with given module name]
– spider is case-insensitive and understands reg-exp
• load/unload
module load <module-name>
module remove <module-name>
– to return to a default state
module restore
Toolchains
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=Toolchains
changed syntax
no longer needed SGE vs SLURM Job Script
new option https://slurm.schedmd.com/rosetta.pdf
Partitions
Parallel Jobs
MPI
srun
• example
Job Arrays
Do not use sbatch in for-loops, this can always be transformed into a job array
Using $TMPDIR
Using $TMPDIR
• the local disks are intended for temporary files from I/O
intensive applications
– $TMPDIR will be deleted after the job is finished
• example
. . .
#SBATCH --gres=tmpdir:500G
. . .
# change to TMPDIR
cd $TMPDIR
# create a lot of data
./mysim > largeDataFile
# post-processing reduces data file
./mypp < largeDataFile > $SLURM_SUBMIT_DIR/smallResultFile
Using GPUs
• a total of 12 identical GPU nodes is available
– 9x MPC-GPU (mpcg.p) and 3x CFD-GPU (cfdg.p)
– one Tesla P100 GPU per node
• modules for
– CUDA SDK (examples and more)
– cuBLAS and cuFFT
Migration of Data
• the new cluster has a new $HOME directory
– central ISILON storage system
• links to the old $HOME and /data/work allow you to copy data
– read-only on the new cluster (as long as the old cluster is
accessible)
– please clean-up your data before copying
$ pwd
/user/abcd1234
$ ls
… old_home_abcd1234 old_work_abcd1234 …
$ cp -ar old_home_abcd1234/somedir .
– eventually you should delete all data in the old_* dirs. (6 month time
scale or so)
MATLAB
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=Configuration_MDCS_2016
• transition
– apart from the new version and integration nothing has changed
– license transition will be step by step (eventually with 32 licenses
remaining on the old system)
• contact us
– e-mail: [email protected]
(will reach everyone below and IT services)
– Stefan Harfst: [email protected]
– Julius Brunken: [email protected]
– Wilke Trei: [email protected]