Cluster Computing at PIK
a tutorial
Ciaron Linstead
10th May 2016
Introduction Environment Modules SLURM Python Documentation Questions
Outline
1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions
Ciaron Linstead IT Services 2
Introduction Environment Modules SLURM Python Documentation Questions
Introduction
Cluster configuration
Environment modules
SLURM - the workload scheduler
Create, submit and monitor jobs
Anaconda Python environment
Documentation
Ciaron Linstead IT Services 3
Introduction Environment Modules SLURM Python Documentation Questions
Introduction
Logged into cluster?
Ciaron Linstead IT Services 4
Introduction Environment Modules SLURM Python Documentation Questions
Introduction
Logged into cluster?
Run a Python, R, shell script?
Ciaron Linstead IT Services 4
Introduction Environment Modules SLURM Python Documentation Questions
Introduction
Logged into cluster?
Run a Python, R, shell script?
Compiled and run C, C++, Fortran?
Ciaron Linstead IT Services 4
Introduction Environment Modules SLURM Python Documentation Questions
Introduction
Logged into cluster?
Run a Python, R, shell script?
Compiled and run C, C++, Fortran?
Submitted jobs via SLURM?
Ciaron Linstead IT Services 4
Introduction Environment Modules SLURM Python Documentation Questions
Introduction
Logged into cluster?
Run a Python, R, shell script?
Compiled and run C, C++, Fortran?
Submitted jobs via SLURM?
comfortable on the command line (cd, ls, mkdir)?
Ciaron Linstead IT Services 4
Introduction Environment Modules SLURM Python Documentation Questions
Introduction
Logged into cluster?
Run a Python, R, shell script?
Compiled and run C, C++, Fortran?
Submitted jobs via SLURM?
comfortable on the command line (cd, ls, mkdir)?
download/compiled/install 3rd party software in Linux?
Ciaron Linstead IT Services 4
Introduction Environment Modules SLURM Python Documentation Questions
Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
Ciaron Linstead IT Services 5
Introduction Environment Modules SLURM Python Documentation Questions
Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
Ciaron Linstead IT Services 5
Introduction Environment Modules SLURM Python Documentation Questions
Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Ciaron Linstead IT Services 5
Introduction Environment Modules SLURM Python Documentation Questions
Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Ciaron Linstead IT Services 5
Introduction Environment Modules SLURM Python Documentation Questions
Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Ciaron Linstead IT Services 5
Introduction Environment Modules SLURM Python Documentation Questions
Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Interconnect
Mellanox FDR Connect-IB Infiniband (non-blocking)
Ciaron Linstead IT Services 5
Introduction Environment Modules SLURM Python Documentation Questions
Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Interconnect
Mellanox FDR Connect-IB Infiniband (non-blocking)
3.5x throughput compared to iPlex Infiniband
Ciaron Linstead IT Services 5
Introduction Environment Modules SLURM Python Documentation Questions
Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Interconnect
Mellanox FDR Connect-IB Infiniband (non-blocking)
3.5x throughput compared to iPlex Infiniband
0.25x latency
Ciaron Linstead IT Services 5
Introduction Environment Modules SLURM Python Documentation Questions
Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Interconnect
Mellanox FDR Connect-IB Infiniband (non-blocking)
3.5x throughput compared to iPlex Infiniband
0.25x latency
Filesystem
2 petabyte GPFS (parallel) filesystem storage
Ciaron Linstead IT Services 5
Introduction Environment Modules SLURM Python Documentation Questions
Cluster configuration
Servers
Compute: 312x (2x Intel Xeon E5-2667v3)
RAM+: 4x (Compute w/ 256GB RAM, 16GB per core)
GPU: 2x (RAM+ w/ NVidia Tesla K40c)
Login: 2x (2x E5-2690v3 (12-core), 256GB RAM)
Visualisation: 2x (2x E5-2680v3, (12-core), 256GB RAM)
Interconnect
Mellanox FDR Connect-IB Infiniband (non-blocking)
3.5x throughput compared to iPlex Infiniband
0.25x latency
Filesystem
2 petabyte GPFS (parallel) filesystem storage
/p/projects, /p/tmp, /p/system
Ciaron Linstead IT Services 5
Introduction Environment Modules SLURM Python Documentation Questions
Compute node configuration
2x Intel Xeon E5-2667v3 "Haswell" CPUs
8 cores per CPU, 16 cores total
64GB DRAM: TruDDR4 2133MHz
(4GB per core)
No local disk
Ciaron Linstead IT Services 6
Introduction Environment Modules SLURM Python Documentation Questions
Compute node - RAM+
Regular compute nodes with 256GB memory
16GB per core
4 available
priority to large-memory jobs (but can run regular jobs)
--partition=ram_gpu
Ciaron Linstead IT Services 7
Introduction Environment Modules SLURM Python Documentation Questions
Compute node - GPU
Regular + 256GB RAM + Nvidia Tesla K40c GPU
2 available
1.66 TFlops per card (plus the 16 CPUs)
CUDA/OpenCL interface for programming
--partition=ram_gpu --gres=gpu:1
Ciaron Linstead IT Services 8
Introduction Environment Modules SLURM Python Documentation Questions
Outline
1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions
Ciaron Linstead IT Services 9
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules - motivation
Have a look at the search path: echo $PATH
Ciaron Linstead IT Services 10
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules - motivation
Have a look at the search path: echo $PATH
when you run "command", the shell goes through this list
Ciaron Linstead IT Services 10
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules - motivation
Have a look at the search path: echo $PATH
when you run "command", the shell goes through this list
shell runs the first one it finds
Ciaron Linstead IT Services 10
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules - motivation
Have a look at the search path: echo $PATH
when you run "command", the shell goes through this list
shell runs the first one it finds
what if I want multiple versions?
Ciaron Linstead IT Services 10
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules - motivation
Have a look at the search path: echo $PATH
when you run "command", the shell goes through this list
shell runs the first one it finds
what if I want multiple versions?
same goes for library paths, e.g. NetCDF, MPI
Ciaron Linstead IT Services 10
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
list software: "module avail"
Ciaron Linstead IT Services 11
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
list software: "module avail"
-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0
Ciaron Linstead IT Services 11
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
list software: "module avail"
-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0
"module load compiler/intel/16.0.0" - load one
Ciaron Linstead IT Services 11
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
list software: "module avail"
-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0
"module load compiler/intel/16.0.0" - load one
"module load compiler/intel"
Ciaron Linstead IT Services 11
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
list software: "module avail"
-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0
"module load compiler/intel/16.0.0" - load one
"module load compiler/intel"
15.0.3 or 16.0.0? or 17.0.0? careful!
Ciaron Linstead IT Services 11
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
list software: "module avail"
-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0
"module load compiler/intel/16.0.0" - load one
"module load compiler/intel"
15.0.3 or 16.0.0? or 17.0.0? careful!
"module list" - what’s loaded?
Ciaron Linstead IT Services 11
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
list software: "module avail"
-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0
"module load compiler/intel/16.0.0" - load one
"module load compiler/intel"
15.0.3 or 16.0.0? or 17.0.0? careful!
"module list" - what’s loaded?
"module unload compiler/intel/16.0.0" - unload one
Ciaron Linstead IT Services 11
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
list software: "module avail"
-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0
"module load compiler/intel/16.0.0" - load one
"module load compiler/intel"
15.0.3 or 16.0.0? or 17.0.0? careful!
"module list" - what’s loaded?
"module unload compiler/intel/16.0.0" - unload one
"module purge" - unload all
Ciaron Linstead IT Services 11
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
list software: "module avail"
-------------------- /p/system/modulefiles/compiler -----------------------------------
compiler/gnu/4.9.2 compiler/gnu/5.2.0 compiler/intel/15.0.3 compiler/intel/16.0.0
"module load compiler/intel/16.0.0" - load one
"module load compiler/intel"
15.0.3 or 16.0.0? or 17.0.0? careful!
"module list" - what’s loaded?
"module unload compiler/intel/16.0.0" - unload one
"module purge" - unload all
"module show compiler/intel/16.0.0" - what’s being
done?
Ciaron Linstead IT Services 11
Introduction Environment Modules SLURM Python Documentation Questions
modules - an example modulefile
module show nco/4.5.0
--------------------------------------------------------------
/p/system/modulefiles/tools/nco/4.5.0:
module-whatis Enable usage for nco version 4.5.0
setenv NCOROOT /p/system/packages/nco/4.5.0
prepend-path PATH /p/system/packages/nco/4.5.0/bin
prepend-path INCLUDE /p/system/packages/nco/4.5.0/include
prepend-path LD_LIBRARY_PATH /p/system/packages/nco/4.5.0/lib
prepend-path MANPATH /p/system/packages/nco/4.5.0/share/man
--------------------------------------------------------------
Ciaron Linstead IT Services 12
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
infrastructure loaded automatically
modules in /p/system/modulefiles, organised by category
Ciaron Linstead IT Services 13
Introduction Environment Modules SLURM Python Documentation Questions
Environment Modules
infrastructure loaded automatically
modules in /p/system/modulefiles, organised by category
add your own via $MODULEPATH
"export MODULEPATH=$MODULEPATH:$HOME/modulefiles"
add this to your .bashrc file
Ciaron Linstead IT Services 13
Introduction Environment Modules SLURM Python Documentation Questions
Exercise - Let’s add a custom module!
Download and build a library
Ciaron Linstead IT Services 14
Introduction Environment Modules SLURM Python Documentation Questions
Exercise - Let’s add a custom module!
Download and build a library
Install in our $HOME directory
Ciaron Linstead IT Services 14
Introduction Environment Modules SLURM Python Documentation Questions
Exercise - Let’s add a custom module!
Download and build a library
Install in our $HOME directory
Write a modulefile
Ciaron Linstead IT Services 14
Introduction Environment Modules SLURM Python Documentation Questions
Exercise - Let’s add a custom module!
Download and build a library
Install in our $HOME directory
Write a modulefile
recommendation: install in
/some/path/<compiler>/package_name/version/
Ciaron Linstead IT Services 14
Introduction Environment Modules SLURM Python Documentation Questions
Exercise - Let’s add a custom module!
Download and build a library
Install in our $HOME directory
Write a modulefile
recommendation: install in
/some/path/<compiler>/package_name/version/
e.g. /home/linstead/software/intel/gmp/6.1.0
Ciaron Linstead IT Services 14
Introduction Environment Modules SLURM Python Documentation Questions
Exercise - Let’s add a custom module!
Download and build a library
Install in our $HOME directory
Write a modulefile
recommendation: install in
/some/path/<compiler>/package_name/version/
e.g. /home/linstead/software/intel/gmp/6.1.0
Prep:
cp -r /home/linstead/phd16/ $HOME
Ciaron Linstead IT Services 14
Introduction Environment Modules SLURM Python Documentation Questions
Hints - build
cd
mkdir -p software/gmp/6.1.0 && cd
software/gmp/6.1.0
tar xvf ../phd16/gmp-6.1.0.tar.xz
cd gmp-6.1.0
module load compiler/gnu/5.2.0
./configure –prefix=$HOME/software/gmp/6.1.0
make && make install
Ciaron Linstead IT Services 15
Introduction Environment Modules SLURM Python Documentation Questions
Hints - module
mkdir -p $HOME/modulefiles/gmp
cp /home/linstead/modulefiles/gmp/6.1.0
$HOME/modulefiles/gmp
edit $HOME/modulefiles/gmp/6.1.0 to match your
installation
Ciaron Linstead IT Services 16
Introduction Environment Modules SLURM Python Documentation Questions
Outline
1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions
Ciaron Linstead IT Services 17
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
Simple Linux Utility for Resource Management (SLURM)
SLURM manages and allocates cluster resources
programs (typically) submitted as jobs to a queue
"sbatch myscript.sh"
myscript.sh is a regular script, plus SLURM info
Ciaron Linstead IT Services 18
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - a simple submit script
1 #!/bin/bash
2
3 #SBATCH --partition=standard
4 #SBATCH --qos=short
5 #SBATCH --job-name=sumprimes
6 #SBATCH --output=sumprimes-%j.out
7 #SBATCH --error=sumprimes-%j.err
8 #SBATCH --account=its
9 #SBATCH --ntasks=1
10
11 $HOME/mycode/sumprimes 1 10000000001
submit with sbatch <filename>
Ciaron Linstead IT Services 19
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--partition - physical nodes ("standard" or "ram_gpu")
"sinfo"
Ciaron Linstead IT Services 20
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--partition - physical nodes ("standard" or "ram_gpu")
"sinfo"
--qos - job type, determines limits and thus priority
"sacctmgr show qos" (or see my alias ssq)
interesting fields: Name, Priority, MaxWall, GrpTRES,
MaxTRES
Ciaron Linstead IT Services 20
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--partition - physical nodes ("standard" or "ram_gpu")
"sinfo"
--qos - job type, determines limits and thus priority
"sacctmgr show qos" (or see my alias ssq)
interesting fields: Name, Priority, MaxWall, GrpTRES,
MaxTRES
--job-name
distinguish between your jobs in the queue
%j (Job ID) gives jobs unique filenames
Ciaron Linstead IT Services 20
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--output/--error
location of standard output and error (e.g. "print" statements)
omit "error" and STDERR will go to "output"
Ciaron Linstead IT Services 21
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--output/--error
location of standard output and error (e.g. "print" statements)
omit "error" and STDERR will go to "output"
--account
the project this job relates to
see "groups" command for the projects you belong to
Ciaron Linstead IT Services 21
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--output/--error
location of standard output and error (e.g. "print" statements)
omit "error" and STDERR will go to "output"
--account
the project this job relates to
see "groups" command for the projects you belong to
--ntasks
number of copies of this program to run
--ntasks=1 for serial (non-MPI, non parallel) jobs
Ciaron Linstead IT Services 21
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - parallel (MPI) submit script
as previous example, with these differences:
1 #SBATCH --ntasks=128
2
3 module purge
4 module load mpi/intel/5.1.3
5 # run parallel code with mpirun
6 mpirun -bootstrap slurm -n $SLURM_NTASKS $HOME/mycode/
sumprimes 0 10000000000
Ciaron Linstead IT Services 22
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--ntasks=128
give me 128 processor cores, I don’t care where
SLURM will attempt to pack sockets and nodes
Ciaron Linstead IT Services 23
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--ntasks=128
give me 128 processor cores, I don’t care where
SLURM will attempt to pack sockets and nodes
for performance, I may require packing/blocking
--nodes=8
--tasks-per-node=16
Ciaron Linstead IT Services 23
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--ntasks=128
give me 128 processor cores, I don’t care where
SLURM will attempt to pack sockets and nodes
for performance, I may require packing/blocking
--nodes=8
--tasks-per-node=16
or
--nodes=16
--tasks-per-node=8
(but see next slide!)
Ciaron Linstead IT Services 23
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--nodes=16 and --tasks-per-node=8
uses half the cores on each node
the other half are available for other users
implications for memory/disk bandwidth
Ciaron Linstead IT Services 24
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - the workload scheduler
--nodes=16 and --tasks-per-node=8
uses half the cores on each node
the other half are available for other users
implications for memory/disk bandwidth
--exclusive gets you the whole node
8GB RAM per task with 16/8 above
up to 64GB per tasks on standard nodes
use sparingly!
see also --cpus-per-task
Ciaron Linstead IT Services 24
Introduction Environment Modules SLURM Python Documentation Questions
SLURM examples - threaded (OpenMP) submission
script
OpenMP.sh
1 #!/bin/bash
2 # (options omitted for brevity)
3 #SBATCH --nodes=1
4 #SBATCH --tasks-per-node=1
5 #SBATCH --cpus-per-task=16
6
7 export OMP_NUM_THREADS=16
8 # OR export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
9
10 $HOME/mydir/myprog.exe
11 # OR srun $HOME/mydir/myprog.exe
submit with sbatch OpenMP.sh
Ciaron Linstead IT Services 25
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
show queue: "squeue -u <username>"
alias sq=’squeue -u <username>’
Ciaron Linstead IT Services 26
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
show queue: "squeue -u <username>"
alias sq=’squeue -u <username>’
SQUEUE_FORMAT="%.18i %.9P %.8j %.8u %.2t %.10M
%.10L %.10l %.6D %.6C %.8q %R"
scontrol show job <job_id>
Ciaron Linstead IT Services 26
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
scontrol show job
1 JobId=321608 JobName=sumprimes
2 UserId=linstead(405) GroupId=users(100)
3 Priority=6858 Nice=0 Account=its QOS=short
4 JobState=RUNNING Reason=None Dependency=(null)
5 Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
6 RunTime=00:00:05 TimeLimit=1-00:00:00 TimeMin=N/A
7 SubmitTime=2016-05-04T14:47:13 EligibleTime=2016-05-04T14:47:13
8 StartTime=2016-05-04T14:47:13 EndTime=2016-05-05T14:47:13
9 Partition=standard AllocNode:Sid=login01:15215
10 NodeList=cs-e14c01b[02-05]
11 BatchHost=cs-e14c01b02
12 NumNodes=4 NumCPUs=64 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
13 TRES=cpu=64,mem=229376,node=4
14 Socks/Node=* NtasksPerN:B:S:C=16:0:*:* CoreSpec=*
15 MinCPUsNode=16 MinMemoryCPU=3.50G MinTmpDiskNode=0
16 Command=/home/linstead/cluster-examples/sumprimes/mpi/slurm.sh
17 WorkDir=/home/linstead/cluster-examples/sumprimes/mpi
18 StdErr=/home/linstead/cluster-examples/sumprimes/mpi/sumprimes-321608.err
19 StdOut=/home/linstead/cluster-examples/sumprimes/mpi/sumprimes-321608.out
Ciaron Linstead IT Services 27
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
sview - a graphical monitoring tool
Ciaron Linstead IT Services 28
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
sview - right-click on a job
Ciaron Linstead IT Services 29
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
Did my job(s) finish yet?
check squeue -u <username>
check sacct
Ciaron Linstead IT Services 30
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
sacct
1 [15:47:13] linstead@login01:~$ sacct
2 JobID JobName Partition Account AllocCPUS State ExitCode
3 ------------ ---------- ---------- ---------- ---------- ---------- --------
4 321467 sumprimes standard its 32 CANCELLED+ 0:0
5 321467.batch batch its 8 CANCELLED 0:15
6 321467.0 pmi_proxy its 4 FAILED 7:0
7 321472 sumprimes standard its 32 COMPLETED 0:0
8 321472.batch batch its 8 COMPLETED 0:0
9 321472.0 pmi_proxy its 4 COMPLETED 0:0
10 325608 sumprimes standard its 64 FAILED 9:0
11 325608.batch batch its 16 FAILED 9:0
12 325608.0 pmi_proxy its 4 COMPLETED 0:0
Ciaron Linstead IT Services 31
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
ExitCode n:m
n : code returned by the job script
m : signal which caused the process to terminate (if signalled)
Ciaron Linstead IT Services 32
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
By default sacct goes back to midnight
Ciaron Linstead IT Services 33
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
By default sacct goes back to midnight
all my jobs in April:
sacct -S04.01 -E05.01
Ciaron Linstead IT Services 33
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
By default sacct goes back to midnight
all my jobs in April:
sacct -S04.01 -E05.01
Add/view extra fields
sacct --format=...
e.g. just ID and end time sacct -ojobid,end
Ciaron Linstead IT Services 33
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
By default sacct goes back to midnight
all my jobs in April:
sacct -S04.01 -E05.01
Add/view extra fields
sacct --format=...
e.g. just ID and end time sacct -ojobid,end
see man sacct
the format of most SLURM commands is configurable, either
via --format/-o or $S???_FORMAT variables.
Ciaron Linstead IT Services 33
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
Exercise: what time did jobs 325608 and 325611 start and
end?
Did any fail? If so, what were the exit codes?
Ciaron Linstead IT Services 34
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - monitoring jobs
Exercise: what time did jobs 325608 and 325611 start and
end?
Did any fail? If so, what were the exit codes?
sacct --jobs 325608,325611 -a --format=start,end,state,exitc
Ciaron Linstead IT Services 34
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - job arrays
Submit and manage collections of similar jobs
--array=0-31
each job in the array takes the same settings from submit
script
each job has a unique index
$SLURM_ARRAY_TASK_ID
Ciaron Linstead IT Services 35
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - job arrays
Task ID is available to my script:
#!/bin/bash
#SBATCH --qos=short
#SBATCH --partition=standard
#SBATCH --array=0-15
#SBATCH --output=jobarray-%A_%a.out
echo ${SLURM_ARRAY_TASK_ID}
(e.g. "./myprog inputfile_${SLURM_ARRAY_TASK_ID}")
Ciaron Linstead IT Services 36
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - job arrays
Read $SLURM_ARRAY_TASK_ID in Python:
import os, sys
try:
task = os.environ[’SLURM_ARRAY_TASK_ID’]
except KeyError:
print "Not running with SLURM job arrays"
sys.exit(1)
Ciaron Linstead IT Services 37
Introduction Environment Modules SLURM Python Documentation Questions
SLURM - job arrays
Read $SLURM_ARRAY_TASK_ID in C:
#include <stdio.h>
#include <stdlib.h>
int main ()
{
printf("task ID: %s\n", getenv("SLURM_ARRAY_TASK_ID"));
return(0);
}
Ciaron Linstead IT Services 38
Introduction Environment Modules SLURM Python Documentation Questions
Outline
1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions
Ciaron Linstead IT Services 39
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Anaconda 2.3.0 (module load anaconda/2.3.0)
/p/system/packages/anaconda/2.3.0/bin/python
conda list
Ciaron Linstead IT Services 40
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Provides the conda package manager
I can install and maintain my own Python environment...
Ciaron Linstead IT Services 41
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
Ciaron Linstead IT Services 41
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
Ciaron Linstead IT Services 41
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
Ciaron Linstead IT Services 41
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
Ciaron Linstead IT Services 41
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
...and I can share it with colleagues
Ciaron Linstead IT Services 41
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
...and I can share it with colleagues
source activate testenv
Ciaron Linstead IT Services 41
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
...and I can share it with colleagues
source activate testenv
conda env export > environment.yml
Ciaron Linstead IT Services 41
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
...and I can share it with colleagues
source activate testenv
conda env export > environment.yml
conda env create -f environment.yml
Ciaron Linstead IT Services 41
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
Provides the conda package manager
I can install and maintain my own Python environment...
conda create -n testenv ipython
source activate testenv
conda install <packagename>
source deactivate
...and I can share it with colleagues
source activate testenv
conda env export > environment.yml
conda env create -f environment.yml
(can also manage R packages, we can set this up if there’s
interest)
Ciaron Linstead IT Services 41
Introduction Environment Modules SLURM Python Documentation Questions
Python for scientific computing
linstead@login01:~$ source activate testenv
discarding /p/system/packages/anaconda/2.3.0/bin from PATH
prepending /home/linstead/.conda/envs/testenv/bin to PATH
(testenv)linstead@login01:~$
Ciaron Linstead IT Services 42
Introduction Environment Modules SLURM Python Documentation Questions
Exercise
create a new environment
install packages ipython and matplotlib
Ciaron Linstead IT Services 43
Introduction Environment Modules SLURM Python Documentation Questions
Outline
1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions
Ciaron Linstead IT Services 44
Introduction Environment Modules SLURM Python Documentation Questions
Documentation / Help
Cluster User Guides:
https://www.pik-potsdam.de/services/it/hpc/user-guides
Environment Modules: http://modules.sourceforge.net/
conda package manager:
http://conda.pydata.org/docs/using/
SLURM: http://slurm.schedmd.com/
man pages (module, sbatch, srun, sinfo, squeue etc.)
Ciaron Linstead IT Services 45
Introduction Environment Modules SLURM Python Documentation Questions
Documentation / Help
Questions/Problems/Requests:
http://www.pik-potsdam.de/services/it/hpc
mailto:[email protected]
Ciaron Linstead IT Services 46
Introduction Environment Modules SLURM Python Documentation Questions
Outline
1 Introduction
2 Environment Modules
3 SLURM
4 Python
5 Documentation
6 Questions
Ciaron Linstead IT Services 47
Introduction Environment Modules SLURM Python Documentation Questions
Questions
Any questions?
Ciaron Linstead IT Services 48