0% found this document useful (0 votes)
95 views13 pages

Job Submission On CC-IN2P3 GPU Farm: April 2019

The document describes the GPU farm and job submission process at the CC-IN2P3 computing center. It details the interactive and batch modes, required options, available software and libraries, and how to use Singularity containers to run customized environments on the farm nodes.

Uploaded by

Antonio Díaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views13 pages

Job Submission On CC-IN2P3 GPU Farm: April 2019

The document describes the GPU farm and job submission process at the CC-IN2P3 computing center. It details the interactive and batch modes, required options, available software and libraries, and how to use Singularity containers to run customized environments on the farm nodes.

Uploaded by

Antonio Díaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Centre de Calcul de l'Institut National de Physique Nucléaire et de Physique des Particules

Job Submission on CC-IN2P3 GPU Farm


April 2019
Current Farm Architecture
 Access from an interactive host
 Univa Grid Engine as batch system (https://doc.cc.in2p3.fr/jobs_gpu)

Interactive
node
qlogin qsub qlogin qsub
GPU Farm

Interactive worker Interactive worker


node (1) node (1)

K80 V100

Batch worker nodes (9) Batch worker nodes (5)

Job Submission on CC-IN2P3 GPU Farm 04/2019 2


Workers Architecture

Worker K80 Worker V100

8 cores 8 cores 10 cores 10 cores

2 cpus (2x8 cores) + 4 (2x2) gpus 2 cpus (2x10 cores) + 4 gpus

= 2 gpus = 1 gpu

Job Submission on CC-IN2P3 GPU Farm 04/2019 3


Farm neighbourhood
 Worker nodes can access different types of storages

SOFTWARE
STORAGE

/cvmfs (images catalog)


/pbs/software (python modules)

GPU FARM

USER
DATA STORAGE
STORAGE
/pbs/throng
/sps

Job Submission on CC-IN2P3 GPU Farm 04/2019 4


Interactive mode
 Interactive Worker and Batch Worker nodes are the same in terms of
architectures (same cpus, gpus, memory)
 Batch scheduler provides access to Interactive Worker nodes in shell mode

Interactive Worker access (qlogin)

Specifying the use of sps resources

Nb of gpus <1 - 4> Specifying K80 or V100 farm

K80
qlogin -l GPU=1,sps=1,GPUtype= -q mc_gpu_interactive -pe multicores_gpu 4
V100

Custom parameters Queue Nb of cpu cores

Job Submission on CC-IN2P3 GPU Farm 04/2019 5


Batch mode
 Batch scheduler provides access on Batch Worker nodes to execute a program

Batch submission (qsub)

Specifying the use of sps resources

Nb of gpus <1 - 4> Specifying K80 or V100 farm

K80
qsub -l GPU=1,sps=1,GPUtype= [ options ] <file_to_execute>
V100

Custom parameters

Job Submission on CC-IN2P3 GPU Farm 04/2019 6


Batch mode - Options
 All information can be found here:
General Batch System: https://doc.cc.in2p3.fr/en:utiliser_le_systeme_batch_ge_depuis_le_centre_de_calcul
GPU Jobs: https://doc.cc.in2p3.fr/en:jobs_gpu
Available Queues: http://cctools.in2p3.fr/mrtguser/info_sge_queue.php
 !!! access to GPU queues requires resources request from user groups !!!
Queue ( -q )
 Multicores (1 node)  Parallel (multinodes) K80 only!!!
mc_gpu_medium (~5h) pa_gpu_long (~48h)
mc_gpu_long (~48h)
mc_gpu_longlasting (~202h)

Environment ( -pe )
 Multicores (1 node)  Parallel (multinodes) K80 only!!!
openmpigpu_2 x (with x = 2 * nb of nodes)
multicores_gpu 4
openmpigpu_4 x (with x = 4 * nb of nodes)

Misc.
 Output file path: -o  Passing environment vars: -V
 Error file path: -e

Job Submission on CC-IN2P3 GPU Farm 04/2019 7


Software Environment

Installed libraries
 Updates (n, n-1)
GPU Jobs: https://doc.cc.in2p3.fr/en:jobs_gpu

Custom environment
 Execute your job on a custom environment via Singularity

Job Submission on CC-IN2P3 GPU Farm 04/2019 8


Why Singularity?
 Farm default environment is updated about 2 times a year, so the most recent
installed version can already be obsolete (in this mad world of AI ^^)
 We do not keep more than 2 CUDA versions installed in the farm: a current
one and the one before that one
 How to keep reproducibility (as long as possible)
Example
 Let's say you need to execute a code that requires Tensorflow 1.13.0 and the
latest installed version of CUDA on the farm is 9.2, you need CUDA 10.0

 Singularity will give you the opportunity to execute an image with the
right pieces of software installed (i.e CUDA 10.0 library in this case)
 This software flexibility is of course possible as soon as it is still
compatible with workers hardware
 One can also create and use its own images which brings maximum
flexibility to the farm (see you @ CC Singularity Training Course)
Job Submission on CC-IN2P3 GPU Farm 04/2019 9
CC-IN2P3 Singularity Image Catalog

 CC-IN2P3 provides an Image Catalog and Compiled Modules

Where to find what

Images Catalog & corresponding Compiled Modules:


https://gitlab.in2p3.fr/ccin2p3-support/c3/hpc/gpu

Singularity Images:
/cvmfs/singularity.in2p3.fr/images/HPC/GPU

Compiled From Source Modules: (compute speed gain ~20%)


(Python 2.7 – 3.6, for K80 and V100)
/pbs/software/centos-7-x86_64/python/modules/tensorflow
More to come… (pytorch...)

Job Submission on CC-IN2P3 GPU Farm 04/2019 10


How to use Singularity?

command (from cca)


qsub -l sps=1,GPU=<nb_gpus>,GPUtype=<K80-V100> -q <queue> -pe multicores_gpu 4
-o <output_path> -e <error_path> -V <path_to>/batch_launcher.sh

/pbs/throng
batch_launcher.sh
#!/bin/bash
# executed on the worker
/bin/singularity exec --nv --bind /sps:/sps --bind /pbs:/pbs <image_path> <path_to>/start.sh

start.sh
#!/bin/bash
# executed on the worker, inside the singularity image
source <path_to_python_env> activate <env>
python <path_to>/program.py

Job Submission on CC-IN2P3 GPU Farm 04/2019 11


Workflow

Job Submission on CC-IN2P3 GPU Farm 04/2019 12


Job Submission on CC-IN2P3 GPU Farm

Questions ?

[email protected]

Thanks for your attention.

Job Submission on CC-IN2P3 GPU Farm 04/2019 13

You might also like