0% found this document useful (0 votes)
126 views26 pages

LSF For Users: Mike Page SCD Consulting Services Group

LSF (Load Sharing Facility) is batch management software that can manage computing resources across multiple platforms. It runs on the Lightning cluster and is used to submit and manage jobs. Common commands to submit and check the status of jobs include bsub, bjobs, bqueues, and bhosts. Bsub is used to submit jobs and allows options to set resource limits, dependencies, queues, and more.

Uploaded by

Abraham Brito
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views26 pages

LSF For Users: Mike Page SCD Consulting Services Group

LSF (Load Sharing Facility) is batch management software that can manage computing resources across multiple platforms. It runs on the Lightning cluster and is used to submit and manage jobs. Common commands to submit and check the status of jobs include bsub, bjobs, bqueues, and bhosts. Bsub is used to submit jobs and allows options to set resource limits, dependencies, queues, and more.

Uploaded by

Abraham Brito
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

LSF for Users

Mike Page
[email protected]
SCD Consulting Services Group
SCD/HSS/CSG
What is LSF?
LSF - Load Sharing Facility

Batch Management Subsystem


for multi-host, multi-vendor complexes

Same role as LoadLeveler or NQE with capability to manage


computing resources across multiple platforms

LSF runs on the Lightning cluster


------------------------------------------------------------------------------
Documentation: /usr/local/docs/LSF/6.0/*.pdf
Hardware description: http://www.scd.ucar.edu/docs/lightning/overview.html
At a lightning command line enter: man lsfintro
Further reading: http://accl.grc.nasa.gov/lsf/about.html
To be able to access LSF
This has been added to your login processing:

. /usr/local/lsf/conf/profile.lsf (sh users)


or
source /usr/local/lsf/conf/cshrc.lsf (csh users)
These commands are executed before you receive a command prompt.
There is no need for you to add anything to your login files in order to use
LSF.

These commands define the LSF environment:


LSF_SERVERDIR, LSF_BINDIR, LSF_LIBDIR, XLSF_UIDDIR, LSF_ENVDIR, PATH, MANPATH

-------------------------------------------------------------------
Check: env | grep -i lsf
Essential Commands
for Users

• bhosts • bmod
• bqueues • bbot/btop
• bsub • bswitch
• bjobs • bstop/bresume
• bhist
• bkill
• bpeek
Essential Commands
Purpose
• bhosts - information about available hosts (lshosts)
• bqueues - information about available queues
• bsub - submit jobs to batch subsystem
• bjobs - list jobs in the batch subsystem
• bhist - displays historical information about user’s jobs
• bpeek - displays stdout and stderr of user’s unfinished job
• bmod - modifies job submission options for user’s job
Essential Commands
Purpose (cont’d)
• bbot/btop - moves a pending job relative to user’s last/first job
in a queue
• bswitch - switches user’s unfinished jobs from one queue to
another
• bstop/bresume - suspends/resumes user’s unfinished jobs
• bkill - kill, suspend or resume user’s jobs
Essential Commands: bhosts
bhosts [-w|-l][-R “res_req”][host_name|host_group]
Displays information about hosts/platforms

lshosts [-w | -l] [-R "res_req"] [host_name | cluster_name]


lshosts -s [shared_resource_name ...]

Displays hosts and their static resource information


ln0126en$ bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
ln0126en ok - 2 0 0 0 0 0
ln0127en ok - 2 0 0 0 0 0
ln0128en ok - 2 0 0 0 0 0
.
.
.
ln0440en ok - 2 0 0 0 0 0
ln0441en ok - 2 0 0 0 0 0
ln0442en ok - 2 0 0 0 0 0
Essential Commands: bqueues
bqueues [-w|-l|-r][-m host_name|-m all]
[-u user_name|-u all][queue_name …]

Displays information about queues.

By default, returns the following information about all queues: queue


name, queue priority, queue status, job slot statistics, and job state
statistics.
ln0126en$ bqueues
QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP
special 500 Open:Active - - - - 0 0 0 0
premium 300 Open:Active - - - - 0 0 0 0
regular 200 Open:Active - - - - 0 0 0 0
economy 160 Open:Active - - - - 0 0 0 0
hold 104 Open:Active - - - - 0 0 0 0
standby 100 Open:Active - - - - 0 0 0 0
share 100 Open:Active - - - - 0 0 0 0
Essential Commands: bsub
bsub [options] command [cmd_args]
Submits a job for batch execution
OPTION LIST
-B Sends mail at dispatch and initiation times.
-H Holds job in PSUSP and waits for bresume
-I | -Ip | -Is Submits as batch interactive
-K Submits job and locks cmd line with status updates
-N Sends job report by e-mail (use only with -I | -Is | -Ip or -o)
-r Rerun job on another host if host terminates
-x Exclusive execution mode
-a esub_parameters Specifies parallel job launcher (PJL) to be used
-b [[month:]day:]hour:minute Dispatch date/time
-C core_limit Limits size of core dumps (-C 0 recommended?)
-c [hours:]minutes[/host_name | /host_model] Cpu time limit
-D data_limit
-e err_file File to use as stderr
-E "pre_exec_command [arguments ...]" Pre-exec command invoked before batch stream command processing
-ext[sched] "external_scheduler_options" N/A
-f "local_file operator [remote_file]" ... Files to be copied between local/remote systems
-F file_limit Per process file size limit
Essential Commands: bsub (cont’d)
bsub [options] command [cmd_args]

OPTION LIST (cont’d)


- g job_group_name Submits job to a job group
-G user_group Associates job with a specific group
-i input_file | -is input_file Specifies stdin for job
-J job_name | -J "job_name[index_list]%job_slot_limit" Specifies job name
-k "checkpoint_dir [checkpoint_period][method=method_name]" Makes a job checkpointable and specifies checkpoint directory
-L login_shell Uses login_shell for runtime environment
-m "host_name[@cluster_name][+[pref_level]] | host_group[+[pref_level]] Selects and ranks hosts/groups on which to run
-M mem_limit Sets per process memory limit
-n min_proc[,max_proc] Sets min/max number of processors required to run job
-o out_file Specifies stdout
-P project_name Specifies project name
-p process_limit Limits total number of processes
-q queue_name Specifies queue for job (default provided by system)
-R "res_req" Specifies resource requirements
-sla service_class_name Specifies service class for job
-sp priority Specifies priority amongst user’s jobs
-S stack_limit Sets per-process stack limit
Essential Commands: bsub (cont’d)
bsub [options] command [cmd_args]

OPTION LIST (cont’d)


-t [[month:]day:]hour:minute Specifies job termination date
-T thread_limit Sets limit on number of concurrent jobs
-U reservation_ID Uses reservation via brsvadd command
-u mail_user Mail-to address
-v swap_limit Sets total process virtual memory limit
-w 'dependency_expression' Defines dependencies to be met before job initiation
-wa '[signal | command | CHKPNT]' Specifies action to be taken before job control step occurs
-wt '[hours:]minutes' Specifies time interval before job control occurs to send warning signal
-W [hours:]minutes[/host_name | /host_model] Specifies run time limit for job
-Zs Spolls command file and runs from there
The Importance of Being <

LSF usage is different from LL/NQS

bsub a.out
bsub -n 2 a.out
bsub myscript
bsub -q queuename a.out
bsub -i infile -o outfile - e errfile a.out

bsub < myscript


Sample LSF script
Serial Job
#!/bin/ksh
#
# LSF batch script to run a serial code
#
#BSUB -P 93300070 # Project 93300070
#BSUB -n 1 # number of tasks
#BSUB -J seriallsf.test # job name
#BSUB -o seriallsf.out # output filename
#BSUB -e seriallsf.err # input filename
#BSUB -q regular # queue

# Fortran example
pgf90 -o samp_f -Mextend samp.f
./samp_f

# C example
pgcc -o samp_c samp.c
./samp_c

# C++ example
pgCC --no_auto_instantiation -o samp_cc samp.cc
./samp_cc

bsub < serial.lsf


Sample LSF script
MPI Job
#!/bin/ksh
#
# LSF batch script to run the test MPI code
#
#BSUB -P 93300070 # Project 93300070
#BSUB -a mpich_gm # select the mpich-gm elim
#BSUB -x # exlusive use of node (not_shared)
#BSUB -n 2 # number of total tasks
#BSUB -R "span[ptile=1]" # run 1 tasks per node
#BSUB -J mpilsf.test # job name
#BSUB -o mpilsf.out # output filename
#BSUB -e mpilsf.err # error filename
#BSUB -q regular # queue

# Fortran example
mpif90 -o mpi_samp_f mpisamp.f
mpirun.lsf ./mpi_samp_f

# C example
mpicc -o mpi_samp_c mpisamp.c
mpirun.lsf ./mpi_samp_c

# C++ example
mpicxx -o mpi_samp_cc mpisamp.cc
mpirun.lsf ./mpi_samp_cc
bsub < mpi.lsf
Sample LSF script
OpenMP Job

#!/bin/ksh # C example
# pgcc -mp -o samp_c samp.c
# LSF script to run the test OMP codes export OMP_NUM_THREADS=1
# ./samp_c
#BSUB -P 93300070 # Proposal group 2 - Project 93300070 export OMP_NUM_THREADS=2
#BSUB -a mpich_gm # select the mpich-gm elim ./samp_c
#BSUB -x # exclusive use of node
#BSUB -n 2 # number of tasks # C++ example
#BSUB -R "span[hosts=1]" # jobs run on one host pgCC --no_auto_instantiation -mp -o sampcc samp.cc
#BSUB -J omplsf.test # job name export OMP_NUM_THREADS=1
#BSUB -o omplsf.out # ouput filename ./samp_cc
#BSUB -e omplsf.err # input filename export OMP_NUM_THREADS=2
#BSUB -q regular # queue ./samp_cc

# Fortran example
pgf90 -o samp_f -Mextend -mp samp.f
export OMP_NUM_THREADS=1
./samp_f
export OMP_NUM_THREADS=2
./samp_f

bsub < omp.lsf


Sample LSF script
MPMD Job
#!/bin/ksh
# # Fortran example
# LSF batch script to run the test MPMD codes mpif90 -Mextend -o $EXE'0' ../src/mpmd/itmpmd.f
# mpif90 -Mextend -o $EXE'1' ../src/mpmd/itmpmd.f
#BSUB -P 93300070 # Project 93300070 mpirun -pg pgfile /bin/pwd
#BSUB -a mpich_gm
#BSUB -n 2 # C example
#BSUB -x mpicc -o $EXE'0' ../src/mpmd/itmpmd.c
#BSUB -R "span[ptile=1]" mpicc -o $EXE'1' ../src/mpmd/itmpmd.c
#BSUB -o mpmdlsf.out # output filename mpirun -pg pgfile /bin/pwd
#BSUB -e mpmdlsf.err # error filename
#BSUB -J mpmdlsf.test # job name # C++ example
#BSUB -q regular # queue mpicxx --no_auto_instantiation -o $EXE'0' ../src/mpmd/itmpmd.cc
# mpicxx --no_auto_instantiation -o $EXE'1' ../src/mpmd/itmpmd.cc
#Build pgfile for mpmd run mpirun -pg pgfile /bin/pwd
rm -f pgfile
touch pgfile rm $EXE'0' $EXE'1' pgfile
#
EXE=../bin/itmpmd
#
j=0
for h in `echo $LSB_HOSTS`
do
echo ${h}" "${j}" "${EXE}${j} >> pgfile
j=`expr $j + 1`
done
#cat pgfile

bsub < mpmd.lsf


Sample LSF script
Hybrid Job
#!/bin/ksh
# # Fortran example
# LSF batch script to run the test mixed MPI/OMP codes mpif90 -Mextend -mp -lmp -o mix mix.f
# export OMP_NUM_THREADS=1
#BSUB -a mpich_gm # select mpich_gm elim mpirun-env.pl -pg pgfile $EXE
#BSUB -x # exclusive use of node export OMP_NUM_THREADS=2
#BSUB -n 2 # sum of number of tasks mpirun-env.pl -pg pgfile $EXE
#BSUB -R "span[ptile=1]" # number of processes per node
#BSUB -o mixlsf.out # output filename # C example
#BSUB -e mixlsf.err # error filename mpicc -mp -o mix mix.c
#BSUB -J mixlsf.test # job name export OMP_NUM_THREADS=1
#BSUB -q regular # queue mpirun-env.pl -pg pgfile $EXE
# export OMP_NUM_THREADS=2
#Build pgfile for mix run mpirun-env.pl -pg pgfile $EXE
rm -f pgfile
touch pgfile # C++ example
# mpicxx --no_auto_instantiation -mp -o mix mix.cc
EXE=${PWD}/mix export OMP_NUM_THREADS=1
# mpirun-env.pl -pg pgfile $EXE
echo $LSB_HOSTS export OMP_NUM_THREADS=2
j=0 mpirun-env.pl -pg pgfile $EXE
for h in `echo $LSB_HOSTS`
do rm pgfile
echo ${h}" "${j}" "${EXE} >> pgfile
j=`expr $j + 1`
done

bsub < mix.lsf


Essential Commands: bjobs
bjobs - Displays information about LSF jobs
bjobs -u user_name
bjobs -u all
bjobs -l
bjobs -r
bjobs -s
bjobs -q queue_name
Essential Commands: bhist
bhist - displays historical information about jobs

bhist -J job_name
bhist -C start_time, end_time
bhist -D start_time, end_time
bhist -S start_time, end_time
bhist -T start_time, end_time
Essential Commands: bpeek
bpeek - displays stdout and stderr of user’s selected, unfinished job
bpeek -f uses ‘tail -f’ to display output instead of ‘cat’

bpeek [-q queue_name | -m host_name | -J job_name |


job_ID | "job_ID[index_list]"]
Essential Commands: bmod
bmod - modifies job submission options of a job

bmod [bsub options] [job_ID | "job_ID[index]"]


bmod -g job_group_name | -gn [job_ID]
bmod [-sla service_class_name | -slan] [job_ID]
bmod [-h | -V]
Essential Commands: bbot, btop
bbot - moves a pending job relative to the last job in the
queue
bbot job_ID | "job_ID[index_list]" [position]
bbot [-h | -V]

btop - moves a pending job relative to the first job in the


queue
btop job_ID | "job_ID[index_list]" [position]
btop [-h | -V]
Essential Commands: bswitch
bswitch - switches unfinished jobs from one queue to
another

bswitch [-J job_name] [-m host_name | -m host_group]


[-q queue_name] [-u user_name | -u user_group | -u all]

destination_queue [0]
bswitch destination_queue [job_ID | "job_ID[index_list]"] ...
bswitch [-h | -V]
Essential Commands: bstop/bresume
bstop -suspends unfinished jobs

bstop [-a] [-d] [-g job_group_name |-sla service_class_name]


[-J job_name] [-m host_name | -m host_group]
[-q queue_name] [-u user_name | -u user_group | -u all] [0]
[job_ID | "job_ID[index]"] ...
bstop [-h | -V]

bresume -resumes one or more suspended jobs

bresume [-g job_group_name] [-J job_name] [-m host_name ]


[-q queue_name] [-u user_name | -u user_group | -u all ] [0]
bresume [job_ID | "job_ID[index_list]"] ...
bresume [-h | -V]
Essential Commands: bkill
bkill - sends signals to kill, suspend, or resume unfinished
jobs

bkill [-l] [-g job_group_name | -sla service_class_name]


[-J job_name] [-m host_name | -m host_group]
[-q queue_name] [-r | -s (signal_value | signal_name)]
[-u user_name | -u user_group | -u all]
[job_ID ... | 0 | "job_ID[index]" ...]
bkill [-h | -V]
Questions?
Comments?

You might also like