User Manual
User Manual
This cluster having 16 CPU nodes & 1 GPU node and 1 DGX
GPU node.
Configuration:
CPU Nodes:
2x Intel Xeon-Gold 6248 Processor, 20 Core, total 40 cores,
192 GB Memory.
Mellanox 100Gbps Interconnect
Total 16 nodes and 640 Cores for running MPI, OpenMP and
Hybrid jobs.
The Node names are node 1 to node 16.
GPU Node:
2x Intel Xeon-Gold 6248 Processor, 20 Core, total 40 cores,
192 GB Memory
Mellanox 100Gbps Interconnect
GPU Node having Nvidia Tesla V100 GPU Card and installed
with necessary drivers and configured to work with slurm job
scheduler.
Total 40 Cores for running GPU jobs
The node are named as gpu1.
Storage:
This cluster also has 200 TB Lustre Storage in this storage is
allotted to "/home".
To Access or login to the HPC can be done using SSH (Secure Shell). You
can use software like Putty or directly access SSH from Terminal /
command prompt.
a. Submitting Jobs
To submit job, user need to create a job script as follows
For GPU Nodes:
#!/bin/bash
#SBATCH --job-name=newjob
#SBATCH --partition=gpu
#SBATCH [email protected]
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --gres=gpu:1
#SBATCH --mail-type=ALL
#SBATCH --workdir=/home/iitgoa/
#SBATCH --output=newjob%j.out
## OpenMP Case
## Make sure the ppn value and OMP NUM THREADS
value are same
or
##leave with SLURM NTASKS env variable
#export OMP NUM THREADS=$SLURM NTASKS
#<your executable> >& out $SLURM JOBID
#MPI Case
### Only one executable allowed
mpirun -np <npvalue>
<your/executable/with/path> >& out $SLURM JOBID
#sbatch myscript.sh
This command submits jobs to scheduler and returns a job id. This
jobid
will be used later for monitoring and managing jobs.
#squeue
c. Cancel a job
scancel <jobid>
This command will show the full details of job including its
job script file which is in queue.
For example:
sacct --
format=JoblD,JobName,MaxRSS,UserCPU,SystemCPU,CpuTime -j
2588
JobID JobName MaxRSS UserCPU SystemCPU
CPUTime
This command hold the job which is queue but not running.
scontrol resume <jobid>
b. Job scheduler
Using this menu user can monitor the all their scheduled jobs
details like Job ID, Partition Name, User name, Job Name, Job
State, Time, Time Limit, CPUs, Nodes, Node list. Users also can do
the actions include ® get information of job, "X" delete job, hold
job and release old job.
Using this menu user can submit new job to users. This
will give window to provide necessary details like job
name, communication email, stdout file, working
This add new job window has info button at each input,
which will give required information about the input
field.
4.Installing Packages
The master node is configured in such a way that all software
installed in /home will be available to all compute nodes. So
we recommend to choose the installation path as follows.
/home/<usename>/softwares/<softwarename>/<version>
/home/hpcuser/softwares/fftw/3.3.7
--prefix=
/home/<username>/softwares/<softwarename>/<version>
-DINSTA LL PREFIX
homekusemainc>/softwarcsi<softwarenamc>/<version>
5.Module
d. Load a module