This repo contains unit tests and integration tests for PyTorch at NERSC.
Run unit tests on Perlmutter:
module load pytorch/2.3.1
sbatch -A $youraccount scripts/
Run all CPU tests on Cori in a conda environment:
conda activate my_env
sbatch scripts/
Run the DDP test on 2 full nodes on Perlmutter with the NCCL backend:
module load pytorch/2.3.1
sbatch -A $youraccount -N 2 --ntasks-per-node 4 scripts/ --backend nccl --init-method file --ranks-per-node 4