Slide 3.2 Virtual Machines and Containers - PPTX 1 1
Slide 3.2 Virtual Machines and Containers - PPTX 1 1
Source: https://www.nvidia.com/
● Singularity allows you to map (bind) directories on your host system to directories within your
container using bind mounts.
○ The system defined bound for Singularity container are $HOME, /tmp. /proc, /sys, /dev etc.
○ The binding feature of singularity with -B flag can be used for /mnt and /scratch as showed:
■ singularity exec -B /mnt,/scratch --nv $TENSORFLOW python train.py
○ Also, it is recommended to use /scratch space instead of default /tmp.
● Use the --nv option to grant your containers GPU support at runtime
○ singularity exec --help
■ -n/--nv Enable experimental Nvidia support
Demo: Singularity Tensorflow - Matrix Multiplication
● Find the python script log-device-placement.py at
~/datascience/csds312/singularity/tensorflow
● Run Tensorflow interactively
○ singularity exec --nv -B /scratch $TENSORFLOW python
~/datascience/csds312/singularity/tensorflow/log-device-placement.py
■ MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
■ 2020-09-22 15:59:54.975151: I tensorflow/core/common_runtime/placer.cc:927] b:
(Const)/job:localhost/replica:0/task:0/device:GPU:0
■ [[22. 28.]
■ [49. 64.]]
Multiple GPUs
# single GPU
with tf.device('/gpu:0'): # Use only one gpu - gpu0
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
# Multiple GPUs
for d in ['/gpu:0', '/gpu:1']: # Request 2 Gpus (check nvidia-smi)
with tf.device(d): # Use both GPUs - gpu0 & gpu1
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
Demo: Singularity container - Pull Images
● Repositories for images i.e. location of container images
○ Singularity Container image search:
■ singularity search “keyword” # e.g. cuda (also look for Singularity Container Tools)
○ Docker Hub (docker): https://hub.docker.com/ - keyword search (e.g. cuda) at dockerhub
search field on top-left of the menu
● In HPC/VM (CSDS312) Environment - Pull Singularity Image
○ Pull a singularity GPU container (image) in your project space
/mnt/pan/courses/sxg125_csds312/singularity (you can also use
/scratch/pbsjobs/<caseID>/singularity space instead in HPC)
■ Command format: singularity pull shub://<singularity-container-full-path> # in HPC
● singularity pull shub://ucr-singularity/cuda-10.1-base:latest # load module
● ll -h # check the size of the image
● singularity exec --nv ./cuda-10.1-base_latest.sif nvcc -V # compiler ver
○ Cuda compilation tools, release 10.1, V10.1.105
Demo: Singularity container - pull docker image
■ Command format: singularity pull docker://<docker-container-full-path> # in VM
● SSH to csds312.case.edu
● Create a singularity directory and cd to it
● Pull the docker image
○ /usr/local/singularity/3.6.1/bin/singularity pull
docker://samuelcolvin/tensorflow-gpu-py36
● Check the image and its size
○ ll -h
■ -rwxr-xr-x. 1 root root 1.3G Sep 22 17:00
tensorflow-gpu-py36_latest.sif
Demo: Test Singularity container
● Test Tensorflow container
○ /usr/local/singularity/3.6.1/bin/singularity exec ./tensorflow-gpu-py36_latest.sif cat
/etc/os-release
■ NAME="Ubuntu"
■ VERSION="16.04.2 LTS (Xenial Xerus)"
■ ID=ubuntu
■ ID_LIKE=debian
■ PRETTY_NAME="Ubuntu 16.04.2 LTS"
■ VERSION_ID="16.04"
○ /usr/local/singularity/3.6.1/bin/singularity exec ./tensorflow-gpu-py36_latest.sif pip freeze
■ ...
■ tensorflow-gpu==1.3.0rc0
■ tensorflow-tensorboard==0.1.2
■ ...
Build Singularity Container
● In your PC or csds312 Virtual Machine
○ Singularity Installation - https://sylabs.io/guides/3.0/user-guide/installation.html or HPC Guide
to Singularity
■ You can use singularity installed in a VM - csds312 at /usr/local/singularity/3.6.1/bin
○ Build a singularity container in your PC or in a csds312 VM as a sudo user
■ Use build option
● sudo /usr/local/singularity/3.6.1/bin/singularity build --sandbox tensorflow-gpu
docker://samuelcolvin/tensorflow-gpu-py36 # directly from image location
■ Use singularity definition file
● cat /usr/local/singularity/definitions/opensees/opensees.def # location in Markov
○ Bootstrap: docker
○ From: sorcerer01/opensees ....
● e.g. sudo /usr/local/singularity/3.6.1/bin/singularity build opensees.img
/usr/local/singularity/definitions/opensees/opensees.def
Show Demo: Singularity Container - Sandbox
● Convert image (“tensorflow-gpu-py36_latest.sif”) to sandbox (“tensorflow-gpu”) so that
you can have access to directory structures
○ sudo /usr/local/singularity/3.6.1/bin/singularity build --sandbox tensorflow-gpu
tensorflow-gpu-py36_latest.sif # from the image
■ [sudo] password for sxg125:
■ INFO: Creating sandbox directory...
■ INFO: Build complete: tensorflow-gpu
● Check files in a sandbox directory
○ ls tensorflow-gpu/bin/
● Shell to the sandbox directory (e.g. tensorflow-gpu) - ignore the warnings if any
○ sudo /usr/local/singularity/3.6.1/bin/singularity shell -w ./tensorflow-gpu
■ [sudo] password for sxg125:
■ Singularity>
Show Demo: Singularity Container - Test
● Test
○ Singularity> cat /etc/os-release
■ NAME="Ubuntu"
Show Demo: Singularity Container - Install Packages
● Check the package e.g Cython module (for optimising compiler for both
python and Cython programming language)
○ Singularity> pip freeze | grep Cython
● Update packages
○ Singularity> apt-get update # apt-get for installing system packages
● Upgrade pip (for pip install)
○ Singularity> pip install --upgrade pip # pip install for installing python packages
● Install python module, Cython
○ Singularity> pip install cython
Show Demo: Singularity Container - Test
● Check if Cython has been installed
○ Singularity> pip freeze | grep Cython
■ Cython==0.29.21
● You may need to install the dependency packages while installing python package:
○ Singularity> apt install <dependency-package>
● Exit from the shell: Singularity> exit
● Create an image from the sandbox that is ready to be transferred to HPC
○ sudo /usr/local/singularity/3.6.1/bin/singularity build tensorflow-gpu-py36_latest.img
tensorflow-gpu
● Test the latest image before transferring it to HPC
○ /usr/local/singularity/3.6.1/bin/singularity exec ./tensorflow-gpu-py36_latest.img pip freeze |
grep Cython
■ Cython==0.29.21
Show Demo: Singularity Container - Transfer Image
● Transfer the container image to HPC environment
■ Use Globus Transfer (recommended for larger file size). Create Globus Personal
Endpoint first
■ Other ways to transfer - HPC Guide to Transferring Files
■ Use scratch space (/scratch/pbsjobs/<create-your-caseID-directory>/singularity) or the
course space (/mnt/pan/courses/sxg125_csds312/)
● It will save your $HOME space that is limited by group quota
● 1 TB of temporary space will be deleted after 14 days
● More details on HPC Guide to Storage and Quota
■ Example: SCP Transfer
● scp ~/singularity/tensorflow-gpu-py36_latest.img
<caseID>@markov.case.edu:/scratch/pbsjobs/<caseID>/singularity
Show Demo: Singularity Container - Test
● Test the transferred image
○ Open Markov terminal and load singularity module
○ singularity exec ./tensorflow-gpu-py36_latest.img pip freeze | grep Cython
■ Cython==0.29.21
● Run a job from HPC - HPC Guide to Singularity
○ Batch job is recommended
Singularity Container - MPI
● Pull the image from Docker:
○ singularity pull docker://rinnocente/gromed-ts-revised-2019:latest # single node
● Job script:
○ #SBATCH -N 1 -n <# of CPUs> --mem=<memory-size>gb
○ cp -r <topology-and-data-files> $PFSDIR
○ cd $PFSDIR
○ singularity exec -B /scratch ~/<path-to-image>/gromed-ts-revised-2019_latest.sif mpiexec -np
<# of CPUs> /usr/local/gromacs/bin/gmx_mpi mdrun -nb auto -s <TPR file> -nsteps <steps>
-plumed <plumed-input-data>
● Search for other MPI images from Docker Hub (multiple Nodes).
Kubernetes (K8s)
● Kubernetes
(https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/) is a
portable, extensible, open-source platform for managing containerized
workloads and services, that facilitates both declarative configuration and
automation (using YAML files).
○ Google open-sourced the Kubernetes project in 2014.
● In a production environment, there is a need to manage the containers that
run the applications and ensure that there is no downtime.
○ Kubernetes provides a framework to run distributed systems resiliently.
○ It takes care of scaling and failover for the applications, provides deployment patterns, and
more.
Kubernetes (K8s) Cont’d
Fig. (left) K8s Cluster with 3 master servers and multiple nodes
(servers) with docker containers in pods (right) kubectl
commands summary; YouTube Source: https://www.youtube.com/watch?v=X48VuDVv0do
Demo: Accessing Nautilus Kubernetes Cluster
● Nautilus [https://pacificresearchplatform.org/nautilus/] is a HyperCluster for running
containerized Big Data Applications. It is using Kubernetes for managing and scaling
containerized applications and hook for automating Ceph data services.
○ Ceph is known for providing scale-out file, block, and object storage within a single data
center
○ also includes a robust set of multi-cluster federation capabilities.
● Access Nautilus:
○ Click on https://nautilus.optiputer.net/ for Pacific Research Platform (PRP) Kubernetes portal
that allows you to login in using your Case Credentials via Institutional (Case Western)
CILogon
○ Follow quickstart Guide (http://ucsd-prp.gitlab.io/userdocs/start/quickstart/)
Demo: Accessing Nautilus Kubernetes Cluster
● You can find the namespaces of your interest in the portal (https://nautilus.optiputer.net/) in
“Namespace Overview” menu. # cwru-dev
● Check the contributed servers:
○ ~/.local/bin/kubectl/kubectl get nodes # find nautilusc01.sci.cwru.edu
■ k8s-x1-01.calit2.optiputer.net Ready <none> 152d v1.21.5
■ nautilusc01.sci.cwru.edu Ready <none> 40d v1.21.5
■ netw-fiona-ucsf.stanford.edu Ready <none> 691d v1.21.5
● If you receive the error “Unable to connect to the server: failed to refresh token: oauth2: cannot fetch
token: 400 Bad Request …”, you need to get the new config file using “Get Config” (portal)
● You are granted the user role in the cluster.
○ To use the namespace of your interest, the owner of a namespace needs to add you to his/her
namespace for which you need to contact them.
Show Demo: Accessing namespace “cwru-dev”
● Set the context - “cluster: nautilus”; “namespace: cwru-dev” # use Markov Desktop
○ ~/.local/bin/kubectl/kubectl config set-context nautilus --namespace=cwru-dev
● Verify the context:
○ ~/.local/bin/kubectl/kubectl config get-contexts
● Find the pod configuration file “tensorflow-pod.yaml” at ~/datascience/kubernetes that
has information about the pod
○ kind: Pod
○ metadata:
○ name: gpu-pod-example # name of the pod
○ spec:
○ containers:
○ - name: gpu-container # name of the container
○ image: gitlab-registry.nrp-nautilus.io/prp/jupyter-stack/tensorflow # container image
Show Demo: Creating and Running a Pod
● Change directory to ~/datascience/kubernetes/ and create a tensorflow pod
using that yaml config file
○ ~/.local/bin/kubectl/kubectl create -f ~/datascience/kubernetes/tensorflow-pod.yaml
● Check the PODS running
○ ~/.local/bin/kubectl/kubectl get pods
● Login into your tensorflow pod
○ ~/.local/bin/kubectl/kubectl exec -it gpu-pod-example bash # i->interactive; t->terminal
● Check the tensorflow package in the container image in the pod (see
“~/admin/datascience/kubernetes/tensorflow-pod.yaml”)
○ (base) jovyan@gpu-pod-example:~$ pip freeze | grep tensorflow
● Open Jupyter Notebook
○ jupyter notebook --ip='127.0.0.1'
Show Demo: Creating and Running a Pod
● You will get url along with the token as one of the outputs similar to:
○ http://gpu-pod-example:8888/?token=990f183e63868fd0b20fc71c729620d965b83bc32aaae40
9
● Open another terminal
● Setup port-forwarding to access the pod from your local machine (HPC)
○ ~/.local/bin/kubectl/kubectl port-forward gpu-pod-example 8888:8888 # print session status
● Open the Browser and copy/paste url/token to open the Jupyter notebook
● Open/upload the notebook file “classification.ipnb” (classify images of
clothing) with tensorflow commands and run the commands.
● Log out from the container and sessions, and delete the pod
○ ~/.local/bin/kubectl/kubectl delete pod gpu-pod-example # podname-> gpu-pod-example
Exercises
● Reading
○ Introduction to Singularity: https://sylabs.io/guides/3.6/user-guide/introduction.html
● Install the python module “opencv-python” in “tensorflow-gpu” sandbox that
was created in the class. Check if there are any dependency packages to be
installed (check the message in the console) before installing opencv-python
and install them.
○ Check if opencv-python is installed.
○ Create image out of the sandbox
○ Transfer the image to HPC at /scratch/pbsjobs/<caseID>/singularity
○ Test the image in HPC using a slurm batch job.