0% found this document useful (0 votes)
14 views31 pages

Slide 3.2 Virtual Machines and Containers - PPTX 1 1

The document provides an overview of Virtual Machines (VMs) and Containers, explaining their functionalities, differences, and usage in high-performance computing (HPC). VMs use a hypervisor for hardware-level virtualization, while containers utilize operating system virtualization for application deployment, allowing for greater portability and resource efficiency. Additionally, it covers Singularity for HPC, Kubernetes for managing containerized applications, and various demos related to their implementation and usage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views31 pages

Slide 3.2 Virtual Machines and Containers - PPTX 1 1

The document provides an overview of Virtual Machines (VMs) and Containers, explaining their functionalities, differences, and usage in high-performance computing (HPC). VMs use a hypervisor for hardware-level virtualization, while containers utilize operating system virtualization for application deployment, allowing for greater portability and resource efficiency. Additionally, it covers Singularity for HPC, Kubernetes for managing containerized applications, and various demos related to their implementation and usage.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

HPC - CSDS312

Virtual Machines and Containers


Virtual Machines (VM)
● VM is the representation of a physical machine by
software (ref. VMWare white paper).
○ VM has its own set of virtual hardware (e.g., RAM,
CPU, NIC, hard disks, etc.) upon which an operating
system (Guest OS - Windows 2003 Server, Linux)
and applications are loaded.
○ a lightweight software layer called a hypervisor (an
emulator) coordinate between VMs and the underlying
physical hardware
○ Host OS runs Hypervisor process which creates and Fig. Physical Server with its own operating system
(host operating System) which manages
runs VMs. heterogenous virtual machines with their own OS
(Guest OS); Source: https://www.nvidia.com/
○ VMs share the original resources from the bare metal
server (host), which the hypervisor manages.
VM Cont’d
● Such full virtualization is referred to as
“hardware-level virtualization.”
● Virtualization allows multiple virtual
machines run in isolation, side-by-side on
the same physical machine.
○ A problem in one VM does not affect any other
VMs on the host.
● There is a great deal of replication among
VMs (sharing of host resources),
increasing resource usage.
Demo: VM - Guest OS
● Check the Guest OS of VDI (Virtual Desktop Infrastructure):
https://myapps.case.edu
○ Desktop -> CWRU Desktop -> Windows Icon -> settings -> System -> About
● Check the Guest OS of virtual machine for the class - csds312
○ ssh <caseID>@csds312.case.edu
○ cat /etc/os-release
■ Red Hat Enterprise Linux Server release 7.8 (Maipo)
● More info about csds312
○ lscpu
■ CPU(s): 2
■ Hypervisor vendor: VMware
■ Virtualization type: full
○ lsmem or cat /proc/meminfo
Containers
● Containers are a good way to bundle and run your applications.
○ A container consists of the application and its library or system dependencies
○ referred to as “operating system virtualization (vs Hardware-level virtualization in VM).”
● This higher level of virtualization may allow users to use a small image or
container.
● Containers have relaxed isolation properties to share the Operating System
(OS) among the applications
○ Containers are made possible using kernel features of the host OS and layered file system
instead of the emulation layer (e.g hypervisor) required to run virtual machines.
● As containers are decoupled from the underlying infrastructure, they are
portable across clouds and OS distributions.
VM Vs Container
● While VMs encapsulate the entire OS and applications, containers encapsulate
individual applications and their dependencies for portable deployment
○ all containers on a server share the same host OS, and hence completely dependent on the
single OS

Source: https://www.nvidia.com/

Docker Engine is an open source


containerization technology for building
and containerizing your applications
Singularity
● Singularity is popular for HPC as it
can be run as a job and doesn’t Source: https://tin6150.github.io/
require root access to run the job.
● It works well with scheduler (e.g.
Slurm).
● Singularity app runs as a user
process (not as a root as in Docker)
● has access to all of host's file system
and devices that any user process
has access to (contrast to isolated
filesystem in Docker) .
Demo: Singularity Basics
Run Singularity

● Open Markov Shell and request a GPU node


○ srun -p class --gres=gpu:1 -A sxg125_csds312 --pty bash
● Load singularity module
● Check the environment variable for tensorflow
○ module display singularity
● singularity --help
○ Usage:
○ singularity [global options...]
○ exec Run a command within a container
Singularity Basics - Cont’d
Bind Paths

● Singularity allows you to map (bind) directories on your host system to directories within your
container using bind mounts.
○ The system defined bound for Singularity container are $HOME, /tmp. /proc, /sys, /dev etc.
○ The binding feature of singularity with -B flag can be used for /mnt and /scratch as showed:
■ singularity exec -B /mnt,/scratch --nv $TENSORFLOW python train.py
○ Also, it is recommended to use /scratch space instead of default /tmp.

GPU support at run time (Demo)

● Use the --nv option to grant your containers GPU support at runtime
○ singularity exec --help
■ -n/--nv Enable experimental Nvidia support
Demo: Singularity Tensorflow - Matrix Multiplication
● Find the python script log-device-placement.py at
~/datascience/csds312/singularity/tensorflow
● Run Tensorflow interactively
○ singularity exec --nv -B /scratch $TENSORFLOW python
~/datascience/csds312/singularity/tensorflow/log-device-placement.py
■ MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
■ 2020-09-22 15:59:54.975151: I tensorflow/core/common_runtime/placer.cc:927] b:
(Const)/job:localhost/replica:0/task:0/device:GPU:0
■ [[22. 28.]
■ [49. 64.]]
Multiple GPUs
# single GPU
with tf.device('/gpu:0'): # Use only one gpu - gpu0
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')

# Multiple GPUs
for d in ['/gpu:0', '/gpu:1']: # Request 2 Gpus (check nvidia-smi)
with tf.device(d): # Use both GPUs - gpu0 & gpu1
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
Demo: Singularity container - Pull Images
● Repositories for images i.e. location of container images
○ Singularity Container image search:
■ singularity search “keyword” # e.g. cuda (also look for Singularity Container Tools)
○ Docker Hub (docker): https://hub.docker.com/ - keyword search (e.g. cuda) at dockerhub
search field on top-left of the menu
● In HPC/VM (CSDS312) Environment - Pull Singularity Image
○ Pull a singularity GPU container (image) in your project space
/mnt/pan/courses/sxg125_csds312/singularity (you can also use
/scratch/pbsjobs/<caseID>/singularity space instead in HPC)
■ Command format: singularity pull shub://<singularity-container-full-path> # in HPC
● singularity pull shub://ucr-singularity/cuda-10.1-base:latest # load module
● ll -h # check the size of the image
● singularity exec --nv ./cuda-10.1-base_latest.sif nvcc -V # compiler ver
○ Cuda compilation tools, release 10.1, V10.1.105
Demo: Singularity container - pull docker image
■ Command format: singularity pull docker://<docker-container-full-path> # in VM
● SSH to csds312.case.edu
● Create a singularity directory and cd to it
● Pull the docker image
○ /usr/local/singularity/3.6.1/bin/singularity pull
docker://samuelcolvin/tensorflow-gpu-py36
● Check the image and its size
○ ll -h
■ -rwxr-xr-x. 1 root root 1.3G Sep 22 17:00
tensorflow-gpu-py36_latest.sif
Demo: Test Singularity container
● Test Tensorflow container
○ /usr/local/singularity/3.6.1/bin/singularity exec ./tensorflow-gpu-py36_latest.sif cat
/etc/os-release
■ NAME="Ubuntu"
■ VERSION="16.04.2 LTS (Xenial Xerus)"
■ ID=ubuntu
■ ID_LIKE=debian
■ PRETTY_NAME="Ubuntu 16.04.2 LTS"
■ VERSION_ID="16.04"
○ /usr/local/singularity/3.6.1/bin/singularity exec ./tensorflow-gpu-py36_latest.sif pip freeze
■ ...
■ tensorflow-gpu==1.3.0rc0
■ tensorflow-tensorboard==0.1.2
■ ...
Build Singularity Container
● In your PC or csds312 Virtual Machine
○ Singularity Installation - https://sylabs.io/guides/3.0/user-guide/installation.html or HPC Guide
to Singularity
■ You can use singularity installed in a VM - csds312 at /usr/local/singularity/3.6.1/bin
○ Build a singularity container in your PC or in a csds312 VM as a sudo user
■ Use build option
● sudo /usr/local/singularity/3.6.1/bin/singularity build --sandbox tensorflow-gpu
docker://samuelcolvin/tensorflow-gpu-py36 # directly from image location
■ Use singularity definition file
● cat /usr/local/singularity/definitions/opensees/opensees.def # location in Markov
○ Bootstrap: docker
○ From: sorcerer01/opensees ....
● e.g. sudo /usr/local/singularity/3.6.1/bin/singularity build opensees.img
/usr/local/singularity/definitions/opensees/opensees.def
Show Demo: Singularity Container - Sandbox
● Convert image (“tensorflow-gpu-py36_latest.sif”) to sandbox (“tensorflow-gpu”) so that
you can have access to directory structures
○ sudo /usr/local/singularity/3.6.1/bin/singularity build --sandbox tensorflow-gpu
tensorflow-gpu-py36_latest.sif # from the image
■ [sudo] password for sxg125:
■ INFO: Creating sandbox directory...
■ INFO: Build complete: tensorflow-gpu
● Check files in a sandbox directory
○ ls tensorflow-gpu/bin/
● Shell to the sandbox directory (e.g. tensorflow-gpu) - ignore the warnings if any
○ sudo /usr/local/singularity/3.6.1/bin/singularity shell -w ./tensorflow-gpu
■ [sudo] password for sxg125:
■ Singularity>
Show Demo: Singularity Container - Test
● Test
○ Singularity> cat /etc/os-release
■ NAME="Ubuntu"
Show Demo: Singularity Container - Install Packages

Install additional packages

● Check the package e.g Cython module (for optimising compiler for both
python and Cython programming language)
○ Singularity> pip freeze | grep Cython
● Update packages
○ Singularity> apt-get update # apt-get for installing system packages
● Upgrade pip (for pip install)
○ Singularity> pip install --upgrade pip # pip install for installing python packages
● Install python module, Cython
○ Singularity> pip install cython
Show Demo: Singularity Container - Test
● Check if Cython has been installed
○ Singularity> pip freeze | grep Cython
■ Cython==0.29.21
● You may need to install the dependency packages while installing python package:
○ Singularity> apt install <dependency-package>
● Exit from the shell: Singularity> exit
● Create an image from the sandbox that is ready to be transferred to HPC
○ sudo /usr/local/singularity/3.6.1/bin/singularity build tensorflow-gpu-py36_latest.img
tensorflow-gpu
● Test the latest image before transferring it to HPC
○ /usr/local/singularity/3.6.1/bin/singularity exec ./tensorflow-gpu-py36_latest.img pip freeze |
grep Cython
■ Cython==0.29.21
Show Demo: Singularity Container - Transfer Image
● Transfer the container image to HPC environment
■ Use Globus Transfer (recommended for larger file size). Create Globus Personal
Endpoint first
■ Other ways to transfer - HPC Guide to Transferring Files
■ Use scratch space (/scratch/pbsjobs/<create-your-caseID-directory>/singularity) or the
course space (/mnt/pan/courses/sxg125_csds312/)
● It will save your $HOME space that is limited by group quota
● 1 TB of temporary space will be deleted after 14 days
● More details on HPC Guide to Storage and Quota
■ Example: SCP Transfer
● scp ~/singularity/tensorflow-gpu-py36_latest.img
<caseID>@markov.case.edu:/scratch/pbsjobs/<caseID>/singularity
Show Demo: Singularity Container - Test
● Test the transferred image
○ Open Markov terminal and load singularity module
○ singularity exec ./tensorflow-gpu-py36_latest.img pip freeze | grep Cython
■ Cython==0.29.21
● Run a job from HPC - HPC Guide to Singularity
○ Batch job is recommended
Singularity Container - MPI
● Pull the image from Docker:
○ singularity pull docker://rinnocente/gromed-ts-revised-2019:latest # single node
● Job script:
○ #SBATCH -N 1 -n <# of CPUs> --mem=<memory-size>gb
○ cp -r <topology-and-data-files> $PFSDIR
○ cd $PFSDIR
○ singularity exec -B /scratch ~/<path-to-image>/gromed-ts-revised-2019_latest.sif mpiexec -np
<# of CPUs> /usr/local/gromacs/bin/gmx_mpi mdrun -nb auto -s <TPR file> -nsteps <steps>
-plumed <plumed-input-data>
● Search for other MPI images from Docker Hub (multiple Nodes).
Kubernetes (K8s)
● Kubernetes
(https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/) is a
portable, extensible, open-source platform for managing containerized
workloads and services, that facilitates both declarative configuration and
automation (using YAML files).
○ Google open-sourced the Kubernetes project in 2014.
● In a production environment, there is a need to manage the containers that
run the applications and ensure that there is no downtime.
○ Kubernetes provides a framework to run distributed systems resiliently.
○ It takes care of scaling and failover for the applications, provides deployment patterns, and
more.
Kubernetes (K8s) Cont’d

Fig. (left) K8s Cluster with 3 master servers and multiple nodes
(servers) with docker containers in pods (right) kubectl
commands summary; YouTube Source: https://www.youtube.com/watch?v=X48VuDVv0do
Demo: Accessing Nautilus Kubernetes Cluster
● Nautilus [https://pacificresearchplatform.org/nautilus/] is a HyperCluster for running
containerized Big Data Applications. It is using Kubernetes for managing and scaling
containerized applications and hook for automating Ceph data services.
○ Ceph is known for providing scale-out file, block, and object storage within a single data
center
○ also includes a robust set of multi-cluster federation capabilities.
● Access Nautilus:
○ Click on https://nautilus.optiputer.net/ for Pacific Research Platform (PRP) Kubernetes portal
that allows you to login in using your Case Credentials via Institutional (Case Western)
CILogon
○ Follow quickstart Guide (http://ucsd-prp.gitlab.io/userdocs/start/quickstart/)
Demo: Accessing Nautilus Kubernetes Cluster

● In the documentation (http://ucsd-prp.gitlab.io/userdocs/start/quickstart/), you are asked to:


○ install kubectl client (binary) in your machine (e.g. in Markov at ~/.user/local/kubectl/<version>
and create module to access it)
○ download the configuration file from portal menu “Get Config” and include it at ~/.kube/
○ There is also a link (https://ucsd-prp.gitlab.io/userdocs/running/toc-running/) to creating your
first ML (Machine Learning) job in kubernetes
● CWRU doesn’t have kubernetes cluster but we have contributed server to Nautilus K8 cluster
● Use Markov Desktop for Jupyter Notebook Demo
Show Demo: Accessing Nautilus Kubernetes Cluster

● You can find the namespaces of your interest in the portal (https://nautilus.optiputer.net/) in
“Namespace Overview” menu. # cwru-dev
● Check the contributed servers:
○ ~/.local/bin/kubectl/kubectl get nodes # find nautilusc01.sci.cwru.edu
■ k8s-x1-01.calit2.optiputer.net Ready <none> 152d v1.21.5
■ nautilusc01.sci.cwru.edu Ready <none> 40d v1.21.5
■ netw-fiona-ucsf.stanford.edu Ready <none> 691d v1.21.5
● If you receive the error “Unable to connect to the server: failed to refresh token: oauth2: cannot fetch
token: 400 Bad Request …”, you need to get the new config file using “Get Config” (portal)
● You are granted the user role in the cluster.
○ To use the namespace of your interest, the owner of a namespace needs to add you to his/her
namespace for which you need to contact them.
Show Demo: Accessing namespace “cwru-dev”
● Set the context - “cluster: nautilus”; “namespace: cwru-dev” # use Markov Desktop
○ ~/.local/bin/kubectl/kubectl config set-context nautilus --namespace=cwru-dev
● Verify the context:
○ ~/.local/bin/kubectl/kubectl config get-contexts
● Find the pod configuration file “tensorflow-pod.yaml” at ~/datascience/kubernetes that
has information about the pod
○ kind: Pod
○ metadata:
○ name: gpu-pod-example # name of the pod
○ spec:
○ containers:
○ - name: gpu-container # name of the container
○ image: gitlab-registry.nrp-nautilus.io/prp/jupyter-stack/tensorflow # container image
Show Demo: Creating and Running a Pod
● Change directory to ~/datascience/kubernetes/ and create a tensorflow pod
using that yaml config file
○ ~/.local/bin/kubectl/kubectl create -f ~/datascience/kubernetes/tensorflow-pod.yaml
● Check the PODS running
○ ~/.local/bin/kubectl/kubectl get pods
● Login into your tensorflow pod
○ ~/.local/bin/kubectl/kubectl exec -it gpu-pod-example bash # i->interactive; t->terminal
● Check the tensorflow package in the container image in the pod (see
“~/admin/datascience/kubernetes/tensorflow-pod.yaml”)
○ (base) jovyan@gpu-pod-example:~$ pip freeze | grep tensorflow
● Open Jupyter Notebook
○ jupyter notebook --ip='127.0.0.1'
Show Demo: Creating and Running a Pod
● You will get url along with the token as one of the outputs similar to:
○ http://gpu-pod-example:8888/?token=990f183e63868fd0b20fc71c729620d965b83bc32aaae40
9
● Open another terminal
● Setup port-forwarding to access the pod from your local machine (HPC)
○ ~/.local/bin/kubectl/kubectl port-forward gpu-pod-example 8888:8888 # print session status
● Open the Browser and copy/paste url/token to open the Jupyter notebook
● Open/upload the notebook file “classification.ipnb” (classify images of
clothing) with tensorflow commands and run the commands.
● Log out from the container and sessions, and delete the pod
○ ~/.local/bin/kubectl/kubectl delete pod gpu-pod-example # podname-> gpu-pod-example
Exercises
● Reading
○ Introduction to Singularity: https://sylabs.io/guides/3.6/user-guide/introduction.html
● Install the python module “opencv-python” in “tensorflow-gpu” sandbox that
was created in the class. Check if there are any dependency packages to be
installed (check the message in the console) before installing opencv-python
and install them.
○ Check if opencv-python is installed.
○ Create image out of the sandbox
○ Transfer the image to HPC at /scratch/pbsjobs/<caseID>/singularity
○ Test the image in HPC using a slurm batch job.

You might also like