0% found this document useful (0 votes)
19 views25 pages

Environment Setup

Uploaded by

Da HUANG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views25 pages

Environment Setup

Uploaded by

Da HUANG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Machine Learning

[Tutorial: Environment Setup]

I-Ching Tseng
[email protected]
[email protected]
National Taiwan University
March 2022
Outline
q Overview
q Package Management Tools
q GPU
q Docker
q Conclusion

2
Overview
q To run a machine learning (ML) model
Ø You have to set up an environment first
Ø Using virtualization or package management tools is a good practice
• You can migrate the code and reproduce the result easily
• Different applications will not affect each other
• If your environment is broken, just create a new environment

q In this tutorial
Ø We will provide some guidelines for setting up environment
Ø We will help you understand the environment
• The software stack
• NVIDIA GPUs

3
Outline
q Overview
q Package Management Tools
Ø Prerequisites
Ø Conda
Ø Pipenv
Ø Summary
q GPU
q Docker
q Conclusion

4
Prerequisites
q Package management tools
Ø Help you to manage to environment
Ø Do not manage the GPU driver
q To utilize GPUs, make sure the GPU driver is intalled
Application

Conda/Pipenv

PyTorch

NVIDIA Driver
Software
Hardware
NVIDIA GPU
5
Conda
q Conda
Ø An open source package and environment management system
Ø Supports Windows, MacOS, and Linux

q We take Anaconda as an example

6
Quick Start - Anaconda
Steps Linux Command

Install Anaconda with the installer bash Anaconda3-2021.11-Linux-x86_64.sh


(Check the document for details)

Create an environment
(You can replace test_env with conda create -n test_env
your desired environment name)

Install packages conda install -n test_env pytorch torchvision


(You can find the command in torchaudio cudatoolkit=11.3 -c pytorch
the PyTorch official website)

Activate the environment conda activate test_env

Run your application python ml.py

Leave the environment conda deactivate

7
Pipenv
q Pipenv
Ø A tool that creates and manages a virtualenv

8
Quick Start - Pipenv
q To know more about Pipenv, please check the document
Steps Linux Command

Install Pipenv with pip3 pip3 install pipenv

pipenv install numpy torchvision torch --index


Install packages https://download.pytorch.org/whl/cu113

Activate the environment pipenv shell

Run your application python ml.py

Leave the environment Ctrl + D

9
Summary
q To utilize GPU, you must install driver on your host machine

q Using Conda or Pipenv to build environments is recommended


Ø Portable
Ø Reproducible
Ø Applications do not affect each other

q You can stop here if you just want to finish the homework

q Why is PyTorch so convenient?


Ø "We ship with everything in-built (PyTorch binaries include CUDA,
CuDNN, NCCL, MKL, etc.)." [Reference]

10
Outline
q Overview
q Package Management Tools
q GPU
Ø NVIDIA GPUs
Ø Software Stack
Ø NVIDIA Driver
Ø CUDA
q Docker
q Conclusion

11
NVIDIA GPUs
q General Purpose Graphics Processing Units (GPGPU)
Ø GPUs are originally designed for computer graphic applications
Ø GPU is good at parallelizing "simple and repetitive" computations
• E.g., matrix multiplication
Ø There are massive matrix multiplication computations in ML models
• We use GPU to accelerate ML model training

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html 12
Software Stack
Application
Translation Image Classification Regression

Frameworks (Caffe, Tensorflow, PyTorch, etc.)


Generic Convolution Layer cuDNN Optimized Conv. Layer

BLAS Libraries
OpenBLAS MKL2019 cuDNN/cuBLAS

NVIDIA Driver

Hardware
CPU FPGA GPU

13
NVIDIA Driver
q NVIDIA driver
Ø The software that allows operating systems (OS) to communicate with
GPUs
Ø Includes kernel modules

Frameworks
cuDNN Conv. Layer
cuDNN/cuBLAS
BLAS Lib
cuDNN/cuBLAS CUDA Runtime API

User space
NVIDIA Driver CUDA Driver Kernel space

Hardware
GPU
14
CUDA
q Compute Unified Device Architecture (CUDA)
Ø "A parallel computing platform and application programming interface
that allows software to use NVIDIA GPUs" [Wikipedia]
q CUDA Runtime API vs. CUDA Driver API
Ø The driver CUDA version must ≥ the runtime CUDA version
Ø Check the driver CUDA version

Ø When we "install CUDA"


• We usually refer to CUDA runtime
• You should check the framework compatibility
• The version should not be greater than the driver CUDA version
• You should choose the runtime CUDA version carefully

15
Outline
q Overview
q Package Management Tools
q GPU
q Docker
Ø Virtualization
Ø Why using Container?
Ø Contanerization with Docker
Ø Pulling Docker Images
Ø NVIDIA Docker
q Conclusion

16
Virtualization
q Virtual machine (VM) and container

q You only have to know that


Ø Containers only virtualize software layers above the OS level
• It is a good choice if we only focus on specific hardware (e.g., NVIDIA GPUs)
Ø Containers are relatively lightweight

https://www.docker.com/resources/what-container 17
Why using Container?
q Containers can virtualize more complex environments
Ø Even if you "only want to train models"
• You may use other frameworks that do not ship with CUDA and cuDNN
• You may need NCCL to perform efficient parallel and distributed training
• You may need to run an old version PyTorch, but the default CUDA version is
too old to communicate with the latest powerful GPU

q Slurm and Kubernetes are popular server management tools


in both academia and industry
Ø Slurm supports singularity container
Ø Kubernetes runs application in Docker containers

18
Containerization with Docker
q Docker
Ø A platform for you to build and run with containers
Ø Docker installation
• Docker Desktop (for Mac and Windows) runs a VM
q Docker image
Ø A set of instructions for creating a Docker container
q Steps of setting up environment with Docker
Ø Install Docker
• One-time effort
Ø Build/pull an image
• There are lots of built images
Ø Run the container
Ø Run your application

19
Pulling Docker Images
q Docker Hub
Ø A place for finding and sharing Docker images
• E.g., Docker Hub repository of PyTorch
q Check the Docker Hub and find the image tag
Ø 1.9.1-cuda11.1-cudnn8-devel vs. 1.9.1-cuda11.1-cudnn8-runtime?

Ø Run "docker pull <image_tag>"

20
NVIDIA Docker (1/2)
q Using GPUs in Docker container makes container less portable
Ø Containers work in user space
• Root privilege only means you can use some privileged system calls
Ø Using NVIDIA GPUs requires kernel modules and user-level libraries
• The CUDA version of the driver user-space modules must be exactly the same
as the CUDA version of the driver kernel modules
• The runtime CUDA version can be smaller than the driver CUDA version
Ø The host driver must exactly match the version of the driver installed in
the container

q We should use NVIDIA Docker


Ø Install NVIDIA Docker
Ø You do not have to install the NVIDIA driver in the container

https://github.com/NVIDIA/nvidia-docker 21
NVIDIA Docker (2/2)
q Steps
Ø Install the latest NVIDIA driver
• One-time effort
Ø Install NVIDIA Docker
• One-time effort
Ø Build/pull an image
Ø Run the container
Ø Run your application

22
Outline
q Overview
q Package Management Tools
q GPU
q Docker
q Conclusion

23
Conclusion
q Whether or not you virtualize your environment
Ø You must install the NVIDIA driver on the host to utilize NVIDIA GPUs
Ø The runtime CUDA version must be less than or equal to the driver
CUDA version

q If you want to use NVIDIA GPUs in containers


Ø Using NVIDIA Docker makes your life easier
• You do not need to install NVIDIA drivers in containers
• Containers are more portable
Ø You only have to pull the built Docker image from Docker Hub
• You do not have to set up CUDA, cuDNN, and frameworks yourself
• This is useful especially when the environment is complex

24
Q&A

Thank You!

25

You might also like