Environment Setup
Environment Setup
I-Ching Tseng
[email protected]
[email protected]
National Taiwan University
March 2022
Outline
q Overview
q Package Management Tools
q GPU
q Docker
q Conclusion
2
Overview
q To run a machine learning (ML) model
Ø You have to set up an environment first
Ø Using virtualization or package management tools is a good practice
• You can migrate the code and reproduce the result easily
• Different applications will not affect each other
• If your environment is broken, just create a new environment
q In this tutorial
Ø We will provide some guidelines for setting up environment
Ø We will help you understand the environment
• The software stack
• NVIDIA GPUs
3
Outline
q Overview
q Package Management Tools
Ø Prerequisites
Ø Conda
Ø Pipenv
Ø Summary
q GPU
q Docker
q Conclusion
4
Prerequisites
q Package management tools
Ø Help you to manage to environment
Ø Do not manage the GPU driver
q To utilize GPUs, make sure the GPU driver is intalled
Application
Conda/Pipenv
PyTorch
NVIDIA Driver
Software
Hardware
NVIDIA GPU
5
Conda
q Conda
Ø An open source package and environment management system
Ø Supports Windows, MacOS, and Linux
6
Quick Start - Anaconda
Steps Linux Command
Create an environment
(You can replace test_env with conda create -n test_env
your desired environment name)
7
Pipenv
q Pipenv
Ø A tool that creates and manages a virtualenv
8
Quick Start - Pipenv
q To know more about Pipenv, please check the document
Steps Linux Command
9
Summary
q To utilize GPU, you must install driver on your host machine
q You can stop here if you just want to finish the homework
10
Outline
q Overview
q Package Management Tools
q GPU
Ø NVIDIA GPUs
Ø Software Stack
Ø NVIDIA Driver
Ø CUDA
q Docker
q Conclusion
11
NVIDIA GPUs
q General Purpose Graphics Processing Units (GPGPU)
Ø GPUs are originally designed for computer graphic applications
Ø GPU is good at parallelizing "simple and repetitive" computations
• E.g., matrix multiplication
Ø There are massive matrix multiplication computations in ML models
• We use GPU to accelerate ML model training
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html 12
Software Stack
Application
Translation Image Classification Regression
BLAS Libraries
OpenBLAS MKL2019 cuDNN/cuBLAS
NVIDIA Driver
Hardware
CPU FPGA GPU
13
NVIDIA Driver
q NVIDIA driver
Ø The software that allows operating systems (OS) to communicate with
GPUs
Ø Includes kernel modules
Frameworks
cuDNN Conv. Layer
cuDNN/cuBLAS
BLAS Lib
cuDNN/cuBLAS CUDA Runtime API
User space
NVIDIA Driver CUDA Driver Kernel space
Hardware
GPU
14
CUDA
q Compute Unified Device Architecture (CUDA)
Ø "A parallel computing platform and application programming interface
that allows software to use NVIDIA GPUs" [Wikipedia]
q CUDA Runtime API vs. CUDA Driver API
Ø The driver CUDA version must ≥ the runtime CUDA version
Ø Check the driver CUDA version
15
Outline
q Overview
q Package Management Tools
q GPU
q Docker
Ø Virtualization
Ø Why using Container?
Ø Contanerization with Docker
Ø Pulling Docker Images
Ø NVIDIA Docker
q Conclusion
16
Virtualization
q Virtual machine (VM) and container
https://www.docker.com/resources/what-container 17
Why using Container?
q Containers can virtualize more complex environments
Ø Even if you "only want to train models"
• You may use other frameworks that do not ship with CUDA and cuDNN
• You may need NCCL to perform efficient parallel and distributed training
• You may need to run an old version PyTorch, but the default CUDA version is
too old to communicate with the latest powerful GPU
18
Containerization with Docker
q Docker
Ø A platform for you to build and run with containers
Ø Docker installation
• Docker Desktop (for Mac and Windows) runs a VM
q Docker image
Ø A set of instructions for creating a Docker container
q Steps of setting up environment with Docker
Ø Install Docker
• One-time effort
Ø Build/pull an image
• There are lots of built images
Ø Run the container
Ø Run your application
19
Pulling Docker Images
q Docker Hub
Ø A place for finding and sharing Docker images
• E.g., Docker Hub repository of PyTorch
q Check the Docker Hub and find the image tag
Ø 1.9.1-cuda11.1-cudnn8-devel vs. 1.9.1-cuda11.1-cudnn8-runtime?
20
NVIDIA Docker (1/2)
q Using GPUs in Docker container makes container less portable
Ø Containers work in user space
• Root privilege only means you can use some privileged system calls
Ø Using NVIDIA GPUs requires kernel modules and user-level libraries
• The CUDA version of the driver user-space modules must be exactly the same
as the CUDA version of the driver kernel modules
• The runtime CUDA version can be smaller than the driver CUDA version
Ø The host driver must exactly match the version of the driver installed in
the container
https://github.com/NVIDIA/nvidia-docker 21
NVIDIA Docker (2/2)
q Steps
Ø Install the latest NVIDIA driver
• One-time effort
Ø Install NVIDIA Docker
• One-time effort
Ø Build/pull an image
Ø Run the container
Ø Run your application
22
Outline
q Overview
q Package Management Tools
q GPU
q Docker
q Conclusion
23
Conclusion
q Whether or not you virtualize your environment
Ø You must install the NVIDIA driver on the host to utilize NVIDIA GPUs
Ø The runtime CUDA version must be less than or equal to the driver
CUDA version
24
Q&A
Thank You!
25