0% found this document useful (0 votes)

7 views5 pages

Data Parallelism

Uploaded by

gaoxiang0411

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views5 pages

Data Parallelism

Uploaded by

gaoxiang0411

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Title: Distributed Deep Learning: Data Parallelism vs.

Model Parallelism, Saving, and

Serving

Table of Contents:

1. Introduction
2. Data Parallelism
3. Model Parallelism
4. Code Example of Model Parallelism in PyTorch
5. Saving and Serving a Model-Trained with Model Parallelism
○ Saving the Model
○ Serving for Online Inference
○ Inference on Multiple vs. Single Devices
6. Conclusion

1. Introduction

In distributed deep learning, there are two primary strategies for scaling training across multiple
devices (e.g., GPUs): Data Parallelism and Model Parallelism. Understanding these
strategies is crucial for efficiently training large models or large datasets.

2. Data Parallelism

Definition: Each device (GPU) holds a full copy of the model. The dataset is split into batches
that are distributed across devices. Each GPU processes a separate batch, computes
gradients, and the gradients are then aggregated to update the model weights.

Pros:

● Straightforward to implement.
● Scales well with large datasets.

Cons:

● Requires that the full model fits on a single device.

● Communication overhead when synchronizing gradients.

Data parallelism is best when the model comfortably fits into a single GPU’s memory, and you
have a large amount of data.

3. Model Parallelism
Definition: The model is split across multiple devices. Each device holds only a part of the
model. During the forward pass, intermediate outputs are passed between devices.

Pros:

● Enables training of very large models that cannot fit into a single GPU’s memory.

Cons:

● More complex to implement than data parallelism.

● Requires inter-device communication of intermediate activations, which can increase
overhead.

Model parallelism is ideal when model size is the bottleneck rather than dataset size.

4. Code Example of Model Parallelism in PyTorch

Note: This is a simplified example assuming two GPUs, GPU 0 and GPU 1. The model’s first
half runs on GPU 0 and the second half on GPU 1.

python
Copy code
import torch
import torch.nn as nn
import torch.optim as optim

# Device setup
device0 = torch.device("cuda:0" if torch.cuda.is_available() else
"cpu")
device1 = torch.device("cuda:1" if (torch.cuda.is_available() and
torch.cuda.device_count() > 1) else "cpu")

class ModelParallelNN(nn.Module):
def __init__(self):
super(ModelParallelNN, self).__init__()
# Part of model on GPU 0
self.fc1 = nn.Linear(1024, 512).to(device0)
self.relu = nn.ReLU()

# Part of model on GPU 1

self.fc2 = nn.Linear(512, 256).to(device1)
self.fc3 = nn.Linear(256, 10).to(device1)
def forward(self, x):
x = x.to(device0)
x = self.fc1(x)
x = self.relu(x)

# Move activations to GPU 1

x = x.to(device1)
x = self.fc2(x)
x = self.relu(x)
x = self.fc3(x)
return x

# Instantiate and train

model = ModelParallelNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Dummy data
data = torch.randn(64, 1024) # 64 examples, 1024 features each
labels = torch.randint(0, 10, (64,)).to(device1)

for epoch in range(10):

optimizer.zero_grad()
outputs = model(data)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

5. Saving and Serving a Model-Trained with Model Parallelism

Saving the Model

Saving works similarly to standard PyTorch models. The state_dict includes all parameters
from all devices.

python
Copy code
torch.save(model.state_dict(), 'model_parallel.pth')

Loading:

● To the original devices:

python
Copy code
model = ModelParallelNN()
model.load_state_dict(torch.load('model_parallel.pth'))
# Ensure parts of model are on correct devices if re-instantiated
model.fc1.to(device0)
model.fc2.to(device1)
model.fc3.to(device1)

● To a single device (e.g., CPU or GPU 0):

python
Copy code
device = torch.device('cuda:0' if torch.cuda.is_available() else
'cpu')
model = ModelParallelNN()
model.load_state_dict(torch.load('model_parallel.pth',
map_location=device))
model.to(device)

Serving for Online Inference

Inference on a Single Device:

If the model fits into one GPU or CPU, it’s simpler to run inference on a single device. This
avoids the complexity of multi-device communication.

python
Copy code
def infer(input_data):
input_data = input_data.to(device)
with torch.no_grad():
output = model(input_data)
return output
Inference on Multiple Devices:
If the model is too large to fit on one device, you can perform inference similarly to the training
forward pass, with parts of the model on different GPUs.

python
Copy code
def infer_parallel(input_data):
input_data = input_data.to(device0)
with torch.no_grad():
output = model(input_data)
return output

In Practice:

● If possible, consolidate the model onto one device for inference to reduce complexity
and overhead.
● Use frameworks like TorchServe or NVIDIA Triton to handle multi-GPU deployment and
scaling.
● Convert models to ONNX and use efficient inference engines if needed.

6. Conclusion

● Data Parallelism is straightforward when the model fits on a single device and involves
replicating the model across multiple devices to process different parts of the dataset.
● Model Parallelism is used when the model is too large for a single device, splitting it
across multiple devices.
● When serving models for online inference, consider consolidating onto a single device if
feasible. If the model is too large, maintain model parallelism for inference.
● Saving and loading model-parallel-trained models involves saving the state_dict and
carefully loading it onto the appropriate devices.

Accenture Informatica Interview Question Answers
100% (2)
Accenture Informatica Interview Question Answers
3 pages
Parallelism Strategies in Machine Learning, Get The Free Cheat Sheet - 2
No ratings yet
Parallelism Strategies in Machine Learning, Get The Free Cheat Sheet - 2
32 pages
Py Torch
No ratings yet
Py Torch
786 pages
Pytorch Performance Tuning Guide: Szymon Migacz, 04/12/2021
No ratings yet
Pytorch Performance Tuning Guide: Szymon Migacz, 04/12/2021
20 pages
Getting Started With Distributed Data Parallel - PyTorch Tutorials 2.4.0+cu124 Documentation
No ratings yet
Getting Started With Distributed Data Parallel - PyTorch Tutorials 2.4.0+cu124 Documentation
4 pages
Pytorch
No ratings yet
Pytorch
4 pages
Pytorch Distributed: Experiences On Accelerating Data Parallel Training
No ratings yet
Pytorch Distributed: Experiences On Accelerating Data Parallel Training
14 pages
PyTorch Guide With Code
No ratings yet
PyTorch Guide With Code
4 pages
Pytorch Distributed: Experiences On Accelerating Data Parallel Training
No ratings yet
Pytorch Distributed: Experiences On Accelerating Data Parallel Training
14 pages
Tutorials Sources Beginner Ptcheat
No ratings yet
Tutorials Sources Beginner Ptcheat
7 pages
Dynamic Space Time Scheduling For GPU in
No ratings yet
Dynamic Space Time Scheduling For GPU in
8 pages
A Hybrid Parallelization Approach For Distributed and Scalable Deep Learning
No ratings yet
A Hybrid Parallelization Approach For Distributed and Scalable Deep Learning
12 pages
DL 1 - ComputerVision With PyTorch Notes
No ratings yet
DL 1 - ComputerVision With PyTorch Notes
304 pages
Pytorch FSDP: Experiences On Scaling Fully Sharded Data Parallel
No ratings yet
Pytorch FSDP: Experiences On Scaling Fully Sharded Data Parallel
13 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
Image Recognitiion
No ratings yet
Image Recognitiion
50 pages
PyTorch Crash Course 1713016363
No ratings yet
PyTorch Crash Course 1713016363
15 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
57 pages
Pytorch
No ratings yet
Pytorch
38 pages
CS236 Introduction To PyTorch
100% (4)
CS236 Introduction To PyTorch
33 pages
Https Res - Cloudinary.com Dralpqhoq Raw Upload v1730208042 Eswrzzqljcbjpkcswpgw
No ratings yet
Https Res - Cloudinary.com Dralpqhoq Raw Upload v1730208042 Eswrzzqljcbjpkcswpgw
8 pages
Pytorch Demo 1749471354
No ratings yet
Pytorch Demo 1749471354
10 pages
Deep Learning With PyTorch Guide For Beginners and Intermediate
100% (7)
Deep Learning With PyTorch Guide For Beginners and Intermediate
120 pages
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
No ratings yet
PyTorch 1 - 0 - Bringing Research and Production Together Presentation
108 pages
Cs336 Spring2024 Assignment2 Systems
No ratings yet
Cs336 Spring2024 Assignment2 Systems
30 pages
Module02 PyTorch
No ratings yet
Module02 PyTorch
36 pages
Programing Assignment 2
No ratings yet
Programing Assignment 2
3 pages
PyTorch Cheat Sheet
No ratings yet
PyTorch Cheat Sheet
2 pages
Deep Learning Lab: How To Train Your First Neural Network
No ratings yet
Deep Learning Lab: How To Train Your First Neural Network
68 pages
Pytorch Tutorial PDF
No ratings yet
Pytorch Tutorial PDF
27 pages
Pytorch 101: Deep Learning PHD Course 2017/2018
No ratings yet
Pytorch 101: Deep Learning PHD Course 2017/2018
19 pages
یادگیری پایتورچ
No ratings yet
یادگیری پایتورچ
30 pages
DeepLearning Pytorch 522H0134 NguyenNhatHuy 522H0150 PhamHuynhTin
No ratings yet
DeepLearning Pytorch 522H0134 NguyenNhatHuy 522H0150 PhamHuynhTin
54 pages
Introduction To PyTorch
No ratings yet
Introduction To PyTorch
9 pages
Introduction To PyTorch
No ratings yet
Introduction To PyTorch
9 pages
PyTorch Cheat Sheet & Quick Reference
No ratings yet
PyTorch Cheat Sheet & Quick Reference
6 pages
C2 W4ok
No ratings yet
C2 W4ok
94 pages
PyTorch Made Easy A Quick Overview
No ratings yet
PyTorch Made Easy A Quick Overview
55 pages
Day 45 PyTorch Presentation
No ratings yet
Day 45 PyTorch Presentation
67 pages
PyTorch CrashCourse
No ratings yet
PyTorch CrashCourse
16 pages
Lab 6
No ratings yet
Lab 6
29 pages
PyTorch - A Comprehensive Overview
No ratings yet
PyTorch - A Comprehensive Overview
7 pages
Chapter 12 (Ditributing TensorFlow) Fa21-Bse-036
No ratings yet
Chapter 12 (Ditributing TensorFlow) Fa21-Bse-036
10 pages
Getting Started With Distributed Machine Learning With PyTorch and Ray - by PyTorch - PyTorch - Medium
No ratings yet
Getting Started With Distributed Machine Learning With PyTorch and Ray - by PyTorch - PyTorch - Medium
11 pages
Py Torch
No ratings yet
Py Torch
19 pages
PDL Final Assignment-3 Aryan
No ratings yet
PDL Final Assignment-3 Aryan
8 pages
Lec 3
No ratings yet
Lec 3
30 pages
DL Pytorch
No ratings yet
DL Pytorch
8 pages
Experimental Pix2pix
No ratings yet
Experimental Pix2pix
5 pages
FAANG-Level Transformer Interview Questions and Answers
No ratings yet
FAANG-Level Transformer Interview Questions and Answers
3 pages
Stars 4 0 0 0 + Forks 7 0 0 + License MIT
No ratings yet
Stars 4 0 0 0 + Forks 7 0 0 + License MIT
19 pages
PyTorch PDF
No ratings yet
PyTorch PDF
72 pages
Deep Learning With Multiple GPUs
No ratings yet
Deep Learning With Multiple GPUs
5 pages
Twinpilots
No ratings yet
Twinpilots
7 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
10 pages
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
From Everand
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
Matthew Rosch
No ratings yet
Learning PyTorch 2.0, Second Edition
From Everand
Learning PyTorch 2.0, Second Edition
Matthew Rosch
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
LPIC-1 Primer
From Everand
LPIC-1 Primer
John Greene
4.5/5 (3)
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Computer Science, Career and Job
From Everand
Computer Science, Career and Job
Ramkrishna Ghosh
No ratings yet
Object-Oriented Programming Assignment One
No ratings yet
Object-Oriented Programming Assignment One
4 pages
Ankesh Raj Java
No ratings yet
Ankesh Raj Java
1 page
MapReduce Architecture
No ratings yet
MapReduce Architecture
3 pages
Framer University - Code Development Prompt
No ratings yet
Framer University - Code Development Prompt
46 pages
Java Lab Manual
No ratings yet
Java Lab Manual
52 pages
Selection Statements
No ratings yet
Selection Statements
12 pages
Synopsis For Smart Surveillance System Using Computer Vision
0% (1)
Synopsis For Smart Surveillance System Using Computer Vision
4 pages
Unit 1
No ratings yet
Unit 1
86 pages
W3Schools Python Quiz
No ratings yet
W3Schools Python Quiz
1 page
Field Group:-Qualifier Section Heading
No ratings yet
Field Group:-Qualifier Section Heading
22 pages
Drone Code
No ratings yet
Drone Code
2 pages
Algoritma Penjadwalan Proses
No ratings yet
Algoritma Penjadwalan Proses
22 pages
Infy TQ Python Assignment-4
No ratings yet
Infy TQ Python Assignment-4
4 pages
Introduction To Object Oriented Programming: by Megha V Gupta, NHITM
No ratings yet
Introduction To Object Oriented Programming: by Megha V Gupta, NHITM
34 pages
Development of MIS - Implementation
100% (2)
Development of MIS - Implementation
51 pages
Assignment1 OOSD2
No ratings yet
Assignment1 OOSD2
4 pages
RMIT
No ratings yet
RMIT
20 pages
Time Table Generator 1
50% (2)
Time Table Generator 1
66 pages
TM112-Spring-TMA-2025 2
No ratings yet
TM112-Spring-TMA-2025 2
4 pages
Unit - 1 Part 2
No ratings yet
Unit - 1 Part 2
14 pages
Rahul Thakur
No ratings yet
Rahul Thakur
1 page
20T127 Mini Project
No ratings yet
20T127 Mini Project
11 pages
Troubleshooting Java Programs With Dtrace: Arieh Markel Sun Microsystems
No ratings yet
Troubleshooting Java Programs With Dtrace: Arieh Markel Sun Microsystems
50 pages
Beckhoff Main Catalog 2018-1-08 Appendix
No ratings yet
Beckhoff Main Catalog 2018-1-08 Appendix
29 pages
Adobe Illustrator CS4 Porting Guide
No ratings yet
Adobe Illustrator CS4 Porting Guide
30 pages
Python Notes and Questions For Interviews
No ratings yet
Python Notes and Questions For Interviews
3 pages
Haskell Bookcamp (MEAP V08) Philipp Hagenlocher - The Ebook Is Ready For Instant Download and Access
100% (6)
Haskell Bookcamp (MEAP V08) Philipp Hagenlocher - The Ebook Is Ready For Instant Download and Access
70 pages
Practical No 1,2.OOP
No ratings yet
Practical No 1,2.OOP
14 pages
The Open University of Sri Lanka Diploma in Technology - Level 3 ACADEMIC YEAR 2007/2008 Mek3170 C Programming Assignment #3
No ratings yet
The Open University of Sri Lanka Diploma in Technology - Level 3 ACADEMIC YEAR 2007/2008 Mek3170 C Programming Assignment #3
8 pages

Data Parallelism

Uploaded by

Data Parallelism

Uploaded by

Title: Distributed Deep Learning: Data Parallelism vs.

Model Parallelism, Saving, and

● Requires that the full model fits on a single device.

● More complex to implement than data parallelism.

4. Code Example of Model Parallelism in PyTorch

# Part of model on GPU 1

# Move activations to GPU 1

# Instantiate and train

for epoch in range(10):

5. Saving and Serving a Model-Trained with Model Parallelism

Saving the Model

● To the original devices:

● To a single device (e.g., CPU or GPU 0):

Serving for Online Inference

Inference on a Single Device:

You might also like