Programing Assignment 2

The assignment focuses on the distributed fine-tuning of the LLaMA 3B model using 2 GPUs, emphasizing data, tensor, and pipeline parallelism techniques. Deliverables include a report on the training methods and performance, along with code access on HPC. Evaluation will be based on training efficiency and the perplexity metric computed on a subset of the dataset.

Uploaded by

daiveekiitkgp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views3 pages

Programing Assignment 2

Uploaded by

daiveekiitkgp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Programming Assignment 2 -

Distributed LLM Fine Tuning

Assignment Overview
Goal: Distributed Fine-Tuning of LLaMA on 2 GPUs

Dataset
Same dataset as Programming Assignment 1.

Pretrained Model
● LLaMA 3B model
● On the cloud burst compute file system: /scratch/BDML25SP/

Key Focus
We will be focusing on distributed training
● Data Parallelism
● Tensor Parallelism
● Pipeline Parallelism

The goal of the assignment is to implement these techniques on 2 GPUs and achieve high
training efficiency (time per epoch).

Deliverables
1. A report documenting:
a. Distributed training techniques used.
b. Training performance (time per epoch) and evaluation results.
c. Step by step guide on how to run the training code.

2. Code access on HPC

Evaluation
Compute the perplexity metric on the remaining 10% of the dataset. The assignment will be
evaluated primarily on the basis of how time efficient the fine-tuning code is, and the final
perplexity score will not hold as much weight.
Data Parallelism
Data parallelism involves replicating the model on both GPUs and splitting the training data
across them. We have covered this paradigm in class in the paper Pytorch Distributed
([Link]

Example code:

Tensor Parallelism
Splits weight matrices of large layers (like Transformer blocks) across multiple GPUs. Each
GPU holds only part of the model's layers. We covered this in the Tofu paper
([Link]

Example code:

Pipeline Parallelism
Pipeline parallelism assigns different layers of the model to different GPUs and processes
micro-batches sequentially. We have covered two systems of this type in GPipe
([Link] and PipeDream ([Link]
Example Code:

Evaluation
Measure time per epoch as the primary metric. Other parts are the same as in Programming
Assignment 1.

Tutorials Sources Beginner Ptcheat
No ratings yet
Tutorials Sources Beginner Ptcheat
7 pages
Cs336 Spring2024 Assignment2 Systems
No ratings yet
Cs336 Spring2024 Assignment2 Systems
30 pages
Data Parallelism
No ratings yet
Data Parallelism
5 pages
PyTorch Cheat Sheet & Quick Reference
No ratings yet
PyTorch Cheat Sheet & Quick Reference
6 pages
Distributed ML Report
No ratings yet
Distributed ML Report
3 pages
Task 2
No ratings yet
Task 2
1 page
DL 1 - ComputerVision With PyTorch Notes
No ratings yet
DL 1 - ComputerVision With PyTorch Notes
304 pages
Experimental Pix2pix
No ratings yet
Experimental Pix2pix
5 pages
PyTorch Overview and Applications Guide
100% (4)
PyTorch Overview and Applications Guide
33 pages
Deep Learning Model Management Guide
No ratings yet
Deep Learning Model Management Guide
8 pages
Accelerating Model Training with PyTorch
No ratings yet
Accelerating Model Training with PyTorch
32 pages
Centralized LLM Fine-Tuning
No ratings yet
Centralized LLM Fine-Tuning
4 pages
Py Torch
No ratings yet
Py Torch
786 pages
Pytorch Performance Tuning Guide: Szymon Migacz, 04/12/2021
No ratings yet
Pytorch Performance Tuning Guide: Szymon Migacz, 04/12/2021
20 pages
2021 08 26 High Performance GPU Tensor CoreCode Generation For Matmul Using MLIR
No ratings yet
2021 08 26 High Performance GPU Tensor CoreCode Generation For Matmul Using MLIR
57 pages
Lecture 06 NN - Framework
No ratings yet
Lecture 06 NN - Framework
5 pages
PyCUDA AH PDF
No ratings yet
PyCUDA AH PDF
16 pages
Multi-GPU Deep Learning Strategies
No ratings yet
Multi-GPU Deep Learning Strategies
5 pages
Distributed Data Parallel in PyTorch
No ratings yet
Distributed Data Parallel in PyTorch
4 pages
Lab 6
No ratings yet
Lab 6
29 pages
Tensorflow Internal
No ratings yet
Tensorflow Internal
17 pages
LLM Powered C To Cuda 9th April
No ratings yet
LLM Powered C To Cuda 9th April
49 pages
Pytorch Distributed: Experiences On Accelerating Data Parallel Training
No ratings yet
Pytorch Distributed: Experiences On Accelerating Data Parallel Training
14 pages
NB4-06 PT I Using CNN
No ratings yet
NB4-06 PT I Using CNN
21 pages
Keras
No ratings yet
Keras
4 pages
Week 13 GCP Lec Notes
No ratings yet
Week 13 GCP Lec Notes
28 pages
3.implementation of Neural Networks
No ratings yet
3.implementation of Neural Networks
2 pages
Unlocking LLM Performance With Ebpf Optimizing Training and Inference Pipelines Chuan Hui Ebpfji Xi Llmxia Daep Xiao Zhen Relia Fa Qiu Yang Xiang Yunshan Networks Inc 1
No ratings yet
Unlocking LLM Performance With Ebpf Optimizing Training and Inference Pipelines Chuan Hui Ebpfji Xi Llmxia Daep Xiao Zhen Relia Fa Qiu Yang Xiang Yunshan Networks Inc 1
37 pages
TwinPilots: Optimizing CPU-GPU LLM Inference
No ratings yet
TwinPilots: Optimizing CPU-GPU LLM Inference
7 pages
PyTorch for Deep Learning Beginners
No ratings yet
PyTorch for Deep Learning Beginners
31 pages
PyTorch Tensor and Autograd Guide
No ratings yet
PyTorch Tensor and Autograd Guide
15 pages
Final Homework Assignment
No ratings yet
Final Homework Assignment
3 pages
TensorFlow Debugging Guide
No ratings yet
TensorFlow Debugging Guide
28 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
Cuda 9 and Beyond
100% (1)
Cuda 9 and Beyond
45 pages
TensorFlow for Developers
No ratings yet
TensorFlow for Developers
75 pages
HPC 5
No ratings yet
HPC 5
2 pages
Getting Started With MLOPs 21 Page Tutorial
No ratings yet
Getting Started With MLOPs 21 Page Tutorial
21 pages
Distributed Fine-Tuning with Transformers on Databricks
No ratings yet
Distributed Fine-Tuning with Transformers on Databricks
7 pages
Arxiv - 20210823 - Deepak Narayanan - Efficient Large-Scale Language Model Training On GPU Clusters Using Megatron-LM
No ratings yet
Arxiv - 20210823 - Deepak Narayanan - Efficient Large-Scale Language Model Training On GPU Clusters Using Megatron-LM
13 pages
Transformers Torch
No ratings yet
Transformers Torch
38 pages
GPT 2 - Learninhg 4
0% (2)
GPT 2 - Learninhg 4
2 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Building GPT-2 from Scratch in PyTorch
No ratings yet
Building GPT-2 from Scratch in PyTorch
13 pages
EIS ISA Complete
No ratings yet
EIS ISA Complete
49 pages
Lab 9
No ratings yet
Lab 9
29 pages
Lab Manual
No ratings yet
Lab Manual
11 pages
ISPR 26 Pytorch
No ratings yet
ISPR 26 Pytorch
35 pages
MLOps With Agentic AI Curriculum
No ratings yet
MLOps With Agentic AI Curriculum
33 pages
Song Sosp24
No ratings yet
Song Sosp24
17 pages
PyTorch Guide With Code
No ratings yet
PyTorch Guide With Code
4 pages
Torchtitan: One-Stop Pytorch Native Solution For Production Ready LLM Pretraining
No ratings yet
Torchtitan: One-Stop Pytorch Native Solution For Production Ready LLM Pretraining
21 pages
S06 DNN Tensorflow PyTorch Wip
No ratings yet
S06 DNN Tensorflow PyTorch Wip
24 pages
09 Tensorflow101 Slide
No ratings yet
09 Tensorflow101 Slide
78 pages
hw1 2487155975100812
No ratings yet
hw1 2487155975100812
6 pages
DLCV Ass 2
No ratings yet
DLCV Ass 2
4 pages
Deep Learning Lab: How To Train Your First Neural Network
No ratings yet
Deep Learning Lab: How To Train Your First Neural Network
68 pages

Programing Assignment 2

Uploaded by

Programing Assignment 2

Uploaded by

Programming Assignment 2 -

Distributed LLM Fine Tuning

2.​ Code access on HPC

You might also like

2. Code access on HPC