GPT 2 - Learninhg 4

Uploaded by

sid_hyd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

20 views2 pages

GPT 2 - Learninhg 4

Uploaded by

sid_hyd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

gpt.

md 2024-07-27

return model

Check Device Type

Detect the most suitable device for computation, prioritizing GPU usage if available. Initially, it sets the
device variable to "cpu" as a default. It then checks if CUDA (NVIDIA's parallel computing platform) is
available, and if so, sets the device to "cuda" for leveraging GPU acceleration. If CUDA is not available but
the machine has Apple's Metal Performance Shaders (MPS) support, the device is set to "mps" to utilize
Apple's GPU capabilities. Finally, it prints out the selected device. This approach optimizes performance by
utilizing available hardware acceleration.
Ref: https://developer.apple.com/metal/pytorch/

device = "cpu"
if torch.cuda.is_available():
device = "cuda"
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
# for apple mps
device = "mps"
print(f"using device: {device}...")

Data Loader

DataLoaderLite , is used for handling text data in a format suitable for training a GPT model. Upon
initialization, the class reads a text file from the specified path, encodes the text into tokens using the GPT-
2 tokenizer from the tiktoken library, and stores these tokens as a PyTorch tensor. The size of each batch
is defined by the parameters B (batch size) and T (sequence length). The class also calculates and prints
the total number of tokens loaded and the number of batches per epoch based on the batch size and
sequence length.
The next_batch method retrieves the next batch of data from the tokenized text. It extracts a buffer of
tokens starting from the current position, creating input (x) and target (y) tensors by shifting the buffer by
one token. The method then updates the current position for the next batch. If the next batch exceeds the
length of the token list, the position is reset to the beginning, ensuring continuous looping through the
dataset. This efficient, lightweight data loader facilitates seamless batch processing for training language
models.

class DataLoaderLite:
def __init__(self, B, T):
self.B = B
self.T = T
with open('text', 'r') as f:
text = f.read()
enc = tiktoken.get_encoding('gpt2')
tokens = enc.encode(text)

7 / 11
gpt.md 2024-07-27

self.tokens = torch.tensor(tokens) # (B, T)

print(f"loaded {len(self.tokens)} tokens")
print(f"1 epoch = {len(self.tokens) // (B * T)} batches")
self.current_position = 0

def next_batch(self):
B, T = self.B, self.T
buf = self.tokens[self.current_position:self.current_position + B
* T + 1] # (B, T)
x = buf[:-1].view(B, T)
y = buf[1:].view(B, T)
self.current_position += B * T
if self.current_position + B * T + 1 > len(self.tokens):
self.current_position = 0
return x, y

3. Training Loop
The basic training loop is decribed for a GPT-2 model using a custom data loader. First, it initializes a
DataLoaderLite instance with a batch size ( B) of 4 and a sequence length ( T) of 32, which will supply the
training data. Next, a GPT model is instantiated using the GPTConfig class and moved to the appropriate
computational device (GPU or CPU) using the model.to(device) method.
An AdamW optimizer is set up with the model parameters and a learning rate of 3e-4. The training loop runs
for 2000 iterations, where in each iteration, the next batch of data is retrieved from the data loader and
transferred to the device. The gradients are reset with optimizer.zero_grad(), and a forward pass
through the model computes the logits and loss. The backward pass (loss.backward()) calculates the
gradients, and optimizer.step() updates the model's weights. Finally, the loss for each step is printed,
providing a measure of the model's performance during training. This loop efficiently trains the GPT model
by iteratively processing batches of data, computing gradients, and updating weights.

train_loader = DataLoaderLite(B=4, T=32)

model = GPT(GPTConfig())
model.to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)
for i in range(2000):
x, y = train_loader.next_batch()
x, y = x.to(device), y.to(device)
optimizer.zero_grad()
logits, loss = model(x, y)
loss.backward()
optimizer.step()
print(f"step: {i}, loss: {loss.item()}")

4. Text Generation
The provided code snippet illustrates the process of generating text sequences using a trained GPT-2
model. Initially, the random seed for both CPU and GPU computations is set to ensure reproducibility. The
8 / 11

Natural Language Processing With Pytorch Readthedocs Io en Latest PDF
No ratings yet
Natural Language Processing With Pytorch Readthedocs Io en Latest PDF
35 pages
GPT2 From Scratch in PyTorch
No ratings yet
GPT2 From Scratch in PyTorch
13 pages
Let's Build Our Own GPT Model From Scratch With PyTorch - by Shubh Mishra - Nov, 2024 - Level Up Coding
No ratings yet
Let's Build Our Own GPT Model From Scratch With PyTorch - by Shubh Mishra - Nov, 2024 - Level Up Coding
27 pages
Transformers Torch
No ratings yet
Transformers Torch
38 pages
Code Explanation
No ratings yet
Code Explanation
8 pages
LLM Fine Tune
No ratings yet
LLM Fine Tune
11 pages
Module02 PyTorch
No ratings yet
Module02 PyTorch
36 pages
Script 2
No ratings yet
Script 2
2 pages
Assignment - 11: Title
No ratings yet
Assignment - 11: Title
2 pages
AI Phase2
No ratings yet
AI Phase2
9 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
10 pages
PyTorch Made Easy A Quick Overview
No ratings yet
PyTorch Made Easy A Quick Overview
55 pages
Deep Learning Lab: How To Train Your First Neural Network
No ratings yet
Deep Learning Lab: How To Train Your First Neural Network
68 pages
Pgi20s02j - Lab Record
No ratings yet
Pgi20s02j - Lab Record
24 pages
GPT in 60 Lines of NumPy - Jay Mody
No ratings yet
GPT in 60 Lines of NumPy - Jay Mody
41 pages
GPT 2 - Learninhg 2
No ratings yet
GPT 2 - Learninhg 2
2 pages
RLDL128
No ratings yet
RLDL128
73 pages
GPT 2 - Learninhg 1
No ratings yet
GPT 2 - Learninhg 1
2 pages
09 Tensorflow101 Slide
No ratings yet
09 Tensorflow101 Slide
78 pages
GENAI Assi 3
No ratings yet
GENAI Assi 3
5 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
4 Implementing A GPT Model From Scratch To Generate Text - Build A Large Language Model (From Scratch)
No ratings yet
4 Implementing A GPT Model From Scratch To Generate Text - Build A Large Language Model (From Scratch)
52 pages
Assignment 3 DS5620
No ratings yet
Assignment 3 DS5620
11 pages
cl12 Huggingface
No ratings yet
cl12 Huggingface
34 pages
NN From Scratch
No ratings yet
NN From Scratch
5 pages
2802ICT Programming Assignment 2
No ratings yet
2802ICT Programming Assignment 2
6 pages
How Does A GPT Tool Process Inputs
No ratings yet
How Does A GPT Tool Process Inputs
19 pages
cl13 gpt-2
No ratings yet
cl13 gpt-2
26 pages
cl13 GPT
No ratings yet
cl13 GPT
26 pages
Experiment 10 NLP
No ratings yet
Experiment 10 NLP
5 pages
Pytorch Tutorial: Narges Honarvar Nazari January 30
No ratings yet
Pytorch Tutorial: Narges Honarvar Nazari January 30
29 pages
Pytorch Slides
No ratings yet
Pytorch Slides
31 pages
Deep Learning Lab Practicals
No ratings yet
Deep Learning Lab Practicals
24 pages
Google Aiml
No ratings yet
Google Aiml
50 pages
OpTorch Optimized Deep Learning Architectures For
No ratings yet
OpTorch Optimized Deep Learning Architectures For
7 pages
Pytorch Tutorial 1
No ratings yet
Pytorch Tutorial 1
48 pages
Deep Learning
No ratings yet
Deep Learning
46 pages
Project Documentation
No ratings yet
Project Documentation
24 pages
Pytorch
No ratings yet
Pytorch
38 pages
Chapter 1
No ratings yet
Chapter 1
37 pages
HuggingFace GPT2
No ratings yet
HuggingFace GPT2
43 pages
Pytorch
No ratings yet
Pytorch
4 pages
PyTorch - A Comprehensive Overview
No ratings yet
PyTorch - A Comprehensive Overview
7 pages
LLM Code Ref
No ratings yet
LLM Code Ref
10 pages
PyTorch Crash Course 1713016363
No ratings yet
PyTorch Crash Course 1713016363
15 pages
Astro AI
No ratings yet
Astro AI
20 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Cs 224N: Assignment #4: 1. Neural Machine Translation With Rnns (45 Points)
No ratings yet
Cs 224N: Assignment #4: 1. Neural Machine Translation With Rnns (45 Points)
7 pages
ID6001 Homework
No ratings yet
ID6001 Homework
2 pages
2c PyTorch4
No ratings yet
2c PyTorch4
4 pages
GenAI LAB-samiran
No ratings yet
GenAI LAB-samiran
27 pages
Chapter 3 - Training Deep Neural Networks
No ratings yet
Chapter 3 - Training Deep Neural Networks
25 pages
Transformer Structure
No ratings yet
Transformer Structure
11 pages
Lab Report (1) Bachpan
No ratings yet
Lab Report (1) Bachpan
29 pages
gpt4-1 Prompting Guide - Ipynb
No ratings yet
gpt4-1 Prompting Guide - Ipynb
29 pages
Week 3
No ratings yet
Week 3
17 pages
DL Pipeline and Tutorial
No ratings yet
DL Pipeline and Tutorial
36 pages
PyTorch Workflow Fundamentals
No ratings yet
PyTorch Workflow Fundamentals
1 page
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Signal Converter Boe Bipolar en
No ratings yet
Signal Converter Boe Bipolar en
2 pages
Winding
No ratings yet
Winding
15 pages
Service Manual: S4S Diesel Engine
100% (2)
Service Manual: S4S Diesel Engine
15 pages
MATS2001 Physical Properties of Materials: Some Preliminary Aspects of Quantum Physics
No ratings yet
MATS2001 Physical Properties of Materials: Some Preliminary Aspects of Quantum Physics
7 pages
Wigand 1992
No ratings yet
Wigand 1992
8 pages
Cataract User Guide Web
No ratings yet
Cataract User Guide Web
29 pages
History of Elliptic Curve Cryptography
No ratings yet
History of Elliptic Curve Cryptography
3 pages
TM Midea 2nd Generation AC Series 50Hz Medium Static Pressure Duct 20210115 V8
No ratings yet
TM Midea 2nd Generation AC Series 50Hz Medium Static Pressure Duct 20210115 V8
19 pages
DTP
No ratings yet
DTP
9 pages
Series 63 Round Bottom Boats
No ratings yet
Series 63 Round Bottom Boats
54 pages
Rotrex Technical Datasheet C38 Range
No ratings yet
Rotrex Technical Datasheet C38 Range
7 pages
Limited Top-Down Effects of Feral Cats On Rodent Dynamics in A Seabird Colony
No ratings yet
Limited Top-Down Effects of Feral Cats On Rodent Dynamics in A Seabird Colony
17 pages
Solid Mechanics Worksheet: Answers
No ratings yet
Solid Mechanics Worksheet: Answers
2 pages
The Six Days of Genesis
94% (18)
The Six Days of Genesis
125 pages
Congenital Heart Desease
100% (1)
Congenital Heart Desease
212 pages
Course Summary ATAS
No ratings yet
Course Summary ATAS
2 pages
Contest3 Tasks
No ratings yet
Contest3 Tasks
10 pages
Introduction To Optical Quantum Information Processing 1st Edition Pieter Kok All Chapters Instant Download
100% (1)
Introduction To Optical Quantum Information Processing 1st Edition Pieter Kok All Chapters Instant Download
41 pages
Psychological Statistics PP
No ratings yet
Psychological Statistics PP
2 pages
Counting Eggs and Larvae
No ratings yet
Counting Eggs and Larvae
5 pages
Debate Writing
No ratings yet
Debate Writing
4 pages
3.cbse C1ipl Biweekly Test-1 Final Syllabus (18.07.2025 & 19.07.2025)
No ratings yet
3.cbse C1ipl Biweekly Test-1 Final Syllabus (18.07.2025 & 19.07.2025)
1 page
Problem: Last Updated: Dec 5 2020, 20:28
No ratings yet
Problem: Last Updated: Dec 5 2020, 20:28
4 pages
Berkeley
No ratings yet
Berkeley
63 pages
Process Safety
No ratings yet
Process Safety
98 pages
LM Borrow
No ratings yet
LM Borrow
2 pages
LTM 8067
No ratings yet
LTM 8067
20 pages
4694
No ratings yet
4694
4 pages
A Technical Seminar Report Submitted On
No ratings yet
A Technical Seminar Report Submitted On
23 pages
F5 PEKA 1 Concentration
No ratings yet
F5 PEKA 1 Concentration
2 pages

GPT 2 - Learninhg 4

Uploaded by

GPT 2 - Learninhg 4

Uploaded by

gpt.

Check Device Type

self.tokens = torch.tensor(tokens) # (B, T)

train_loader = DataLoaderLite(B=4, T=32)

You might also like