0% found this document useful (1 vote)
20 views2 pages

GPT 2 - Learninhg 4

Uploaded by

sid_hyd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
20 views2 pages

GPT 2 - Learninhg 4

Uploaded by

sid_hyd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

gpt.

md 2024-07-27

return model

Check Device Type

Detect the most suitable device for computation, prioritizing GPU usage if available. Initially, it sets the
device variable to "cpu" as a default. It then checks if CUDA (NVIDIA's parallel computing platform) is
available, and if so, sets the device to "cuda" for leveraging GPU acceleration. If CUDA is not available but
the machine has Apple's Metal Performance Shaders (MPS) support, the device is set to "mps" to utilize
Apple's GPU capabilities. Finally, it prints out the selected device. This approach optimizes performance by
utilizing available hardware acceleration.
Ref: https://developer.apple.com/metal/pytorch/

device = "cpu"
if torch.cuda.is_available():
device = "cuda"
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
# for apple mps
device = "mps"
print(f"using device: {device}...")

Data Loader

DataLoaderLite , is used for handling text data in a format suitable for training a GPT model. Upon
initialization, the class reads a text file from the specified path, encodes the text into tokens using the GPT-
2 tokenizer from the tiktoken library, and stores these tokens as a PyTorch tensor. The size of each batch
is defined by the parameters B (batch size) and T (sequence length). The class also calculates and prints
the total number of tokens loaded and the number of batches per epoch based on the batch size and
sequence length.
The next_batch method retrieves the next batch of data from the tokenized text. It extracts a buffer of
tokens starting from the current position, creating input (x) and target (y) tensors by shifting the buffer by
one token. The method then updates the current position for the next batch. If the next batch exceeds the
length of the token list, the position is reset to the beginning, ensuring continuous looping through the
dataset. This efficient, lightweight data loader facilitates seamless batch processing for training language
models.

class DataLoaderLite:
def __init__(self, B, T):
self.B = B
self.T = T
with open('text', 'r') as f:
text = f.read()
enc = tiktoken.get_encoding('gpt2')
tokens = enc.encode(text)

7 / 11
gpt.md 2024-07-27

self.tokens = torch.tensor(tokens) # (B, T)


print(f"loaded {len(self.tokens)} tokens")
print(f"1 epoch = {len(self.tokens) // (B * T)} batches")
self.current_position = 0

def next_batch(self):
B, T = self.B, self.T
buf = self.tokens[self.current_position:self.current_position + B
* T + 1] # (B, T)
x = buf[:-1].view(B, T)
y = buf[1:].view(B, T)
self.current_position += B * T
if self.current_position + B * T + 1 > len(self.tokens):
self.current_position = 0
return x, y

3. Training Loop
The basic training loop is decribed for a GPT-2 model using a custom data loader. First, it initializes a
DataLoaderLite instance with a batch size ( B) of 4 and a sequence length ( T) of 32, which will supply the
training data. Next, a GPT model is instantiated using the GPTConfig class and moved to the appropriate
computational device (GPU or CPU) using the model.to(device) method.
An AdamW optimizer is set up with the model parameters and a learning rate of 3e-4. The training loop runs
for 2000 iterations, where in each iteration, the next batch of data is retrieved from the data loader and
transferred to the device. The gradients are reset with optimizer.zero_grad(), and a forward pass
through the model computes the logits and loss. The backward pass (loss.backward()) calculates the
gradients, and optimizer.step() updates the model's weights. Finally, the loss for each step is printed,
providing a measure of the model's performance during training. This loop efficiently trains the GPT model
by iteratively processing batches of data, computing gradients, and updating weights.

train_loader = DataLoaderLite(B=4, T=32)


model = GPT(GPTConfig())
model.to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)
for i in range(2000):
x, y = train_loader.next_batch()
x, y = x.to(device), y.to(device)
optimizer.zero_grad()
logits, loss = model(x, y)
loss.backward()
optimizer.step()
print(f"step: {i}, loss: {loss.item()}")

4. Text Generation
The provided code snippet illustrates the process of generating text sequences using a trained GPT-2
model. Initially, the random seed for both CPU and GPU computations is set to ensure reproducibility. The
8 / 11

You might also like