0% found this document useful (0 votes)

19 views2 pages

GPT 2 - Learninhg 2

Uploaded by

sid_hyd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views2 pages

GPT 2 - Learninhg 2

Uploaded by

sid_hyd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

gpt.

md 2024-07-27

class CausalSelfAttention(nn.Module):
def __init__(self, config):
super().__init__()
assert config.n_embd % config.n_head == 0
self.c_attn = nn.Linear(config.n_embd, 3 * config.n_embd)
self.c_proj = nn.Linear(config.n_embd, config.n_embd)
self.c_proj.GPT_SCALE_UNIT = 1
self.n_head = config.n_head
self.n_embd = config.n_embd
self.register_buffer("bias",
torch.tril(torch.ones(config.block_size, config.block_size))
.view(1, 1, config.block_size, config.block_size))

def forward(self, x):

B, T, C = x.size()
qkv = self.c_attn(x)
q, k, v = qkv.split(self.n_embd, dim=2)
k = k.view(B, T, self.n_head, C // self.n_head).transpose(1, 2)
q = q.view(B, T, self.n_head, C // self.n_head).transpose(1, 2)
v = v.view(B, T, self.n_head, C // self.n_head).transpose(1, 2)
att = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1)))
att = att.masked_fill(self.bias[:, :, :T, :T] == 0, float('-inf'))
att = F.softmax(att, dim=-1)
y = att @ v
y = y.transpose(1, 2).contiguous().view(B, T, C)
y = self.c_proj(y)
return y

MLP

The MLP class represents a simple feed-forward neural network with GELU (Gaussian Error Linear Unit)
activation, designed to enhance gradient flow and learning. Unlike ReLU, which can suffer from "dying
neurons" due to its flat gradient for negative values, GELU provides a smooth curve that improves gradient
propagation. This activation function is also employed in other advanced models like BERT and GPT-2. The
class consists of two linear layers: the first (c_fc) expands the embedding dimension to four times its size,
while the second (c_proj) projects it back to the original embedding dimension. The forward method
sequentially applies these linear transformations and the GELU activation, processing the input tensor x to
produce the output. This design ensures efficient learning and better performance in handling complex data
patterns.

class MLP(nn.Module):
def __init__(self, config):
super().__init__()
self.c_fc = nn.Linear(config.n_embd, 4 * config.n_embd)
self.gelu = nn.GELU(approximate="tanh")
self.c_proj = nn.Linear(4 * config.n_embd, config.n_embd)
self.c_proj.GPT_SCALE_UNIT = 1

def forward(self, x):

3 / 11
gpt.md 2024-07-27

x = self.c_fc(x)
x = self.gelu(x)
x = self.c_proj(x)
return x

Block

The Block class is a fundamental component in the GPT-2 architecture, integrating self-attention and MLP
(Multi-Layer Perceptron) modules with layer normalization and residual connections. Unlike the
conventional transformer architecture where layer normalization follows the self-attention or MLP, GPT-2
positions layer normalization at the input of each sub-block, ensuring a clean residual path. This design
choice facilitates smooth gradient flow from the top layers down to the input/token layer, enhancing learning
efficiency. The class initializes with two layer normalization layers (ln_1 and ln_2), a
CausalSelfAttention module for token interaction, and an MLP for token-wise transformations. In the
forward method, the input tensor x undergoes layer normalization followed by self-attention, and the result
is added back to x as a residual connection. This process is repeated with the MLP, ensuring that each
token is updated independently and effectively. The combination of these elements allows the block to
efficiently aggregate information across tokens and update their representations, forming a robust building
block for the overall model.

class Block(nn.Module):
def __init__(self, config):
super().__init__()
self.ln_1 = nn.LayerNorm(config.n_embd)
self.attn = CausalSelfAttention(config)
self.ln_2 = nn.LayerNorm(config.n_embd)
self.mlp = MLP(config)

def forward(self, x):

x = x + self.attn(self.ln_1(x))
x = x + self.mlp(self.ln_2(x))
return x

GPT

The GPT class is the primary structure for the GPT-2 model, combining multiple components to form a
complete transformer network for text generation. It starts by initializing essential elements: token and
position embeddings (wte and wpe), a stack of transformer blocks (h), final layer normalization (ln_f), and
the output linear layer (lm_head). These modules are organized in a ModuleDict for flexible access and
management. The embeddings map input indices to dense vectors, while the transformer blocks apply self-
attention and MLP operations to these embeddings, progressively refining the representations.
To ensure consistent and efficient learning, the class incorporates weight sharing between the token
embeddings and the output layer, and initializes weights following the original GPT-2 model's methodology.
During forward passes, the model processes input sequences by adding positional information to token
embeddings and passing them through the transformer blocks. The final layer normalization and linear
4 / 11

GPT 2 - Learninhg 4
0% (1)
GPT 2 - Learninhg 4
2 pages
Deep Learning Lab Course 2017 (Deep Learning Practical)
No ratings yet
Deep Learning Lab Course 2017 (Deep Learning Practical)
49 pages
Deep Learning Lab Practicals
No ratings yet
Deep Learning Lab Practicals
24 pages
MAMBA
No ratings yet
MAMBA
5 pages
Training A Functional Link Neural Network Using An Artificial Bee Colony For Solving A Classification Problems
No ratings yet
Training A Functional Link Neural Network Using An Artificial Bee Colony For Solving A Classification Problems
6 pages
001 Intro
No ratings yet
001 Intro
66 pages
CS236 Introduction To PyTorch
100% (4)
CS236 Introduction To PyTorch
33 pages
GPT 2 - Learninhg 3
No ratings yet
GPT 2 - Learninhg 3
2 pages
Karpathy MinGPT Model
No ratings yet
Karpathy MinGPT Model
7 pages
LLM Code Ref
No ratings yet
LLM Code Ref
10 pages
Materi Naive Bayes
No ratings yet
Materi Naive Bayes
15 pages
Transformers Torch
No ratings yet
Transformers Torch
38 pages
GNN Hands On 03
No ratings yet
GNN Hands On 03
7 pages
GNN Python Code in Keras and Pytorch - by YashwanthReddyGoduguchintha - Medium
No ratings yet
GNN Python Code in Keras and Pytorch - by YashwanthReddyGoduguchintha - Medium
10 pages
GPT 2 - Learninhg 1
No ratings yet
GPT 2 - Learninhg 1
2 pages
Intro To Pytorch
No ratings yet
Intro To Pytorch
12 pages
Ann A1 21610041
No ratings yet
Ann A1 21610041
12 pages
cs519 hw2
No ratings yet
cs519 hw2
15 pages
GPT4 Architecture
No ratings yet
GPT4 Architecture
2 pages
ICE516 GPT4 Architecture
No ratings yet
ICE516 GPT4 Architecture
5 pages
ECE613-Lecture 4: Multilayer Feedforward Neural Networks-Applications
No ratings yet
ECE613-Lecture 4: Multilayer Feedforward Neural Networks-Applications
4 pages
Astro AI
No ratings yet
Astro AI
20 pages
Script 2
No ratings yet
Script 2
2 pages
Unit 3
No ratings yet
Unit 3
52 pages
AI in Smart Grid (II)
No ratings yet
AI in Smart Grid (II)
15 pages
GPT2 From Scratch in PyTorch
No ratings yet
GPT2 From Scratch in PyTorch
13 pages
Aditya Joshi 23252595 Assign 5
No ratings yet
Aditya Joshi 23252595 Assign 5
7 pages
HW1P1 F23
No ratings yet
HW1P1 F23
37 pages
Decoder-Only Transformer (LLM) For Question Asking: Notebook Structure
No ratings yet
Decoder-Only Transformer (LLM) For Question Asking: Notebook Structure
9 pages
Transformers Implementations 1731410319
No ratings yet
Transformers Implementations 1731410319
10 pages
l14 Machine Learning
No ratings yet
l14 Machine Learning
16 pages
ML Unit-5
No ratings yet
ML Unit-5
14 pages
Transformer Flux
No ratings yet
Transformer Flux
11 pages
Astro AI
No ratings yet
Astro AI
20 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
39 pages
DL Using Python and C++
No ratings yet
DL Using Python and C++
35 pages
2B MultiLayer Perceptron Assignment
No ratings yet
2B MultiLayer Perceptron Assignment
3 pages
ANN Assignment I
No ratings yet
ANN Assignment I
44 pages
NLP 4
No ratings yet
NLP 4
10 pages
Cs224n Self Attention Transformers 2023 Draft
No ratings yet
Cs224n Self Attention Transformers 2023 Draft
18 pages
4 Implementing A GPT Model From Scratch To Generate Text - Build A Large Language Model (From Scratch)
No ratings yet
4 Implementing A GPT Model From Scratch To Generate Text - Build A Large Language Model (From Scratch)
52 pages
Question Example
No ratings yet
Question Example
10 pages
Proyect 1 Underwater Robot
No ratings yet
Proyect 1 Underwater Robot
9 pages
TXT
No ratings yet
TXT
7 pages
Building Makemore Part 4 - Becoming A Backprop Ninja
No ratings yet
Building Makemore Part 4 - Becoming A Backprop Ninja
50 pages
IBest DeepLearning
No ratings yet
IBest DeepLearning
123 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
23ad2205a DL Lab Book
No ratings yet
23ad2205a DL Lab Book
120 pages
Lab Report (1) Bachpan
No ratings yet
Lab Report (1) Bachpan
29 pages
Coding Attention Mechanisms
No ratings yet
Coding Attention Mechanisms
24 pages
Project Report 4th Year
No ratings yet
Project Report 4th Year
43 pages
Final DL
No ratings yet
Final DL
26 pages
Mlp-Fromscratch Sigmoid-Mse
No ratings yet
Mlp-Fromscratch Sigmoid-Mse
13 pages
Pytorch Demo 1749471354
No ratings yet
Pytorch Demo 1749471354
10 pages
DM - Lab - 8 - Jupyter Notebook
No ratings yet
DM - Lab - 8 - Jupyter Notebook
5 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
01 Intro
No ratings yet
01 Intro
49 pages
Neural Network Toolbox Command List
No ratings yet
Neural Network Toolbox Command List
4 pages
Deep Learning
No ratings yet
Deep Learning
46 pages
연구역량 강화를 위한 생성형 인공지능 활용 방안
No ratings yet
연구역량 강화를 위한 생성형 인공지능 활용 방안
120 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
Deep Learning Models (Basic)
No ratings yet
Deep Learning Models (Basic)
35 pages
ML Question Papper
100% (1)
ML Question Papper
2 pages
Eng PPT Tech
No ratings yet
Eng PPT Tech
18 pages
Deep Learning: Data Mining: Advanced Aspects
No ratings yet
Deep Learning: Data Mining: Advanced Aspects
131 pages
LLM For Maths People
No ratings yet
LLM For Maths People
53 pages
ANN Models
No ratings yet
ANN Models
42 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
No ratings yet
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
5 pages
AI Algorithm - Detect Glasses
No ratings yet
AI Algorithm - Detect Glasses
41 pages
Unit 5
No ratings yet
Unit 5
61 pages
Results: Contingency Tables
No ratings yet
Results: Contingency Tables
7 pages
Machine Learning - Classification: CS102 Winter 2019
No ratings yet
Machine Learning - Classification: CS102 Winter 2019
36 pages
9419 44206 1 PB
No ratings yet
9419 44206 1 PB
7 pages
Backpropagation in CNN - Part 2
No ratings yet
Backpropagation in CNN - Part 2
6 pages
JNTU - Neural Network
No ratings yet
JNTU - Neural Network
5 pages
Deep Feature Extraction of Pap Smear Images Based On Convolutional Neural Network and Vision Transformer For Cervical Cancer Classification
No ratings yet
Deep Feature Extraction of Pap Smear Images Based On Convolutional Neural Network and Vision Transformer For Cervical Cancer Classification
7 pages
Lampiran 5. Hasil Analisis
No ratings yet
Lampiran 5. Hasil Analisis
2 pages
Time Series Prediction With Multilayer Perceptron (MLP) : A New Generalized Error Based Approach
No ratings yet
Time Series Prediction With Multilayer Perceptron (MLP) : A New Generalized Error Based Approach
2 pages
MLT Answer Key
No ratings yet
MLT Answer Key
10 pages
Deep Belief Network
No ratings yet
Deep Belief Network
4 pages
Credit Card Fraud Detection Using Machine Learning: Ruttala Sailusha V. Gnaneswar
No ratings yet
Credit Card Fraud Detection Using Machine Learning: Ruttala Sailusha V. Gnaneswar
7 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
Credit Card Fraud Detection Using Ensemble Data Mining Methods
No ratings yet
Credit Card Fraud Detection Using Ensemble Data Mining Methods
19 pages
CS F425 - Deep Learning - (Tanmay Tulsidas Verlekar) - 2023 - 2
No ratings yet
CS F425 - Deep Learning - (Tanmay Tulsidas Verlekar) - 2023 - 2
3 pages
QB DL
No ratings yet
QB DL
2 pages
Machine Learning: III B. Tech I Semester Regular/Supplementary Examinations, December - 2023
No ratings yet
Machine Learning: III B. Tech I Semester Regular/Supplementary Examinations, December - 2023
8 pages
Assignment 6
No ratings yet
Assignment 6
2 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

GPT 2 - Learninhg 2

Uploaded by

GPT 2 - Learninhg 2

Uploaded by

gpt.

def forward(self, x):

def forward(self, x):

def forward(self, x):

You might also like