0% found this document useful (0 votes)

15 views11 pages

Transformer Structure

Uploaded by

rtr.sumanthpagadala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

Transformer Structure

Uploaded by

rtr.sumanthpagadala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Transformers

Chris Watkins
Programming with gradients
Set up a ‘model’ with : class Mymodel( nn.Module ):
parameters, also known as def __init__():
trainable weights
super().__init__()
weights1 = nn.Linear(2,5)
forward function, which computes relu = nn.ReLU()
output from input
weights2 = nn.Linear(5,3)
loss function
def forward( x ):
x=weights1(x)
x=relu(x)
x=weights2(x)
return x
Setting up the network for learning
mymodel = MyModel() # set up my model, initialize weights
all_my_weights=mymodel.parameters() # get all the weights

# put the weights in the optimizer

my_favourite_optimizer = torch.optim.AdamW( all_my_weights )
Learning
The learning loop of every neural Randomly select X (input)
network and Y (target) from data

optimizer.step() output = mymodel.forward(X)

(optimizer has the weight tensors (this is the ‘forward pass’)

and the gradients; it now uses
the gradients to slightly change
the weights)

loss = loss_function( output, Y )

(loss is just one number)
loss.backward()

(this computes all gradients of loss

wrt weights. The gradients are
attached to the weight tensors. zero all stored gradients
This is ‘back-propagation’.)
GPT transformer design overview

The big, fat, eats, dog, white, cat, …

fat
big, fat, eats, dog, white , cat, …

cat big, fat, eats, dog, white, cat, …

eats big, fat, eats, dog, white, cat, …

Embedding
vectors Transform Each token has a sequence of embedding vectors going through the
mix transformer network, which finally produces predictions of next tokens
process
Exploded view of transformer head applied at token i in GPT
q0 = Qu0

k0 = Ku0
yi0= qi . k0 pi0 Note that we only mix with
v0 = Vu0 previous tokens in the sequence

.
.
.
.
qi = Qui
pi0 v0 + … + pii vi
ki = Kui yii= qi . ki pii
ui
vi = Vui softmax Some
feedforward
Mixing neural
proportions network
Travelling between symbol world and vector world

Symbol to vector:
lookup in
embedding table
Stored (and trained) embedding vector for ‘cat’

cat 3

Symbol probabilities
dot product
s a with each
o the vector in
Vector to symbol: cow embedding
f
. t
cat
dog
.
.
table gives
logits, which
Dot m . are turned
.
Embedding vector (usually product . into symbol
the result of much Stored output a .
. probabilities
processing) embedding x .
. with softmax
vectors
3 transformer
heads, each with
different weights Multiple transformer heads
Token 0

Token 1 Each transformer head applies

same weights to each token
stream

Token 2 The transformer heads all have

different weights ‘inside’ them

Token 3

Head 2 Outputs of the three transformer

heads are concatenated to form
Head 1 input to next layer
Head 0
Two layers of transformer heads, with three transformer heads in each layer

Token 0

Token 1

Token 2

Token 3

Layer 1 Head 2 Layer 2 Head 2

6 different
Layer 1 Head 1 Layer 2 Head 1 transformer
Layer 2 Head 0 heads
Layer 1 Head 0
What have I left out?
Surely it can’t be so simple.

Well, I’ve only left out layer normalization and skip-connections…

It really is this simple.

But GPT3 is very big:

• 96 attention layers
• Various embedding dimensions, up to 12,288
• 175,000,000,000 trainable weights
• Batch size of several million
Why do transformers work so well?
Transformers are the dominant NN architecture (for large problems)
since 2017.

No general agreement or precise theory on why even small

transformers work for non-NLP problems.

No agreement as to why very large and deep transformers work so

well in language modelling and question answering.

The Possessed (Devils) by Fyodor Dostoevsky
No ratings yet
The Possessed (Devils) by Fyodor Dostoevsky
657 pages
GPT 2 - Learninhg 4
0% (1)
GPT 2 - Learninhg 4
2 pages
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
No ratings yet
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen - Li
272 pages
Ex Tenebris Marking System
No ratings yet
Ex Tenebris Marking System
5 pages
Chemistry For Physical Science
No ratings yet
Chemistry For Physical Science
9 pages
AI API Course
No ratings yet
AI API Course
85 pages
Week 12
100% (1)
Week 12
64 pages
Dissertation Sahra Wagenknecht
100% (2)
Dissertation Sahra Wagenknecht
8 pages
Light XlTwgwQ0 OvDn1N7
No ratings yet
Light XlTwgwQ0 OvDn1N7
41 pages
NEET Chemistry Chapter Wise Mock Test - Physical Chemistry I - CBSE Tuts
No ratings yet
NEET Chemistry Chapter Wise Mock Test - Physical Chemistry I - CBSE Tuts
25 pages
Chicago River Design Guidelines 2019
100% (2)
Chicago River Design Guidelines 2019
137 pages
How To Code For Quantum Computers
From Everand
How To Code For Quantum Computers
Nivio Dos Santos
No ratings yet
1 - ALG - Exponential - Growth - Decay - Functions
No ratings yet
1 - ALG - Exponential - Growth - Decay - Functions
22 pages
A Gentle Introduction To Neural Networks With Python
100% (1)
A Gentle Introduction To Neural Networks With Python
85 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
GPT4 Architecture
No ratings yet
GPT4 Architecture
2 pages
Transformers Torch
No ratings yet
Transformers Torch
38 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
50 pages
Data-Sheet FieldJointCoating
No ratings yet
Data-Sheet FieldJointCoating
2 pages
TensorFlow NN
No ratings yet
TensorFlow NN
178 pages
How To Create A Simple Neural Network in Python
100% (1)
How To Create A Simple Neural Network in Python
4 pages
Local Media3092843488830198412
100% (1)
Local Media3092843488830198412
2 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
Module 1 - Introduction To Animal Science
No ratings yet
Module 1 - Introduction To Animal Science
13 pages
Dive Into Deep Learning
No ratings yet
Dive Into Deep Learning
105 pages
Chapter 2. Transformers: A Note For Early Release Readers
No ratings yet
Chapter 2. Transformers: A Note For Early Release Readers
85 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
A Gentle Introduction To Neural Networks With Python
No ratings yet
A Gentle Introduction To Neural Networks With Python
85 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
LLM For Maths People
No ratings yet
LLM For Maths People
53 pages
10 From Zero To ML
No ratings yet
10 From Zero To ML
53 pages
X33fcon 2023 Empowering Security GenerativeAI Fundamentals Applications
No ratings yet
X33fcon 2023 Empowering Security GenerativeAI Fundamentals Applications
60 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
DL Lab - Merged
No ratings yet
DL Lab - Merged
60 pages
Fischer SaMontec ENG PDF
No ratings yet
Fischer SaMontec ENG PDF
242 pages
Beginner's PyTorch Guide
No ratings yet
Beginner's PyTorch Guide
35 pages
FIAT Q SF 2014 - Raw Items and Tables
No ratings yet
FIAT Q SF 2014 - Raw Items and Tables
2 pages
Lec9 NN I
No ratings yet
Lec9 NN I
47 pages
Lecture14 - ML (FF, Autoenc, Dense Networks)
No ratings yet
Lecture14 - ML (FF, Autoenc, Dense Networks)
28 pages
Solid Liquid and Gases
No ratings yet
Solid Liquid and Gases
46 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
Transformer
No ratings yet
Transformer
39 pages
To Create A LLM
No ratings yet
To Create A LLM
53 pages
4 Implementing A GPT Model From Scratch To Generate Text - Build A Large Language Model (From Scratch)
No ratings yet
4 Implementing A GPT Model From Scratch To Generate Text - Build A Large Language Model (From Scratch)
52 pages
Google Aiml
No ratings yet
Google Aiml
50 pages
Lesson 4 - Deep Learning
No ratings yet
Lesson 4 - Deep Learning
20 pages
Qulay Qo'LanmaInglizEnglish
No ratings yet
Qulay Qo'LanmaInglizEnglish
19 pages
Practical Deep Learning For NLP: Maarten Versteegh NLP Research Engineer
No ratings yet
Practical Deep Learning For NLP: Maarten Versteegh NLP Research Engineer
44 pages
DS303 NN
No ratings yet
DS303 NN
20 pages
Building Social Protection Floors For All: Global Flagship Programme Strategy (2016-20)
No ratings yet
Building Social Protection Floors For All: Global Flagship Programme Strategy (2016-20)
24 pages
Attitudes and Perception
No ratings yet
Attitudes and Perception
38 pages
An Introduction To Transformers
No ratings yet
An Introduction To Transformers
10 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
Deep Learning Lab: How To Train Your First Neural Network
No ratings yet
Deep Learning Lab: How To Train Your First Neural Network
68 pages
Unit III
No ratings yet
Unit III
29 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
Video 4 - Introduction To Neural Networks
No ratings yet
Video 4 - Introduction To Neural Networks
18 pages
Pitambara 1
No ratings yet
Pitambara 1
30 pages
The Role of Subject Knowledge in The Eff PDF
No ratings yet
The Role of Subject Knowledge in The Eff PDF
15 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
Neural Networks
No ratings yet
Neural Networks
17 pages
M3 Transcript
No ratings yet
M3 Transcript
10 pages
Intro To Neural Nets PDF
No ratings yet
Intro To Neural Nets PDF
29 pages
PRACTICAL MEDICAL PHYSICS A Guide To The Work of Hospital Clinical
100% (4)
PRACTICAL MEDICAL PHYSICS A Guide To The Work of Hospital Clinical
263 pages
Artificial Intelligence - Assignment 3
No ratings yet
Artificial Intelligence - Assignment 3
11 pages
Transformer
No ratings yet
Transformer
5 pages
12 Rashis and Their Lords
No ratings yet
12 Rashis and Their Lords
1 page
Transformers
No ratings yet
Transformers
15 pages
How To Build Your Own Neural Network From Scratch in Python
No ratings yet
How To Build Your Own Neural Network From Scratch in Python
11 pages
Understanding GPT The AI Revolution in Language Processing
No ratings yet
Understanding GPT The AI Revolution in Language Processing
10 pages
Deeplearning Guide
No ratings yet
Deeplearning Guide
10 pages
Rnoti p1707
No ratings yet
Rnoti p1707
9 pages
Valeo Technical Paper
No ratings yet
Valeo Technical Paper
6 pages
Experiment 9 1
No ratings yet
Experiment 9 1
3 pages
ICE516 GPT4 Architecture
No ratings yet
ICE516 GPT4 Architecture
5 pages
Lab 03 Properties of Signals and Various Systems
No ratings yet
Lab 03 Properties of Signals and Various Systems
7 pages
Australian Standard: Inspection of Buildings Part 1: Pre-Purchase Inspections - Residential Buildings
No ratings yet
Australian Standard: Inspection of Buildings Part 1: Pre-Purchase Inspections - Residential Buildings
9 pages
Types of Load Pavement Failures in Kenya
No ratings yet
Types of Load Pavement Failures in Kenya
4 pages
Master of Public Management: Admission Requirements
No ratings yet
Master of Public Management: Admission Requirements
3 pages
Dive Into Deep Learning Fundamental Walkthrough 1638714338
No ratings yet
Dive Into Deep Learning Fundamental Walkthrough 1638714338
5 pages
Deep Learning
No ratings yet
Deep Learning
3 pages
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Assignment #7 - Dr. Totanes
No ratings yet
Assignment #7 - Dr. Totanes
3 pages
Transformer Architecture Explained in LLMs
No ratings yet
Transformer Architecture Explained in LLMs
2 pages
Social Work Tools, Skills, and Techniques
100% (5)
Social Work Tools, Skills, and Techniques
63 pages
MPKV Vacancy
No ratings yet
MPKV Vacancy
1 page
Safari - 25 Jul 2019 at 11:43
No ratings yet
Safari - 25 Jul 2019 at 11:43
1 page
Introduction to Vectorial and Matricial Calculus
From Everand
Introduction to Vectorial and Matricial Calculus
Simone Malacrida
No ratings yet
Introduction to Vectors, Matrices and Tensors
From Everand
Introduction to Vectors, Matrices and Tensors
Simone Malacrida
No ratings yet
İdi̇l Ören CV
No ratings yet
İdi̇l Ören CV
3 pages