E02 Computer A Handout
E02 Computer A Handout
Computer Assignment A
Mohammad Vali
GPT-2
Embeddings
Prediction
Layer 1
Layer 2
Layer 3
Layer 4
"The quick brown fox" "jumps"
Embeddings
Layer 1
"quick" 2068 [ 0.007, -0.082, ..., -0.158]
"brown" 7586 [-0.114, -0.019, ..., -0.042]
"fox" 21831 [ 0.043, -0.181, ..., -0.162]
p("is") = 0.007
pick next word
p("the") = 0.000 based on its
Prediction
...
Layer 4
probability
"The quick brown fox" ... p("jumps") = 0.792 "jumps"
p("jumped") = 0.111
...
p("<eos>") = 0.000
extend sentence
with "jumps"
Embeddings
Prediction
Layer 1
Layer 2
Layer 3
Layer 4
"The quick
"jumps"
brown fox"
▶ Training by predicting next word in text,
similar to generation.
▶ Transformer layers are the complex
part1,2,3 .
▶ Combine relationships in text sequence to
each word (Attention).
▶ Refine word representations (MLP).
typically leads to better generation. Models GPT-2 S GPT-2 M GPT-2 GPT-3 GPT-4
#params 124M 355M 1.5B 175B ?
Source: https://en.wikipedia.org/wiki/Large_language_model
1
Blog post: https://jalammar.github.io/illustrated-gpt2/
2
”An Introduction to Transformers”, Richard E. Turner (2023). https://arxiv.org/abs/2304.10557
3 CS-C1000 · Mohammad Vali · 9/19
Youtube: https://youtu.be/wjZofJX0v4M?si=zjk7DnuEs_egGhWr
Computer Assignment A
▶ Run code in JupyterHub to generate text with GPT-2 medium (355M params).
▶ Only need a web browser. No coding required.
Log in to:
https://jupyter.cs.aalto.fi
Click on ’fetch’.
This will create a folder /introtoai2025/ which holds the notebook.
▶ Kernel → Restart