2024 Build Llms
2024 Build Llms
Classi er
Personal assistant
Instruction dataset
fi
Using Large Language Models (LLMs)
https://github.com/Lightning-AI/litgpt
STAGE 1: BUILDING
1) Data
2) Attention 3) LLM
preparation
mechanism architecture
& sampling
Building an LLM
1) Data 7) Load
2) Attention 3) LLM 5) Training 6) Model
preparation pretrained
mechanism architecture loop evaluation
& sampling weights
4) Pretraining
Building an LLM Foundation model
Classi er
4) Pretraining
Building an LLM Foundation model
Personal assistant
9) Finetuning
Instruction dataset
https://github.com/rasbt/LLMs-from-scratch
Classi er
4) Pretraining
Building an LLM Foundation model
Personal assistant
9) Finetuning
Instruction dataset
Sample text
"In the heart of the city stood the old library, a relic from a bygone era. Its
stone walls bore the marks of time, and ivy clung tightly to its facade …"
Sample text
"In the heart of the city stood the old library, a relic from a bygone era. Its
stone walls bore the marks of time, and ivy clung tightly to its facade …"
Sample text
"In the heart of the city stood the old library, a relic from a bygone era. Its
stone walls bore the marks of time, and ivy clung tightly to its facade …"
Sample text
"In the heart of the city stood the old library, a relic from a bygone era. Its
stone walls bore the marks of time, and ivy clung tightly to its facade …"
Output layers
LLM
Preprocessing steps
Input text
“This”
LLM LLM
“Our training corpus includes a new mix of data from publicly available sources,
which does not include data from Meta’s products or services. We made an e ort
to remove data from certain sites known to contain a high volume of personal
information about private individuals. We trained on 2 trillion tokens of data as
this provides a good performance–cost trade-o , up-sampling the most factual
sources in an e ort to increase knowledge and dampen hallucinations.”
“To train the best language model, the curation of a large, high-
quality training dataset is paramount. In line with our design
principles, we invested heavily in pretraining data. Llama 3 is
pretrained on over 15T tokens that were all collected from publicly
available sources.”
Introducing Meta Llama 3: The most capable openly available LLM to date (2024), https://ai.meta.com/blog/meta-llama-3/
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone (2024), https://arxiv.org/abs/2404.14219
Classi er
4) Pretraining
Building an LLM Foundation model
Personal assistant
9) Finetuning
Instruction dataset
Final LayerNorm
model +
Dropout
Linear layer
LayerNorm 2
Linear layer
+
Dropout
N× { Dropout
Tokenized text
LayerNorm 2
N× { Dropout
RMS
LayerNorm 2 LayerNorm 2
+ 20 heads + 32 heads
Dropout Dropout
RMS
LayerNorm 1 LayerNorm 1
36 × { Dropout
32 × { Dropout
Classi er
4) Pretraining
Building an LLM Foundation model
Personal assistant
9) Finetuning
Instruction dataset
stone walls bore the marks of time, and ivy clung tightly to its facade …"
Classi er
4) Pretraining
Building an LLM Foundation model
Personal assistant
9) Finetuning
Instruction dataset
https://github.com/Lightning-AI/litgpt
Sebastian Raschka Building LLMs 47
LitGPT
https://github.com/Lightning-AI/litgpt
Classi er
4) Pretraining
Building an LLM Foundation model
Personal assistant
9) Finetuning
Instruction dataset
output layer
1 50,257
Linear output layer …
GPT
model Final LayerNorm
+ …
Dropout 1 768
Feed forward
The original linear output layer
LayerNorm 2
maps 768 hidden units to 50,257 units
+
Dropout
(the number of tokens in the vocabulary)
Masked multi-head
attention
LayerNorm 1
12 × { Dropout
Tokenized text
Inputs
output layer
Linear output layer
GPT
model Final LayerNorm
1 50,257
…
+
Dropout
Feed forward
…
1 768
LayerNorm 2
+
1 2
Dropout
Masked multi-head
attention
LayerNorm 1
12 × { Dropout
1
…
768
We replace the original linear output layer above
Positional embedding layer
with a layer that maps from 768 hidden units to
only 2 units, where the 2 units represent the two
Token embedding layer classes ("spam" and "not spam")
Tokenized text
Inputs
https://magazine.sebastianraschka.com/p/ netuning-large-language-models
https://magazine.sebastianraschka.com/p/ netuning-large-language-models
Classi er
4) Pretraining
Building an LLM Foundation model
Personal assistant
9) Finetuning
Instruction dataset
{
"instruction": "Rewrite the following sentence using passive voice.",
"input": "The team achieved great results.",
"output": "Great results were achieved by the team."
},
### Instruction:
Rewrite the following sentence using passive voice.
### Input:
The team achieved great results.
### Response:
Great results were achieved by the team.
LLM
### Instruction:
Rewrite the following sentence using passive voice.
### Input:
The team achieved great results.
### Response:
Great results were achieved by the team.
Model response
MMLU
Rank Model Average↑ (%) Paper
1 Gemini Ultra 90 Gemini: A Family of Highly Capable
~1760B Multimodal Models
2 GPT-4o 88.7 GPT-4 Technical Report
model_answer = model(input)
https://github.com/EleutherAI/lm-evaluation-harness
https://github.com/Lightning-AI/litgpt/blob/main/tutorials/evaluation.md
Sebastian Raschka Building LLMs 70
AlpacaEval
Compare to response by GPT-4 Preview using a GPT-4 based auto-annotator
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/03_model-evaluation/llm-instruction-eval-openai.ipynb
Improve helpfulness+safety if
Preference netuning
developing a chatbot
Code Llama: Open Foundation Models for Code, https://arxiv.org/abs/2308.12950 Instruction netuning
Continued pretraining
Classi er
4) Pretraining
Building an LLM Foundation model
Personal assistant
9) Finetuning
Instruction dataset
https://sebastianraschka.com/books/
https://lightning.ai
Slides
🗺 https://sebastianraschka.com/pdf/slides/2024-build-llms.pdf