0% found this document useful (0 votes)
474 views40 pages

Large Language Models: CSC413 Tutorial 9 Yongchao Zhou

The document discusses large language models (LLMs), including what they are, why they are useful, how they are trained, and some of their emergent capabilities and risks. LLMs have billions of parameters and are trained on massive datasets using self-supervised objectives. During training, they learn general language representations that allow them to perform many NLP tasks without extensive fine-tuning. Emergent capabilities include in-context learning, advanced prompting techniques, and zero-shot chains of thought. Risks include potential for harm from mistakes, misuse, biases, and attacks.

Uploaded by

sidh.singh73
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
474 views40 pages

Large Language Models: CSC413 Tutorial 9 Yongchao Zhou

The document discusses large language models (LLMs), including what they are, why they are useful, how they are trained, and some of their emergent capabilities and risks. LLMs have billions of parameters and are trained on massive datasets using self-supervised objectives. During training, they learn general language representations that allow them to perform many NLP tasks without extensive fine-tuning. Emergent capabilities include in-context learning, advanced prompting techniques, and zero-shot chains of thought. Risks include potential for harm from mistakes, misuse, biases, and attacks.

Uploaded by

sidh.singh73
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Large Language Models

CSC413 Tutorial 9
Yongchao Zhou
Overview
● What are LLMs?
● Why LLMs?
● Emergent Capabilities
○ Few-shot In-context Learning
○ Advanced Prompt Techniques
● LLM Training
○ Architectures
○ Objectives
● LLM Finetuning
○ Instruction finetuning
○ RLHF
○ Bootstrapping
● LLM Risks
What are Language Models?
● Narrow Sense
○ A probabilistic model that assigns a probability to every finite sequence (grammatical or not)

● Broad Sense
○ Decoder-only models (GPT-X, OPT, LLaMA, PaLM)
○ Encoder-only models (BERT, RoBERTa, ELECTRA)
○ Encoder-decoder models (T5, BART)
Large Language Models - Billions of Parameters

https://huggingface.co/blog/large-language-models
Large Language Models - Hundreds of Billions of Tokens

https://babylm.github.io/
Large Language Models - yottaFlops of Compute

https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf
Why LLMs?
● Scaling Law for Neural Language Models
○ Performance depends strongly on scale! We keep getting better performance as we scale
the model, data, and compute up!

https://arxiv.org/pdf/2001.08361.pdf
Why LLMs?
● Generalization
○ We can now use one single model to solve many NLP tasks

https://arxiv.org/pdf/1910.10683.pdf
Why LLMs?
● Emergent Abilities
○ Some ability of LM is not present in smaller models but is present in larger models

https://docs.google.com/presentation/d/1yzbmYB5E7G8lY2-KzhmArmPYwwl7o7CUST1xRZDUu1Y/edit?resourcekey=0-6_TnUMoK
WCk_FN2BiPxmbw#slide=id.g1fc34b3ac18_0_27
Emergent Capability - In-Context Learning

https://arxiv.org/pdf/2005.14165.pdf
Emergent Capability - In-Context Learning

https://www.cs.princeton.edu/courses/archive/fall22/cos597G/lectures/lec04.pdf
Emergent Capability - In-Context Learning
Pretraining + Fine-tuning Paradigm
Pretraining + Prompting Paradigm
● Fine-tuning (FT) Stronger
○ + Strongest performance task-specific
○ - Need curated and labeled dataset for each performance
new task (typically 1k-100k ex.)
○ - Poor generalization, spurious feature
exploitation
● Few-shot (FS)
○ + Much less task-specific data needed
○ + No spurious feature exploitation
○ - Challenging
● One-shot (1S)
○ + "Most natural," e.g. giving humans instructions
○ - Challenging
● Zero-shot (OS)
○ + Most convenient More convenient,
○ - Challenging, can be ambiguous
general, less data
Emergent Capability - Chain of Thoughts Prompting

https://arxiv.org/pdf/2201.11903.pdf
Emergent Capability - Chain of Thoughts Prompting

https://arxiv.org/pdf/2201.11903.pdf
Emergent Capability - Zero Shot CoT Prompting

https://arxiv.org/pdf/2205.11916.pdf
Emergent Capability - Zero Shot CoT Prompting

https://arxiv.org/pdf/2205.11916.pdf
Emergent Capability - Self-Consistency Prompting

https://arxiv.org/pdf/2203.11171.pdf
Emergent Capability - Least-to-Most Prompting

https://arxiv.org/pdf/2205.10625.pdf
Emergent Capability - Augmented Prompting Abilities

Advanced Prompting Techniques Ask a human to

● Zero-shot CoT Prompting ● Explain the rationale


● Self-Consistency ● Double check the answer
● Divide-and-Conquer ● Decompose to easy subproblems

Large Language Models demonstrate some human-like behaviors!


Training Architectures

Encoder-decoder models (T5, BART) Decoder-only models (GPT-X, PaLM)

http://jalammar.github.io/illustrated-transformer/
Training Objectives - UL2

https://arxiv.org/pdf/2205.05131.pdf
What kinds of things does pretraining learn?

https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf
Finetune - Instruction Finetune

https://arxiv.org/pdf/2210.11416.pdf
Finetune - Instruction Finetune

https://arxiv.org/pdf/2210.11416.pdf
Finetune - Instruction Finetune

https://arxiv.org/pdf/2210.11416.pdf
Finetune - Instruction Finetune

https://arxiv.org/pdf/2210.11416.pdf
Finetune - RLHF

https://arxiv.org/pdf/2203.14465.pdf
Application - ChatGPT
Application - ChatGPT

https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Abilities-of-Language-Models-to-their-Source
s-b9a57ac0fcf74f30a1ab9e3e36fa1dc1
Finetune - Bootstrapping

https://arxiv.org/pdf/2203.14465.pdf
Finetune - Bootstrapping

https://arxiv.org/pdf/2210.11610.pdf
Large Language models Risks
● LLMs make mistakes

(falsehoods, hallucinations)

● LLMs can be misused

(misinformation, spam)

● LLMs can cause harms

(toxicity, biases, stereotypes)

● LLMs can be attacked

(adversarial examples, poisoning, prompt injection)

● LLMs can be useful as defenses

(content moderation, explanations)


Resources for further reading
● https://web.stanford.edu/class/cs224n/
● https://stanford-cs324.github.io/winter2022/
● https://stanford-cs324.github.io/winter2023/
● https://www.cs.princeton.edu/courses/archive/fall22/cos597G/
● https://rycolab.io/classes/llm-s23/
● https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Ab
ilities-of-Language-Models-to-their-Sources-b9a57ac0fcf74f30a1ab9e3e36fa1d
c1
● https://www.jasonwei.net/blog/emergence
Emergent Capability - In-Context Learning

https://arxiv.org/pdf/2005.14165.pdf
Emergent Capability - Decomposed Prompting

https://arxiv.org/pdf/2210.02406.pdf
Training Objectives - UL2

https://arxiv.org/pdf/2205.05131.pdf
Training Techniques - Parallelism

https://openai.com/research/techniques-for-training-large-neural-networks
Training Techniques - Parallelism

https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/

You might also like