0% found this document useful (0 votes)
7 views25 pages

Large Large Models

The document discusses the architecture and functioning of Large Language Models (LLMs), particularly focusing on the Transformer backbone and self-attention mechanisms. It highlights the importance of human feedback in training LLMs and the challenges associated with integrating human input effectively. Additionally, it explores the concept of in-context learning and emergent abilities in larger models, emphasizing the need for continual training and system-level thinking in LLM development.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views25 pages

Large Large Models

The document discusses the architecture and functioning of Large Language Models (LLMs), particularly focusing on the Transformer backbone and self-attention mechanisms. It highlights the importance of human feedback in training LLMs and the challenges associated with integrating human input effectively. Additionally, it explores the concept of in-context learning and emergent abilities in larger models, emphasizing the need for continual training and system-level thinking in LLM development.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

https://www.linkedin.

com/in/thota-adinarayana/

1
Introduction to Large Language Models

• Latest LLMs adopt the Transformer backbone.

• Core component: self-attention mechanism


• Put tokens into their context!

2 https://www.linkedin.com/in/thota-adinarayana/
Introduction to Large Language Models
• Level of linguistic knowledge

Not applicable
Not applicable
Great. LLMs are robust to typos, coinage, cacography

Great. Dependency parsing and coreference resolution are


almost solved. Syntactic information is captured in attention
(Clark et al., 2019).
Great. Most of current NLP benchmarks focus on this part.

It depends! LLMs still get confused when they meet


unique contexts or special users (e.g., those in
underrepresented groups)
How do LLMs acquire the knowledge of language?
• Unsupervised pre-training on very large corpus
• There are many pre-training methods, here we focus on the one used by the
GPT family.
• Language modeling: predict the next word

4 The gif is copied from The illustrated GPT-2


How do LLMs acquire the knowledge of language?

These examples are copied from Stanford CS224N/Ling284 slides (author: John Hewitt).
They are actually examples for masked language modeling which is a bit different from how GPT is pre-trained. 25
From GPT-3 to ChatGPT:
Learn human intents behind their language

Information behind this sentence:


People usually use imperative sentence
to make a request. The listener is
expected to complete that request.

OpenAI
6 “Aligning language models to follow instructions” 2022
Follow Instructions & Align with Human Preference

Ouyang et al. “Training language models


7 to follow instructions with human feedback” NIPS 2022
Follow Instructions & Align with Human Preference

Human-in-
the-loop!
(Discuss more
later)

Ouyang et al. “Training language models


8 to follow instructions with human feedback” NIPS 2022
Introduction to Large Language Models
GPT-4?

Hugging
9 face “Large Language Models: A New Moore's Law?” 2021
Introduction to Large Language Models
• In-context learning
• No parameter update
• Wrap “training” samples in the prompt

The gif is copied


10 from https://ai.stanford.edu/blog/understanding-incontext/
Introduction to Large Language Models
• Open question:
• Why does in-context learning work?
• There are some hypotheses but no conclusion yet
• Xie et al. “An Explanation of In-context Learning as Implicit Bayesian
Inference” ICLR 2022
• Akyürek et al. “What learning algorithm is in-context learning? Investigations
with linear models” ICLR 2023
• Oswald et al. “Transformers learn in-context by gradient descent” Arxiv 2022

11
Introduction to Large Language Models
• Emergence abilities
• An ability is emergent if it is not present in smaller models but is present in
larger models.
• In-context learning ability is one of them.
• Scaling to improve unlock abilities.

Emergence in few-shot prompting


This gif is copied from Jason Wei’s slides.

Wei et al. “Emergent Abilities of Large Language Models” TMLR 2022 35


The following contents are my
own opinion, very subjective!

View LLMs from a system perspective


• Analogy: operating system (OS)
• Knowing a set of algorithms is not enough to build a good OS.
• Knowing a training algorithm/recipe is not enough to build a good LLM.

• Model patching & continual training of LLM are important.


• We shouldn’t always build a new LLM from scratch.
• I think this may be one reason for OpenAI’s success – they build LLMs as
building a system (maintenance, version control, incremental update)

37
Yao Fu “How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources”
Unix Family Tree

Ecosystem Graphs
Put LLMs into a Larger System
• Analogy: operating system (OS)
• How do we interact with OS?
• How do we interact with LLMs?

16
Put LLMs into a Larger System
• Analogy: operating system (OS)
• How do we interact with OS?
• How do we interact with LLMs?

This part is now also considered as a part of the


OS in general.
• Make the system more accessible, especially
for non-computer experts.

17
Put LLMs into a Larger System

The user briefly describes


his/her goal. AutoGPT breaks
the goal into detailed steps and
refine its own plan.

AutoGPT demo 42
Put LLMs into a Larger System
LLM functions as a controller
and can use tool on its own.

19 https://openai.com/blog/chatgpt-plugins
LLM as a Controller

I’m inspired
20 by https://oval.cs.stanford.edu/ to add this illustration.
LLM as a Controller: Challenges
• How to design the interaction interface between LLMs and other
components (e.g., external databases, API schemas)?
• Desiderata:
robustness, unambiguity, privacy-protecting, easy-to-build for non-AI developers

• How to maintain the state of LLM?


• Naïve solution: Cramming all the previous contexts into the prompt.
• Problems:
The sequence length is limited (recall the attention mechanism).
Multiple individual calls to the LLM cause great overhead.

I’m inspired
21 by https://oval.cs.stanford.edu/ to add this illustration.
Bring Human into the Loop
• Returning to the OS analogy
• What’s special with LLMs?
• LLMs can learn from the human-
model interaction and evolve.

This part is now also considered as a part of the


OS in general.
• Make the system more accessible, especially
for non-computer experts.

22
Bring Human into the Loop
Core challenges:
• How can we let human easily provide
feedback?
• Exploiting cheap labor is unethical and
infeasible to collecting domain-specific
feedbacks.
• I think research from the HCI side is
important.
• How can we let the LLM take
feedback?
• Current approach: RLHF
• What’s next? (distinct challenges exist)

Chen et al. “Perspectives on Incorporating


23 Expert Feedback into Model Updates” Arxiv 2022
Distinct Challenges in Learning from Human Feedback

• Human feedback is noisy. The model should decide whether to take


the feedback rather than viewing it as the ground truth.
• out-of-distribution detection -> “out-of-confidence” detection
• In OOD detection, we design algorithm to assign a score to an instance to
indicate how much it belongs to the training distribution, or in other words,
how much the model should be capable of predicting its label.
• I think the LLM should also assign a confidence score to the input question.

24
25

You might also like