0% found this document useful (0 votes)

7 views25 pages

Large Large Models

The document discusses the architecture and functioning of Large Language Models (LLMs), particularly focusing on the Transformer backbone and self-attention mechanisms. It highlights the importance of human feedback in training LLMs and the challenges associated with integrating human input effectively. Additionally, it explores the concept of in-context learning and emergent abilities in larger models, emphasizing the need for continual training and system-level thinking in LLM development.

Uploaded by

md.nayim howlader

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views25 pages

Large Large Models

Uploaded by

md.nayim howlader

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

https://www.linkedin.

com/in/thota-adinarayana/

1
Introduction to Large Language Models

• Latest LLMs adopt the Transformer backbone.

• Core component: self-attention mechanism

• Put tokens into their context!

2 https://www.linkedin.com/in/thota-adinarayana/
Introduction to Large Language Models
• Level of linguistic knowledge

Not applicable
Not applicable
Great. LLMs are robust to typos, coinage, cacography

Great. Dependency parsing and coreference resolution are

almost solved. Syntactic information is captured in attention
(Clark et al., 2019).
Great. Most of current NLP benchmarks focus on this part.

It depends! LLMs still get confused when they meet

unique contexts or special users (e.g., those in
underrepresented groups)
How do LLMs acquire the knowledge of language?
• Unsupervised pre-training on very large corpus
• There are many pre-training methods, here we focus on the one used by the
GPT family.
• Language modeling: predict the next word

4 The gif is copied from The illustrated GPT-2

How do LLMs acquire the knowledge of language?

These examples are copied from Stanford CS224N/Ling284 slides (author: John Hewitt).
They are actually examples for masked language modeling which is a bit different from how GPT is pre-trained. 25
From GPT-3 to ChatGPT:
Learn human intents behind their language

Information behind this sentence:

People usually use imperative sentence
to make a request. The listener is
expected to complete that request.

OpenAI
6 “Aligning language models to follow instructions” 2022
Follow Instructions & Align with Human Preference

Ouyang et al. “Training language models

7 to follow instructions with human feedback” NIPS 2022
Follow Instructions & Align with Human Preference

Human-in-
the-loop!
(Discuss more
later)

Ouyang et al. “Training language models

8 to follow instructions with human feedback” NIPS 2022
Introduction to Large Language Models
GPT-4?

Hugging
9 face “Large Language Models: A New Moore's Law?” 2021
Introduction to Large Language Models
• In-context learning
• No parameter update
• Wrap “training” samples in the prompt

The gif is copied

10 from https://ai.stanford.edu/blog/understanding-incontext/
Introduction to Large Language Models
• Open question:
• Why does in-context learning work?
• There are some hypotheses but no conclusion yet
• Xie et al. “An Explanation of In-context Learning as Implicit Bayesian
Inference” ICLR 2022
• Akyürek et al. “What learning algorithm is in-context learning? Investigations
with linear models” ICLR 2023
• Oswald et al. “Transformers learn in-context by gradient descent” Arxiv 2022

11
Introduction to Large Language Models
• Emergence abilities
• An ability is emergent if it is not present in smaller models but is present in
larger models.
• In-context learning ability is one of them.
• Scaling to improve unlock abilities.

Emergence in few-shot prompting

This gif is copied from Jason Wei’s slides.

Wei et al. “Emergent Abilities of Large Language Models” TMLR 2022 35

The following contents are my
own opinion, very subjective!

View LLMs from a system perspective

• Analogy: operating system (OS)
• Knowing a set of algorithms is not enough to build a good OS.
• Knowing a training algorithm/recipe is not enough to build a good LLM.

• Model patching & continual training of LLM are important.

• We shouldn’t always build a new LLM from scratch.
• I think this may be one reason for OpenAI’s success – they build LLMs as
building a system (maintenance, version control, incremental update)

37
Yao Fu “How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources”
Unix Family Tree

Ecosystem Graphs
Put LLMs into a Larger System
• Analogy: operating system (OS)
• How do we interact with OS?
• How do we interact with LLMs?

16
Put LLMs into a Larger System
• Analogy: operating system (OS)
• How do we interact with OS?
• How do we interact with LLMs?

This part is now also considered as a part of the

OS in general.
• Make the system more accessible, especially
for non-computer experts.

17
Put LLMs into a Larger System

The user briefly describes

his/her goal. AutoGPT breaks
the goal into detailed steps and
refine its own plan.

AutoGPT demo 42
Put LLMs into a Larger System
LLM functions as a controller
and can use tool on its own.

19 https://openai.com/blog/chatgpt-plugins
LLM as a Controller

I’m inspired
20 by https://oval.cs.stanford.edu/ to add this illustration.
LLM as a Controller: Challenges
• How to design the interaction interface between LLMs and other
components (e.g., external databases, API schemas)?
• Desiderata:
robustness, unambiguity, privacy-protecting, easy-to-build for non-AI developers

• How to maintain the state of LLM?

• Naïve solution: Cramming all the previous contexts into the prompt.
• Problems:
The sequence length is limited (recall the attention mechanism).
Multiple individual calls to the LLM cause great overhead.

I’m inspired
21 by https://oval.cs.stanford.edu/ to add this illustration.
Bring Human into the Loop
• Returning to the OS analogy
• What’s special with LLMs?
• LLMs can learn from the human-
model interaction and evolve.

This part is now also considered as a part of the

OS in general.
• Make the system more accessible, especially
for non-computer experts.

22
Bring Human into the Loop
Core challenges:
• How can we let human easily provide
feedback?
• Exploiting cheap labor is unethical and
infeasible to collecting domain-specific
feedbacks.
• I think research from the HCI side is
important.
• How can we let the LLM take
feedback?
• Current approach: RLHF
• What’s next? (distinct challenges exist)

Chen et al. “Perspectives on Incorporating

23 Expert Feedback into Model Updates” Arxiv 2022
Distinct Challenges in Learning from Human Feedback

• Human feedback is noisy. The model should decide whether to take

the feedback rather than viewing it as the ground truth.
• out-of-distribution detection -> “out-of-confidence” detection
• In OOD detection, we design algorithm to assign a score to an instance to
indicate how much it belongs to the training distribution, or in other words,
how much the model should be capable of predicting its label.
• I think the LLM should also assign a confidence score to the input question.

24
25

Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
No ratings yet
Planet, Code - PYTHON For LARGE LANGUAGE MODELS - A Beginners Handbook For Leveraging Llms Into Modern Development Workflows and Applications (2025)
254 pages
Jason Weston Reasoning Alignment Berkeley Talk
No ratings yet
Jason Weston Reasoning Alignment Berkeley Talk
106 pages
Large Language Models: CSC413 Tutorial 9 Yongchao Zhou
No ratings yet
Large Language Models: CSC413 Tutorial 9 Yongchao Zhou
40 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
No ratings yet
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
279 pages
Large Language Models (LLM)
No ratings yet
Large Language Models (LLM)
139 pages
Advanced Prompt Engineering
No ratings yet
Advanced Prompt Engineering
27 pages
Fine Tuning Techniques For Large Language Models LLMs
No ratings yet
Fine Tuning Techniques For Large Language Models LLMs
15 pages
D 02 Large Language Models
100% (1)
D 02 Large Language Models
58 pages
Training Large Language Models
No ratings yet
Training Large Language Models
7 pages
LLM Models
No ratings yet
LLM Models
23 pages
Day 1
No ratings yet
Day 1
32 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
Attention Is All You Need.
No ratings yet
Attention Is All You Need.
5 pages
Large Language Models
100% (1)
Large Language Models
23 pages
Generative Ai Terminology
67% (3)
Generative Ai Terminology
26 pages
Icaps LLM Tut Slides Posted
No ratings yet
Icaps LLM Tut Slides Posted
97 pages
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
No ratings yet
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
27 pages
Roisinluo Reasoning in LLMs
No ratings yet
Roisinluo Reasoning in LLMs
72 pages
NeurIPS 2023 Hugginggpt Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face Paper Conference
No ratings yet
NeurIPS 2023 Hugginggpt Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face Paper Conference
27 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
Overcoming The Limitations of Large Language Models - by Janna Lipenkova - Towards Data Science
No ratings yet
Overcoming The Limitations of Large Language Models - by Janna Lipenkova - Towards Data Science
20 pages
Paniit Demystifying Llms
No ratings yet
Paniit Demystifying Llms
66 pages
Talking About Large Language Models
No ratings yet
Talking About Large Language Models
12 pages
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
No ratings yet
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
25 pages
Toc 9780138199302
No ratings yet
Toc 9780138199302
8 pages
HuggingGPT: Solving AI Tasks With ChatGPT and Its Friends in HuggingFace
100% (1)
HuggingGPT: Solving AI Tasks With ChatGPT and Its Friends in HuggingFace
18 pages
01 Introduction
No ratings yet
01 Introduction
60 pages
Know Thy Frenemy
No ratings yet
Know Thy Frenemy
40 pages
SSRN 4504303
No ratings yet
SSRN 4504303
8 pages
How To Write Effective Prompts For Large Language Models: Comment
No ratings yet
How To Write Effective Prompts For Large Language Models: Comment
5 pages
LLMs
No ratings yet
LLMs
40 pages
Seminar
No ratings yet
Seminar
14 pages
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
Large Language Models
No ratings yet
Large Language Models
27 pages
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
No ratings yet
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
12 pages
03 NLP Document
No ratings yet
03 NLP Document
38 pages
LLMand Logicor Mimick
No ratings yet
LLMand Logicor Mimick
11 pages
Large Language Model
0% (1)
Large Language Model
38 pages
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
No ratings yet
Hugginggpt: Solving Ai Tasks With Chatgpt and Its Friends in Hugging Face
18 pages
Can Large Language Models Reason and Plan?
No ratings yet
Can Large Language Models Reason and Plan?
5 pages
Day 2 Module 2 - Understanding LLMs
No ratings yet
Day 2 Module 2 - Understanding LLMs
14 pages
State of AI - by Eduardo Mace - ScalePV 2023
No ratings yet
State of AI - by Eduardo Mace - ScalePV 2023
36 pages
Large Language Models Johns Hopkins University
No ratings yet
Large Language Models Johns Hopkins University
54 pages
《A Primer on Large Language Models and their Limitations
No ratings yet
《A Primer on Large Language Models and their Limitations
33 pages
Exploring The Frontiers of LLMs in Psychological Applications
No ratings yet
Exploring The Frontiers of LLMs in Psychological Applications
34 pages
Impact Robotic
No ratings yet
Impact Robotic
21 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
Generative AI
No ratings yet
Generative AI
28 pages
Generative Artificial Intelligence - Opportunities and Challenges of Large Language Models - SpringerLink
No ratings yet
Generative Artificial Intelligence - Opportunities and Challenges of Large Language Models - SpringerLink
8 pages
LLM 1 GPT
No ratings yet
LLM 1 GPT
12 pages
Teaching LLMs To Think and Act - ReAct Prompt Engineering - by Bryan McKenney - Medium
No ratings yet
Teaching LLMs To Think and Act - ReAct Prompt Engineering - by Bryan McKenney - Medium
15 pages
LLM Presentation
No ratings yet
LLM Presentation
10 pages
Large Language Models A Comprehensive Survey of It
No ratings yet
Large Language Models A Comprehensive Survey of It
30 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
LLM Model
No ratings yet
LLM Model
3 pages
1st Note
No ratings yet
1st Note
3 pages
Role Clarity
No ratings yet
Role Clarity
194 pages
2 Notes
No ratings yet
2 Notes
3 pages
Large Language Models
No ratings yet
Large Language Models
3 pages
AI Guide
100% (1)
AI Guide
14 pages
What Makes Strategic Planning Successful?
100% (1)
What Makes Strategic Planning Successful?
2 pages
Long Quiz
100% (1)
Long Quiz
21 pages
Madule 2, Baloro
100% (7)
Madule 2, Baloro
8 pages
306-MKT CB MCQ-min
No ratings yet
306-MKT CB MCQ-min
63 pages
Homeroom Guidance Week 1 PDF
0% (1)
Homeroom Guidance Week 1 PDF
8 pages
Industrial Engineering and Management by Ravi V PDF
No ratings yet
Industrial Engineering and Management by Ravi V PDF
2 pages
Social Functioning: A Sociological Common Base For Social Work Practice
No ratings yet
Social Functioning: A Sociological Common Base For Social Work Practice
11 pages
Educ 104 Notes
No ratings yet
Educ 104 Notes
10 pages
Fba Portfolio
No ratings yet
Fba Portfolio
17 pages
Personal Profile
No ratings yet
Personal Profile
3 pages
Stress Management - Set 3
No ratings yet
Stress Management - Set 3
6 pages
Unit Ii TFN
No ratings yet
Unit Ii TFN
14 pages
G6U2L5 Lesson Plan
No ratings yet
G6U2L5 Lesson Plan
3 pages
Activity 4
No ratings yet
Activity 4
2 pages
Teachers Role in Developing Students Soft Skills
No ratings yet
Teachers Role in Developing Students Soft Skills
13 pages
UNIT 6 - Review
No ratings yet
UNIT 6 - Review
3 pages
The Leadership Journey
No ratings yet
The Leadership Journey
4 pages
Q1 - English Week 3 Day 3
No ratings yet
Q1 - English Week 3 Day 3
3 pages
Canada FLANAGAN Handout 12
No ratings yet
Canada FLANAGAN Handout 12
35 pages
The Effect of Dictation Exercises On The Spelling Proficiency of Waldorf Elementary Learners
No ratings yet
The Effect of Dictation Exercises On The Spelling Proficiency of Waldorf Elementary Learners
3 pages
AdvancED Lesson Obs Form
No ratings yet
AdvancED Lesson Obs Form
2 pages
1 s2.0 S266682702400001X Main
No ratings yet
1 s2.0 S266682702400001X Main
8 pages
The-Main-Idea-Embedded-Formative-Assessment-March-2013 Summary of CHP
No ratings yet
The-Main-Idea-Embedded-Formative-Assessment-March-2013 Summary of CHP
14 pages
History (Foundation) : Personal and Family Histories
No ratings yet
History (Foundation) : Personal and Family Histories
4 pages
Developing Level of Interest of Grade Learners of City Central Elementary School in Relation To Problem Solving Activities in Mathematics
No ratings yet
Developing Level of Interest of Grade Learners of City Central Elementary School in Relation To Problem Solving Activities in Mathematics
2 pages
Career Guidance, Participation of Students and Its Implication For Kano, Nigeria
No ratings yet
Career Guidance, Participation of Students and Its Implication For Kano, Nigeria
6 pages
Korth PDF
No ratings yet
Korth PDF
16 pages
Syllabus
No ratings yet
Syllabus
4 pages
Constrained Conditional Model: Fundamentals and Applications
From Everand
Constrained Conditional Model: Fundamentals and Applications
Fouad Sabry
No ratings yet