0% found this document useful (1 vote)

277 views38 pages

Large Language Model

This document discusses large language models and their applications. It begins with an introduction to language models and then describes large language models, which can contain hundreds of billions of parameters. Examples of large language models discussed include BERT, GPT-3, and ChatGPT. The document explains techniques used in developing these models like pre-training, fine-tuning, and prompting. It also covers risks of using large language models.

Uploaded by

21020641

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

277 views38 pages

Large Language Model

Uploaded by

21020641

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

UET

Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

VNU-University of Engineering and Technology

Natural Language Processing - INT3406E 20

Large Language Model

Nguyen Van Vinh - UET

Outline

● Introduction to LM
● Large Language Models and applications

UET-FIT 2
Language Modeling (Mô hình ngôn ngữ)?

● What is the probability of “Tôi trình bày ChatGPT tại Trường ĐH Công
Nghệ” ?
● What is the probability of “Công Nghệ học Đại trình bày ChatGPT tại Tôi” ?
● “Tôi trình bày ChatGPT tại Trường ĐH Công nghệ, địa điểm …”) or
P(…/Tôi trình bày ChatGPT tại Trường ĐH Công nghệ, địa điểm) ?
● A model that computes either of these:
W = w1,w2,w3,w4,w5…wn
P(W) or P(wn|w1,w2…wn-1) is called a language model

3
Large Language Model

4
Large Language Model (Hundreds of Billions of
Tokens)

5
6
Large Language Models - yottaFlops of Compute

Source: https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf 7
Why LLMs?

● Double Descent

8
Why LLMs?

● Scaling Law for Neural Language Models

○ Performance depends strongly on scale! We keep getting better performance as
we scale the model, data, and compute up!

9
Why LLMs?

● Generalization
○ We can now use one single model to solve many NLP tasks

10
Why LLMs? Emergence in few-shot prompting
Emergent Abilities
• Some ability of LM is
not present in
smaller models but
is present in larger
models
Emergent Capability - In-Context Learning

12
Emergent Capability - In-Context Learning

13
What is pre-training / fine-tuning?

● “Pre-train” a model on a large dataset for task X, then “fine-tune” it on a

dataset for task Y
● Key idea: X is somewhat related to Y, so a model that can do X will have
some good neural representations for Y as well (transfer learning)
● ImageNet pre-training is huge in computer vision: learning generic visual
features for recognizing objects

Can we find some task X that can be

useful for a wide range of
downstream tasks Y?

14
Pretraining + Prompting Paradigm

15
Prompting Engineering (2020  now)
● Prompts involve instructions and context passed to a language model to
achieve a desired task

Prompt engineering is
the practice of
developing and
optimizing prompts to
efficiently use language
models (LMs) for a
variety of applications

16
Prompt Engineering Techniques
● Many advanced prompting techniques have been designed to
improve performance on complex tasks •
○ Few-shot prompts
○ Chain-of-thought (CoT) prompting
○ Self-Consistency
○ Knowledge Generation Prompting
○ ReAct

17
Temperature and Top-p Sampling in LLMs

● Temperature and Top-p sampling are two essential parameters that can be
tweaked to control the output of LLMs
● Temperature (0-2): This parameter determines the creativity and diversity of the text
generated by LLMs model. A higher temperature value (e.g., 1.5) leads to more
diverse and creative text, while a lower value (e.g., 0.5) results in more focused and
deterministic text.
● Top-p Sampling (0-1): This parameter maintains a balance between diversity and
high-probability words by selecting tokens from the top-p most probable tokens
whose collective probability mass is greater than or equal to a threshold p.

18
Three major forms of pre-training (LLMs)

19
BERT: Bidirectional Encoder Representations from
Transformers

Source: (Devlin et al, 2019): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 20
Masked Language Modeling (MLM)

● Q: Why we can’t do language modeling with bidirectional models?

● Solution: Mask out k% of the input words, and then predict the masked words

21
Next Sentence Prediction (NSP)

22
BERT pre-training

23
RoBERTa
● BERT is still under-trained
● Removed the next sentence prediction pre-training — it adds more noise than
benefits!
● Trained longer with 10x data & bigger batch sizes
● Pre-trained on 1,024 V100 GPUs for one day in 2019

24
(Liu et al., 2019): RoBERTa: A Robustly Optimized BERT Pretraining Approach
Text-to-text models: the best of both worlds (Bard)?
● Encoder-only models (e.g., BERT) enjoy the benefits of bidirectionality but they can’t be
used to generate text
● Decoder-only models (e.g., GPT3, Lamma2) can do generation but they are left-to-right
LMs..
● Text-to-text models combine the best of both worlds!

T5 = Text-to-Text Transfer Transformer

(Raffel et al., 2020): Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 25
How to use these pre-trained models?

26
From GPT to GPT-2 to GPT-3

27
Quiz

● Context size?
● The larger the size context, the more difficult it is?

28
GPT-3: language models are few-shot learners

● GPT-2 → GPT-3: 1.5B → 175B (# of parameters), ~14B → 300B (# of tokens)

29
GPT-3’s in-context learning

30
[2020] GPT-3 to [2022] ChatGPT

What’s new?
● Training on code

● Supervised
instruction tuning

● RLHF =
Reinforcement
learning from
human feedback

Source: Fu, 2022, “How does GPT Obtain its Ability? Tracing Emergent Abilities of Language
Models to their Sources" 31
How was ChatGPT developed?

32
Evaluation of LLMs

33
LLMs newest

● Claude 2.1 (Anthropic)

○ 200K Context Window
○ 2x Decrease in Hallucination Rates

● GPT4 turbo (Open AI)

○ 128K Context Window
Vietnamese
● PhoGPT (VinAI)
● FPT.AI
● VNG (Zalo):
● …

34
ChatGPT application for reading comprehension (ChatPdf)

● Fine-tune the ChatGPT model with training data in specific domain

● Using LLM improvement techniques based on Retrieval Augmented
Generation (RAG)
● Use efficient Prompting to achieve expectation output

35
Large Language models Risks

● LLMs make mistakes

○ (falsehoods, hallucinations)
● LLMs can be misused
○ (misinformation, spam)
● LLMs can cause harms
○ (toxicity, biases, stereotypes)
● LLMs can be attacked
○ (adversarial examples, poisoning, prompt injection)
● LLMs are costly to train and deploy

36
Summary

● Introduction to LLM
● Large Language models (types)

UET-FIT 37
UET
Since 2004

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

VNU-University of Engineering and Technology

Thank you
Email me
[email protected]

Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
39 pages
Day 1
No ratings yet
Day 1
32 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
LLM Basics
No ratings yet
LLM Basics
35 pages
PyTorch Workflow Fundamentals
No ratings yet
PyTorch Workflow Fundamentals
1 page
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
No ratings yet
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
76 pages
GANppt
100% (1)
GANppt
34 pages
Generative AI With Large Language Models AWS & DeepLearning
No ratings yet
Generative AI With Large Language Models AWS & DeepLearning
96 pages
Aios LLM As Os
100% (2)
Aios LLM As Os
35 pages
MM-LLMs Recent Advances in MultiModal Large Language Models
No ratings yet
MM-LLMs Recent Advances in MultiModal Large Language Models
22 pages
TensorFlow Cheatsheet Zero To Mastery V1.01
No ratings yet
TensorFlow Cheatsheet Zero To Mastery V1.01
26 pages
RAG (Generative AI) - A "Rags To Riches" Moment For Artificial Intelligence - by Kanishk Khatter - Medium
No ratings yet
RAG (Generative AI) - A "Rags To Riches" Moment For Artificial Intelligence - by Kanishk Khatter - Medium
12 pages
Writing Code For NLP Research PDF
No ratings yet
Writing Code For NLP Research PDF
254 pages
Knowledge Graphs V Vector Databases and When Not To Use Them!
No ratings yet
Knowledge Graphs V Vector Databases and When Not To Use Them!
3 pages
RAG Notes
No ratings yet
RAG Notes
19 pages
Lecture 26
No ratings yet
Lecture 26
17 pages
Introduction of Neural Network
No ratings yet
Introduction of Neural Network
31 pages
CS485 Ch5 Transformers
No ratings yet
CS485 Ch5 Transformers
50 pages
Little Guide To Building Large Language Models in 2024
100% (1)
Little Guide To Building Large Language Models in 2024
65 pages
Fine-Tuning Legal-BERT - LLMs For Automated Legal Text Classification - by Drewgelbard - Nov, 2024 - Towards AI
No ratings yet
Fine-Tuning Legal-BERT - LLMs For Automated Legal Text Classification - by Drewgelbard - Nov, 2024 - Towards AI
27 pages
Deep Learning Step by Step
No ratings yet
Deep Learning Step by Step
171 pages
GenAI Unit1 3
No ratings yet
GenAI Unit1 3
31 pages
Rakesh Kumar - Data Scientist
No ratings yet
Rakesh Kumar - Data Scientist
3 pages
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
100% (1)
A Novel Adoption of LSTM in Customer Touchpoint Prediction Problems Presentation 1
73 pages
EDA - The Right Way
No ratings yet
EDA - The Right Way
111 pages
A Review On Large Language Models Architectures Ap
No ratings yet
A Review On Large Language Models Architectures Ap
31 pages
What Are Vector Databases
No ratings yet
What Are Vector Databases
5 pages
How To Code A Neural Network With Backpropagation in Python
No ratings yet
How To Code A Neural Network With Backpropagation in Python
133 pages
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
No ratings yet
Building A Streamlit Chatbot With LangChain and Llama 3.1 - Exploring LLMs - 3 - by Abou Zuhayr - Sep, 2024 - GoPenAI
15 pages
GenAI Pinnacle Roadmap
100% (1)
GenAI Pinnacle Roadmap
8 pages
Multi-Document Agentic RAG Using Llama-Index and Mistral - by Plaban Nayak - The AI Forum - May, 2024 - Medium
100% (1)
Multi-Document Agentic RAG Using Llama-Index and Mistral - by Plaban Nayak - The AI Forum - May, 2024 - Medium
24 pages
MLOps
No ratings yet
MLOps
9 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
115 pages
Lab7 LLM Chains
No ratings yet
Lab7 LLM Chains
7 pages
Transformers
No ratings yet
Transformers
21 pages
Deep Learning PIAIC
100% (1)
Deep Learning PIAIC
229 pages
Fine Tuning Techniques For Large Language Models LLMs
No ratings yet
Fine Tuning Techniques For Large Language Models LLMs
15 pages
Deep Learning: - Course Code: - Unit 1
No ratings yet
Deep Learning: - Course Code: - Unit 1
21 pages
07 Spectrum 24
No ratings yet
07 Spectrum 24
5 pages
Lec-All Deep Learning Coursework
100% (2)
Lec-All Deep Learning Coursework
639 pages
Ai Notes
No ratings yet
Ai Notes
2 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
Introduction To Generative AI LLM
100% (1)
Introduction To Generative AI LLM
9 pages
AI Privacy Risks and Mitigations in Large Language Models
No ratings yet
AI Privacy Risks and Mitigations in Large Language Models
102 pages
Deep Learning PPT Full Notes
100% (3)
Deep Learning PPT Full Notes
105 pages
LangChain & RAG
No ratings yet
LangChain & RAG
62 pages
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
No ratings yet
A Practical Guide To Hybrid Natural Language Processing (Combining Neural Models and Knowledge Graph
281 pages
Ebook Deep Learning Objective Type Questions
No ratings yet
Ebook Deep Learning Objective Type Questions
102 pages
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
No ratings yet
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
13 pages
10 Evani Generative AI Champion
No ratings yet
10 Evani Generative AI Champion
39 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
Langchain PDF Reader
100% (1)
Langchain PDF Reader
15 pages
Local LLM Inference and Fine-Tuning
100% (3)
Local LLM Inference and Fine-Tuning
26 pages
Vector Databases
No ratings yet
Vector Databases
35 pages
NEURAL NETWORKS and Deep Learning: Going Deep About Neural Network
No ratings yet
NEURAL NETWORKS and Deep Learning: Going Deep About Neural Network
4 pages
Code, Et Tu - LLM, Transformer, RAG AI - Mastering Large Language Models, Transformer Models, and Retrieval-Augmented Generation (RAG) Technology (2024)
100% (2)
Code, Et Tu - LLM, Transformer, RAG AI - Mastering Large Language Models, Transformer Models, and Retrieval-Augmented Generation (RAG) Technology (2024)
317 pages
Generative AI
No ratings yet
Generative AI
2 pages
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
From Everand
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
Abhishek Vijayvargia
No ratings yet
Video Transcript Summarizer
No ratings yet
Video Transcript Summarizer
11 pages
Project Report
No ratings yet
Project Report
55 pages
Unit 4, 5 AI Notes
No ratings yet
Unit 4, 5 AI Notes
33 pages
2305 File Paper
No ratings yet
2305 File Paper
7 pages
ICDSIS-2024 Conference-Template PDF
No ratings yet
ICDSIS-2024 Conference-Template PDF
8 pages
DECLARATION
No ratings yet
DECLARATION
30 pages
Bert Research Paper
No ratings yet
Bert Research Paper
7 pages
Bert For Token Classification Ner Tutorial
No ratings yet
Bert For Token Classification Ner Tutorial
30 pages
Yang Et. Al (2022) - s41746-022-00742-2
No ratings yet
Yang Et. Al (2022) - s41746-022-00742-2
9 pages
Multi-Aspect Co-Attentional Collaborative Filtering For Extreme
No ratings yet
Multi-Aspect Co-Attentional Collaborative Filtering For Extreme
11 pages
Roformer - Enhanced Transformer With Rotary Position Embedding
No ratings yet
Roformer - Enhanced Transformer With Rotary Position Embedding
14 pages
BERT Vs GPT Models - Differences, Examples
No ratings yet
BERT Vs GPT Models - Differences, Examples
12 pages
On Technical Trading and Social Media Indicators For Cryptocurrency Price
No ratings yet
On Technical Trading and Social Media Indicators For Cryptocurrency Price
15 pages
Haddow 等 - 2022 - Survey of Low-Resource Machine Translation
No ratings yet
Haddow 等 - 2022 - Survey of Low-Resource Machine Translation
60 pages
Interenship Report
No ratings yet
Interenship Report
26 pages
Arxiv - 20201201 - Chen Zhu - Modifying Memories in Transformer Models
No ratings yet
Arxiv - 20201201 - Chen Zhu - Modifying Memories in Transformer Models
21 pages
40 Free Ai Courses List
100% (1)
40 Free Ai Courses List
8 pages
125 Questions GenAI Interview Guide
No ratings yet
125 Questions GenAI Interview Guide
24 pages
Apt Mmf 一种高级持续性威胁行动者归因方法
No ratings yet
Apt Mmf 一种高级持续性威胁行动者归因方法
27 pages
Natural Language Processing For The Legal Domain A Survey of Tasks, Datasets, Models, and Challenges
No ratings yet
Natural Language Processing For The Legal Domain A Survey of Tasks, Datasets, Models, and Challenges
35 pages
Perplexed by Quality: A Perplexity-Based Method For Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data
No ratings yet
Perplexed by Quality: A Perplexity-Based Method For Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data
14 pages
1 PB
No ratings yet
1 PB
9 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
Shedding New Light On The Language of The Dark Web
No ratings yet
Shedding New Light On The Language of The Dark Web
17 pages
NLP Notes-1
No ratings yet
NLP Notes-1
54 pages
ACL - 2021 - Xiang Lisa Li - Prefix-Tuning Optimizing Continuous Prompts For Generation
No ratings yet
ACL - 2021 - Xiang Lisa Li - Prefix-Tuning Optimizing Continuous Prompts For Generation
16 pages
Anomaly Detection in System Log Data Using Lightweight Multi 2024
No ratings yet
Anomaly Detection in System Log Data Using Lightweight Multi 2024
14 pages
Amharic Abstractive Text Summarization
No ratings yet
Amharic Abstractive Text Summarization
6 pages
Finacial News Summary and Sentiment Report
No ratings yet
Finacial News Summary and Sentiment Report
3 pages
RoBERTa-GCN A Novel Approach For Combating Fake News in Bangla Using Advanced Language Processing and Graph Convolutional Networks
No ratings yet
RoBERTa-GCN A Novel Approach For Combating Fake News in Bangla Using Advanced Language Processing and Graph Convolutional Networks
20 pages

Large Language Model

Uploaded by

Large Language Model

Uploaded by

UET

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

Natural Language Processing - INT3406E 20

Large Language Model

Nguyen Van Vinh - UET

● Scaling Law for Neural Language Models

● “Pre-train” a model on a large dataset for task X, then “fine-tune” it on a

Can we find some task X that can be

● Q: Why we can’t do language modeling with bidirectional models?

T5 = Text-to-Text Transfer Transformer

● GPT-2 → GPT-3: 1.5B → 175B (# of parameters), ~14B → 300B (# of tokens)

● Claude 2.1 (Anthropic)

● GPT4 turbo (Open AI)

● Fine-tune the ChatGPT model with training data in specific domain

● LLMs make mistakes

ĐẠI HỌC CÔNG NGHỆ, ĐHQGHN

You might also like