0% found this document useful (0 votes)

47 views51 pages

Continual Alignment

The document discusses continual learning (CL) for large language models (LLMs), highlighting the challenges of catastrophic forgetting and the need for continual alignment with evolving human values. It presents strategies for mitigating alignment tax and overgeneralization, emphasizing the importance of integrating new preferences while preserving previously learned knowledge. The document also addresses the complexities of updating LLMs and the necessity for ongoing research in this area due to data limitations.

Uploaded by

xvjtfy947p

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views51 pages

Continual Alignment

Uploaded by

xvjtfy947p

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 51

Continual Learning for Large

Language Models
Trang Vu
https://bit.ly/ajcai24-cl4llm
About me

● Lecturer at Monash University

● K08 HCMUT

Efficienc Multi- Multilingu Machine

y domains ality Translation

2
What is Continual Learning?

3
Ring, Mark B. "CHILD: A first step towards continual learning." Machine Learning 28.1 (1997): 77-104.
Continual Learning (CL)

Image courtesy: https://mila.quebec/en/article/la-maml-look-ahead-meta-learning-for-continual-learning/

4
Why Continual Learning?

Shaheen, Khadija, et al. "Continual learning for real-world autonomous systems: Algorithms, challenges and frameworks." 5
Journal of Intelligent & Robotic Systems 105.1 (2022): 9.
Why Continual Learning

6
Thrun, Sebastian, and Tom M. Mitchell. "Lifelong robot learning." Robotics and autonomous systems 15.1-2 (1995): 25-46.
Why Continual Learning

Liu, Bing, and Sahisnu Mazumder. "Lifelong and continual learning dialogue systems: learning during conversation." 7
Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 17. 2021.
Continual Learning (CL)

● Domain incremental learning

○ All tasks in the task sequence differ in the input distribution but
share the same label set
○ Examples: a sequence of sentiment analysis tasks on product
review: book -> computer -> …
○ Shared label classes: {positive, negative}

Books Kitchen Appliance Computer Smart phones

8
Continual Learning (CL)

● Class incremental learning

○ New classes are added to the incoming task

● Model suffers from catastrophic forgetting

○ A phenomenon of sudden performance drop in previously learned
tasks during learning the current task

9
Continual Learning (CL)

● Task incremental learning

○ A relaxation of class-incremental learning
○ Each task is assigned with a unique id which is then added to its
data samples so that the task-specific parameters can be activated
accordingly

10
Challenge: Catastrophic Forgetting

11
Challenge: Catastrophic Forgetting

12
French, Robert M. "Catastrophic forgetting in connectionist networks." Trends in cognitive sciences 3.4 (1999): 128-135.
Basic Strategies for Continual Learning

13
Bohao, P. E. N. G., et al. "Scalable Language Model with Generalized Continual Learning."ICLR. 2024.
Large Language Models (LLMs)

14
Continual Learning with LLMs

15
What do LLMs know about?

16
But LLMs do Need Update!

17
So… How to Update LLMs?

Wu et al. "Continual learning for large language models: A survey." arXiv preprint arXiv:2402.01364 (2024). 18
Complexity of CL for LLMs

2
3

1 Alignment 2 Finetune aligned 3 Continual alignment

model 19
Alignments of Large Language Models

● Alignment is the method of steering the generative

process to satisfy a specified property, reward or affinity
metric.

Askell et al. 2021. A General Language Assistant as a Laboratory for Alignment. Arxiv 2112.00861 20
Alignments of Large Language Models

● Alignment is the method of steering the generative

process to satisfy a specified property, reward or affinity
metric.

Askell et al. 2021. A General Language Assistant as a Laboratory for Alignment. Arxiv 2112.00861 21
Alignments of Large Language Models

● Alignment is the method of steering the generative

process to satisfy a specified property, reward or affinity
metric.

Askell et al. 2021. A General Language Assistant as a Laboratory for Alignment. Arxiv 2112.00861 22
Reinforcement Learning with Human Feedback

Lambert & Ustalov. Reinforcement Learning with Human Feedback Tutorial. ICML 2023. 23
Alignment Tax

● Alignment-forgetting trade-off:
○ Aligning LLMs with RLHF can lead to forgetting pretrained abilities

● Also referred to as reward hacking, language drift in the

literature

Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023. 24
Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
RLHF is a trade-off

Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023.
25
Elastic Reset

● Periodically reset the online model to an exponentially

moving average (EMA) of itself

Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023.
26
Elastic Reset

Pareto Front of IMDB Sentiment Task with

GPT2

Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023.
27
Heterogeneous Model Averaging (HMA)

● Interpolating between pre and post RLHF model weights

[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
28
Heterogeneous Model Averaging (HMA)

[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
29
Heterogeneous Model Averaging (HMA)

● Interpolating between pre and post RLHF model weights

archives the most strongest alignment-forgetting Pareto
front

[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
30
Model Averaging vs Experience Replay

● Model averaging outperform Experience Replay on 2 out

of 3 datasets

[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
31
Recap: Multiple-stage Training of LLMs

2
3

1 Alignment 2 Finetune aligned 3 Continual alignment

model 32
Fine-tuning Aligned LLMs Compromises Safety

Fine-tuning GPT-3.5 Turbo leads to safety degradation with harmfulness scores

increase across 11 categories after fine-tuning

Qi, Xiangyu, et al. "Fine-tuning aligned language models compromises safety, even when users do not intend to!." ICLR 2024 33
Mitigating Alignment Tax

● Incorporating pretraining data into RLHF finetuning to

minimize performance regression on standard NLP
datasets (Ouyang et al. 2022)

Fine-tuning GPT-3.5 Turbo by mixing different number of safety

samples

Ouyang, Long, et al. "Training language models to follow instructions with human feedback." NeurIPS 2022
Qi, Xiangyu, et al. "Fine-tuning aligned language models compromises safety, even when users do not intend to!." ICLR 2024 34
Recap: Multiple-stage Training of LLMs

2
3

1 Alignment 2 Finetune aligned 3 Continual alignment

model 35
Diverse Nature of Human Preference
● High level ethical principles
○ “Universal Declaration of Human Rights”

● Culturally specific values

○ Enlightenment values in the West
○ Confucian values in East Asia
○ Hindu or Islamic values

● Laws and regulations

○ GDPR in EU

● Social etiquette and best practices in various

human societies and professional settings
● Domain-specific human preferences
○ “Empathy” for health assistants
○ “Helpful” for customer service agents Sorensen et al. 2024 A Roadmap to Pluralistic Alignment.
ICML 2024
36
Human Values and Preferences Evolves

● Societal values, social norms and ethical guidelines

evolves over times
● Preference diversity across different demographic groups
● Individual’s preference changing overtime

Qiu et al. “ProgressGym: Alignment with a Millennium of Moral Progress”. NeurIPS 2024
37
2 Scenarios of Continual Alignment

● Updating value or preference

○ Update LLMs to reflect shifts in societal values
○ Unlearn outdated custom
○ Incorporating new values
○ Similar to model editing and machine unlearning

38
2 Scenarios of Continual Alignment

● Updating value or preference

○ Update LLMs to reflect shifts in societal values
○ Unlearn outdated custom
○ Incorporating new values
○ Similar to model editing and machine unlearning

● Integrate new value

○ Adding new demographic groups or value type
○ Preserve the previous learned values
○ Similar to standard continual learning problem

39
Persona Prompting

Hu and Collier. "Quantifying the Persona Effect in LLM Simulations." ACL 2024
40
Overgeneralization

● Prompting-based approach is efficient, but tends

overgeneralize, i.e. forgetting the preferences on
unrelated targets

41
Stephan et al.. "RLVF: Learning from Verbal Feedback without Overgeneralization." ICML 2024
Control Overgeneralization

● Fine-tuning with DPO on the in-scope data

● Supervised context distillation (SCD) on the out-of-scope
and near-scope dataprompts

42
Stephan et al.. "RLVF: Learning from Verbal Feedback without Overgeneralization." ICML 2024
Control Overgeneralization

43
Stephan et al.. "RLVF: Learning from Verbal Feedback without Overgeneralization." ICML 2024
Continual RLHF Training

● A desired policy should always generate high-reward

results with high probabilities
● Categorize the rollout samples into five types according to
their rewards and generation probabilities

44
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)

● Each rollout type has a weighting strategy for policy

learning (α(x)) and knowledge retention (β(x))

clipped policy knowledge

learning retention
penalty term
45
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)

● Each rollout type has a weighting strategy for policy

learning (α(x)) and knowledge retention (β(x))

46
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)

● CPPO exhibits better training stability

Training process of Task-2. The PPO algorithm is unstable at 7k steps and is

unable to continuously increase the reward score

47
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)

● CPPO exhibits better training stability

Training process of Task-2. The PPO algorithm is unstable at 7k steps and is

unable to continuously increase the reward score

Toy settings with 2 summarization tasks

How does it perform in the Helpful, Honest, Harmless framework
in alignments?
48
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Lacks Continual Alignment Data

● Collection of preference data is expensive

Qiu et al. “ProgressGym: Alignment with a Millennium of Moral Progress”. NeurIPS 2024
49
Summary
2
3

1 2
Catastrophic forgetting of previous learned knowledge (alignment tax)
3
Overgeneralization to the new preferences
3 50
Continual alignment is still under explored due to lack of data
Q&A
https://bit.ly/ajcai24-cl4llm

Continual Learning of Large Language Models: A Comprehensive Survey
No ratings yet
Continual Learning of Large Language Models: A Comprehensive Survey
57 pages
Continual Learning Proposal
No ratings yet
Continual Learning Proposal
11 pages
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
No ratings yet
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
31 pages
LLM Alignment
No ratings yet
LLM Alignment
171 pages
Training The Application of LLM
No ratings yet
Training The Application of LLM
68 pages
Metaalign: Align Large Language Models With Diverse Preferences During Inference Time
No ratings yet
Metaalign: Align Large Language Models With Diverse Preferences During Inference Time
19 pages
Lab: L - S A C B: Arge Cale Lignment For HAT OTS
No ratings yet
Lab: L - S A C B: Arge Cale Lignment For HAT OTS
10 pages
E4. LLM Instruction Tuning
No ratings yet
E4. LLM Instruction Tuning
45 pages
A General Language Assistant As A Laboratory For Alignment
No ratings yet
A General Language Assistant As A Laboratory For Alignment
48 pages
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
Recent Advances of Foundation Language Models-Based Continual Learning - A Survey
No ratings yet
Recent Advances of Foundation Language Models-Based Continual Learning - A Survey
40 pages
Jason Weston Reasoning Alignment Berkeley Talk
No ratings yet
Jason Weston Reasoning Alignment Berkeley Talk
106 pages
Nemo-Aligner: Scalable Toolkit For Efficient Model Align-Ment
No ratings yet
Nemo-Aligner: Scalable Toolkit For Efficient Model Align-Ment
16 pages
Training Language Models To Follow Instructions With Human Feedback
No ratings yet
Training Language Models To Follow Instructions With Human Feedback
68 pages
The Wisdom of Hindsight Makes Language Models Better Instruction Followers
No ratings yet
The Wisdom of Hindsight Makes Language Models Better Instruction Followers
15 pages
Jason Wei Stanford cs330 Talk
No ratings yet
Jason Wei Stanford cs330 Talk
44 pages
A Comprehensive Survey of LLM Alignment Techniques - RLHF - Rlaif - Ppo - Dpo and More
No ratings yet
A Comprehensive Survey of LLM Alignment Techniques - RLHF - Rlaif - Ppo - Dpo and More
37 pages
ChatGPT, LLM and RLHF
No ratings yet
ChatGPT, LLM and RLHF
45 pages
2024 Emnlp-Main 496
No ratings yet
2024 Emnlp-Main 496
22 pages
Roisinluo Reasoning in LLMs
No ratings yet
Roisinluo Reasoning in LLMs
72 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
154 pages
Recent Advances in Language Modeling (2022-2025)
No ratings yet
Recent Advances in Language Modeling (2022-2025)
5 pages
Perspectives in Business Ethics
No ratings yet
Perspectives in Business Ethics
113 pages
4 Alignment
No ratings yet
4 Alignment
48 pages
Navarrete Parra, O. (2023) Aligning A Medium Size GPT Model in English To A Small Closed Domain in Spanish
No ratings yet
Navarrete Parra, O. (2023) Aligning A Medium Size GPT Model in English To A Small Closed Domain in Spanish
9 pages
Large Language Models: CSC413 Tutorial 9 Yongchao Zhou
100% (1)
Large Language Models: CSC413 Tutorial 9 Yongchao Zhou
40 pages
Lecture 7
No ratings yet
Lecture 7
66 pages
07 Lecture10 Post Training
No ratings yet
07 Lecture10 Post Training
61 pages
Augmenting LLMs Survey
No ratings yet
Augmenting LLMs Survey
33 pages
Reinforcement Learning From Human Feedback (RLHF)
No ratings yet
Reinforcement Learning From Human Feedback (RLHF)
23 pages
Large Language Models (LLM)
100% (1)
Large Language Models (LLM)
139 pages
Lec20 LLM
No ratings yet
Lec20 LLM
58 pages
Training Language Models To Follow Instructions
No ratings yet
Training Language Models To Follow Instructions
15 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
521H0502-521H0498-521h0333 NLP Report
No ratings yet
521H0502-521H0498-521h0333 NLP Report
27 pages
2025 Lecture06 MachineLearning
No ratings yet
2025 Lecture06 MachineLearning
56 pages
Icaps LLM Tut Slides Posted
No ratings yet
Icaps LLM Tut Slides Posted
97 pages
Large Language Models Johns Hopkins University
No ratings yet
Large Language Models Johns Hopkins University
54 pages
19 20-gpt-3 Prompts
No ratings yet
19 20-gpt-3 Prompts
68 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
2024 Acl-Long 523
No ratings yet
2024 Acl-Long 523
25 pages
Foundations of Large Language Models 1738142777
No ratings yet
Foundations of Large Language Models 1738142777
101 pages
Foundations of LLM
No ratings yet
Foundations of LLM
231 pages
Chowdhery Et Al. - 2022 - PaLM Scaling Language Modeling With Pathways
No ratings yet
Chowdhery Et Al. - 2022 - PaLM Scaling Language Modeling With Pathways
83 pages
Cs224n 2025 Lecture10 Instruction Tunining RLHF
No ratings yet
Cs224n 2025 Lecture10 Instruction Tunining RLHF
61 pages
Large Large Models
No ratings yet
Large Large Models
25 pages
L L M C S - I: Arge Anguage Odels AN ELF Mprove
No ratings yet
L L M C S - I: Arge Anguage Odels AN ELF Mprove
19 pages
(Slide) RLHF
No ratings yet
(Slide) RLHF
53 pages
Roy Convolutional Prompting Meets Language Models For Continual Learning CVPR 2024 Paper
No ratings yet
Roy Convolutional Prompting Meets Language Models For Continual Learning CVPR 2024 Paper
11 pages
10.48550 Arxiv.2204.02311
No ratings yet
10.48550 Arxiv.2204.02311
87 pages
Cs224n Text Generation
No ratings yet
Cs224n Text Generation
73 pages
Natural Language Processing With Deep Learning 1 PDF
No ratings yet
Natural Language Processing With Deep Learning 1 PDF
37 pages
Investigating Continual Pretraining in Large
No ratings yet
Investigating Continual Pretraining in Large
25 pages
Aadarsh Seminar PPT
No ratings yet
Aadarsh Seminar PPT
15 pages
Approaches For Neural-Network Language Model Adaptation: August 2017
No ratings yet
Approaches For Neural-Network Language Model Adaptation: August 2017
6 pages
Efficient Prompting Methods For Large Language Models - A Survey
100% (1)
Efficient Prompting Methods For Large Language Models - A Survey
18 pages
SSRN 4504303
No ratings yet
SSRN 4504303
8 pages
LIMA: Less Is More For Alignment: Chunting Zhou Pengfei Liu Puxin Xu Srini Iyer Jiao Sun
No ratings yet
LIMA: Less Is More For Alignment: Chunting Zhou Pengfei Liu Puxin Xu Srini Iyer Jiao Sun
15 pages
Constrained Conditional Model: Fundamentals and Applications
From Everand
Constrained Conditional Model: Fundamentals and Applications
Fouad Sabry
No ratings yet
Learn Machine Learning in 24 Hours: The Ultimate Beginner’s Guide: Master Coding in 24 Hours
From Everand
Learn Machine Learning in 24 Hours: The Ultimate Beginner’s Guide: Master Coding in 24 Hours
Aniket Jain
No ratings yet
Chap2 Inheritance
No ratings yet
Chap2 Inheritance
29 pages
Chap3 Polymorphism
No ratings yet
Chap3 Polymorphism
19 pages
Duy Le Thanh: Developer
No ratings yet
Duy Le Thanh: Developer
5 pages
2018 ZuCo A Simultaneous EEG and Eye Tracking Resource For Natural Sentence Reading
No ratings yet
2018 ZuCo A Simultaneous EEG and Eye Tracking Resource For Natural Sentence Reading
13 pages
Toward A More Ethical Future of Artificial Intelligence and Data Science
No ratings yet
Toward A More Ethical Future of Artificial Intelligence and Data Science
4 pages
Cds PHD Thesis
100% (3)
Cds PHD Thesis
6 pages
Image Segmentation Using Discontinuity-Based Approach: Rajiv Kumar M.Arthanari M.Sivakumar
No ratings yet
Image Segmentation Using Discontinuity-Based Approach: Rajiv Kumar M.Arthanari M.Sivakumar
7 pages
AI and Business Automation
No ratings yet
AI and Business Automation
43 pages
Artificial Neural Network PHD Thesis
100% (2)
Artificial Neural Network PHD Thesis
5 pages
Data Privacy Paper
No ratings yet
Data Privacy Paper
65 pages
Unit-I Chatbot Notes
No ratings yet
Unit-I Chatbot Notes
25 pages
Deep Learning Vs Machine Learning
No ratings yet
Deep Learning Vs Machine Learning
2 pages
AI-Enhanced Detection of Hazardous Materials in Metal Scrap For Safer Industrial Operations
No ratings yet
AI-Enhanced Detection of Hazardous Materials in Metal Scrap For Safer Industrial Operations
10 pages
2111.02735v3 - Speech Emotion Detection
No ratings yet
2111.02735v3 - Speech Emotion Detection
7 pages
Ci Notes
No ratings yet
Ci Notes
216 pages
LBDL A5 Booklet
No ratings yet
LBDL A5 Booklet
82 pages
Hercules Final
No ratings yet
Hercules Final
52 pages
Arti Cial Intelligence in Computer-Aided Auditing Techniques and Technologies (Caatts) and An Application Proposal For Auditors
No ratings yet
Arti Cial Intelligence in Computer-Aided Auditing Techniques and Technologies (Caatts) and An Application Proposal For Auditors
24 pages
Matsya AI - Unmanned Surface Vehicle - 250203 - 154524
No ratings yet
Matsya AI - Unmanned Surface Vehicle - 250203 - 154524
11 pages
AI and Machine Learning Enabled SDN
No ratings yet
AI and Machine Learning Enabled SDN
14 pages
An Overview and Comparison of Free Python Libraries For Data Mining and Big Data Analysis
No ratings yet
An Overview and Comparison of Free Python Libraries For Data Mining and Big Data Analysis
6 pages
Self-Supervised Learning For Tool Wear Monitoring
No ratings yet
Self-Supervised Learning For Tool Wear Monitoring
30 pages
Fin Irjmets1647283669
No ratings yet
Fin Irjmets1647283669
8 pages
Robotics, AI, and Humanity: Science, Ethics, and Policy Joachim Von Braun 2024 Scribd Download
100% (6)
Robotics, AI, and Humanity: Science, Ethics, and Policy Joachim Von Braun 2024 Scribd Download
55 pages
Artificial Intelligence:: A Rival For Humans, or A Partner?
No ratings yet
Artificial Intelligence:: A Rival For Humans, or A Partner?
1 page
Automotive-Report-2020 TecAlliance v2.1.8 PDF
No ratings yet
Automotive-Report-2020 TecAlliance v2.1.8 PDF
64 pages
Risk-Assessment-Using-Machine-Learning-Models Project
No ratings yet
Risk-Assessment-Using-Machine-Learning-Models Project
9 pages
Topics For Thesis in Electronics and Communication
100% (3)
Topics For Thesis in Electronics and Communication
8 pages
Automatic Prediction of Hit Songs (2005)
No ratings yet
Automatic Prediction of Hit Songs (2005)
9 pages
Introduction To Computing - Module 1 - Introduction To The Profession
100% (1)
Introduction To Computing - Module 1 - Introduction To The Profession
58 pages
Cae-I Ai KCS 071 Q.P 2023-24
No ratings yet
Cae-I Ai KCS 071 Q.P 2023-24
1 page
Grass Report VF
No ratings yet
Grass Report VF
19 pages
Shopify Seminar Deck
No ratings yet
Shopify Seminar Deck
31 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages