0% found this document useful (0 votes)
45 views51 pages

Continual Alignment

The document discusses continual learning (CL) for large language models (LLMs), highlighting the challenges of catastrophic forgetting and the need for continual alignment with evolving human values. It presents strategies for mitigating alignment tax and overgeneralization, emphasizing the importance of integrating new preferences while preserving previously learned knowledge. The document also addresses the complexities of updating LLMs and the necessity for ongoing research in this area due to data limitations.

Uploaded by

xvjtfy947p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views51 pages

Continual Alignment

The document discusses continual learning (CL) for large language models (LLMs), highlighting the challenges of catastrophic forgetting and the need for continual alignment with evolving human values. It presents strategies for mitigating alignment tax and overgeneralization, emphasizing the importance of integrating new preferences while preserving previously learned knowledge. The document also addresses the complexities of updating LLMs and the necessity for ongoing research in this area due to data limitations.

Uploaded by

xvjtfy947p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Continual Learning for Large

Language Models
Trang Vu
https://bit.ly/ajcai24-cl4llm
About me

● Lecturer at Monash University

● K08 HCMUT

Efficienc Multi- Multilingu Machine


y domains ality Translation

2
What is Continual Learning?

3
Ring, Mark B. "CHILD: A first step towards continual learning." Machine Learning 28.1 (1997): 77-104.
Continual Learning (CL)

Image courtesy: https://mila.quebec/en/article/la-maml-look-ahead-meta-learning-for-continual-learning/

4
Why Continual Learning?

Shaheen, Khadija, et al. "Continual learning for real-world autonomous systems: Algorithms, challenges and frameworks." 5
Journal of Intelligent & Robotic Systems 105.1 (2022): 9.
Why Continual Learning

6
Thrun, Sebastian, and Tom M. Mitchell. "Lifelong robot learning." Robotics and autonomous systems 15.1-2 (1995): 25-46.
Why Continual Learning

Liu, Bing, and Sahisnu Mazumder. "Lifelong and continual learning dialogue systems: learning during conversation." 7
Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 17. 2021.
Continual Learning (CL)

● Domain incremental learning


○ All tasks in the task sequence differ in the input distribution but
share the same label set
○ Examples: a sequence of sentiment analysis tasks on product
review: book -> computer -> …
○ Shared label classes: {positive, negative}

Books Kitchen Appliance Computer Smart phones

8
Continual Learning (CL)

● Class incremental learning


○ New classes are added to the incoming task

● Model suffers from catastrophic forgetting


○ A phenomenon of sudden performance drop in previously learned
tasks during learning the current task

9
Continual Learning (CL)

● Task incremental learning


○ A relaxation of class-incremental learning
○ Each task is assigned with a unique id which is then added to its
data samples so that the task-specific parameters can be activated
accordingly

10
Challenge: Catastrophic Forgetting

11
Challenge: Catastrophic Forgetting

12
French, Robert M. "Catastrophic forgetting in connectionist networks." Trends in cognitive sciences 3.4 (1999): 128-135.
Basic Strategies for Continual Learning

13
Bohao, P. E. N. G., et al. "Scalable Language Model with Generalized Continual Learning."ICLR. 2024.
Large Language Models (LLMs)

14
Continual Learning with LLMs

15
What do LLMs know about?

16
But LLMs do Need Update!

17
So… How to Update LLMs?

Wu et al. "Continual learning for large language models: A survey." arXiv preprint arXiv:2402.01364 (2024). 18
Complexity of CL for LLMs

2
3

1 Alignment 2 Finetune aligned 3 Continual alignment


model 19
Alignments of Large Language Models

● Alignment is the method of steering the generative


process to satisfy a specified property, reward or affinity
metric.

Askell et al. 2021. A General Language Assistant as a Laboratory for Alignment. Arxiv 2112.00861 20
Alignments of Large Language Models

● Alignment is the method of steering the generative


process to satisfy a specified property, reward or affinity
metric.

Askell et al. 2021. A General Language Assistant as a Laboratory for Alignment. Arxiv 2112.00861 21
Alignments of Large Language Models

● Alignment is the method of steering the generative


process to satisfy a specified property, reward or affinity
metric.

Askell et al. 2021. A General Language Assistant as a Laboratory for Alignment. Arxiv 2112.00861 22
Reinforcement Learning with Human Feedback

Lambert & Ustalov. Reinforcement Learning with Human Feedback Tutorial. ICML 2023. 23
Alignment Tax

● Alignment-forgetting trade-off:
○ Aligning LLMs with RLHF can lead to forgetting pretrained abilities

● Also referred to as reward hacking, language drift in the


literature

Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023. 24
Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
RLHF is a trade-off

Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023.
25
Elastic Reset

● Periodically reset the online model to an exponentially


moving average (EMA) of itself

Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023.
26
Elastic Reset

Pareto Front of IMDB Sentiment Task with


GPT2

Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023.
27
Heterogeneous Model Averaging (HMA)

● Interpolating between pre and post RLHF model weights

[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
28
Heterogeneous Model Averaging (HMA)

[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
29
Heterogeneous Model Averaging (HMA)

● Interpolating between pre and post RLHF model weights


archives the most strongest alignment-forgetting Pareto
front

[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
30
Model Averaging vs Experience Replay

● Model averaging outperform Experience Replay on 2 out


of 3 datasets

[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
31
Recap: Multiple-stage Training of LLMs

2
3

1 Alignment 2 Finetune aligned 3 Continual alignment


model 32
Fine-tuning Aligned LLMs Compromises Safety

Fine-tuning GPT-3.5 Turbo leads to safety degradation with harmfulness scores


increase across 11 categories after fine-tuning

Qi, Xiangyu, et al. "Fine-tuning aligned language models compromises safety, even when users do not intend to!." ICLR 2024 33
Mitigating Alignment Tax

● Incorporating pretraining data into RLHF finetuning to


minimize performance regression on standard NLP
datasets (Ouyang et al. 2022)

Fine-tuning GPT-3.5 Turbo by mixing different number of safety


samples

Ouyang, Long, et al. "Training language models to follow instructions with human feedback." NeurIPS 2022
Qi, Xiangyu, et al. "Fine-tuning aligned language models compromises safety, even when users do not intend to!." ICLR 2024 34
Recap: Multiple-stage Training of LLMs

2
3

1 Alignment 2 Finetune aligned 3 Continual alignment


model 35
Diverse Nature of Human Preference
● High level ethical principles
○ “Universal Declaration of Human Rights”

● Culturally specific values


○ Enlightenment values in the West
○ Confucian values in East Asia
○ Hindu or Islamic values

● Laws and regulations


○ GDPR in EU

● Social etiquette and best practices in various


human societies and professional settings
● Domain-specific human preferences
○ “Empathy” for health assistants
○ “Helpful” for customer service agents Sorensen et al. 2024 A Roadmap to Pluralistic Alignment.
ICML 2024
36
Human Values and Preferences Evolves

● Societal values, social norms and ethical guidelines


evolves over times
● Preference diversity across different demographic groups
● Individual’s preference changing overtime

Qiu et al. “ProgressGym: Alignment with a Millennium of Moral Progress”. NeurIPS 2024
37
2 Scenarios of Continual Alignment

● Updating value or preference


○ Update LLMs to reflect shifts in societal values
○ Unlearn outdated custom
○ Incorporating new values
○ Similar to model editing and machine unlearning

38
2 Scenarios of Continual Alignment

● Updating value or preference


○ Update LLMs to reflect shifts in societal values
○ Unlearn outdated custom
○ Incorporating new values
○ Similar to model editing and machine unlearning

● Integrate new value


○ Adding new demographic groups or value type
○ Preserve the previous learned values
○ Similar to standard continual learning problem

39
Persona Prompting

Hu and Collier. "Quantifying the Persona Effect in LLM Simulations." ACL 2024
40
Overgeneralization

● Prompting-based approach is efficient, but tends


overgeneralize, i.e. forgetting the preferences on
unrelated targets

41
Stephan et al.. "RLVF: Learning from Verbal Feedback without Overgeneralization." ICML 2024
Control Overgeneralization

● Fine-tuning with DPO on the in-scope data


● Supervised context distillation (SCD) on the out-of-scope
and near-scope dataprompts

42
Stephan et al.. "RLVF: Learning from Verbal Feedback without Overgeneralization." ICML 2024
Control Overgeneralization

43
Stephan et al.. "RLVF: Learning from Verbal Feedback without Overgeneralization." ICML 2024
Continual RLHF Training

● A desired policy should always generate high-reward


results with high probabilities
● Categorize the rollout samples into five types according to
their rewards and generation probabilities

44
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)

● Each rollout type has a weighting strategy for policy


learning (α(x)) and knowledge retention (β(x))

clipped policy knowledge


learning retention
penalty term
45
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)

● Each rollout type has a weighting strategy for policy


learning (α(x)) and knowledge retention (β(x))

46
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)

● CPPO exhibits better training stability

Training process of Task-2. The PPO algorithm is unstable at 7k steps and is


unable to continuously increase the reward score

47
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)

● CPPO exhibits better training stability

Training process of Task-2. The PPO algorithm is unstable at 7k steps and is


unable to continuously increase the reward score

Toy settings with 2 summarization tasks


How does it perform in the Helpful, Honest, Harmless framework
in alignments?
48
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Lacks Continual Alignment Data

● Collection of preference data is expensive

Qiu et al. “ProgressGym: Alignment with a Millennium of Moral Progress”. NeurIPS 2024
49
Summary
2
3

1 2
Catastrophic forgetting of previous learned knowledge (alignment tax)
3
Overgeneralization to the new preferences
3 50
Continual alignment is still under explored due to lack of data
Q&A
https://bit.ly/ajcai24-cl4llm

You might also like