Continual Alignment
Continual Alignment
Language Models
Trang Vu
https://bit.ly/ajcai24-cl4llm
About me
● K08 HCMUT
2
What is Continual Learning?
3
Ring, Mark B. "CHILD: A first step towards continual learning." Machine Learning 28.1 (1997): 77-104.
Continual Learning (CL)
4
Why Continual Learning?
Shaheen, Khadija, et al. "Continual learning for real-world autonomous systems: Algorithms, challenges and frameworks." 5
Journal of Intelligent & Robotic Systems 105.1 (2022): 9.
Why Continual Learning
6
Thrun, Sebastian, and Tom M. Mitchell. "Lifelong robot learning." Robotics and autonomous systems 15.1-2 (1995): 25-46.
Why Continual Learning
Liu, Bing, and Sahisnu Mazumder. "Lifelong and continual learning dialogue systems: learning during conversation." 7
Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 17. 2021.
Continual Learning (CL)
8
Continual Learning (CL)
9
Continual Learning (CL)
10
Challenge: Catastrophic Forgetting
11
Challenge: Catastrophic Forgetting
12
French, Robert M. "Catastrophic forgetting in connectionist networks." Trends in cognitive sciences 3.4 (1999): 128-135.
Basic Strategies for Continual Learning
13
Bohao, P. E. N. G., et al. "Scalable Language Model with Generalized Continual Learning."ICLR. 2024.
Large Language Models (LLMs)
14
Continual Learning with LLMs
15
What do LLMs know about?
16
But LLMs do Need Update!
17
So… How to Update LLMs?
Wu et al. "Continual learning for large language models: A survey." arXiv preprint arXiv:2402.01364 (2024). 18
Complexity of CL for LLMs
2
3
Askell et al. 2021. A General Language Assistant as a Laboratory for Alignment. Arxiv 2112.00861 20
Alignments of Large Language Models
Askell et al. 2021. A General Language Assistant as a Laboratory for Alignment. Arxiv 2112.00861 21
Alignments of Large Language Models
Askell et al. 2021. A General Language Assistant as a Laboratory for Alignment. Arxiv 2112.00861 22
Reinforcement Learning with Human Feedback
Lambert & Ustalov. Reinforcement Learning with Human Feedback Tutorial. ICML 2023. 23
Alignment Tax
● Alignment-forgetting trade-off:
○ Aligning LLMs with RLHF can lead to forgetting pretrained abilities
Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023. 24
Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
RLHF is a trade-off
Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023.
25
Elastic Reset
Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023.
26
Elastic Reset
Noukhovitch et al. 2023. Language Model Alignment with Elastic Reset. In NeurIPS 2023.
27
Heterogeneous Model Averaging (HMA)
[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
28
Heterogeneous Model Averaging (HMA)
[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
29
Heterogeneous Model Averaging (HMA)
[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
30
Model Averaging vs Experience Replay
[1] Lin et al. 2024 Mitigating the Alignment Tax of RLHF. In EMNLP 2024
31
Recap: Multiple-stage Training of LLMs
2
3
Qi, Xiangyu, et al. "Fine-tuning aligned language models compromises safety, even when users do not intend to!." ICLR 2024 33
Mitigating Alignment Tax
Ouyang, Long, et al. "Training language models to follow instructions with human feedback." NeurIPS 2022
Qi, Xiangyu, et al. "Fine-tuning aligned language models compromises safety, even when users do not intend to!." ICLR 2024 34
Recap: Multiple-stage Training of LLMs
2
3
Qiu et al. “ProgressGym: Alignment with a Millennium of Moral Progress”. NeurIPS 2024
37
2 Scenarios of Continual Alignment
38
2 Scenarios of Continual Alignment
39
Persona Prompting
Hu and Collier. "Quantifying the Persona Effect in LLM Simulations." ACL 2024
40
Overgeneralization
41
Stephan et al.. "RLVF: Learning from Verbal Feedback without Overgeneralization." ICML 2024
Control Overgeneralization
42
Stephan et al.. "RLVF: Learning from Verbal Feedback without Overgeneralization." ICML 2024
Control Overgeneralization
43
Stephan et al.. "RLVF: Learning from Verbal Feedback without Overgeneralization." ICML 2024
Continual RLHF Training
44
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)
46
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)
47
Zhang et al. ”CPPO: Continual Learning for Reinforcement Learning with Human Feedback” ICLR 2024
Continual Proximal Policy Optimization (CPPO)
Qiu et al. “ProgressGym: Alignment with a Millennium of Moral Progress”. NeurIPS 2024
49
Summary
2
3
1 2
Catastrophic forgetting of previous learned knowledge (alignment tax)
3
Overgeneralization to the new preferences
3 50
Continual alignment is still under explored due to lack of data
Q&A
https://bit.ly/ajcai24-cl4llm