CLIP model is an Efficient Online Lifelong Learner

Wang, Leyuan; Xiang, Liuyu; Wei, Yujie; Wang, Yunlong; He, Zhaofeng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.15155 (cs)

[Submitted on 24 May 2024]

Title:CLIP model is an Efficient Online Lifelong Learner

Authors:Leyuan Wang, Liuyu Xiang, Yujie Wei, Yunlong Wang, Zhaofeng He

View PDF HTML (experimental)

Abstract:Online Lifelong Learning (OLL) addresses the challenge of learning from continuous and non-stationary data streams. Existing online lifelong learning methods based on image classification models often require preset conditions such as the total number of classes or maximum memory capacity, which hinders the realization of real never-ending learning and renders them impractical for real-world scenarios. In this work, we propose that vision-language models, such as Contrastive Language-Image Pretraining (CLIP), are more suitable candidates for online lifelong learning. We discover that maintaining symmetry between image and text is crucial during Parameter-Efficient Tuning (PET) for CLIP model in online lifelong learning. To this end, we introduce the Symmetric Image-Text (SIT) tuning strategy. We conduct extensive experiments on multiple lifelong learning benchmark datasets and elucidate the effectiveness of SIT through gradient analysis. Additionally, we assess the impact of lifelong learning on generalizability of CLIP and found that tuning the image encoder is beneficial for lifelong learning, while tuning the text encoder aids in zero-shot learning.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.15155 [cs.CV]
	(or arXiv:2405.15155v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.15155

Submission history

From: Leyuan Wang [view email]
[v1] Fri, 24 May 2024 02:21:49 UTC (321 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:CLIP model is an Efficient Online Lifelong Learner

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:CLIP model is an Efficient Online Lifelong Learner

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators