Unified Vision and Language Prompt Learning

Zang, Yuhang; Li, Wei; Zhou, Kaiyang; Huang, Chen; Loy, Chen Change

Computer Science > Computer Vision and Pattern Recognition

arXiv:2210.07225 (cs)

[Submitted on 13 Oct 2022]

Title:Unified Vision and Language Prompt Learning

Authors:Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy

View PDF

Abstract:Prompt tuning, a parameter- and data-efficient transfer learning paradigm that tunes only a small number of parameters in a model's input space, has become a trend in the vision community since the emergence of large vision-language models like CLIP. We present a systematic study on two representative prompt tuning methods, namely text prompt tuning and visual prompt tuning. A major finding is that none of the unimodal prompt tuning methods performs consistently well: text prompt tuning fails on data with high intra-class visual variances while visual prompt tuning cannot handle low inter-class variances. To combine the best from both worlds, we propose a simple approach called Unified Prompt Tuning (UPT), which essentially learns a tiny neural network to jointly optimize prompts across different modalities. Extensive experiments on over 11 vision datasets show that UPT achieves a better trade-off than the unimodal counterparts on few-shot learning benchmarks, as well as on domain generalization benchmarks. Code and models will be released to facilitate future research.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.07225 [cs.CV]
	(or arXiv:2210.07225v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2210.07225

Submission history

From: Yuhang Zang [view email]
[v1] Thu, 13 Oct 2022 17:50:24 UTC (5,027 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Unified Vision and Language Prompt Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Unified Vision and Language Prompt Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators