Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

Zhang, Yichi; Pan, Jiayi; Zhou, Yuchen; Pan, Rui; Chai, Joyce

Computer Science > Artificial Intelligence

arXiv:2311.00047 (cs)

[Submitted on 31 Oct 2023]

Title:Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

Authors:Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai

View PDF

Abstract:Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world. However, known as visual illusions, human's perception of reality isn't always faithful to the physical world. This raises a key question: do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality? To investigate this question, we build a dataset containing five types of visual illusions and formulate four tasks to examine visual illusions in state-of-the-art VLMs. Our findings have shown that although the overall alignment is low, larger models are closer to human perception and more susceptible to visual illusions. Our dataset and initial findings will promote a better understanding of visual illusions in humans and machines and provide a stepping stone for future computational models that can better align humans and machines in perceiving and communicating about the shared visual world. The code and data are available at this https URL.

Comments:	Accepted at EMNLP 2023 main conference
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2311.00047 [cs.AI]
	(or arXiv:2311.00047v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2311.00047

Submission history

From: Yichi Zhang [view email]
[v1] Tue, 31 Oct 2023 18:01:11 UTC (4,554 KB)

Computer Science > Artificial Intelligence

Title:Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators