Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events

Chinchure, Aditya; Ravi, Sahithya; Ng, Raymond; Shwartz, Vered; Li, Boyang; Sigal, Leonid

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.05725 (cs)

[Submitted on 7 Dec 2024 (v1), last revised 7 Apr 2025 (this version, v2)]

Title:Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events

Authors:Aditya Chinchure, Sahithya Ravi, Raymond Ng, Vered Shwartz, Boyang Li, Leonid Sigal

View PDF

Abstract:The commonsense reasoning capabilities of vision-language models (VLMs), especially in abductive reasoning and defeasible reasoning, remain poorly understood. Most benchmarks focus on typical visual scenarios, making it difficult to discern whether model performance stems from keen perception and reasoning skills, or reliance on pure statistical recall. We argue that by focusing on atypical events in videos, clearer insights can be gained on the core capabilities of VLMs. Explaining and understanding such out-of-distribution events requires models to extend beyond basic pattern recognition and regurgitation of their prior knowledge. To this end, we introduce BlackSwanSuite, a benchmark for evaluating VLMs' ability to reason about unexpected events through abductive and defeasible tasks. Our tasks artificially limit the amount of visual information provided to models while questioning them about hidden unexpected events, or provide new visual information that could change an existing hypothesis about the event. We curate a comprehensive benchmark suite comprising over 3,800 MCQ, 4,900 generative and 6,700 yes/no questions, spanning 1,655 videos. After extensively evaluating various state-of-the-art VLMs, including GPT-4o and Gemini 1.5 Pro, as well as open-source VLMs such as LLaVA-Video, we find significant performance gaps of up to 32% from humans on these tasks. Our findings reveal key limitations in current VLMs, emphasizing the need for enhanced model architectures and training strategies. Our data and leaderboard is available at this http URL.

Comments:	CVPR 2025. For data, visit this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.05725 [cs.CV]
	(or arXiv:2412.05725v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.05725

Submission history

From: Aditya Chinchure [view email]
[v1] Sat, 7 Dec 2024 19:19:03 UTC (29,786 KB)
[v2] Mon, 7 Apr 2025 20:26:05 UTC (32,243 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators