R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?

Li, Chunyi; Zhang, Jianbo; Zhang, Zicheng; Wu, Haoning; Tian, Yuan; Sun, Wei; Lu, Guo; Liu, Xiaohong; Min, Xiongkuo; Lin, Weisi; Zhai, Guangtao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.05474 (cs)

[Submitted on 7 Oct 2024]

Title:R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?

Authors:Chunyi Li, Jianbo Zhang, Zicheng Zhang, Haoning Wu, Yuan Tian, Wei Sun, Guo Lu, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

View PDF HTML (experimental)

Abstract:The outstanding performance of Large Multimodal Models (LMMs) has made them widely applied in vision-related tasks. However, various corruptions in the real world mean that images will not be as ideal as in simulations, presenting significant challenges for the practical application of LMMs. To address this issue, we introduce R-Bench, a benchmark focused on the **Real-world Robustness of LMMs**. Specifically, we: (a) model the complete link from user capture to LMMs reception, comprising 33 corruption dimensions, including 7 steps according to the corruption sequence, and 7 groups based on low-level attributes; (b) collect reference/distorted image dataset before/after corruption, including 2,970 question-answer pairs with human labeling; (c) propose comprehensive evaluation for absolute/relative robustness and benchmark 20 mainstream LMMs. Results show that while LMMs can correctly handle the original reference images, their performance is not stable when faced with distorted images, and there is a significant gap in robustness compared to the human visual system. We hope that R-Bench will inspire improving the robustness of LMMs, **extending them from experimental simulations to the real-world application**. Check this https URL for details.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Cite as:	arXiv:2410.05474 [cs.CV]
	(or arXiv:2410.05474v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.05474

Submission history

From: Chunyi Li [view email]
[v1] Mon, 7 Oct 2024 20:12:08 UTC (43,009 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators