ROME: Testing Image Captioning Systems via Recursive Object Melting

Yu, Boxi; Zhong, Zhiqing; Li, Jiaqi; Yang, Yixing; He, Shilin; He, Pinjia

doi:10.1145/3597926.3598094

Abstract:Image captioning (IC) systems aim to generate a text description of the salient objects in an image. In recent years, IC systems have been increasingly integrated into our daily lives, such as assistance for visually-impaired people and description generation in Microsoft Powerpoint. However, even the cutting-edge IC systems (e.g., Microsoft Azure Cognitive Services) and algorithms (e.g., OFA) could produce erroneous captions, leading to incorrect captioning of important objects, misunderstanding, and threats to personal safety. The existing testing approaches either fail to handle the complex form of IC system output (i.e., sentences in natural language) or generate unnatural images as test cases. To address these problems, we introduce Recursive Object MElting (Rome), a novel metamorphic testing approach for validating IC systems. Different from existing approaches that generate test cases by inserting objects, which easily make the generated images unnatural, Rome melts (i.e., remove and inpaint) objects. Rome assumes that the object set in the caption of an image includes the object set in the caption of a generated image after object melting. Given an image, Rome can recursively remove its objects to generate different pairs of images. We use Rome to test one widely-adopted image captioning API and four state-of-the-art (SOTA) algorithms. The results show that the test cases generated by Rome look much more natural than the SOTA IC testing approach and they achieve comparable naturalness to the original images. Meanwhile, by generating test pairs using 226 seed images, Rome reports a total of 9,121 erroneous issues with high precision (86.47%-92.17%). In addition, we further utilize the test cases generated by Rome to retrain the Oscar, which improves its performance across multiple evaluation metrics.

Comments:	Accepted by ISSTA 2023
Subjects:	Software Engineering (cs.SE)
ACM classes:	K.6.3; I.4.9
Cite as:	arXiv:2306.02228 [cs.SE]
	(or arXiv:2306.02228v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2306.02228
Related DOI:	https://doi.org/10.1145/3597926.3598094

Computer Science > Software Engineering

Title:ROME: Testing Image Captioning Systems via Recursive Object Melting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators