C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Li, Rongchang; Feng, Zhenhua; Xu, Tianyang; Li, Linze; Wu, Xiao-Jun; Awais, Muhammad; Atito, Sara; Kittler, Josef

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.06113 (cs)

[Submitted on 8 Jul 2024 (v1), last revised 19 Jul 2024 (this version, v2)]

Title:C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Authors:Rongchang Li, Zhenhua Feng, Tianyang Xu, Linze Li, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler

View PDF HTML (experimental)

Abstract:Compositional actions consist of dynamic (verbs) and static (objects) concepts. Humans can easily recognize unseen compositions using the learned concepts. For machines, solving such a problem requires a model to recognize unseen actions composed of previously observed verbs and objects, thus requiring so-called compositional generalization ability. To facilitate this research, we propose a novel Zero-Shot Compositional Action Recognition (ZS-CAR) task. For evaluating the task, we construct a new benchmark, Something-composition (Sth-com), based on the widely used Something-Something V2 dataset. We also propose a novel Component-to-Composition (C2C) learning method to solve the new ZS-CAR task. C2C includes an independent component learning module and a composition inference module. Last, we devise an enhanced training strategy to address the challenges of component variations between seen and unseen compositions and to handle the subtle balance between learning seen and unseen actions. The experimental results demonstrate that the proposed framework significantly surpasses the existing compositional generalization methods and sets a new state-of-the-art. The new Sth-com benchmark and code are available at this https URL.

Comments:	Accepted by ECCV2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.06113 [cs.CV]
	(or arXiv:2407.06113v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.06113

Submission history

From: Rongchang Li [view email]
[v1] Mon, 8 Jul 2024 16:49:01 UTC (1,630 KB)
[v2] Fri, 19 Jul 2024 04:20:32 UTC (1,616 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators