Semantically Guided Representation Learning For Action Anticipation

Diko, Anxhelo; Avola, Danilo; Prenkaj, Bardh; Fontana, Federico; Cinque, Luigi

doi:10.1007/978-3-031-73390-1_26

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.02309 (cs)

[Submitted on 2 Jul 2024]

Title:Semantically Guided Representation Learning For Action Anticipation

Authors:Anxhelo Diko, Danilo Avola, Bardh Prenkaj, Federico Fontana, Luigi Cinque

View PDF HTML (experimental)

Abstract:Action anticipation is the task of forecasting future activity from a partially observed sequence of events. However, this task is exposed to intrinsic future uncertainty and the difficulty of reasoning upon interconnected actions. Unlike previous works that focus on extrapolating better visual and temporal information, we concentrate on learning action representations that are aware of their semantic interconnectivity based on prototypical action patterns and contextual co-occurrences. To this end, we propose the novel Semantically Guided Representation Learning (S-GEAR) framework. S-GEAR learns visual action prototypes and leverages language models to structure their relationship, inducing semanticity. To gather insights on S-GEAR's effectiveness, we test it on four action anticipation benchmarks, obtaining improved results compared to previous works: +3.5, +2.7, and +3.5 absolute points on Top-1 Accuracy on Epic-Kitchen 55, EGTEA Gaze+ and 50 Salads, respectively, and +0.8 on Top-5 Recall on Epic-Kitchens 100. We further observe that S-GEAR effectively transfers the geometric associations between actions from language to visual prototypes. Finally, S-GEAR opens new research frontiers in anticipation tasks by demonstrating the intricate impact of action semantic interconnectivity.

Comments:	Accepted as a full paper at ECCV'24 with Paper ID #4140
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2407.02309 [cs.CV]
	(or arXiv:2407.02309v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.02309
Related DOI:	https://doi.org/10.1007/978-3-031-73390-1_26

Submission history

From: Bardh Prenkaj [view email]
[v1] Tue, 2 Jul 2024 14:44:01 UTC (14,294 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Semantically Guided Representation Learning For Action Anticipation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Semantically Guided Representation Learning For Action Anticipation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators