Spatio-Temporal Multi-Task Learning Transformer for Joint Moving Object Detection and Segmentation

Mohamed, Eslam; El-Sallab, Ahmed

Computer Science > Computer Vision and Pattern Recognition

arXiv:2106.11401 (cs)

[Submitted on 21 Jun 2021]

Title:Spatio-Temporal Multi-Task Learning Transformer for Joint Moving Object Detection and Segmentation

Authors:Eslam Mohamed, Ahmed El-Sallab

View PDF

Abstract:Moving objects have special importance for Autonomous Driving tasks. Detecting moving objects can be posed as Moving Object Segmentation, by segmenting the object pixels, or Moving Object Detection, by generating a bounding box for the moving targets. In this paper, we present a Multi-Task Learning architecture, based on Transformers, to jointly perform both tasks through one network. Due to the importance of the motion features to the task, the whole setup is based on a Spatio-Temporal aggregation. We evaluate the performance of the individual tasks architecture versus the MTL setup, both with early shared encoders, and late shared encoder-decoder transformers. For the latter, we present a novel joint tasks query decoder transformer, that enables us to have tasks dedicated heads out of the shared model. To evaluate our approach, we use the KITTI MOD [29] data set. Results show1.5% mAP improvement for Moving Object Detection, and 2%IoU improvement for Moving Object Segmentation, over the individual tasks networks.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2106.11401 [cs.CV]
	(or arXiv:2106.11401v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2106.11401

Submission history

From: Eslam Bakr Mohamed [view email]
[v1] Mon, 21 Jun 2021 20:30:44 UTC (10,051 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-06

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ahmad El Sallab

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Spatio-Temporal Multi-Task Learning Transformer for Joint Moving Object Detection and Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Spatio-Temporal Multi-Task Learning Transformer for Joint Moving Object Detection and Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators