Context Enhanced Transformer for Single Image Object Detection

An, Seungjun; Park, Seonghoon; Kim, Gyeongnyeon; Baek, Jeongyeol; Lee, Byeongwon; Kim, Seungryong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.14492 (cs)

[Submitted on 22 Dec 2023 (v1), last revised 26 Dec 2023 (this version, v2)]

Title:Context Enhanced Transformer for Single Image Object Detection

Authors:Seungjun An, Seonghoon Park, Gyeongnyeon Kim, Jeongyeol Baek, Byeongwon Lee, Seungryong Kim

View PDF HTML (experimental)

Abstract:With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and ImageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: this https URL.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.14492 [cs.CV]
	(or arXiv:2312.14492v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.14492

Submission history

From: Seungjun An [view email]
[v1] Fri, 22 Dec 2023 07:40:43 UTC (5,024 KB)
[v2] Tue, 26 Dec 2023 05:54:22 UTC (10,863 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Context Enhanced Transformer for Single Image Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Context Enhanced Transformer for Single Image Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators