Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

Sun, Peize; Zhang, Rufeng; Jiang, Yi; Kong, Tao; Xu, Chenfeng; Zhan, Wei; Tomizuka, Masayoshi; Li, Lei; Yuan, Zehuan; Wang, Changhu; Luo, Ping

Computer Science > Computer Vision and Pattern Recognition

arXiv:2011.12450 (cs)

[Submitted on 25 Nov 2020 (v1), last revised 26 Apr 2021 (this version, v2)]

Title:Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

Authors:Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo

View PDF

Abstract:We present Sparse R-CNN, a purely sparse method for object detection in images. Existing works on object detection heavily rely on dense object candidates, such as $k$ anchor boxes pre-defined on all grids of image feature map of size $H\times W$. In our method, however, a fixed sparse set of learned object proposals, total length of $N$, are provided to object recognition head to perform classification and location. By eliminating $HWk$ (up to hundreds of thousands) hand-designed object candidates to $N$ (e.g. 100) learnable proposals, Sparse R-CNN completely avoids all efforts related to object candidates design and many-to-one label assignment. More importantly, final predictions are directly output without non-maximum suppression post-procedure. Sparse R-CNN demonstrates accuracy, run-time and training convergence performance on par with the well-established detector baselines on the challenging COCO dataset, e.g., achieving 45.0 AP in standard $3\times$ training schedule and running at 22 fps using ResNet-50 FPN model. We hope our work could inspire re-thinking the convention of dense prior in object detectors. The code is available at: this https URL.

Comments:	add test-dev; add crowdhuman
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2011.12450 [cs.CV]
	(or arXiv:2011.12450v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2011.12450

Submission history

From: Peize Sun [view email]
[v1] Wed, 25 Nov 2020 00:01:28 UTC (6,741 KB)
[v2] Mon, 26 Apr 2021 14:20:03 UTC (10,253 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators