Rethinking Pre-training and Self-training

Zoph, Barret; Ghiasi, Golnaz; Lin, Tsung-Yi; Cui, Yin; Liu, Hanxiao; Cubuk, Ekin D.; Le, Quoc V.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2006.06882 (cs)

[Submitted on 11 Jun 2020 (v1), last revised 15 Nov 2020 (this version, v2)]

Title:Rethinking Pre-training and Self-training

Authors:Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

View PDF

Abstract:Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a surprising result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes. In other words, self-training works well exactly on the same setup that pre-training does not work (using ImageNet to help COCO). On the PASCAL segmentation dataset, which is a much smaller dataset than COCO, though pre-training does help significantly, self-training improves upon the pre-trained model. On COCO object detection, we achieve 54.3AP, an improvement of +1.5AP over the strongest SpineNet model. On PASCAL segmentation, we achieve 90.5 mIOU, an improvement of +1.5% mIOU over the previous state-of-the-art result by DeepLabv3+.

Comments:	Accepted for publication at the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.06882 [cs.CV]
	(or arXiv:2006.06882v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2006.06882

Submission history

From: Barret Zoph [view email]
[v1] Thu, 11 Jun 2020 23:59:16 UTC (3,584 KB)
[v2] Sun, 15 Nov 2020 19:41:27 UTC (7,160 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Rethinking Pre-training and Self-training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Rethinking Pre-training and Self-training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators