A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Chen, Jianfei; Gai, Yu; Yao, Zhewei; Mahoney, Michael W.; Gonzalez, Joseph E.

Computer Science > Machine Learning

arXiv:2010.14298 (cs)

[Submitted on 27 Oct 2020]

Title:A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Authors:Jianfei Chen, Yu Gai, Zhewei Yao, Michael W. Mahoney, Joseph E. Gonzalez

View PDF

Abstract:Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance. Inspired by these theoretical results, we develop two novel gradient quantizers, and we show that these have smaller variance than the existing per-tensor quantizer. For training ResNet-50 on ImageNet, our 5-bit block Householder quantizer achieves only 0.5% validation accuracy loss relative to QAT, comparable to the existing INT8 baseline.

Comments:	24 pages
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2010.14298 [cs.LG]
	(or arXiv:2010.14298v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2010.14298

Submission history

From: Jianfei Chen [view email]
[v1] Tue, 27 Oct 2020 13:57:33 UTC (3,561 KB)

Computer Science > Machine Learning

Title:A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators