DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

Chen, Aochuan; Zhang, Yimeng; Jia, Jinghan; Diffenderfer, James; Liu, Jiancheng; Parasyris, Konstantinos; Zhang, Yihua; Zhang, Zheng; Kailkhura, Bhavya; Liu, Sijia

Computer Science > Machine Learning

arXiv:2310.02025 (cs)

[Submitted on 3 Oct 2023 (v1), last revised 15 Mar 2024 (this version, v4)]

Title:DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

Authors:Aochuan Chen, Yimeng Zhang, Jinghan Jia, James Diffenderfer, Jiancheng Liu, Konstantinos Parasyris, Yihua Zhang, Zheng Zhang, Bhavya Kailkhura, Sijia Liu

View PDF HTML (experimental)

Abstract:Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems when first-order (FO) information is difficult or impossible to obtain. However, the scalability of ZO optimization remains an open problem: Its use has primarily been limited to relatively small-scale ML problems, such as sample-wise adversarial attack generation. To our best knowledge, no prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance. To overcome this roadblock, we develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch through three primary innovations. First, we demonstrate the advantages of coordinatewise gradient estimation (CGE) over randomized vector-wise gradient estimation in training accuracy and computational efficiency. Second, we propose a sparsityinduced ZO training protocol that extends the model pruning methodology using only finite differences to explore and exploit the sparse DL prior in CGE. Third, we develop the methods of feature reuse and forward parallelization to advance the practical implementations of ZO training. Our extensive experiments show that DeepZero achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching FO training performance for the first time. Furthermore, we show the practical utility of DeepZero in applications of certified adversarial defense and DL-based partial differential equation error correction, achieving 10-20% improvement over SOTA. We believe our results will inspire future research on scalable ZO optimization and contribute to advancing DL with black box. Codes are available at this https URL.

Comments:	Accepted to ICLR'24. Codes are available at this https URL
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2310.02025 [cs.LG]
	(or arXiv:2310.02025v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.02025

Submission history

From: Yimeng Zhang [view email]
[v1] Tue, 3 Oct 2023 13:05:36 UTC (1,906 KB)
[v2] Sun, 5 Nov 2023 04:15:39 UTC (1,905 KB)
[v3] Sun, 4 Feb 2024 00:55:18 UTC (1,561 KB)
[v4] Fri, 15 Mar 2024 15:28:11 UTC (1,561 KB)

Computer Science > Machine Learning

Title:DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators