Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Bordelon, Blake; Noci, Lorenzo; Li, Mufan Bill; Hanin, Boris; Pehlevan, Cengiz

Statistics > Machine Learning

arXiv:2309.16620 (stat)

[Submitted on 28 Sep 2023 (v1), last revised 8 Dec 2023 (this version, v2)]

Title:Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Authors:Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan

View PDF HTML (experimental)

Abstract:The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of $1/\sqrt{\text{depth}}$ in combination with the $\mu$P parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.

Subjects:	Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2309.16620 [stat.ML]
	(or arXiv:2309.16620v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2309.16620

Submission history

From: Blake Bordelon [view email]
[v1] Thu, 28 Sep 2023 17:20:50 UTC (668 KB)
[v2] Fri, 8 Dec 2023 18:19:44 UTC (802 KB)

Statistics > Machine Learning

Title:Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators