Deconstructing the Goldilocks Zone of Neural Network Initialization

Vysogorets, Artem; Dawid, Anna; Kempe, Julia

Computer Science > Machine Learning

arXiv:2402.03579 (cs)

[Submitted on 5 Feb 2024 (v1), last revised 5 Jun 2024 (this version, v2)]

Title:Deconstructing the Goldilocks Zone of Neural Network Initialization

Authors:Artem Vysogorets, Anna Dawid, Julia Kempe

View PDF HTML (experimental)

Abstract:The second-order properties of the training loss have a massive impact on the optimization dynamics of deep learning models. Fort & Scherlis (2019) discovered that a large excess of positive curvature and local convexity of the loss Hessian is associated with highly trainable initial points located in a region coined the "Goldilocks zone". Only a handful of subsequent studies touched upon this relationship, so it remains largely unexplained. In this paper, we present a rigorous and comprehensive analysis of the Goldilocks zone for homogeneous neural networks. In particular, we derive the fundamental condition resulting in excess of positive curvature of the loss, explaining and refining its conventionally accepted connection to the initialization norm. Further, we relate the excess of positive curvature to model confidence, low initial loss, and a previously unknown type of vanishing cross-entropy loss gradient. To understand the importance of excessive positive curvature for trainability of deep networks, we optimize fully-connected and convolutional architectures outside the Goldilocks zone and analyze the emergent behaviors. We find that strong model performance is not perfectly aligned with the Goldilocks zone, calling for further research into this relationship.

Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2402.03579 [cs.LG]
	(or arXiv:2402.03579v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.03579
Journal reference:	Proceedings of the 41st International Conference on Machine Learning, PMLR (2024) 235:49717-49732

Submission history

From: Artem Vysogorets [view email]
[v1] Mon, 5 Feb 2024 23:06:48 UTC (11,878 KB)
[v2] Wed, 5 Jun 2024 02:44:31 UTC (10,859 KB)

Computer Science > Machine Learning

Title:Deconstructing the Goldilocks Zone of Neural Network Initialization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Deconstructing the Goldilocks Zone of Neural Network Initialization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators