Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Wen, Kaiyue; Ma, Tengyu; Li, Zhiyuan

Computer Science > Machine Learning

arXiv:2307.11007v1 (cs)

[Submitted on 20 Jul 2023 (this version), latest version 23 Jul 2023 (v2)]

Title:Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Authors:Kaiyue Wen, Tengyu Ma, Zhiyuan Li

View PDF

Abstract:Despite extensive studies, the underlying reason as to why overparameterized neural networks can generalize remains elusive. Existing theory shows that common stochastic optimizers prefer flatter minimizers of the training loss, and thus a natural potential explanation is that flatness implies generalization. This work critically examines this explanation. Through theoretical and empirical investigation, we identify the following three scenarios for two-layer ReLU networks: (1) flatness provably implies generalization; (2) there exist non-generalizing flattest models and sharpness minimization algorithms fail to generalize, and (3) perhaps most surprisingly, there exist non-generalizing flattest models, but sharpness minimization algorithms still generalize. Our results suggest that the relationship between sharpness and generalization subtly depends on the data distributions and the model architectures and sharpness minimization algorithms do not only minimize sharpness to achieve better generalization. This calls for the search for other explanations for the generalization of over-parameterized neural networks.

Comments:	34 pages,11 figures
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2307.11007 [cs.LG]
	(or arXiv:2307.11007v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2307.11007

Submission history

From: Kaiyue Wen [view email]
[v1] Thu, 20 Jul 2023 16:34:58 UTC (540 KB)
[v2] Sun, 23 Jul 2023 03:59:44 UTC (541 KB)

Computer Science > Machine Learning

Title:Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators