mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Behdin, Kayhan; Song, Qingquan; Gupta, Aman; Keerthi, Sathiya; Acharya, Ayan; Ocejo, Borja; Dexter, Gregory; Khanna, Rajiv; Durfee, David; Mazumder, Rahul

Statistics > Machine Learning

arXiv:2302.09693 (stat)

[Submitted on 19 Feb 2023 (v1), last revised 1 Oct 2023 (this version, v2)]

Title:mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Authors:Kayhan Behdin, Qingquan Song, Aman Gupta, Sathiya Keerthi, Ayan Acharya, Borja Ocejo, Gregory Dexter, Rajiv Khanna, David Durfee, Rahul Mazumder

View PDF

Abstract:Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to substantiate this theoretical advancement. We also show that contrary to previous work, mSAM can be implemented in a flexible and parallelizable manner without significantly increasing computational costs. Our implementation of mSAM yields superior generalization performance across a wide range of tasks compared to SAM, further supporting our theoretical framework.

Comments:	arXiv admin note: substantial text overlap with arXiv:2212.04343
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2302.09693 [stat.ML]
	(or arXiv:2302.09693v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2302.09693

Submission history

From: Kayhan Behdin [view email]
[v1] Sun, 19 Feb 2023 23:27:12 UTC (366 KB)
[v2] Sun, 1 Oct 2023 02:19:50 UTC (41 KB)

Statistics > Machine Learning

Title:mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators