Project Proposal

Grooking

Uploaded by

tajmaryam2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views4 pages

Project Proposal

Grooking

Uploaded by

tajmaryam2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Optimizing Grokking Efficiency:

Structured Learning Dynamics and

Regularization Strategies in Neural
Networks
Maryam Taj
2022907
November, 8th 2024

I. Introduction

Background
The term "grokking" was coined in recent studies to describe neural networks’
emergent generalization after prolonged training, particularly in tasks like modular
addition or sparse parity (Power et al., 2022; Merrill et al., 2023). Initial phases of
grokking involve memorization, where models achieve high training accuracy but fail
to generalize. Eventually, after extended training, models transition to a state of
high test accuracy, indicating a deep understanding of the task. This transition,
though powerful, typically demands extensive training time, raising computational
and efficiency challenges (Nanda et al., 2023).
1

Problem Statement
While grokking offers powerful generalization capabilities, its high computational cost and
lengthy training periods make it inefficient. This project addresses the challenge of
achieving grokking more efficiently by investigating structured learning dynamics and
targeted regularization strategies to encourage faster generalization.

Objectives

I. Research Question 1
What role do structured learning dynamics play in achieving efficient grokking?

II. Research Question 2

How can regularization and curriculum learning optimize learning stages for grokking?

III. Research Question 3

Can targeted architectural modifications facilitate early generalization in neural networks?

II. Literature Review

1. Mechanistic Interpretability in Grokking: Nanda et al. (2023) reverse-engineered grokking
in transformers trained on modular arithmetic. They showed that neural networks
transition through phases—memorization, circuit formation, and cleanup—before
achieving grokking. These findings provide a basis for identifying learning dynamics that
could accelerate grokking.

2. Sparse Network Transitions: Merrill et al. (2023) identified grokking as a phase transition
driven by the emergence of sparse subnetworks, which dominate model predictions
post-grokking. This study highlighted the role of network sparsification, suggesting targeted
pruning as a potential method to improve efficiency.

3. Role of Regularization and Weight Decay: Thilak et al. (2022) found that weight decay
facilitates grokking by encouraging models to prioritize generalizing components over
2

memorized features, supporting the hypothesis that controlled regularization could

accelerate the grokking process.

4. Curriculum Learning for Enhanced Generalization: Curriculum learning, which involves

gradually increasing task difficulty, has been shown to improve generalization by allowing
models to build on foundational patterns. This study will evaluate its application to
achieving grokking with fewer resources (Barak et al., 2022).

5. Learning Rate Schedules and Training Stability: Ganguli et al. (2022) examined dynamic
learning rates, such as cyclical or cosine annealing, and found that they could improve
model stability during emergent behavior transitions. This could reduce the training time
required for grokking.

6. Competitive Subnetworks and Phase Transitions: Studies by Engel & Van den Broeck
(2001) on emergent behaviors in neural networks reveal that competing network
components influence generalization. Applying this to grokking could provide insights into
encouraging sparse, generalizable structures in fewer steps.

7. Fourier-Based Interpretability in Neural Networks: By analyzing the Fourier components

of network activations, Liu et al. (2023) identified structured mechanisms that arise in
neural networks, suggesting that such interpretability methods could support targeted
pruning and efficient grokking.

Research Gaps

The reviewed literature points to several strategies for encouraging generalization but lacks
a cohesive framework that combines these approaches to accelerate grokking. This project
will address this gap by synthesizing structured training methods, such as curriculum
learning and targeted sparsification, with regularization strategies to achieve efficient
grokking.

III. Methodology

Approach
This research will investigate the dynamics of grokking across various neural network
architectures, analyzing stages within the training process that correspond to
memorization, circuit formation, and cleanup. Using interpretability techniques, the study
will develop metrics to monitor progress within these phases.
3

IV. Expected results

This research expects to identify efficient pathways for achieving grokking by using
curriculum learning, regularization, and targeted pruning to encourage early formation of
sparse, generalizable structures. These findings could lead to practical guidelines for
enhancing neural network generalization with minimal resource investment, potentially
benefiting tasks requiring rapid, reliable learning.

References
1. Nanda, N., et al. "Progress measures for grokking via mechanistic interpretability."
ICLR 2023.
2. Merrill, W., et al. "A tale of two circuits: Grokking as competition of sparse and dense
subnetworks." ICLR Workshop on Understanding Foundation Models, 2023.
3. Thilak, V., et al. "The slingshot mechanism: An empirical study of adaptive optimizers
and the Grokking Phenomenon." NeurIPS 2022 Workshop.
4. Power, A., et al. "Grokking: Generalization beyond overfitting on small algorithmic
datasets." arXiv preprint arXiv:2201.02177, 2022.
5. Barak, B., et al. "Hidden progress in deep learning: SGD learns parities near the
computational limit." Advances in Neural Information Processing Systems, 2022.
6. Engel, A., & Van den Broeck, C. Statistical Mechanics of Learning. Cambridge
University Press, 2001.
7. Liu, Z., et al. "Omnigrok: Grokking beyond algorithmic data." International Conference
on Learning Representations, 2023.

Deep Learning in Solving Mathematical Equations
No ratings yet
Deep Learning in Solving Mathematical Equations
14 pages
Tutorials
No ratings yet
Tutorials
17 pages
Rogress Measures For Grokking Via Mechanistic Interpretability
No ratings yet
Rogress Measures For Grokking Via Mechanistic Interpretability
35 pages
2 Deep Grokking Would Deep Neural Networks Generalize Better
No ratings yet
2 Deep Grokking Would Deep Neural Networks Generalize Better
8 pages
Progress Measures For Grokking Via Mechanistic Interpretability2301.05217
No ratings yet
Progress Measures For Grokking Via Mechanistic Interpretability2301.05217
36 pages
2024 IT Progress Measure
No ratings yet
2024 IT Progress Measure
8 pages
2022 - Towards Understanding Grokking - An Effective Theory of Representation Learning
No ratings yet
2022 - Towards Understanding Grokking - An Effective Theory of Representation Learning
29 pages
MDL Grokking Correlation
No ratings yet
MDL Grokking Correlation
10 pages
4 Unified View of Grokking, Double Descent and Emergent Abilities
No ratings yet
4 Unified View of Grokking, Double Descent and Emergent Abilities
18 pages
Neural Execution Engines: Learning To Execute Subroutines: Work Completed During An Internship at Google
No ratings yet
Neural Execution Engines: Learning To Execute Subroutines: Work Completed During An Internship at Google
21 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Generalization
No ratings yet
Generalization
10 pages
Generalization
No ratings yet
Generalization
10 pages
Grokking - Generalization Beyond Overfitting On Small Algorithmic Datasets - Harrison Edwards Alethea Power Yuri Burda Igor Babuschkin Vedant Misra
No ratings yet
Grokking - Generalization Beyond Overfitting On Small Algorithmic Datasets - Harrison Edwards Alethea Power Yuri Burda Igor Babuschkin Vedant Misra
9 pages
NeurIPS 2021 Scalable Rule Based Representation Learning For Interpretable Classification Paper
No ratings yet
NeurIPS 2021 Scalable Rule Based Representation Learning For Interpretable Classification Paper
13 pages
Neural Network Fundamentals With Graphs
No ratings yet
Neural Network Fundamentals With Graphs
6 pages
2031 Learning To Grow Pretrained Mo
No ratings yet
2031 Learning To Grow Pretrained Mo
18 pages
A Survey of Graph Prompting Methods
No ratings yet
A Survey of Graph Prompting Methods
11 pages
Curriculum Learning: A Survey: Petru Soviany Radu Tudor Ionescu Paolo Rota Nicu Sebe
No ratings yet
Curriculum Learning: A Survey: Petru Soviany Radu Tudor Ionescu Paolo Rota Nicu Sebe
40 pages
Principles of Mesh Networks and Mesh Generation: Definitive Reference for Developers and Engineers
From Everand
Principles of Mesh Networks and Mesh Generation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
theseGNN XAI
No ratings yet
theseGNN XAI
4 pages
MLUnit5 Ans Long QB
No ratings yet
MLUnit5 Ans Long QB
9 pages
03 Ai Training Methodologies
No ratings yet
03 Ai Training Methodologies
1 page
Learning Interpretable Rules For Scalable Data Representation and Classification
No ratings yet
Learning Interpretable Rules For Scalable Data Representation and Classification
17 pages
Learn2learn A Library For Meta-Learning Research
No ratings yet
Learn2learn A Library For Meta-Learning Research
10 pages
Neural Net Research
No ratings yet
Neural Net Research
10 pages
PyTorch Essentials: A Comprehensive Guide to Machine Learning Techniques
From Everand
PyTorch Essentials: A Comprehensive Guide to Machine Learning Techniques
Adam Jones
No ratings yet
Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets
No ratings yet
Grokking: Generalization Beyond Overfitting On Small Algorithmic Datasets
10 pages
Master Thesis Lipovsky
No ratings yet
Master Thesis Lipovsky
76 pages
Questions For CSE 7th Sem
No ratings yet
Questions For CSE 7th Sem
14 pages
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Thesis 2021 Optimizaiton GNN Stathas Nistath Meng Eecs 2021 Thesis
No ratings yet
Thesis 2021 Optimizaiton GNN Stathas Nistath Meng Eecs 2021 Thesis
79 pages
ConCur Self-Supervised Graph Representation Based On Contrastive Learning With Curriculum Negative Sampling
No ratings yet
ConCur Self-Supervised Graph Representation Based On Contrastive Learning With Curriculum Negative Sampling
13 pages
How To Transfer Algorithmic Reasoning Knowledge To Learn New Algorithms?
No ratings yet
How To Transfer Algorithmic Reasoning Knowledge To Learn New Algorithms?
21 pages
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
58 pages
Network Morphism
No ratings yet
Network Morphism
11 pages
WWW - Explainable Neural Rule Learning
No ratings yet
WWW - Explainable Neural Rule Learning
11 pages
Network Morphism
No ratings yet
Network Morphism
9 pages
Constructive Neural-Network Learning Algorithms For Pattern Classification
No ratings yet
Constructive Neural-Network Learning Algorithms For Pattern Classification
16 pages
ICML 2018 Notes: Stockholm, Sweden
No ratings yet
ICML 2018 Notes: Stockholm, Sweden
55 pages
483 Learning To Optimize
No ratings yet
483 Learning To Optimize
13 pages
Semi-Supervised Learning With Graphs
No ratings yet
Semi-Supervised Learning With Graphs
174 pages
L06 Slides - mlp3
No ratings yet
L06 Slides - mlp3
26 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
G L Ra: G A R F E L RAF - : E O Eometric Daptive Anks OR Fficient O INE Tuning
No ratings yet
G L Ra: G A R F E L RAF - : E O Eometric Daptive Anks OR Fficient O INE Tuning
23 pages
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
From Everand
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
2022PhD - Princeton - Bridging Theory and Practice in Deep Learning Optimization and Generalization
No ratings yet
2022PhD - Princeton - Bridging Theory and Practice in Deep Learning Optimization and Generalization
540 pages
Combinatorial Optimization and Reasoning With Graph Neural Networks
No ratings yet
Combinatorial Optimization and Reasoning With Graph Neural Networks
61 pages
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Unit - IV
No ratings yet
Unit - IV
24 pages
DL 4
No ratings yet
DL 4
15 pages
GCAT - Link Prediction in Knowledge Graphs
No ratings yet
GCAT - Link Prediction in Knowledge Graphs
73 pages
Optimization For Deep Learning Theory and Algorithms
No ratings yet
Optimization For Deep Learning Theory and Algorithms
60 pages
Specialism Research Report
No ratings yet
Specialism Research Report
13 pages
Hopfield Network
No ratings yet
Hopfield Network
16 pages
Steven Kolawole SOP
No ratings yet
Steven Kolawole SOP
2 pages
An Overview of Deep Neural Networks For Few-Shot Learning
No ratings yet
An Overview of Deep Neural Networks For Few-Shot Learning
44 pages
ArXiv-2024-MingZhang-0-Towards Graph Contrastive Learning A Survey and Beyond
No ratings yet
ArXiv-2024-MingZhang-0-Towards Graph Contrastive Learning A Survey and Beyond
35 pages
DL Unit-3
No ratings yet
DL Unit-3
10 pages
DL Unit 4&5
No ratings yet
DL Unit 4&5
27 pages
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
Technical Seminar
No ratings yet
Technical Seminar
14 pages
Lecture 12
No ratings yet
Lecture 12
10 pages
Artificial Intelligence Applied To Software Testing
No ratings yet
Artificial Intelligence Applied To Software Testing
7 pages
Artificial Intelligence: Long Short Term Memory Networks
No ratings yet
Artificial Intelligence: Long Short Term Memory Networks
14 pages
Practical Period Report #VI: Faculty of Engineering Dual Studies
No ratings yet
Practical Period Report #VI: Faculty of Engineering Dual Studies
13 pages
Agentic AI Fundamentals Quiz Complete With Code Diagrams Nida Rizwan
100% (1)
Agentic AI Fundamentals Quiz Complete With Code Diagrams Nida Rizwan
14 pages
CNN-RNN Based Handwritten Text Recognition: G.R. Hemanth, M. Jayasree, S. Keerthi Venii, P. Akshaya, and R. Saranya
No ratings yet
CNN-RNN Based Handwritten Text Recognition: G.R. Hemanth, M. Jayasree, S. Keerthi Venii, P. Akshaya, and R. Saranya
7 pages
ML Model Flowchart
No ratings yet
ML Model Flowchart
5 pages
SSRN 4863712
No ratings yet
SSRN 4863712
3 pages
Variation Autoencoder VAEs in PyTorch
No ratings yet
Variation Autoencoder VAEs in PyTorch
9 pages
30 Frequently Asked Deep Learning Interview Questions and Answers
100% (1)
30 Frequently Asked Deep Learning Interview Questions and Answers
28 pages
DocBERT - BERT For Document Classification
No ratings yet
DocBERT - BERT For Document Classification
7 pages
17 - PPT - NLP Project-2-24
No ratings yet
17 - PPT - NLP Project-2-24
23 pages
Proposal Defense Schedule 1st Round 2019-20AY - For Studnets
No ratings yet
Proposal Defense Schedule 1st Round 2019-20AY - For Studnets
2 pages
Book1 Artifficial Intelligence
No ratings yet
Book1 Artifficial Intelligence
84 pages
Predictive Modelling
No ratings yet
Predictive Modelling
9 pages
DL Lab Manual A.Y 2022-23-1
100% (1)
DL Lab Manual A.Y 2022-23-1
67 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
Unit 5
No ratings yet
Unit 5
144 pages
2024 Eacl-Long 104
No ratings yet
2024 Eacl-Long 104
14 pages
3-2-9 - Soft Computing Lab
No ratings yet
3-2-9 - Soft Computing Lab
2 pages
Optimizing Brain Tumor Identification With Fine - Tuned Pre-Trained CNN Models A Comparative Study of VGG16 and EfficientNetB4
No ratings yet
Optimizing Brain Tumor Identification With Fine - Tuned Pre-Trained CNN Models A Comparative Study of VGG16 and EfficientNetB4
5 pages
Ann Assignmeent 1,2,3
No ratings yet
Ann Assignmeent 1,2,3
23 pages
Script
No ratings yet
Script
10 pages
ROUTERBENCH: A Benchmark For Multi-LLM Routing System
No ratings yet
ROUTERBENCH: A Benchmark For Multi-LLM Routing System
18 pages
Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
Lab DigitRecognitionMINST
No ratings yet
Lab DigitRecognitionMINST
10 pages
Neuromorphic Computing - Mimicking The Human Brain For Efficient AI
No ratings yet
Neuromorphic Computing - Mimicking The Human Brain For Efficient AI
2 pages