0% found this document useful (0 votes)

7 views5 pages

DoubleDescente Synthesis

The document discusses the double descent phenomenon in machine learning, which describes how test error can decrease after initially increasing with model complexity, particularly at the interpolation threshold. It explains the bias-variance tradeoff and how different types of double descent (model-wise, sample-wise, and epoch-wise) can occur under varying conditions. The document also highlights ongoing research into various models exhibiting this phenomenon and includes Python code to investigate its occurrence with linear regression.

Uploaded by

bahatikilongo1960

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views5 pages

DoubleDescente Synthesis

Uploaded by

bahatikilongo1960

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Double descent phenomenon

1.Fay Elhassan 2. Vongai Mitchell 3. Bahati Kilongo

April 10, 2025

1 Bias - variance Tradeoff

Model complexity
The complexity of a model is its capacity to capture a complex underlying process. It refers to the
number of predictor or independent variables or features that a model needs to take into account to
make accurate predictions.
The following are key factors that govern the model’s complexity and impact the model’s accuracy:

• The Number of parameters.

• The norm used.
• The number of training examples.
• For the neural networks, we can add the number of hidden layers, increase the number of neurons
in each layer, and alter the form of activation functions.

Bias-variance Tradeoff
In the context of machine learning and model evaluation, the generalization error refers to the overall
error of a model when applied to unseen data, and it is a key concept in evaluating the performance
of a model. The generalization error is composed of three components: bias, variance, and irreducible
error.
Bias: Bias refers to the error introduced by the simplifying assumptions or limitations of a model
in capturing the true underlying relationship between the data points. It quantifies the deviation of
the model predictions from the true values. A high bias indicates that the model is too simplistic and
unable to capture the true complexity of the data, leading to underfitting. On the other hand, a low
bias means that the model is more capable of capturing the true relationship between data points.
Mathematically, bias can be calculated as:

Bias = E[f (x)] − ftrue (x)

where:

f(x) represents the predictions of the model for a given input x

E[f(x)] represents the expected value or average of the model predictions over different datasets
ftrue (x) represents the true underlying relationship between the data points

Variance: Variance refers to the variability or spread in the predictions of a model for different datasets.
It quantifies how much the predictions of a model change when trained on different subsets of data.
A high variance indicates that the model is sensitive to the training data and may overfit, capturing
noise or random patterns in the data. On the other hand, a low variance means that the model is
more stable and consistent in its predictions. Mathematically, variance can be calculated as:

Variance = E[(f (x)−E[f(x)])2 ]

1
where:

f(x) represents the predictions of the model for a given input x

E[f(x)] represents the expected value or average of the model predictions over different datasets

Irreducible error: Irreducible error represents the inherent noise or randomness in the data that
cannot be reduced by any model. It is the minimum error that any model would have, regardless of
its complexity or performance. The relationship between bias, variance, and the generalization error
can be expressed by the following equation:

Generalization error = Bias2 +Variance + Irreducibleerror

This equation highlights that the generalization error is the sum of the squared bias, variance,
and irreducible error. The goal in model training is to find a balance between bias and variance, as
reducing one may increase the other. The bias-variance tradeoff aims to select a model that strikes
the right balance between bias and variance to minimize the generalization error and achieve a model
that can effectively generalize to unseen data. A model with high bias may underfit the data, while
a model with high variance may overfit the data. The ideal model should have an appropriate level
of complexity that captures the underlying patterns in the data without being too simplistic or too
sensitive to noise. Now we can represent the test error according to the model complexity as follows:

Figure 1: Bias-Variance Tradeoff

2 The double descent phenomenon

Figure 2 shows that if the model complexity is increased, the test error first decreases and afterwards
increases to a peak point. In many cases it is empirically observed that the test error can start to
decrease again to a very minimum point confirming the intuition that bigger/complex models are
better. This phenomenon is called double descent phenomenon.
This second decrease happens at the interpolation threshold whereby model complexity is just
sufficient enough to fit the training data and thereby causing a very small training error approxi-
mately equal to zero. This splits the graph into overparameterized (underconstrained) regime whereby
number of model parameters, P , is larger than number of training data, N , and underparameterized
(overconstrained) regime whereby N is greater than P .
The second decrease happens in the overparameterized regime and is due to the different factors
of the model complexity as stated earlier.
The phenomenon can be also illustrated by this animated image [Web2, 2023] https://mlu-explain.
github.io/double-descent/

2
Figure 2: Double descent.

Model-wise double descent

In modern machine learning it is shown that the test error doesn’t keep increasing as you increase model
complexity, however, as you increase model complexity, the test error reaches a critical point(interpolation
threshold) and starts to decrease again. Since this second decrease is caused by increase in model com-
plexity, this type of double descent is called model-wise double descent.

Sample-wise double descent

As the number of training examples (N ) increases, model becomes more complex in-order to cater for
the large data set. In some cases, when the number of samples is increased to almost/equal to number
of parameters P this increase in training examples has been noted to cause a type of double descent
called sample-wise double descent.

Epoch-wise double descent

In complex models, for a given large number of optimization steps (number of epochs) of training
data, the test error decreases, increases, and decreases again due to what is called epoch-wise double
descent. This double descent is due to the number of optimization steps.

Key point to note

It is to be noted that, model-wise and sample-wise are very similar; model-wise focuses on increasing
number of parameters P until it reaches or exceeds number of samples (N ) whereas sample-wise focuses
on increasing number of samples such that they reach number of parameters P .

3 Double descent phenomenon and model generalization

The double descent phenomenon occurs at the interpolation threshold, where a model perfectly fits the
training data but becomes sensitive to noise, resulting in a peak in test error. In the overparameterized
regime, there are multiple models that can absorb noise and fit the training data well, and SGD as an
optimizer can find the best model. Despite lacking explicit regularization, overparameterized models
still exhibit good generalization performance on the testing set due to implicit regularization through
SGD.

4 Double descent phenomenon with models

The double descent phenomenon refers to a peculiar behavior observed in certain models where the
test error initially decreases as the model complexity increases, then reaches a minimum, and finally
increases again as the model becomes more complex. This phenomenon challenges the traditional
bias-variance tradeoff and suggests that adding more complexity to a model may not always lead to
overfitting.
Currently, the double descent phenomenon has been observed in models such as:
• Linear regression

3
• Linear Discriminant Analysis
• Logistic regression
On the other hand, there are still ongoing research and investigation to determine
whether or not the following models exhibit the double descent phenomenon:

• Quadratic Discriminant Analysis (work on Linear Discriminant Analysis may be extended)

• Random forests
• Support Vector Machines
• Neural networks with nonlinear activation

The double descent phenomenon has sparked significant interest in the machine learning com-
munity as it challenges our traditional understanding of model complexity and generalization error.
Further research is being conducted to better understand the underlying causes and implications of
this phenomenon in different models.

5 Python Code
The code aims to investigate whether the double descent phenomenon occurs for a synthetic dataset
and how the regularization parameter affects the shape of the test error curve. Linear regression is
used as a simple model to test for double descent, and a function for the L2 regularizer is defined to
further analyze the phenomenon.

5.1 Define Function

Regularized linear regression solution:

a =(XT X + λI)X T y
To investigate the relationship between the model complexity (controlled by the number of samples
N and the regularization parameter λ and the generalization error (measured by the test error) on a
fixed size of training data. By varying the values of N and λ and observing the resulting test errors

Figure 3: Linear Regression with L2 Regularizer

When λ is small, the regularization is weak and the model tends to fit the training data more
closely, possibly leading to overfitting. When λ is large, the regularization is strong and the model
tends to have smaller coefficients, which can help prevent overfitting. Double descent phenomenon is
observed for smaller values of λ implying that using large regularization parameters might be able to
stop the double descent phenomenon.

4
References
[Belkin et al., 2019] Belkin, M., Hsu, D., Ma, S., and Mandal, S. (2019). Reconciling modern machine-
learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of
Sciences, 116(32):15849–15854.
[Nakkiran et al., 2019] Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., and Sutskever, I.
(2019). Deep double descent: Where bigger models and more data hurt.

[Schaeffer et al., 2023] Schaeffer, R., Khona, M., Robertson, Z., Boopathy, A., Pistunova, K., Rocks,
J. W., Fiete, I. R., and Koyejo, O. (2023). Double descent demystified: Identifying, interpreting
ablating the sources of a deep learning puzzle.
[Web1, 2023] Web1 (Accessed April 2023). The double descent phenomenon in machine learning.
Webots,https://math.gatech.edu/sites/default/files/images/reu2021_liao.pdf.

[Web2, 2023] Web2 (Accessed April 2023). Double descent animated image. Webots,https://
mlu-explain.github.io/double-descent/.

DL Unit1
100% (1)
DL Unit1
79 pages
Financial Management Project Bba - Compressed
No ratings yet
Financial Management Project Bba - Compressed
55 pages
Midterm2 PDF
100% (3)
Midterm2 PDF
25 pages
Title of The Proposal Comparative Analysis of Service Quality On Customer Satisfaction:-In Case of Private and Public Banks in Wolkite Town
No ratings yet
Title of The Proposal Comparative Analysis of Service Quality On Customer Satisfaction:-In Case of Private and Public Banks in Wolkite Town
29 pages
IT Roadmap For Cybersecurity: Excerpt
100% (1)
IT Roadmap For Cybersecurity: Excerpt
9 pages
Biyani Bba HRD Notes PDF
No ratings yet
Biyani Bba HRD Notes PDF
57 pages
Advertising Briefs
No ratings yet
Advertising Briefs
16 pages
Philippines Patient Bill of Rights
92% (13)
Philippines Patient Bill of Rights
2 pages
Deep Double Descent Where Bigger Models and More Data Hurt
No ratings yet
Deep Double Descent Where Bigger Models and More Data Hurt
24 pages
Unit 4
No ratings yet
Unit 4
50 pages
Variance and Bias
No ratings yet
Variance and Bias
14 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Edab Module - 2
No ratings yet
Edab Module - 2
20 pages
12 Bias-Variance - Underfit - Overfit
No ratings yet
12 Bias-Variance - Underfit - Overfit
4 pages
Excellent 05 - Overfitting
No ratings yet
Excellent 05 - Overfitting
22 pages
Chapter5 Regularization Summary Final
No ratings yet
Chapter5 Regularization Summary Final
10 pages
SML Lecture4
No ratings yet
SML Lecture4
38 pages
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
No ratings yet
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
61 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
DoubleDescente Beamer
No ratings yet
DoubleDescente Beamer
9 pages
3 Bias Variance Tradeoff
No ratings yet
3 Bias Variance Tradeoff
9 pages
Mba Thesis Proposal Sample PDF
100% (3)
Mba Thesis Proposal Sample PDF
4 pages
Machine Learning-Unit 3
No ratings yet
Machine Learning-Unit 3
18 pages
Data Science Interview Questions - 1
No ratings yet
Data Science Interview Questions - 1
55 pages
Ensemble Method
No ratings yet
Ensemble Method
12 pages
Learning Theory
No ratings yet
Learning Theory
19 pages
Bias Variance
No ratings yet
Bias Variance
8 pages
Regularization Linear Models
No ratings yet
Regularization Linear Models
23 pages
Machine Learning Math Essentials - 12.02.2025
No ratings yet
Machine Learning Math Essentials - 12.02.2025
88 pages
4 - Bias-Variance Tradeoff
No ratings yet
4 - Bias-Variance Tradeoff
28 pages
DL-Lec 2 - Bias-Variance-Tradeoff
No ratings yet
DL-Lec 2 - Bias-Variance-Tradeoff
33 pages
Over Fitting
No ratings yet
Over Fitting
19 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Linear Regression, Polynomical, Gradiant Descent
No ratings yet
Linear Regression, Polynomical, Gradiant Descent
42 pages
(Technical) Machine Learning U3-6 (2019 Pattern)
No ratings yet
(Technical) Machine Learning U3-6 (2019 Pattern)
101 pages
Chap 4 Slides
No ratings yet
Chap 4 Slides
61 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
10 pages
Unit 1-Week2: Linear Regression, Bias, Variance, Under and Over Fitting, Curse of Dimensionality and ROC
No ratings yet
Unit 1-Week2: Linear Regression, Bias, Variance, Under and Over Fitting, Curse of Dimensionality and ROC
53 pages
Jkkklphftbbhuii
No ratings yet
Jkkklphftbbhuii
17 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
L8 Ann
No ratings yet
L8 Ann
20 pages
ML - Week 06
No ratings yet
ML - Week 06
31 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
150 Essential Data Science Questions and Answers
No ratings yet
150 Essential Data Science Questions and Answers
55 pages
Bias and Variance
No ratings yet
Bias and Variance
21 pages
DL Notes
No ratings yet
DL Notes
16 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Midterm Examination (Research)
100% (1)
Midterm Examination (Research)
16 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
Geddes, Barbara - Paradigms and Castles
No ratings yet
Geddes, Barbara - Paradigms and Castles
108 pages
Chapter2 1 22
No ratings yet
Chapter2 1 22
9 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Overfitting vs. Underfitting, Bias vs. Variance
No ratings yet
Overfitting vs. Underfitting, Bias vs. Variance
7 pages
Unit 1.2 Perceptron 2024
No ratings yet
Unit 1.2 Perceptron 2024
107 pages
Unit 2
No ratings yet
Unit 2
37 pages
Fit Without Fear - Remarkable Mathematical Phenomena of Deep Learning Through The Prism of Interpolation
No ratings yet
Fit Without Fear - Remarkable Mathematical Phenomena of Deep Learning Through The Prism of Interpolation
51 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
Bais and Variance
No ratings yet
Bais and Variance
4 pages
Athira Anil - Resume
No ratings yet
Athira Anil - Resume
2 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
6 pages
User Interaction With AI-enabled Systems: A Systematic Review of IS Research
No ratings yet
User Interaction With AI-enabled Systems: A Systematic Review of IS Research
17 pages
C Manema Chapter 1-5 - Tt1
No ratings yet
C Manema Chapter 1-5 - Tt1
67 pages
Dissertation Argosy University
100% (2)
Dissertation Argosy University
8 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Diagnosing Bias Vs Variance
No ratings yet
Diagnosing Bias Vs Variance
11 pages
Validaciones - Bosstrap
No ratings yet
Validaciones - Bosstrap
50 pages
Functional Fixation
No ratings yet
Functional Fixation
29 pages
Creating Blue Ocean Strategy in Air Indus (PVT) LTD PDF
No ratings yet
Creating Blue Ocean Strategy in Air Indus (PVT) LTD PDF
182 pages
Pan-Borneo Research Methods
No ratings yet
Pan-Borneo Research Methods
4 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Chapter One
No ratings yet
Chapter One
21 pages
LST Handbook of Guidelines and Procedures
No ratings yet
LST Handbook of Guidelines and Procedures
66 pages
The Stranger and The City
No ratings yet
The Stranger and The City
9 pages
BU Internship Policy AY 24 25
No ratings yet
BU Internship Policy AY 24 25
10 pages
Basic Research Templates
No ratings yet
Basic Research Templates
15 pages
Journal of Automation and Intelligence: Jia Liu Yaochu Jin
No ratings yet
Journal of Automation and Intelligence: Jia Liu Yaochu Jin
21 pages
Sensing Technologies For Precision Specialty Crop Production0
No ratings yet
Sensing Technologies For Precision Specialty Crop Production0
32 pages
UNIT 2 LESSON 3 Relevance of Qualitative Research
No ratings yet
UNIT 2 LESSON 3 Relevance of Qualitative Research
33 pages
ArtikelProsidingISEHTUNNES HamdanHuseinBatubara
No ratings yet
ArtikelProsidingISEHTUNNES HamdanHuseinBatubara
9 pages
Research Article: Aircraft Failure Rate Prediction Method Based On CEEMD and Combined Model
No ratings yet
Research Article: Aircraft Failure Rate Prediction Method Based On CEEMD and Combined Model
19 pages
Multiplier Effect of Tourism Promise For Trinidad Oil Base Economy
No ratings yet
Multiplier Effect of Tourism Promise For Trinidad Oil Base Economy
9 pages
Watershed Transform
No ratings yet
Watershed Transform
31 pages
2023 Semantic Segmentation of Germinated Oil Palm Seeds
No ratings yet
2023 Semantic Segmentation of Germinated Oil Palm Seeds
26 pages
2021 Automated Corn Seed Fusarium Disease Cla
No ratings yet
2021 Automated Corn Seed Fusarium Disease Cla
10 pages
A Review On The Combination of Deep Learning Techniques With Proximal Hyperspectral Images in Agriculture - ScienceDirect
No ratings yet
A Review On The Combination of Deep Learning Techniques With Proximal Hyperspectral Images in Agriculture - ScienceDirect
9 pages
2022 - Raman Method in Identification of Species and Varieties, Assessment of Plant Maturity and Crop Quality-A Review
No ratings yet
2022 - Raman Method in Identification of Species and Varieties, Assessment of Plant Maturity and Crop Quality-A Review
16 pages
Computer Vision FOR FOOD QUALITY ASSURANCE
No ratings yet
Computer Vision FOR FOOD QUALITY ASSURANCE
12 pages
2020 - Deep Learning-Based Detection of Seedling Development
No ratings yet
2020 - Deep Learning-Based Detection of Seedling Development
11 pages
2023 - Video Based Oil Palm Ripeness Detection Model Using Deep Learning
No ratings yet
2023 - Video Based Oil Palm Ripeness Detection Model Using Deep Learning
23 pages
Pavement Design Project
No ratings yet
Pavement Design Project
10 pages
2020 - Computer Vision For Automated Quantification of Striga Seed Germination
No ratings yet
2020 - Computer Vision For Automated Quantification of Striga Seed Germination
8 pages
2021 Classification of Germination Images of Pear Polle
No ratings yet
2021 Classification of Germination Images of Pear Polle
7 pages
2024 - Enhanced Corn Seed Disease Classification - Leveraging MobileNetV2 With Feature Augmentation and Transfer Learning
No ratings yet
2024 - Enhanced Corn Seed Disease Classification - Leveraging MobileNetV2 With Feature Augmentation and Transfer Learning
12 pages
2016 - An Image-Processing Based Algorithm For Rice Seed Germination Rate Evaluation
No ratings yet
2016 - An Image-Processing Based Algorithm For Rice Seed Germination Rate Evaluation
5 pages
2020 - Hyperspectral Imaging of Beet Seed Germination Prediction - ScienceDirect
No ratings yet
2020 - Hyperspectral Imaging of Beet Seed Germination Prediction - ScienceDirect
5 pages
2021 - Robust Seed Germination Prediction Using Deep Learning and RGB Image Data
No ratings yet
2021 - Robust Seed Germination Prediction Using Deep Learning and RGB Image Data
10 pages
Practical Research
No ratings yet
Practical Research
5 pages
Public Abstract 2523
No ratings yet
Public Abstract 2523
1 page
2020 - Development and Assessment of Cracking and Sorting Processes of Palm Kernel Nut Machine
No ratings yet
2020 - Development and Assessment of Cracking and Sorting Processes of Palm Kernel Nut Machine
8 pages
Six Sigma Healthcare Informatics and EMR
No ratings yet
Six Sigma Healthcare Informatics and EMR
3 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

DoubleDescente Synthesis

Uploaded by

DoubleDescente Synthesis

Uploaded by

Double descent phenomenon

1.Fay Elhassan 2. Vongai Mitchell 3. Bahati Kilongo

1 Bias - variance Tradeoff

• The Number of parameters.

Bias = E[f (x)] − ftrue (x)

f(x) represents the predictions of the model for a given input x

Variance = E[(f (x)−E[f(x)])2 ]

f(x) represents the predictions of the model for a given input x

Generalization error = Bias2 +Variance + Irreducibleerror

Figure 1: Bias-Variance Tradeoff

2 The double descent phenomenon

Model-wise double descent

Sample-wise double descent

Epoch-wise double descent

Key point to note

3 Double descent phenomenon and model generalization

4 Double descent phenomenon with models

• Quadratic Discriminant Analysis (work on Linear Discriminant Analysis may be extended)

5.1 Define Function

Figure 3: Linear Regression with L2 Regularizer

You might also like