DoubleDescente Synthesis
DoubleDescente Synthesis
Bias-variance Tradeoff
In the context of machine learning and model evaluation, the generalization error refers to the overall
error of a model when applied to unseen data, and it is a key concept in evaluating the performance
of a model. The generalization error is composed of three components: bias, variance, and irreducible
error.
Bias: Bias refers to the error introduced by the simplifying assumptions or limitations of a model
in capturing the true underlying relationship between the data points. It quantifies the deviation of
the model predictions from the true values. A high bias indicates that the model is too simplistic and
unable to capture the true complexity of the data, leading to underfitting. On the other hand, a low
bias means that the model is more capable of capturing the true relationship between data points.
Mathematically, bias can be calculated as:
where:
Variance: Variance refers to the variability or spread in the predictions of a model for different datasets.
It quantifies how much the predictions of a model change when trained on different subsets of data.
A high variance indicates that the model is sensitive to the training data and may overfit, capturing
noise or random patterns in the data. On the other hand, a low variance means that the model is
more stable and consistent in its predictions. Mathematically, variance can be calculated as:
1
where:
Irreducible error: Irreducible error represents the inherent noise or randomness in the data that
cannot be reduced by any model. It is the minimum error that any model would have, regardless of
its complexity or performance. The relationship between bias, variance, and the generalization error
can be expressed by the following equation:
2
Figure 2: Double descent.
3
• Linear Discriminant Analysis
• Logistic regression
On the other hand, there are still ongoing research and investigation to determine
whether or not the following models exhibit the double descent phenomenon:
The double descent phenomenon has sparked significant interest in the machine learning com-
munity as it challenges our traditional understanding of model complexity and generalization error.
Further research is being conducted to better understand the underlying causes and implications of
this phenomenon in different models.
5 Python Code
The code aims to investigate whether the double descent phenomenon occurs for a synthetic dataset
and how the regularization parameter affects the shape of the test error curve. Linear regression is
used as a simple model to test for double descent, and a function for the L2 regularizer is defined to
further analyze the phenomenon.
a =(XT X + λI)X T y
To investigate the relationship between the model complexity (controlled by the number of samples
N and the regularization parameter λ and the generalization error (measured by the test error) on a
fixed size of training data. By varying the values of N and λ and observing the resulting test errors
When λ is small, the regularization is weak and the model tends to fit the training data more
closely, possibly leading to overfitting. When λ is large, the regularization is strong and the model
tends to have smaller coefficients, which can help prevent overfitting. Double descent phenomenon is
observed for smaller values of λ implying that using large regularization parameters might be able to
stop the double descent phenomenon.
4
References
[Belkin et al., 2019] Belkin, M., Hsu, D., Ma, S., and Mandal, S. (2019). Reconciling modern machine-
learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of
Sciences, 116(32):15849–15854.
[Nakkiran et al., 2019] Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., and Sutskever, I.
(2019). Deep double descent: Where bigger models and more data hurt.
[Schaeffer et al., 2023] Schaeffer, R., Khona, M., Robertson, Z., Boopathy, A., Pistunova, K., Rocks,
J. W., Fiete, I. R., and Koyejo, O. (2023). Double descent demystified: Identifying, interpreting
ablating the sources of a deep learning puzzle.
[Web1, 2023] Web1 (Accessed April 2023). The double descent phenomenon in machine learning.
Webots,https://math.gatech.edu/sites/default/files/images/reu2021_liao.pdf.
[Web2, 2023] Web2 (Accessed April 2023). Double descent animated image. Webots,https://
mlu-explain.github.io/double-descent/.