AIC and BIC
AIC and BIC
^
AI C = 2k − 2 ln(L)
Where:
model explains the data. The log-likelihood is the natural logarithm of this likelihood.
n
^ = ∑ ln(f (y |θ))
ln(L) i
i=1
Here, f (y i
|θ) is the probability density function, and θ\thetaθ represents the model parameters.
3. Count the Parameters (k): This includes all estimated parameters in the model. For
example, in a linear regression model, k would be the number of coefficients, including the
intercept.
4. Calculate AIC: Using the formula:
^
AI C = 2k − 2 ln(L)
This penalizes models with more parameters (to avoid overfitting) and rewards models with a
higher likelihood.
^
BI C = k ln(n) − 2 ln(L)
Where:
^
BI C = k ln(n) − 2 ln(L)
The term k ln(n) increases more rapidly with the number of parameters compared to AIC’s
penalty term 2k, making BIC more conservative.
Key Differences:
Penalty Terms:
-### AIC penalizes models with more parameters using 2k, while BIC uses k ln(n). BIC's
penalty grows faster with the number of observations, making it more stringent for larger
datasets. ###
Focus: AIC is more focused on predictive accuracy, while BIC is more focused on finding
the "true" model by incorporating a stronger penalty for complexity.
Model Preference: AIC is likely to select more complex models, while BIC leans toward
simpler models, especially as the dataset size grows.
2k(k + 1)
AI Cc = AI C +
n − k − 1
Where:
This correction term accounts for small sample sizes and adjusts AIC upwards when n is small,
preventing overfitting in small datasets.
BICc (Corrected BIC): Though less commonly used, BICc would adjust BIC in a similar
way, taking small-sample bias into account, but BIC is already heavily biased toward
simpler models in large datasets.
Log-Likelihood
Counting parameters
https://en.wikipedia.org/wiki/Akaike_information_criterion
AI C = −2log(L) + 2(p + q + l + 1)
l=1 if c≠0
and l=0 if c=0c=0.
Note that the last term in parentheses is the number of parameters in the model (including σ ,
2