0% found this document useful (0 votes)
102 views5 pages

AIC and BIC

Uploaded by

sid.bakshi2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views5 pages

AIC and BIC

Uploaded by

sid.bakshi2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

AIC and BIC

AIC (Akaike Information Criterion)


AIC (Akaike Information Criterion) is a measure used for model selection, aiming to balance
goodness of fit with model complexity. It helps in identifying the model that best explains the
data without overfitting. AIC is defined as:

^
AI C = 2k − 2 ln(L)

Where:

k = Number of parameters in the model


^
L = Maximum likelihood of the model
^
ln(L) = Log-likelihood of the model

Steps to Calculate AIC


1. Fit the Model: Estimate the model's parameters (e.g., coefficients in a linear regression
model).
2. Calculate Log-Likelihood ln(L)^ : The likelihood function is a measure of how well the

model explains the data. The log-likelihood is the natural logarithm of this likelihood.
n

^ = ∑ ln(f (y |θ))
ln(L) i

i=1

Here, f (y i
|θ) is the probability density function, and θ\thetaθ represents the model parameters.

3. Count the Parameters (k): This includes all estimated parameters in the model. For
example, in a linear regression model, k would be the number of coefficients, including the
intercept.
4. Calculate AIC: Using the formula:

^
AI C = 2k − 2 ln(L)

This penalizes models with more parameters (to avoid overfitting) and rewards models with a
higher likelihood.

For an ARIMA(3, 2, 1) model:


k = 3+ 1 + 1 (if no intercept model)
k = 3 + 1 + 1 + 1 (if intercept is present)
Choosing a Model Based on AIC Value
Lower AIC is Better: When comparing models, the one with the lowest AIC is generally
considered the best. A lower AIC indicates a better trade-off between fit and complexity.
Relative Differences: AIC values by themselves don’t have much meaning, but
differences between AIC values are meaningful. A difference of 2 or more between two
models suggests that the model with the lower AIC is significantly better.
Limitations: AIC doesn't account for correlation in errors or non-stationarity in data. It's
also not suitable when models are non-nested.

BIC (Bayesian Information Criterion)


BIC (Bayesian Information Criterion) is another criterion used for model selection. It is similar to
AIC but incorporates a stronger penalty for the number of parameters to avoid overfitting. The
formula for BIC is:

^
BI C = k ln(n) − 2 ln(L)

Where:

k = Number of parameters in the model


n = Number of data points (observations)
^
L= Maximum likelihood of the model
^
ln(L)= Log-likelihood of the model

Steps to Calculate BIC


1. Fit the Model: Estimate the model parameters as you would for AIC.
2. Calculate Log-Likelihood (ln(L
^ ): Same as AIC, calculate the log of the maximum

likelihood function of the model.


3. Count the Parameters (k): Count the number of estimated parameters in the model,
including the intercept.
4. Number of Observations (n): Determine the number of data points in the dataset.
5. Calculate BIC:

^
BI C = k ln(n) − 2 ln(L)

The term k ln(n) increases more rapidly with the number of parameters compared to AIC’s
penalty term 2k, making BIC more conservative.

Choosing a Model Based on BIC Value


Lower BIC is Better: As with AIC, the model with the lowest BIC is preferred.
Stronger Penalty for Complexity: BIC penalizes models with more parameters more
heavily than AIC does. This means BIC generally prefers simpler models, especially for
large datasets.
Relative Differences: BIC values are meaningful only when comparing models. A
difference of 10 or more between models indicates strong evidence in favor of the model
with the lower BIC.

Comparison of AIC and BIC

Criterion Formula Penalty for Best For Model


Complexity Preference
AIC ^
AI C = 2k − 2 ln(L) Penalizes based Model Tends to prefer
on number of selection more complex
parameters when the goal models
is prediction
BIC ^
BI C = k ln(n) − 2 ln(L) Stronger penalty Model Prefers simpler
for large data or selection models,
more parameters when finding especially as
the "true" nnn increases
model

Key Differences:

Penalty Terms:

-### AIC penalizes models with more parameters using 2k, while BIC uses k ln(n). BIC's
penalty grows faster with the number of observations, making it more stringent for larger
datasets. ###

Focus: AIC is more focused on predictive accuracy, while BIC is more focused on finding
the "true" model by incorporating a stronger penalty for complexity.
Model Preference: AIC is likely to select more complex models, while BIC leans toward
simpler models, especially as the dataset size grows.

Pros and Cons of AIC and BIC

Criterion Pros Cons


AIC - Good for predictive model - More likely to select overly complex
selection. models (overfitting).
Criterion Pros Cons
- Less conservative (better for
smaller datasets).
BIC - Stronger penalty for complexity - May select overly simple models
(better for finding the simplest (underfitting) when the goal is predictive
model). accuracy.
- Considers dataset size. - Less flexible for small datasets.

AICc (Corrected AIC) and BICc (Corrected BIC)


For small sample sizes, the regular AIC and BIC can be biased because they don’t account for
the small-sample effects on model complexity. To adjust for this, AICc and BICc were
introduced.

AICc (Corrected AIC):

2k(k + 1)
AI Cc = AI C +
n − k − 1

Where:

n = Number of data points


k = Number of parameters

This correction term accounts for small sample sizes and adjusts AIC upwards when n is small,
preventing overfitting in small datasets.

BICc (Corrected BIC): Though less commonly used, BICc would adjust BIC in a similar
way, taking small-sample bias into account, but BIC is already heavily biased toward
simpler models in large datasets.

Summary of Differences between AIC, AICc, BIC, and BICc


AIC: Tends to select more complex models, best for prediction.
AICc: Adjusted for small datasets, making AIC more conservative when n is small.
BIC: Stronger preference for simpler models, especially with larger datasets.
BICc: Less commonly used. Theoretically would be an adjustment of BIC for small sample
sizes, though BIC is already conservative.

Log-Likelihood

Counting parameters
https://en.wikipedia.org/wiki/Akaike_information_criterion

AI C = −2log(L) + 2(p + q + l + 1)

,where L is the likelihood of the data,

l=1 if c≠0
and l=0 if c=0c=0.

Note that the last term in parentheses is the number of parameters in the model (including σ ,
2

the variance of the residuals)

You might also like