Econometrics Cheat Sheet - ?
Econometrics Cheat Sheet - ?
I have tried to make it as detailed as possible while keeping explanations straightforward. It is organized by major
topics and includes definitions, equations, intuitive analogies, simple examples, and key points relevant to
quantitative finance.
1. Foundations of Econometrics
1.1 What Is Econometrics?
Definition: Econometrics uses statistical and mathematical models to analyze economic (and
financial) data. The goal is to estimate relationships and forecast future trends.
1. Random Variables
Example: Daily stock returns can be seen as a random variable since each day the return
changes.
A distribution tells us how values of a random variable are spread (mean, variance, skewness,
kurtosis).
Normal distribution is commonly used in finance.
2. Regression Analysis
2.1 Ordinary Least Squares (OLS)
1. Definition
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/11
OLS is the most basic regression technique. It fits a linear equation to the data by minimizing
the sum of squared residuals (differences between observed values and predicted values).
2. Equation
xi1 , … , xik : Independent variables (e.g., factors like interest rates, GDP, etc.).
β0 , β1 , … , βk : Parameters to be estimated.
3. Minimization Criterion
n
min ∑(yi − y^i )2
β0 ,…,βk
i=1
4. Layman Analogy:
Imagine you have points on a 2D plane. You want to draw a single straight line through these
points that is “best fit” in the sense that the total (vertical) distance between the line and
points is minimal.
5. Example:
Suppose you want to predict a stock’s return (y ) based on the market’s return (x). You run a
simple linear regression and find:
y^ = 0.02 + 0.8 × x
Linearity
No perfect collinearity
No autocorrelation in errors
Normality of errors (important for small sample inference)
Example:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/11
If you want to see whether interest rate and inflation together significantly predict bond prices, you
might do an F-test:
H0 : βinterest rate = 0,
βinflation = 0
1. Definition
Endogeneity happens when the explanatory variable x is correlated with the error term ε.
This violates OLS assumptions and biases estimates.
Stage 1: Regress x on z .
^ from stage 1.
Stage 2: Regress y on the predicted x
3. Analogy
If you suspect your “cause” variable (e.g., number of employees in a firm) is correlated with
hidden factors (e.g., firm’s productivity shock), you find an instrument (like government hiring
policy changes) that only influences the outcome (firm’s profit) through changes in number of
employees, not through direct correlation with productivity shocks.
1. Definition
A stationary process has a constant mean, constant variance, and autocovariances that
depend only on lag, not on time.
2. Why It Matters
Many time series methods assume stationarity. Non-stationary data can lead to spurious
regression results.
Example: You might run the ADF test on daily returns of a stock to check if they are stationary
(they usually are) or run it on price levels (often non-stationary).
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/11
3.2 Autoregressive (AR) Models
1. AR(1) Model
yt = ϕ0 + ϕ1 yt−1 + εt
If ∣ϕ1 ∣
< 1, the series is stationary.
Interpretation: Today’s value depends on yesterday’s value and some noise.
2. Example
This means 80% of last month’s inflation carries into this month, plus a constant 0.5.
1. MA(1) Model
yt = θ0 + εt + θ1 εt−1
Interpretation: Today’s value depends on current noise and the previous period’s noise.
2. Analogy
MA processes are like “smoothed noise” — each observation is an average of error terms
from current and past periods.
1. ARMA(p, q)
i=1 j=1
2. ARIMA(p, d, q)
For Integrated series that need differencing (Δyt = yt − yt−1 ):
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/11
1. Definition
A system of equations where each variable depends on its own lags and the lags of other
variables.
2. Use Case
If you have multiple time series (e.g., inflation, interest rate, GDP growth), a VAR captures
cross-variable dynamics.
3. Equation Example (VAR(1) with two variables):
( ) = ( 1 ) + ( 11 ) ( 1,t−1 ) + ( 1,t )
y1,t
c ϕ ϕ12 y ε
1. Definition
If two or more non-stationary series are linearly combined into a stationary series, they are
“cointegrated.”
Example: Two asset prices that individually wander (random walks) but maintain a stable
long-run relationship.
2. Johansen Test
Cointegrated pairs can be used in pairs trading: if the spread deviates, you expect it to revert.
4. Error Correction Model (ECM)
If yt and xt are cointegrated, an ECM captures both short-term deviations and long-term
equilibrium:
In finance, volatility is not constant — it clusters. Large moves tend to be followed by large moves
(high volatility), and small moves by small moves (low volatility).
5.2 ARCH(q)
1. Definition
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/11
Autoregressive Conditional Heteroskedasticity: The variance of the error term depends on
past squared errors.
2. Equation
q
εt ∼
N (0, σt2 ),
σt2 = α0 + ∑ αi ε2t−i
i=1
5.3 GARCH(p, q)
1. Definition
Generalized ARCH: Extends ARCH by allowing past variances to also affect current variance.
2. Equation
q p
σt2 =ω+ ∑ αi ε2t−i
+ ∑ βj σt−j
2
i=1 j=1
3. Interpretation
Today’s volatility depends on past shocks to volatility (ε2t−i ) and past volatility levels (σt−j
2
).
4. Financial Example
If a market experiences a large shock (big εt ), that can cause tomorrow’s volatility to spike.
IGARCH: Has a unit root in volatility process (volatility shocks can persist indefinitely).
Definition: Data that tracks multiple entities (firms, countries, stocks) over time.
Example: Daily returns of 100 stocks over 2 years (cross-sectional dimension = 100, time dimension
= 500 trading days).
1. Fixed Effects
2. Random Effects
Assumes entity-specific intercepts are random draws from a distribution.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/11
More efficient than fixed effects if the random effect assumption is correct (no correlation
with regressors).
Allows you to use cross-sectional and time-series variation to glean more insights.
For instance, analyzing the effect of firm-specific variables on returns while controlling for each
firm’s characteristics.
7. Forecast Evaluation
7.1 In-Sample vs. Out-of-Sample
In-sample: Fit the model on a historical dataset and see how well it explains that same dataset.
Out-of-sample: Test the model on new or “future” data (not used in model fitting).
^)
AIC = 2k − 2 ln(L
^ : Maximized likelihood.
k : Number of parameters, L
2. BIC (Bayesian Information Criterion)
^)
BIC = k ln(n) − 2 ln(L
1. CAPM: Ri − R f = αi + β i (R m − R f ) + ε i
Ri : Return of asset i.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/11
Rf : Risk-free rate.
Extend CAPM by adding extra factors (size factor, value factor, momentum, etc.).
9. Advanced Topics
9.1 Maximum Likelihood Estimation (MLE)
Example: GARCH parameters are typically estimated with MLE, by assuming errors follow a normal
(or Student’s t) distribution.
Definition: Incorporates prior beliefs and updates them with data to get posterior distributions.
Example: Bayesian VAR uses priors on parameters to handle over-parameterization issues in large
VAR models.
Non-linear methods (e.g., logistic regression for binary outcomes, neural nets for predictive tasks).
You want to forecast next month’s stock return for a particular asset. You suspect:
1. Past returns and market returns are relevant (time series aspect).
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/11
3. Other factors like macro variables (inflation, interest rate) matter.
Step-by-Step
1. Stationarity Checks
Check if the asset’s returns are stationary using ADF test. Typically, returns are stationary.
2. Build an AR(1)-GARCH(1,1) Model
rt = ϕ0 + ϕ1 rt−1 + εt
εt ∼ N (0, σt2 ),
σt2 = ω + αε2t−1 + βσt−1
2
3. Include Exogenous Factors (VAR extension or just add them in the mean equation):
4. Estimate Model
5. Check Diagnostics
6. Forecast
2
Use the fitted model to project next period’s return (rt+1 ) and volatility (σt+1 ).
7. Evaluate Performance
Compare forecast vs. actual using RMSE, MAE, or an out-of-sample R-squared.
2. AR(1):
yt = ϕ0 + ϕ1 yt−1 + εt
3. MA(1):
yt = θ0 + εt + θ1 εt−1
4. GARCH(1,1):
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/11
5. Johansen Cointegration
Stage 1: x = π0 + π1 z + …
Stage 2: y = α0 + α1 x
^+…
4. Overfitting: Use information criteria (AIC, BIC) or cross-validation to avoid too many parameters.
5. Interpretation of Coefficients: In time series, a strongly persistent AR(1) coefficient near 1 might
indicate near-random walk behavior, which can make forecasting more difficult.
6. Economic/Financial Theory helps guide variable selection and interpretation. Do not rely purely
on data-mining or purely on theory — a balance is best.
Use this cheat sheet as a quick reference while diving deeper into any specific topic.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/11