0% found this document useful (0 votes)
83 views10 pages

Econometrics Cheat Sheet - ?

The document is a comprehensive econometrics cheat sheet that covers foundational concepts, regression analysis, time series analysis, and advanced topics relevant to quantitative finance. It includes definitions, equations, examples, and key points for understanding econometric methods and their applications. Key topics discussed include OLS regression, hypothesis testing, endogeneity, time series models, GARCH models, panel data, and forecasting evaluation metrics.

Uploaded by

Abdul Moizz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views10 pages

Econometrics Cheat Sheet - ?

The document is a comprehensive econometrics cheat sheet that covers foundational concepts, regression analysis, time series analysis, and advanced topics relevant to quantitative finance. It includes definitions, equations, examples, and key points for understanding econometric methods and their applications. Key topics discussed include OLS regression, hypothesis testing, endogeneity, time series models, GARCH models, panel data, and forecasting evaluation metrics.

Uploaded by

Abdul Moizz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Econometrics Cheat Sheet

Amit Kumar Jha


T

I have tried to make it as detailed as possible while keeping explanations straightforward. It is organized by major
topics and includes definitions, equations, intuitive analogies, simple examples, and key points relevant to
quantitative finance.

1. Foundations of Econometrics
1.1 What Is Econometrics?

Definition: Econometrics uses statistical and mathematical models to analyze economic (and
financial) data. The goal is to estimate relationships and forecast future trends.

Layman Explanation: Think of econometrics as a combination of “Economics + Statistics + Math”


that helps us understand how different variables move together and predict outcomes (like stock
prices, interest rates, etc.).

1.2 Basic Statistical Concepts

1. Random Variables

A random variable is a variable whose value is unknown or subject to variability.

Example: Daily stock returns can be seen as a random variable since each day the return
changes.

2. Distributions (Normal, t-distribution, etc.)

A distribution tells us how values of a random variable are spread (mean, variance, skewness,
kurtosis).
Normal distribution is commonly used in finance.

t-distribution is useful when sample sizes are small or variance is unknown.


3. Moments

1st moment: Mean μ.

2nd moment: Variance σ 2 .

3rd moment: Skewness (measure of asymmetry).


4th moment: Kurtosis (measure of tail heaviness).

2. Regression Analysis
2.1 Ordinary Least Squares (OLS)

1. Definition

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/11
OLS is the most basic regression technique. It fits a linear equation to the data by minimizing
the sum of squared residuals (differences between observed values and predicted values).
2. Equation

yi = β0 + β1 xi1 + β2 xi2 + ⋯ + βk xik + εi


​ ​ ​ ​ ​ ​ ​ ​ ​

yi : Dependent variable (e.g., asset returns).


xi1 , … , xik : Independent variables (e.g., factors like interest rates, GDP, etc.).
​ ​

β0 , β1 , … , βk : Parameters to be estimated.
​ ​ ​

εi : Error term (noise).


3. Minimization Criterion
n
min ∑(yi − y^i )2 ​ ​ ​ ​ ​

β0 ,…,βk
​ ​

i=1

4. Layman Analogy:

Imagine you have points on a 2D plane. You want to draw a single straight line through these
points that is “best fit” in the sense that the total (vertical) distance between the line and
points is minimal.
5. Example:

Suppose you want to predict a stock’s return (y ) based on the market’s return (x). You run a
simple linear regression and find:

y^ = 0.02 + 0.8 × x

Interpretation: A 1% change in market return leads to approximately 0.8% change in the


stock’s return, plus a 2% average baseline.

6. Key Assumptions (CLASSICAL OLS)

Linearity
No perfect collinearity

Zero conditional mean of errors: E(ε∣x) =0


Homoscedasticity: Constant variance of error terms

No autocorrelation in errors
Normality of errors (important for small sample inference)

2.2 Hypothesis Testing in Regression

t-test: Tests if an individual coefficient βj is significantly different from zero.


F-test: Tests if multiple coefficients are jointly zero (e.g., β1 ​ = β2 = 0).


R-squared (R2 ): Proportion of variance in y explained by the regression.

Example:

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/11
If you want to see whether interest rate and inflation together significantly predict bond prices, you
might do an F-test:

H0 : βinterest rate = 0,
​ ​ βinflation = 0

H1 : At least one of them is not zero


2.3 Endogeneity and Instrumental Variables

1. Definition

Endogeneity happens when the explanatory variable x is correlated with the error term ε.
This violates OLS assumptions and biases estimates.

2. Instrumental Variable (IV)


Introduce a new variable z (the “instrument”), which is correlated with x but uncorrelated
with the error term ε.

Then you do a two-stage approach (2SLS = Two-Stage Least Squares):

Stage 1: Regress x on z .

^ from stage 1.
Stage 2: Regress y on the predicted x

3. Analogy
If you suspect your “cause” variable (e.g., number of employees in a firm) is correlated with
hidden factors (e.g., firm’s productivity shock), you find an instrument (like government hiring
policy changes) that only influences the outcome (firm’s profit) through changes in number of
employees, not through direct correlation with productivity shocks.

3. Time Series Analysis


3.1 Stationarity

1. Definition
A stationary process has a constant mean, constant variance, and autocovariances that
depend only on lag, not on time.
2. Why It Matters
Many time series methods assume stationarity. Non-stationary data can lead to spurious
regression results.

3. Unit Root Tests


Augmented Dickey-Fuller (ADF)
Null hypothesis H0 : Unit root (non-stationary).

Alternative hypothesis H1 : Stationary.


Example: You might run the ADF test on daily returns of a stock to check if they are stationary
(they usually are) or run it on price levels (often non-stationary).

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/11
3.2 Autoregressive (AR) Models

1. AR(1) Model

yt = ϕ0 + ϕ1 yt−1 + εt
​ ​ ​ ​ ​

If ∣ϕ1 ∣
​ < 1, the series is stationary.
Interpretation: Today’s value depends on yesterday’s value and some noise.
2. Example

Monthly inflation might be modeled as:

inflationt = 0.5 + 0.8 × inflationt−1 + εt


​ ​ ​

This means 80% of last month’s inflation carries into this month, plus a constant 0.5.

3.3 Moving Average (MA) Models

1. MA(1) Model

yt = θ0 + εt + θ1 εt−1
​ ​ ​ ​ ​

Interpretation: Today’s value depends on current noise and the previous period’s noise.
2. Analogy

MA processes are like “smoothed noise” — each observation is an average of error terms
from current and past periods.

3.4 ARMA and ARIMA Models

1. ARMA(p, q)

Combines AR and MA components:


p q
yt = ϕ0 + ∑ ϕi yt−i + ∑ θj εt−j + εt
​ ​ ​ ​ ​ ​ ​ ​ ​

i=1 j=1

2. ARIMA(p, d, q)
For Integrated series that need differencing (Δyt ​ = yt − yt−1 ): ​ ​

d is the order of differencing to make the series stationary.


3. Financial Application
ARIMA might model log-prices or returns for forecasting.
Example: An ARIMA(1,1,1) indicates you difference the data once (d=1), then apply an AR(1)
and an MA(1).

3.5 Vector Autoregression (VAR)

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/11
1. Definition

A system of equations where each variable depends on its own lags and the lags of other
variables.
2. Use Case
If you have multiple time series (e.g., inflation, interest rate, GDP growth), a VAR captures
cross-variable dynamics.
3. Equation Example (VAR(1) with two variables):

( ) = ( 1 ) + ( 11 ) ( 1,t−1 ) + ( 1,t )
y1,t
​ c ​ ϕ ​ ϕ12 ​ y ​ ε ​

y2,t c2 ϕ21 ϕ22 y2,t−1 ε2,t


​ ​ ​ ​ ​ ​

​ ​ ​ ​ ​ ​

4. Cointegration and Error Correction


4.1 Cointegration

1. Definition

If two or more non-stationary series are linearly combined into a stationary series, they are
“cointegrated.”
Example: Two asset prices that individually wander (random walks) but maintain a stable
long-run relationship.
2. Johansen Test

Tests for multiple cointegration relationships in a VAR framework.


3. Why Important for Quants

Cointegrated pairs can be used in pairs trading: if the spread deviates, you expect it to revert.
4. Error Correction Model (ECM)
If yt and xt are cointegrated, an ECM captures both short-term deviations and long-term
​ ​

equilibrium:

Δyt = α(yt−1 − βxt−1 ) + (other terms) + εt


​ ​ ​ ​

α is the speed of adjustment to the equilibrium.

5. Volatility and GARCH Models


5.1 Why Model Volatility?

In finance, volatility is not constant — it clusters. Large moves tend to be followed by large moves
(high volatility), and small moves by small moves (low volatility).

5.2 ARCH(q)

1. Definition

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/11
Autoregressive Conditional Heteroskedasticity: The variance of the error term depends on
past squared errors.
2. Equation
q
εt ∼
​ N (0, σt2 ),
​ σt2 ​ = α0 + ∑ αi ε2t−i
​ ​ ​ ​

i=1

5.3 GARCH(p, q)

1. Definition

Generalized ARCH: Extends ARCH by allowing past variances to also affect current variance.
2. Equation
q p
σt2 =ω+ ∑ αi ε2t−i
​ ​ ​ + ∑ βj σt−j
2
​ ​ ​

i=1 j=1

3. Interpretation
Today’s volatility depends on past shocks to volatility (ε2t−i ) and past volatility levels (σt−j
2
).
​ ​

4. Financial Example

If a market experiences a large shock (big εt ), that can cause tomorrow’s volatility to spike.

5.4 GARCH Extensions

EGARCH, GJR-GARCH: Capture asymmetry (volatility reacts more to negative returns).

IGARCH: Has a unit root in volatility process (volatility shocks can persist indefinitely).

6. Panel Data Econometrics


6.1 Panel Data Basics

Definition: Data that tracks multiple entities (firms, countries, stocks) over time.

Example: Daily returns of 100 stocks over 2 years (cross-sectional dimension = 100, time dimension
= 500 trading days).

6.2 Fixed Effects vs. Random Effects

1. Fixed Effects

Controls for unobserved time-invariant characteristics of each entity by using entity-specific


intercepts.

Example: Each stock might have a different baseline return.

2. Random Effects
Assumes entity-specific intercepts are random draws from a distribution.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/11
More efficient than fixed effects if the random effect assumption is correct (no correlation
with regressors).

6.3 Why Important for Quants?

Allows you to use cross-sectional and time-series variation to glean more insights.

For instance, analyzing the effect of firm-specific variables on returns while controlling for each
firm’s characteristics.

7. Forecast Evaluation
7.1 In-Sample vs. Out-of-Sample

In-sample: Fit the model on a historical dataset and see how well it explains that same dataset.
Out-of-sample: Test the model on new or “future” data (not used in model fitting).

7.2 Evaluation Metrics

1. RMSE (Root Mean Squared Error)

Emphasizes large errors by squaring them.

2. MAE (Mean Absolute Error)


Averages absolute errors, less sensitive to outliers than RMSE.

3. MAPE (Mean Absolute Percentage Error)

Focuses on relative error.

7.3 Information Criteria

1. AIC (Akaike Information Criterion)

^)
AIC = 2k − 2 ln(L
^ : Maximized likelihood.
k : Number of parameters, L
2. BIC (Bayesian Information Criterion)

^)
BIC = k ln(n) − 2 ln(L

Penalizes complexity more heavily than AIC.

8. Factor Models & Principal Components


8.1 Single-Factor Model (CAPM)

1. CAPM: Ri ​ − R f = αi + β i (R m − R f ) + ε i
​ ​ ​ ​ ​ ​

Ri : Return of asset i.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/11
Rf : Risk-free rate.

Rm : Return of market portfolio.


2. Beta: Sensitivity of asset i to market.

3. Alpha: Excess return not explained by the market.

8.2 Multi-Factor Models (Fama-French)

Extend CAPM by adding extra factors (size factor, value factor, momentum, etc.).

8.3 Principal Component Analysis (PCA)

1. Definition: A technique to reduce dimensions by finding uncorrelated factors (principal


components).

2. Use Case in Finance


Often used on interest rate yield curves, extracting the main factors (level, slope, curvature).

9. Advanced Topics
9.1 Maximum Likelihood Estimation (MLE)

Definition: A general method of estimating parameters by maximizing the likelihood function of


observed data.

Example: GARCH parameters are typically estimated with MLE, by assuming errors follow a normal
(or Student’s t) distribution.

9.2 Bayesian Econometrics

Definition: Incorporates prior beliefs and updates them with data to get posterior distributions.

Example: Bayesian VAR uses priors on parameters to handle over-parameterization issues in large
VAR models.

9.3 Non-Linear and Machine Learning Methods

Non-linear methods (e.g., logistic regression for binary outcomes, neural nets for predictive tasks).

Random Forest, Gradient Boosting for classification/regression tasks on economic data.

10. Putting It All Together: An Illustrative Example


Scenario

You want to forecast next month’s stock return for a particular asset. You suspect:

1. Past returns and market returns are relevant (time series aspect).

2. Volatility clustering might matter (GARCH aspect).

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/11
3. Other factors like macro variables (inflation, interest rate) matter.

Step-by-Step

1. Stationarity Checks

Check if the asset’s returns are stationary using ADF test. Typically, returns are stationary.
2. Build an AR(1)-GARCH(1,1) Model

Let rt be the asset return at time t.


rt = ϕ0 + ϕ1 rt−1 + εt
​ ​ ​ ​ ​

εt ∼ N (0, σt2 ),
​ ​ σt2 = ω + αε2t−1 + βσt−1

2
​ ​

3. Include Exogenous Factors (VAR extension or just add them in the mean equation):

rt = ϕ0 + ϕ1 rt−1 + γ1 (market return)t + γ2 (inflation)t + εt


​ ​ ​ ​ ​ ​ ​ ​ ​

4. Estimate Model

Use MLE or a specialized time-series package to estimate the parameters


ϕ0 , ϕ1 , γ1 , γ2 , ω, α, β .
​ ​ ​ ​

5. Check Diagnostics

Ljung-Box test for autocorrelation in residuals.


ARCH LM test for no leftover volatility structure.

6. Forecast
2
Use the fitted model to project next period’s return (rt+1 ) and volatility (σt+1 ). ​ ​

7. Evaluate Performance
Compare forecast vs. actual using RMSE, MAE, or an out-of-sample R-squared.

11. Key Formulas Summary


1. OLS:

min ∑(yi − y^i )2 ​ ​ ​

2. AR(1):

yt = ϕ0 + ϕ1 yt−1 + εt
​ ​ ​ ​ ​

3. MA(1):

yt = θ0 + εt + θ1 εt−1
​ ​ ​ ​ ​

4. GARCH(1,1):

σt2 = ω + αε2t−1 + βσt−1



2
​ ​

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 9/11
5. Johansen Cointegration

System-based approach to find cointegrating vectors among multiple non-stationary time


series.

6. 2SLS (Instrumental Variables):

Stage 1: x = π0 + π1 z + …
​ ​

Stage 2: y = α0 + α1 x
​ ^+… ​

12. Tips and Tricks


1. Always check stationarity for time-series data. Non-stationary data can lead to misleading
regressions (spurious).

2. Look at residuals after fitting a model:

Do they show patterns? Are they autocorrelated? Is volatility stable or clustering?


3. Model complexity: Start simple (ARIMA, GARCH) before moving to higher-order or more complex
models (VAR, state-space, machine learning).

4. Overfitting: Use information criteria (AIC, BIC) or cross-validation to avoid too many parameters.
5. Interpretation of Coefficients: In time series, a strongly persistent AR(1) coefficient near 1 might
indicate near-random walk behavior, which can make forecasting more difficult.
6. Economic/Financial Theory helps guide variable selection and interpretation. Do not rely purely
on data-mining or purely on theory — a balance is best.

13. Concluding Remarks


Econometrics for quants is a toolbox:

Regression handles relationships among variables.

Time-series techniques handle dynamics and forecasts.

Cointegration reveals long-run equilibria.


GARCH captures volatility clustering.

Panel data uses cross-sectional plus time variation.

Factor models help decompose returns into systematic risks.

Always remember to:

Check assumptions (stationarity, no autocorrelation, homoscedasticity, or the correct GARCH


specification).

Validate models on out-of-sample data.


Combine theory (finance, economics) with rigorous statistical testing to avoid spurious results.

Use this cheat sheet as a quick reference while diving deeper into any specific topic.

Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 10/11

You might also like