0% found this document useful (0 votes)
9 views4 pages

Econometrics Cheat Sheet

This document is a comprehensive cheat sheet for Econometrics 242 at the University of the Western Cape, covering key concepts such as OLS formulas, Gauss-Markov assumptions, data types, hypothesis testing, and regression models. It includes definitions, formulas, and interpretations relevant to econometric analysis, including the use of instrumental variables and panel data. The cheat sheet serves as a quick reference for students studying econometrics, summarizing essential statistical techniques and their applications.

Uploaded by

mihlemahashe2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Econometrics Cheat Sheet

This document is a comprehensive cheat sheet for Econometrics 242 at the University of the Western Cape, covering key concepts such as OLS formulas, Gauss-Markov assumptions, data types, hypothesis testing, and regression models. It includes definitions, formulas, and interpretations relevant to econometric analysis, including the use of instrumental variables and panel data. The cheat sheet serves as a quick reference for students studying econometrics, summarizing essential statistical techniques and their applications.

Uploaded by

mihlemahashe2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

lOMoARcPSD|44392029

Econometrics Cheat Sheet

Econometrics 242 (University of the Western Cape)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by MIHLE MAHASHE ([email protected])
lOMoARcPSD|44392029

Econometrics Cheat Sheet OLS formulas Gauss-Markov Assumptions


by Tyler Ransom, University of Oklahoma To estimate β̂0 and β̂1 , we make two assumptions: 1. y is a linear function of the β’s
@tyleransom
1. E (u) = 0
2. y and x’s are randomly sampled from population
2. E (u|x) = E (u) for all x
Data & Causality 3. No perfect multicollinearity
When these hold, we get the following formulas:
Basics about data types and causality.
β̂0 = y − β̂1 x 4. E (u|x1 , . . . , xk ) = E (u) = 0 (Unconfoundedness)
Types of data V

Experimental Data from randomized experiment Cov (y, x) 5. V ar (u|x1 , . . . , xk ) = V ar (u) = σ 2 (Homoskedasticity)
β̂1 = V

Observational Data collected passively V ar (x)


Cross-sectional Multiple units, one point in time When (1)-(4) hold: OLS is unbiased; i.e. E(β̂j ) = βj
Time series Single unit, multiple points in time fitted values (ŷi ) ŷi = β̂0 + β̂1 xi
residuals (ûi ) ûi = yi − ŷi When (1)-(5) hold: OLS is Best Linear Unbiased Estimator
Longitudinal (or Panel)Multiple units followed over multiple P
time periods Total Sum of Squares SST = N (y − y)2
Pi=1 i Variance of u (a.k.a. “error variance”)
Experimental data Expl. Sum of Squares SSE = N (ŷ − y)2
Pi=1 2i
• Correlation =⇒ Causality Resid. Sum of Squares SSR = N i=1 ûi
R-squared (R2 ) R2 = SSE ; SSR
• Very rare in Social Sciences SST σ̂ 2 =
“frac. of var. in y explained by x” N −K−1
Statistics basics Algebraic properties of OLS estimates 1 XN
PN = û2
We examine a random sample of data to learn about the
i=1 ûi = 0 (mean & sum of residuals is zero) N − K − 1 i=1 i
population PN
i=1 xi ûi = 0 (zero covariance bet. x and resids.)
Random sample Representative of population Variance and Standard Error of β̂j
The OLS line (SRF) always passes through (x, y)
Parameter (θ) Some number describing population
Estimator of θ Rule assigning value of θ to sample SSE + SSR = SST
σ2
1 PN 0 ≤ R2 ≤ 1 V ar(β̂j ) = , j = 1, 2, ..., k
e.g. Sample average, Y = N i=1 Yi SSTj (1 − Rj2 )
Estimate of θ What the estimator spits out
for a particular sample (θ̂) Interpretation and functional form where
Sampling distribution Distribution of estimates Our model is restricted to be linear in parameters
N
X
across all possible samples But not linear in x SSTj = (N − 1)V ar(xj ) = (xij − xj )
Bias of estimator W E (W ) − θ
Other functional forms can give more realistic model i=1
Efficiency W efficient if V ar(W ) < V ar(Wf)
Rj2 = R2 from a regression of xj on all other x’s
Consistency W consistent if θ̂ → θ as N → ∞ Model DV RHS Interpretation of β1

Hypothesis testing Standard deviation:
Level-level y x ∆y = β1 ∆x pV ar
V

The way we answer yes/no questions about our population Level-log y log(x) ∆y ≈ (β1 /100) [1%∆x] Standard error: V ar
using a sample of data. e.g. “Does increasing public school Log-level log(y) x %∆y ≈ (100β1 ) ∆x s
spending increase student achievement?” Log-log log(y) log(x) %∆y ≈ β1 %∆x σ̂ 2
se(β̂j ) = , j = 1, . . . , k
null hypothesis (H0 ) Typically, H0 : θ = 0 Quadratic y x + x2 ∆y = (β1 + 2β2 x) ∆x SSTj (1 − Rj2 )
alt. hypothesis (Ha ) Typically, H0 : θ 6= 0 Note: DV = dependent variable; RHS = right hand side
significance level (α) Tolerance for making Type I error; Classical Linear Model (CLM)
(e.g. 10%, 5%, or 1%) Multiple Regression Model Add a 6th assumption to Gauss-Markov:
test statistic (T ) Some function of the sample of data Multiple regression is more useful than simple regression
critical value (c) Value of T such that reject H0 if |T | > c; 
because we can more plausibly estimate ceteris paribus 6. u is distributed N 0, σ 2
c depends on α; relationships (i.e. E (u|x) = E (u) is more plausible)
c depends on if 1- or 2-sided test Need this to know what the exact distribution of β̂j is
p-value Largest α at which fail to reject H0 ; y = β0 + β1 x 1 + · · · + βk x k + u
reject H0 if p < α β̂1 , . . . , β̂k : partial effect of each of the x’s on y • If A(6) fails, need asymptotics to test β’s
Simple Regression Model • Then, interpret distr. of β̂j as asymptotic (not exact)
Regression is useful because we can estimate a ceteris paribus β̂0 = y − β̂1 x1 − · · · − β̂k xk
relationship between some variable x and our outcome y V

Cov (y, residualized xj )


β̂j = V Testing Hypotheses about the β’s
V ar (residualized xj )
y = β0 + β1 x + u • Under A (1)–(6), can test hypotheses about the β’s
where “residualized xj ” means the residuals from OLS
We want to estimate β̂1 , which gives us the effect of x on y. regression of xj on all other x’s (i.e. x1 , . . . , xj−1 , xj+1 , . . . xk ) • Or, (much more plausible) A (1)–(5) + asymptotics

Downloaded by MIHLE MAHASHE ([email protected])


lOMoARcPSD|44392029

t-test for simple hypotheses Interaction terms TS Forecasting


To test a simple hypothesis like interaction term: When two x’s are multiplied together A good forecast minimizes forecasting error fˆt :
 h i
H 0 : βj = 0 y = β0 + β1 happy + β2 age + β3 happy × age + u min E e2t+1 |It = E (yt+1 − ft )2 |It
ft
Ha : βj 6= 0
β3 : difference in age slope for those who are happy where It is the information set
use a t-test: compared to those who are unhappy
β̂j − 0 RMSE measures forecast performance (on future data):
t=   Linear Probability Model (LPM) v
se β̂j u m−1
u1 X
When y is a dummy variable, e.g. Root Mean Squared Error = t ê2
where 0 is the null hypothesized value. m h=0 T +h+1
happy = β0 + β1 age + β2 income + u
Reject H0 if p < α or if |t| > c (See: Hypothesis testing)
β’s are interpreted as change in probability: Model with lowest RMSE is best forecast
F -test for joint hypotheses
• Can choose ft in many ways
Can’t use a t-test for joint hypotheses, e.g.: ∆ Pr(y = 1) = β1 ∆x
• Basic way: ŷT +1 from linear model
H0 : β3 = 0, β4 = 0, β5 = 0 By definition, homoskedasticity is violated in the LPM
• ARIMA, ARMA-GARCH are cutting-edge models
Ha : β3 6= 0 OR β4 6= 0 OR β5 6= 0
Instead, use F statistic:
Time Series (TS) data Granger causality
• Observe one unit over many time periods z Granger causes y if, after controlling for past values of y,
(SSRr − SSRur )/(dfr − dfur ) (SSRr − SSRur )/q past values of z help forecast yt
F = = • e.g. US quarterly GDP, 3-month T-bill rate, etc.
SSRur /dfur SSRur /(N − k − 1)
where • New G-M assumption: no serial correlation in ut CLM violations
Heteroskedasticity
SSRr = SSR of restricted model (if H0 true) • Remove random sampling assumption (makes no sense)
• Test: Breusch-Pagan or White tests (H0 : homosk.)
SSRur = SSR of unrestricted model (if H0 false)
Two focuses of TS data • If H0 rejected, SEs, t-, and F-stats are invalid
q = # of equalities in H0 ?
1. Causality (e.g. ↑ taxes =⇒ ↓ GDP growth) • Instead use heterosk.-robust SEs and t- and F-stats
N − k − 1 = Deg. Freedom of unrestricted model
2. Forecasting (e.g. AAPL stock price next quarter?) Serial correlation
Reject H0 if p < α or if F > c (See: Hypothesis testing) • Test: Breusch-Godfrey test (H0 : no serial corr.)
Requirements for TS data
• If H0 rejected, SEs, t-, and F-stats are invalid
Note: F > 0, always To properly use TS data for causal inf / forecasting,
need data free of the following elements: • Instead use HAC SEs and t- and F-stats
Qualitative data • HAC: “Heterosk. and Autocorrelation Consistent”
• Can use qualitative data in our model Trends: y always ↑ or ↓ every period
• Must create a dummy variable
Seasonality: y always ↑ or ↓ at regular intervals Measurement error
Non-stationarity: y has a unit root; i.e. not stable • Measurement error in x can be a violation of A4
• e.g. “Yes” represented by 1 and “No” by 0
Otherwise, R2 and β̂j ’s are misleading
• Attenuation bias: βj biased towards 0
dummy variable trap: Perfect collinearity that happens
when too many dummy variables are included in the model AR(1) and Unit Root Processes Omitted Variable Bias
y = β0 + β1 happy + β2 not happy + u AR(1) model (Auto Regressive of order 1): When an important x is excluded: omitted variable bias.
The above equation suffers from the dummy variable trap yt = ρyt−1 + ut Bias depends on two forces:
because units can only be “happy” or “not happy,” so 1. Partial effect of x2 on y (i.e. β2 )
including both would result in perfect collinearity with the Stable if |ρ| < 1; Unit Root if |ρ| ≥ 1
intercept “Non-stationary,” “Unit Root,” “Integrated” are all 2. Correlation between x2 and x1
synonymous Which direction does the bias go?
Interpretation of dummy variables
Interpretation of dummy variable coefficients is always relative Correcting for Non-stationarity
Corr(x1 , x2 ) > 0 Corr(x1 , x2 ) < 0
to the excluded category (e.g. not happy):
Easiest way is to take a first difference:
β2 > 0 Positive Bias Negative Bias
y = β0 + β1 happy + β2 age + u β2 < 0 Negative Bias Positive Bias
First difference: Use ∆y = yt − yt−1 instead of yt
β1 : avg. y for those who are happy compared to those who Test for unit root: Augmented Dickey-Fuller (ADF) test Note: “Positive bias” means β1 is too big;
are unhappy, holding fixed age H0 of ADF test: y has a unit root “Negative bias” means β1 is too small

Downloaded by MIHLE MAHASHE ([email protected])


lOMoARcPSD|44392029

How to resolve E (u|x) 6= 0 • Can also use with more than 2 time periods Binary dependent variables
How can we get unbiased β̂j ’s when E (u|x) 6= 0? • δ̂1 has same interpretation, different math formula Three options for estimation when y is binary (0/1):

Validity: • Linear Probability Model


• Include lagged y as a regressor
• Need y changing across time and treatment for reasons • Logit
• Include proxy variables for omitted ones
only due to the policy
• Use instrumental variables • Probit
• a.k.a. parallel trends assumption
• Use a natural experiment (e.g. diff-in-diff) Latter two are nonlinear models:
Panel data
• Use panel data P (y = 1 | x) = G (β0 + β1 x1 + β2 x2 )
Follow same sample of units over multiple time periods
Instrumental Variables (IV) yit = β0 + β1 xit1 + · · · + βk xitk + ai + uit where G (·) is some nonlinear function satisfying 0 < G (·) < 1
| {z }
A variable z, called the instrument, satisfies: νit Trade-offs with logit/probit
1. cov (z, u) = 0 (not testable) • νit = composite error Disadvantages
2. cov (z, x) 6= 0 (testable) • ai = unit-specific unobservables • Now it’s much harder to estimate and interpret β’s!
• uit = idiosyncratic error
• Can no longer use OLS; instead use maximum likelihood
z typically comes from a natural experiment • Allow E (a|x) 6= 0
• Nonlinear G (·) =⇒
• Maintain E (u|x) = 0
cov (z, y) – Must use chain rule to compute slope
β̂IV =
cov (z, x) Four different methods of estimating βj ’s:
– Slope of tangent line depends on x!
• SE’s much larger when using IV compared to OLS 1. Pooled OLS (i.e. ignore composite error)
2. First differences (FD): Main advantage
• Be aware of weak instruments
∆yi = β1 ∆xi1 + · · · + ∆βk xik + ∆ui • Now 0 < yb < 1 =⇒ more realistic
When there are multiple instruments:
estimated via Pooled OLS on transformed data • (Recall: in LPM, possible to have negative probabilities)
• use Two-stage least squares (2SLS)
3. Fixed effects (FE):
• exclude at least as many z’s as endogenous x’s Common choices for G (·)
yit − y i = β1 (xit1 − xi1 ) + · · · Logit model:
1st stage: regress endogenous x on z’s and exogenous x’s
+ βk (xitk − xik ) + (uit − ui )
2nd stage: regress y on x̂ and exogenous x’s exp (xβ)
estimated via Pooled OLS on transformed data G (xβ) = = Λ (xβ)
Test for weak instruments: Instrument is weak if 1 + exp (xβ)
4. Random effects (RE):
• 1st stage F stat < 10 Probit model:
√ yit − θy i = β0 (1 − θ) + β1 (xit1 − θxi1 ) + · · ·
• or 1st stage |t| < 10 ≈ 3.2 Z xβ
+ βk (xitk − θxik ) + (νit − θν i ) G (xβ) = φ (z) dz = Φ (xβ)
Difference in Differences (DiD) estimated via FGLS, where −∞

Can get causal effects from pooled cross sectional data s where φ (·) is the standard normal pdf
2
σu
A nat. experiment divides units into treatment, control groups θ =1− 2
σu + T σa2 Interpreting logit/probit parameter estimates
β̂RE → β̂F E as θ → 1 • β’s that come from logit/probit 6= β’s from LPM
yit = β0 + δ0 d2t + β1 dTit + δ1 d2t × dTit + uit
β̂RE → β̂P OLS as θ → 0 • But, sign is same
where
• In logit/probit, we have
• d2t = dummy for being in time period 2
RE assumes E (a|x) = 0 ∂p (x)
• dTit = dummy for being in the treatment group = βj g (xβ)
∂xj
• δ̂1 = difference in differences Cluster-robust SEs
  • Serial correlation of νit is a problem where g (xβ) is the first derivative of G (xβ)
δ̂1 = y treat,2 − y control,2 − y treat,1 − y control,1
• Use cluster-robust SEs ∂p(x)
• In LPM, we have ∂xj
= βj
Extensions: • These correct for serial corr. and heterosk.
• Can also include x’s in the model • Cluster at the unit level Layout by Winston Chang, http://wch.github.io/latexsheet/

Downloaded by MIHLE MAHASHE ([email protected])

You might also like