0% found this document useful (0 votes)
20 views3 pages

Econometrics Cheatsheet en

Uploaded by

robster2168
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views3 pages

Econometrics Cheatsheet en

Uploaded by

robster2168
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Econometrics Cheat Sheet Assumptions and properties Ordinary Least Squares

By Marcelo Moreno - Universidad Rey Juan Carlos


Econometric model assumptions Objective - minimize
Pn the2 Sum of Squared Residuals (SSR):
The Econometrics Cheat Sheet Project
Under this assumptions, the OLS estimator will present min i=1 ûi , where ûi = yi − ŷi
good properties. Gauss-Markov assumptions:
Basic concepts 1. Parameters linearity (and weak dependence in time y
Simple regression model
Equation:
Definitions series). y must be a linear function of the β’s.
yi = β0 + β1 xi + ui
Econometrics - is a social science discipline with the 2. Random sampling. The sample from the population
Estimation:
objective of quantify the relationships between economic has been randomly taken. (Only when cross section)
ŷi = β̂0 + β̂1 xi
agents, test economic theories and evaluate and implement 3. No perfect collinearity.
where:
government and business policies. ˆ There are no independent variables that are constant: β1
β̂0 = y − β̂1 x
Econometric model - is a simplified representation of the Var(xj ) ̸= 0, ∀j = 1, . . . , k Cov(y,x)
ˆ There isn’t an exact linear relation between indepen- β̂ 1 = Var(x)
reality to explain economic phenomena. β0 x
Ceteris paribus - if all the other relevant factors remain dent variables.
constant. 4. Conditional mean zero and correlation zero.
Multiple regression model
a. There aren’t systematic errors: E(u | x1 , . . . , xk ) =
Data types y Equation:
E(u) = 0 → strong exogeneity (a implies b).
Cross section - data taken at a given moment in time, an b. There are no relevant variables left out of the model: yi = β0 + β1 x1i + · · · + βk xki + ui
static photo. Order doesn’t matter. Cov(xj , u) = 0, ∀j = 1, . . . , k → weak exogeneity. Estimation:
Time series - observation of variables across time. Order 5. Homoscedasticity. The variability of the residuals is ŷi = β̂0 + β̂1 x1i + · · · + β̂k xki
does matter. the same for all levels of x: where:
Panel data - consist of a time series for each observation Var(u | x1 , . . . , xk ) = σu2 β̂0 = y − β̂1 x1 − · · · − β̂k xk
x Cov(y,resid x )
of a cross section. 6. No auto-correlation. Residuals don’t contain infor- β0 2 β̂j = Var(resid xj )j
Pooled cross sections - combines cross section from dif- x1
mation about any other residuals: Matrix: β̂ = (X T X)−1 (X T y)
ferent time periods. Corr(ut , us | x1 , . . . , xk ) = 0, ∀t ̸= s
Phases of an econometric model 7. Normality. Residuals are independent and identically Interpretation of coefficients
distributed: u ∼ N (0, σu2 ) Model Dependent Independent β1 interpretation
1. Specification. 3. Validation. Level-level y x ∆y = β1 ∆x
2. Estimation. 4. Utilization. 8. Data size. The number of observations available must Level-log y log(x) ∆y ≈ (β1 /100)(%∆x)
be greater than (k + 1) parameters to estimate. (It is Log-level log(y) x %∆y ≈ (100β1 )∆x
Regression analysis already satisfied under asymptotic situations) Log-log log(y) log(x) %∆y ≈ β1 (%∆x)
Study and predict the mean value of a variable (dependent Quadratic y x + x2 ∆y = (β1 + 2β2 x)∆x
variable, y) regarding the base of fixed values of other vari- Asymptotic properties of OLS
ables (independent variables, x’s). In econometrics it is Under the econometric model assumptions and the Central Error measurements Pn Pn
common to use Ordinary Least Squares (OLS) for regres- Limit Theorem (CLT): Sum of Sq. Residuals: SSR = i=1 û2i = Pi=1 (yi − ŷi )2
n
sion analysis. ˆ Hold 1 to 4a: OLS is unbiased. E(β̂j ) = βj Explained Sum of Squares: SSE = Pi=1 (ŷi − y)2
n
ˆ Hold 1 to 4: OLS is consistent. plim(β̂j ) = βj (to 4b Total Sum of Sq.: SST = SSE + SSR = i=1q (yi − y)2
Correlation analysis left out 4a, weak exogeneity, biased but consistent) SSR
Correlation analysis don’t distinguish between dependent Standard Error of the Regression: σ̂u = n−k−1
ˆ Hold 1 to 5: asymptotic normality of OLS (then, 7 is p
and independent variables.
necessarily satisfied): u ∼ N (0, σu2 ) Standard Error of the β̂’s: se(β̂) = qσ̂u2 · (X T X)−1
ˆ Simple correlation measures the grade of linear associa- a
Pn
i=1 (yi −ŷi )
2

ˆ Hold 1 to 6: unbiased estimate of σu2 . E(σ̂u2 ) = σu2 Root Mean Squared Error: RMSE = n
tion between two variables.Pn Pn
ˆ Hold 1 to 6: OLS is BLUE (Best Linear Unbiased Esti- i=1 |yi −ŷi |
r = Cov(x,y)
σx ·σy =
√Pn i=1 ((xi −x)·(y
2
Pn
i −y))
2
Absolute Mean Error: AME = n
i=1 (xi −x) · i=1 (yi −y) n
P
mator) or efficient. Mean Percentage Error: MPE = i=1 |ûi /yi |
· 100
ˆ Partial correlation measures the grade of linear associa-
ˆ Hold 1 to 7: hypothesis testing and confidence intervals n
tion between two variables controlling a third.
can be done reliably.

CS-24.5.2-EN - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license


R-squared Individual tests Dummy variables
Tests if a parameter is significantly different from a given
Is a measure of the goodness of the fit, how the regression value, ϑ. Dummy (or binary) variables are used for qualitative infor-
fits the data: ˆ H0 : βj = ϑ mation like sex, civil state, country, etc.
SSE
R2 = SST = 1 − SSR
SST ˆ H1 : βj ̸= ϑ ˆ Takes the value 1 in a given category and 0 in the rest.
ˆ Measures the percentage of variation of y that is lin- β̂ −ϑ ˆ Are used to analyze and modeling structural changes
early explained by the variations of x’s. Under H0 : t = se(j β̂ ) ∼ tn−k−1,α/2
j in the model parameters.
ˆ Takes values between 0 (no linear explanation) and 1 If |t| > |tn−k−1,α/2 |, there is evidence to reject H0 . If a qualitative variable have m categories, we only have to
(total explanation). Individual significance test - tests if a parameter is sig- include (m − 1) dummy variables.
When the number of regressors increases, the value of the nificantly different from zero.
R-squared also increases, whatever the new variables are ˆ H0 : βj = 0 Structural change
ˆ H1 : βj ̸= 0 Structural change refers to changes in the values of the pa-
relevant or not. To solve this problem, there is an adjusted
β̂j rameters of the econometric model produced by the effect
R-squared by degrees of freedom (or corrected): Under H0 : t = ∼ tn−k−1,α/2
2 n−1 se(β̂j ) of different sub-populations. Structural change can be in-
R = 1 − n−k−1 · SSR n−1
SST = 1 − n−k−1 · (1 − R )
2
If |t| > |tn−k−1,α/2 |, there is evidence to reject H0 . cluded in the model through dummy variables.
2
For big sample sizes: R ≈ R2 The location of the dummy variables (D) matters:
The F test
ˆ On the intercept (additive effect) - represents the mean
Simultaneously tests multiple (linear) hypothesis about the
Hypothesis testing parameters. It makes use of a non restricted model and a
difference between the values produced by the structural
change.
restricted model:
Definitions y = β0 + δ 1 D + β 1 x 1 + u
ˆ Non restricted model - is the model on which we want
Is a rule designed to explain from a sample, if exist ev- ˆ On the slope (multiplicative effect) - represents the ef-
to test the hypothesis.
idence or not to reject an hypothesis that is made fect (slope) difference between the values produced by
ˆ Restricted model - is the model on which the hypoth-
about one or more population parameters. the structural change.
esis that we want to test have been imposed.
Elements of an hypothesis test: y = β0 + β1 x 1 + δ 1 D · x 1 + u
Then, looking at the errors, there are:
ˆ Null hypothesis (H0 ) - is the hypothesis to be tested. Chow’s structural test - analyze the existence of struc-
ˆ SSRUR - is the SSR of the non restricted model.
ˆ Alternative hypothesis (H1 ) - is the hypothesis that tural changes in all the model parameters, it’s a particular
ˆ SSRR - is the SSR of the restricted model.
cannot be rejected when H0 is rejected. SSRR −SSRUR n−k−1 expression of the F test, where H0 : No structural change
Under H0 : F = · q ∼ Fq,n−k−1
ˆ Test statistic - is a random variable whose probability SSRUR
(all δ = 0).
where k is the number of parameters of the non restricted
distribution is known under H0 .
model and q is the number of linear hypothesis tested.
ˆ Critical value (C) - is the value against which the test Changes of scale
If F > Fq,n−k−1 , there is evidence to reject H0 .
statistic is compared to determine if H0 is rejected or not.
Global significance test - tests if all the parameters as- Changes in the measurement units of the variables:
It sets the frontier between the regions of acceptance and
sociated to x’s are simultaneously equal to zero. ˆ In the endogenous variable, y ∗ = y ·λ - affects all model
rejection of H0 .
ˆ H0 : β1 = β2 = · · · = βk = 0 parameters, βj∗ = βj · λ, ∀j = 1, . . . , k
ˆ Significance level (α) - is the probability of rejecting
ˆ H1 : β1 ̸= 0 and/or β2 ̸= 0 . . . and/or βk ̸= 0 ˆ In an exogenous variable, x∗j = xj · λ - only affect the
the null hypothesis being true (Type I Error). Is chosen
In this case, we can simplify the formula for the F statistic: parameter linked to said exogenous variable, βj∗ = βj · λ
by who conduct the test. Commonly is 10%, 5% or 1%. R2 n−k−1
ˆ p-value - is the highest level of significance by which H0 Under H 0 : F = 1−R 2 · k ∼ F k,n−k−1 ˆ Same scale change on endogenous and exogenous - only
cannot be rejected. If F > Fk,n−k−1 , there is evidence to reject H0 . affects the intercept, β0∗ = β0 · λ
Two-tailed test. H0 dist. One-tailed test. H0 dist.
Confidence intervals Changes of origin
−C C C
1−α 1−α
α/ 2 α/ 2 α The confidence intervals at (1 − α) confidence level can be Changes in the measurement origin of the variables (en-
Accept. region Accept. reg. calculated: dogenous or exogenous), y ∗ = y + λ - only affects the
The rule is: if p-value < α holds, there is evidence to β̂ j ∓ tn−k−1,α/2 · se( β̂ j ) model’s intercept, β0∗ = β0 + λ
reject H0 , thus, there is evidence to accept H1 .

CS-24.5.2-EN - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license


Multicollinearity Heteroscedasticity Auto-correlation
ˆ Perfect multicollinearity - there are independent vari- The residuals ui of the population regression function do The residual of any observation, ut , is correlated with the
ables that are constant and/or there is an exact linear not have the same variance σu2 : residual of any other observation. The observations are not
relation between independent variables. Is the breaking Var(u | x1 , . . . , xk ) = Var(u) ̸= σu2 independent.
of the third (3) econometric model assumption. Is the breaking of the fifth (5) econometric model as- Corr(ut , us | x1 , . . . , xk ) = Corr(ut , us ) ̸= 0, ∀t ̸= s
ˆ Approximate multicollinearity - there are indepen- sumption. The “natural” context of this phenomena is time series. Is
dent variables that are approximately constant and/or the breaking of the sixth (6) econometric model as-
there is an approximately linear relation between inde-
Consequences sumption.
ˆ OLS estimators still are unbiased.
pendent variables. It does not break any economet-
ˆ OLS estimators still are consistent. Consequences
ric model assumption, but has an effect on OLS.
ˆ OLS is not efficient anymore, but still a LUE (Linear ˆ OLS estimators still are unbiased.
Consequences Unbiased Estimator). ˆ OLS estimators still are consistent.
ˆ Perfect multicollinearity - the equation system of ˆ Variance estimations of the estimators are biased: ˆ OLS is not efficient anymore, but still a LUE (Linear
OLS cannot be solved due to infinite solutions. the construction of confidence intervals and the hypoth- Unbiased Estimator).
ˆ Approximate multicollinearity esis testing is not reliable. ˆ Variance estimations of the estimators are biased:
– Small sample variations can induce to big variations in the construction of confidence intervals and the hypoth-
the OLS estimations.
Detection esis testing is not reliable.
ˆ Graphs - look u y
– The variance of the OLS estimators of the x’s that are
collinear, increments, thus the inference of the param-
for scatter pat- Detection
terns on x vs. u ˆ Graphs - look for scatter patterns on ut−1 vs. ut or
eter is affected. The estimation of the parameter is x
or x vs. y plots. make use of a correlogram.
very imprecise (big confidence interval).
Ac. Ac. + Ac. −
x ut ut ut
Detection
ˆ Correlation analysis - look for high correlations be- ˆ Formal tests - White, Bartlett, Breusch-Pagan, etc.
tween independent variables, |r| > 0.7. Commonly, H0 : No heteroscedasticity. ut−1 ut−1
ˆ Variance Inflation Factor (VIF) - indicates the in- ut−1
Correction
crement of Var(β̂j ) because of the multicollinearity. ˆ Use OLS with a variance-covariance matrix estimator ro-
1
VIF(β̂j ) = 1−R 2 bust to heteroscedasticity (HC), for example, the one pro-
2
j ˆ Formal tests - Durbin-Watson, Breusch-Godfrey, etc.
where Rj denotes the R-squared from a regression be- posed by White. Commonly, H0 : No auto-correlation.
tween xj and all the other x’s. ˆ If the variance structure is known, make use of Weighted
– Values between 4 to 10 - there might be multicollinear- Least Squares (WLS) or Generalized Least Squares Correction
ity problems. (GLS): ˆ Use OLS with a variance-covariance matrix estimator ro-
– Values > 10 - there are multicollinearity problems. – Supposing that Var(u) = σu2 ·xi , divide the model vari- bust to heterocedasticity and auto-correlation (HAC), for
One typical characteristic of multicollinearity is that the ables by the square root of xi and apply OLS. example, the one proposed by Newey-West.
regression coefficients of the model aren’t individually dif- – Supposing that Var(u) = σu2 · x2i , divide the model ˆ Use Generalized Least Squares. Supposing yt = β0 +
ferent from zero (due to high variances), but jointly they variables by xi (the square root of x2i ) and apply OLS. β1 xt + ut , with ut = ρut−1 + εt , where |ρ| < 1 and εt is
are different from zero. ˆ If the variance structure is not known, make use of Fea- white noise.
sible Weighted Least Squared (FWLS), that estimates a – If ρ is known, create a quasi-differentiated model where
Correction possible variance, divides the model variables by it and ut is white noise and estimate it by OLS.
ˆ Delete one of the collinear variables. then apply OLS. – If ρ is not known, estimate it by -for example- the
ˆ Perform factorial analysis (or any other dimension reduc- ˆ Make a new model specification, for example, logarithmic Cochrane-Orcutt method, create a quasi-differentiated
tion technique) on the collinear variables. transformation (lower variance). model where ut is white noise and estimate it by OLS.
ˆ Interpret coefficients with multicollinearity jointly.

CS-24.5.2-EN - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license

You might also like