EC226 - Econometrics (Revision Guide - Simple Linear Regression)
EC226 - Econometrics (Revision Guide - Simple Linear Regression)
EC226 - Econometrics (Revision Guide - Simple Linear Regression)
SUMMARY OF HANDOUT 1:
TWO VARIABLES LINEAR REGRESSION
ANALYSIS
• REGRESSION
• INTERPRETING COEFFICIENTS
∑𝑛 ̅)
𝑖=1(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦
𝐶𝑜𝑣(𝑥, 𝑦) = 𝑛−1
Drawbacks of Covariance:
Solution?
Switch to Correlation. Think of correlation as a modified version of covariance.
𝐶𝑜𝑣(𝑥,𝑦)
𝐶𝑜𝑟𝑟(𝑥, 𝑦) = 𝜌(𝑥, 𝑦) =
√𝑉(𝑥).𝑉(𝑦)
The closer the correlation to + 1, the stronger the positive linear relationship.
The closer the correlation to - 1, the stronger the negative linear relationship.
Note: Covariance and Correlation are measures of linear association. If the covariance (and correlation)
between two variables is 0, then it doesn’t imply that the variables are independent. They may still have some
non-linear relationship.
REGRESSION
Looks at the linear causal association between the random variables (Causal is the key word here)
To keep it simple, think of 2 variables for now (we can also have more than 2 variables)
y – Dependent Variable or Endogenous Variable or Regressand
x – Independent Variable or Exogenous Variable or Explanatory Variable or Regressor
𝑦𝑖 = 𝐸(𝑦|𝑥) + 𝜀𝑖 𝑦𝑖 = 𝑦̂𝑖 + 𝑒𝑖
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜀𝑖 𝑦𝑖 = 𝑎 + 𝑏𝑥𝑖 + 𝑒𝑖
𝜀𝑖 : Random error term (for population) or 𝑒𝑖 : Random error term (for sample) or
Disturbance Term Residuals
Note: The assumptions are on population error term (𝜀𝑖 ), not on sample error term (𝑒𝑖 )
Result: 𝑎 = 𝑦̅ − 𝑏𝑥̅
Note: While working with Step 2, we find that ∑𝑛𝑖=1 𝑒𝑖 = 0. This is one of the algebraic properties of OLS. So,
the sum of residuals is zero (this is not an assumption, this is by construction).
∑𝑛 ̅
𝑖=1 𝑥𝑖 𝑦𝑖 −𝑛𝑥̅ 𝑦
Result: b = ∑𝑛 2 2 (There are multiple ways of writing the ‘b’ formula)
𝑖=1 𝑥𝑖 −𝑛𝑥̅
Note: While working with Step 2, we find that ∑𝑛𝑖=1 𝑥𝑖 𝑒𝑖 = 0. This is one of the algebraic properties of OLS.
∑𝑛 2
𝑖=1 𝑒𝑖 𝑅𝑆𝑆
𝑠2 = = 𝐷𝑜𝐹
𝐷𝑜𝐹
where, DOF (degrees of freedom) = n – Total number of parameters that we estimate in a model
Note: Gauss Markov Theorem is a conditional statement. For OLS estimators to be BLUE, the first three CLRM
assumptions have to be satisfied. The fourth assumption has nothing to do with the BLUE property of OLS
estimators.
Note: The unbiasedness property of OLS estimators relies on Assumption 1. It doesn’t require Assumption 2
and Assumption 3.
∑𝑛 2
𝑖=1 𝑥𝑖
𝑉(𝑎|𝑥) = 𝜎 2 ∗
𝑛 ∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
𝜎2
𝑉(𝑏|𝑥) = ∑𝑛 2
𝑖=1(𝑥𝑖 −𝑥̅ )
We need to add Assumption 2 and Assumption 3 to get the above formulas. The above variance formulas
provide us the efficiency of OLS estimators (mathematical proof omitted)
Note: Assumption 4 (Distribution of Population errors) has no role in the unbiasedness and efficiency of OLS
estimators. Then, why do we even have this assumption? Well, this assumption helps us with Hypothesis
Testing (Next Part)
4. Compute t-statistic
𝑏 − 𝛽0 𝑠2
𝑡= , where 𝑠𝑏 = √∑𝑛 2 (𝑠𝑏 is the standard error of b)
𝑠𝑏 𝑖=1 𝑖 −𝑥̅ )
(𝑥
5. Rejection Rule
Total Sum of Squares (TSS) = Explained Sum of Squares (ESS) + Residual Sum of Squares (RSS)
𝑅𝑆𝑆 𝐸𝑆𝑆
𝑅2 = 1 − 𝑇𝑆𝑆 = 𝑇𝑆𝑆
Interpretation of 𝑅2 : The proportion of the total variation in the dependent variable that the model can
explain.
1. Maximum Value of 𝑅2 is 1
2. Minimum Value of 𝑅2 is 0
3. 𝑅2 cannot be negative
4. We can only compare the 𝑅2 of two models, if the models have the same dependent variable. If two models
have different dependent variables, then we cannot do a comparison of 𝑅 2
INTERPRETING COEFFICIENTS
1. 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜀𝑖
The units of both the variables (x and y) matter. So, pay attention to the units.
2. 𝑦𝑖 = 𝛼 + 𝛽 𝑙𝑛(𝑥𝑖 ) + 𝜀𝑖
The unit of the variable that has a log doesn’t matter. In this case, the variable ‘x’ has a log, so it’s unit won’t
matter in the interpretation. The variable ‘y’ doesn’t have a log, so it’s unit will be a part of the interpretation.
Note: There are some cases under which this pattern of interpretation won’t hold. Example: Interpretation
will change if you increase x by 100% instead of 1%.
3. 𝑙𝑛(𝑦𝑖 ) = 𝛼 + 𝛽𝑥𝑖 + 𝜀𝑖
As mentioned earlier, the unit of the variable that has a log doesn’t matter. In this case, the variable ‘y’ has a
log, so it’s unit won’t matter in the interpretation. The variable ‘x’ doesn’t have a log, so it’s unit will be a part
of the interpretation.
Note: There are some cases under which this pattern of interpretation won’t hold. Example: Interpretation
will change if 𝛽 > 0.10. In that case, the interpretation would be: If x increases by 1 unit, then y is expected to
increase by (𝑒 𝛽 − 1) ∗ 100 %
As mentioned earlier, the unit of the variable that has a log doesn’t matter. In this case, both the variables
have a log, so their units won’t matter in the interpretation.
𝛽: Elasticity of y with respect to x. In other words, if x increases by 1%, then y is expected to increase by 𝛽%