EC226 - Econometrics (Revision Guide - Simple Linear Regression)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

EC226 ECONOMETRICS

(OFFERED BY UNIVERSITY OF WARWICK)

SUMMARY OF HANDOUT 1:
TWO VARIABLES LINEAR REGRESSION
ANALYSIS

MADE BY: SHUBHAM KALRA 1


LIST OF TOPICS COVERED
(QUIZ INCLUDED):

• CORRELATION VERSUS REGRESSION ANALYSIS

• REGRESSION

• CLASSICAL LINEAR REGRESSION MODEL (CLRM)


ASSUMPTIONS

• ESTIMATING THE POPULATION PARAMETERS

• PROPERTIES OF OLS ESTIMATORS

• HYPOTHESIS TESTING (5-Step Procedure)

• MEASURE OF GOODNESS OF FIT

• INTERPRETING COEFFICIENTS

ADDITIONAL RESOURCES AND SUPPORT:

• TEST YOUR KNOWLEDGE - ACCESS THE ONLINE QUIZ

• STRUGGLING WITH ECONOMETRICS? SCHEDULE A FREE


DISCUSSION CALL

MADE BY: SHUBHAM KALRA 2


HANDOUT 1: TWO VARIABLES LINEAR REGRESSION ANALYSIS

CORRELATION VERSUS REGRESSION ANALYSIS

Correlation and Covariance: Measures of LINEAR ASSOCIATION

∑𝑛 ̅)
𝑖=1(𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦
𝐶𝑜𝑣(𝑥, 𝑦) = 𝑛−1

If 𝐶𝑜𝑣(𝑥, 𝑦) = 0, then there is no linear relationship between x and y.


If 𝐶𝑜𝑣(𝑥, 𝑦) > 0, then there is a positive linear relationship between x and y.
If 𝐶𝑜𝑣(𝑥, 𝑦) < 0, then there is a negative linear relationship between x and y.

Drawbacks of Covariance:

1) It’s not a scale free measure.


Example: Let x be height (in inches) and y be weight (in kilograms)
Assume Cov (Height, Weight) = +2000
If you decide to measure y (i.e. weight) in grams instead of kilograms, then the covariance between height and
weight will change.

2) Tells the direction of the linear relationship (not the strength)


Continuing with the previous example, Cov (Height, Weight) = +2000, indicates a positive linear relationship
(as covariance > 0). However, the number, 2000, doesn’t tell us anything about the strength of the linear
relationship. We don’t know whether the linear relationship is strongly positive, mildly positive or weakly
positive.

Solution?
Switch to Correlation. Think of correlation as a modified version of covariance.

𝐶𝑜𝑣(𝑥,𝑦)
𝐶𝑜𝑟𝑟(𝑥, 𝑦) = 𝜌(𝑥, 𝑦) =
√𝑉(𝑥).𝑉(𝑦)

Correlation takes care of the drawbacks of the covariance.

1) Correlation is a scale free measure


Example: Let x be height (in inches) and y be weight (in kilograms)
Assume Corr (Height, Weight) = + 0.75
If you decide to measure y (i.e. weight) in grams instead of kilograms, then the correlation between height
and weight won’t change.

2) Tells the direction as well as the strength of the linear relationship

Correlation takes value between -1 and +1.

MADE BY: SHUBHAM KALRA 3


If 𝐶𝑜𝑟𝑟(𝑥, 𝑦) = 0, then there is no linear relationship between x and y.
If 𝐶𝑜𝑟𝑟(𝑥, 𝑦) > 0, then there is a positive linear relationship between x and y.
If 𝐶𝑜𝑟𝑟(𝑥, 𝑦) < 0, then there is a negative linear relationship between x and y.
If 𝐶𝑜𝑟𝑟(𝑥, 𝑦) = + 1, then there is a perfect positive linear relationship between x and y.
If 𝐶𝑜𝑟𝑟(𝑥, 𝑦) = − 1, then there is a perfect negative linear relationship between x and y.

The closer the correlation to + 1, the stronger the positive linear relationship.
The closer the correlation to - 1, the stronger the negative linear relationship.

Note: Covariance and Correlation are measures of linear association. If the covariance (and correlation)
between two variables is 0, then it doesn’t imply that the variables are independent. They may still have some
non-linear relationship.

Why switch to regression from correlation (covariance)?


Correlation doesn’t show us the cause and effect relationship. If Correlation (x,y) = + 0.9, then it implies that x
and y have a strong positive linear relationship i.e. both of them move together in the same direction. We
won’t be able to tell if x causes y or if y causes x or if both of them are just moving together without any direct
relationship.

REGRESSION

Looks at the linear causal association between the random variables (Causal is the key word here)

To keep it simple, think of 2 variables for now (we can also have more than 2 variables)
y – Dependent Variable or Endogenous Variable or Regressand
x – Independent Variable or Exogenous Variable or Explanatory Variable or Regressor

Population Equations: Sample Equations:

𝐸(𝑦|𝑥) = 𝛼 + 𝛽𝑥𝑖 𝑦̂𝑖 = 𝑎 + 𝑏𝑥𝑖

𝑦𝑖 = 𝐸(𝑦|𝑥) + 𝜀𝑖 𝑦𝑖 = 𝑦̂𝑖 + 𝑒𝑖

𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜀𝑖 𝑦𝑖 = 𝑎 + 𝑏𝑥𝑖 + 𝑒𝑖

𝛼 : Population Intercept Parameter (Unknown but 𝑎 : Estimator of 𝛼


constant)

𝛽 : Population Slope Parameter (Unknown but 𝑏 : Estimator of 𝛽


constant)

𝜀𝑖 : Random error term (for population) or 𝑒𝑖 : Random error term (for sample) or
Disturbance Term Residuals

MADE BY: SHUBHAM KALRA 4


CLASSICAL LINEAR REGRESSION MODEL (CLRM) ASSUMPTIONS

1) 𝐸(𝜀𝑖 |𝑥𝑖 ) = 𝐸(𝜀𝑖 ) = 0 (the error term is independent of xi)


2) 𝑉(𝜀𝑖 |𝑥𝑖 ) = 𝜎 2 (Homoscedasticity)
3) 𝐶𝑜𝑣(𝜀𝑖 , 𝜀𝑗 |𝑥𝑖 ) = 0 , i ≠ j (the errors are serially uncorrelated over observations)
4) 𝜀𝑖 |𝑥𝑖 ~𝑁(0, 𝜎 2 ) (the errors are normally distributed)

Note: The assumptions are on population error term (𝜀𝑖 ), not on sample error term (𝑒𝑖 )

ESTIMATING THE POPULATION PARAMETERS

Till now, we have introduced three population parameters in total: 𝛼, 𝛽 and 𝜎 2


𝛼 ∶ Population Intercept Parameter
𝛽: Population Slope Parameter
𝜎 2 : Variance of 𝜀𝑖 |𝑥𝑖 (This is Assumption 2)

Question: How to estimate these population parameters?

Use method of OLS to find estimators of 𝛼 and 𝛽 (requires derivation).


We won’t do any derivation for estimator of 𝜎 2 , there is a straight forward formula for this.

Method of Ordinary Least Squares (OLS)


Choose estimator of 𝛼 (denoted by ‘a’) and estimator of 𝛽(denoted by ‘b’) such that the Residuals sum of
squares (RSS) is minimized.
Mathematical expression for RSS ( ∑𝑛𝑖=1 𝑒𝑖 2 )
From the sample equations, we know that 𝑦𝑖 = 𝑎 + 𝑏𝑥𝑖 + 𝑒𝑖
This implies,
∑𝑛𝑖=1 𝑒𝑖2 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 )2

How to find the estimator of 𝛼 (denoted by ‘a’)


Step 1: Partially differentiate ∑𝑛𝑖=1 𝑒𝑖 2 with respect to ‘a’
Step 2: Put the result from Step 1 equal to 0 and solve for ‘a’

Result: 𝑎 = 𝑦̅ − 𝑏𝑥̅

Note: While working with Step 2, we find that ∑𝑛𝑖=1 𝑒𝑖 = 0. This is one of the algebraic properties of OLS. So,
the sum of residuals is zero (this is not an assumption, this is by construction).

How to find the estimator of 𝛽 (denoted by ‘b’)


Step 1: Partially differentiate ∑𝑛𝑖=1 𝑒𝑖 2 with respect to ‘b’
Step 2: Put the result from Step 1 equal to 0 and solve for ‘b’

∑𝑛 ̅
𝑖=1 𝑥𝑖 𝑦𝑖 −𝑛𝑥̅ 𝑦
Result: b = ∑𝑛 2 2 (There are multiple ways of writing the ‘b’ formula)
𝑖=1 𝑥𝑖 −𝑛𝑥̅

Note: While working with Step 2, we find that ∑𝑛𝑖=1 𝑥𝑖 𝑒𝑖 = 0. This is one of the algebraic properties of OLS.

MADE BY: SHUBHAM KALRA 5


We can use this algebraic property to show that the covariance between the residuals (e i) and the
independent variable (Xi) is 0. (Again, this is not an assumption, this is by construction).

Straight forward formula for s2 (estimator of 𝜎 2 ) - No derivation for this!

∑𝑛 2
𝑖=1 𝑒𝑖 𝑅𝑆𝑆
𝑠2 = = 𝐷𝑜𝐹
𝐷𝑜𝐹

where, DOF (degrees of freedom) = n – Total number of parameters that we estimate in a model

PROPERTIES OF OLS ESTIMATORS

Gauss Markov Theorem


If the first 3 CLRM assumptions are satisfied, then the OLS estimators are BLUE.

Best (minimum variance as compared to any alternative estimator)


Linear (linear function of the error term)
Unbiased (𝐸(𝑏|𝑥) = 𝛽, 𝐸(𝑎|𝑥) = 𝛼)
Estimators

Note: Gauss Markov Theorem is a conditional statement. For OLS estimators to be BLUE, the first three CLRM
assumptions have to be satisfied. The fourth assumption has nothing to do with the BLUE property of OLS
estimators.

Unbiasedness of OLS Estimators

For unbiasedness, we need to show:


𝐸(𝑎|𝑥) = 𝛼
𝐸(𝑏|𝑥) = 𝛽

Note: The unbiasedness property of OLS estimators relies on Assumption 1. It doesn’t require Assumption 2
and Assumption 3.

Variance of OLS Estimators

∑𝑛 2
𝑖=1 𝑥𝑖
𝑉(𝑎|𝑥) = 𝜎 2 ∗
𝑛 ∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2

𝜎2
𝑉(𝑏|𝑥) = ∑𝑛 2
𝑖=1(𝑥𝑖 −𝑥̅ )

We need to add Assumption 2 and Assumption 3 to get the above formulas. The above variance formulas
provide us the efficiency of OLS estimators (mathematical proof omitted)

Note: Assumption 4 (Distribution of Population errors) has no role in the unbiasedness and efficiency of OLS
estimators. Then, why do we even have this assumption? Well, this assumption helps us with Hypothesis
Testing (Next Part)

MADE BY: SHUBHAM KALRA 6


HYPOTHESIS TESTING (5-Step Procedure)

1. Set up the Null Hypothesis 𝐻0 : 𝛽 = 𝛽0


where 𝛽0 is a constant. It could be any value: 0, 5, -2 etc.
Make sure you set the hypothesis on population parameters and not on estimators. So, don’t write
𝐻0 : 𝑏 = 5

2. Set up the Alternative Hypothesis


𝐻1 : 𝛽 ≠ 𝛽0 (This is a two-tailed test)
Alternative Hypothesis can also be one-tailed 𝐻1 : 𝛽 > 𝛽0 OR 𝐻1 : 𝛽 < 𝛽0

3. Compute critical values from the t-table


Degrees of Freedom (𝐷𝑜𝐹) = 𝑛 − number of estimated parameters in the model
where, n is the sample size

4. Compute t-statistic

𝑏 − 𝛽0 𝑠2
𝑡= , where 𝑠𝑏 = √∑𝑛 2 (𝑠𝑏 is the standard error of b)
𝑠𝑏 𝑖=1 𝑖 −𝑥̅ )
(𝑥

5. Rejection Rule

Reject 𝐻0 if | t-statistic | > | t critical |

Do Not Reject 𝐻0 if | t-statistic | < | t critical |

MEASURE OF GOODNESS OF FIT

Total Sum of Squares (TSS) = Explained Sum of Squares (ESS) + Residual Sum of Squares (RSS)

TSS is the total amount of variation in the dependent variable


Mathematical Expression: ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 = ∑𝑛𝑖=1 𝑦𝑖2 − 𝑛(𝑦̅)2

ESS is the amount of variation explained by the model


2
Mathematical Expression: ∑𝑛𝑖=1(𝑦̂𝑖 − 𝑦̅̂ )

RSS is the amount of variation NOT explained by the model


Mathematical Expression: ∑𝑛𝑖=1 𝑒𝑖 2

Measure of Goodness of Fit is 𝑅2 (also known as coefficient of determination)

𝑅𝑆𝑆 𝐸𝑆𝑆
𝑅2 = 1 − 𝑇𝑆𝑆 = 𝑇𝑆𝑆

Interpretation of 𝑅2 : The proportion of the total variation in the dependent variable that the model can
explain.

MADE BY: SHUBHAM KALRA 7


Some key points about 𝑅 2

1. Maximum Value of 𝑅2 is 1
2. Minimum Value of 𝑅2 is 0
3. 𝑅2 cannot be negative
4. We can only compare the 𝑅2 of two models, if the models have the same dependent variable. If two models
have different dependent variables, then we cannot do a comparison of 𝑅 2

INTERPRETING COEFFICIENTS

1. 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜀𝑖

The units of both the variables (x and y) matter. So, pay attention to the units.

𝛽: If x increases by 1 unit, then y is expected to increase by 𝛽 units.

𝛽 is the marginal response of y with respect to x

2. 𝑦𝑖 = 𝛼 + 𝛽 𝑙𝑛(𝑥𝑖 ) + 𝜀𝑖

The unit of the variable that has a log doesn’t matter. In this case, the variable ‘x’ has a log, so it’s unit won’t
matter in the interpretation. The variable ‘y’ doesn’t have a log, so it’s unit will be a part of the interpretation.

It would be easier to interpret 𝛽/100

𝛽/100 : If x increases by 1%, then y is expected to increase by 𝛽/100 units

Note: There are some cases under which this pattern of interpretation won’t hold. Example: Interpretation
will change if you increase x by 100% instead of 1%.

3. 𝑙𝑛(𝑦𝑖 ) = 𝛼 + 𝛽𝑥𝑖 + 𝜀𝑖

As mentioned earlier, the unit of the variable that has a log doesn’t matter. In this case, the variable ‘y’ has a
log, so it’s unit won’t matter in the interpretation. The variable ‘x’ doesn’t have a log, so it’s unit will be a part
of the interpretation.

It would be easier to interpret 𝛽 ∗ 100

𝛽 ∗ 100 : If x increases by 1 unit, then y is expected to increase by 𝛽 ∗ 100 %

Note: There are some cases under which this pattern of interpretation won’t hold. Example: Interpretation
will change if 𝛽 > 0.10. In that case, the interpretation would be: If x increases by 1 unit, then y is expected to
increase by (𝑒 𝛽 − 1) ∗ 100 %

MADE BY: SHUBHAM KALRA 8


4. ln (𝑦𝑖 ) = 𝛼 + 𝛽 ln (𝑥𝑖 ) + 𝜀𝑖

As mentioned earlier, the unit of the variable that has a log doesn’t matter. In this case, both the variables
have a log, so their units won’t matter in the interpretation.

𝛽: Elasticity of y with respect to x. In other words, if x increases by 1%, then y is expected to increase by 𝛽%

MADE BY: SHUBHAM KALRA 9

You might also like