0% found this document useful (0 votes)
79 views61 pages

The Simple Regression Model

This document provides an overview of the simple linear regression model. It discusses (1) defining the model as relating a dependent variable y to an independent variable x, plus an error term, (2) estimating the model parameters β0 and β1 using methods of moments and ordinary least squares, and (3) key assumptions that the error term has a mean of zero and is uncorrelated with the independent variable. Examples are provided to illustrate key concepts such as the population regression function.

Uploaded by

张敏然
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views61 pages

The Simple Regression Model

This document provides an overview of the simple linear regression model. It discusses (1) defining the model as relating a dependent variable y to an independent variable x, plus an error term, (2) estimating the model parameters β0 and β1 using methods of moments and ordinary least squares, and (3) key assumptions that the error term has a mean of zero and is uncorrelated with the independent variable. Examples are provided to illustrate key concepts such as the population regression function.

Uploaded by

张敏然
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Department of Finance & Banking, University of Malaya

The Simple Regression


Model

Dr. Aidil Rizal Shahrin


[email protected]

October 10, 2020


Contents

1 Simple Regression Model

2 Estimating β0 & β1
2.1 Method of Moments
2.2 Ordinary Least Square

3 Sample Regression Function (SRF)


3.1 Example...

4 Goodness of Fit

5 Units of Measurements and Functional Form


5.1 Changing Units of Measurements
5.2 Incorporating Nonlinearities
5.3 Meaning of ’Linear’ Regression
2/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Contents

6 Expected Values and Variances of OLS Estimator


6.1 Unbiasedness of OLS
6.2 Variances of the OLS Estimators
6.2.1 Sampling Variances of the OLS Estimators
6.2.2 Estimating the Error Variance

7 Regression through Origin

3/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Simple Regression Model

i. Most of the time in econometric analysis, we have this


premises; y and x are two variables representing some
populations (most common scenario more than 2 RV).
ii. We are interested in ‘explaining y in term of x’.
iii. In modeling ‘y in terms of x’, we confront 3 issues:
1. There is never an exact relationship between 2 variables,
how do we allow for other factors to affect y? In theory the
relationship is always exact, did you noticed that?
2. What is the functional relationship between y and x? Linear,
quadratic, cubic or even exponential and etc.? If you have
economic theory, it will give the guidance.
3. How assure are we capturing ‘ceteris paribus’? Remember
we are interested in causality!

4/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Simple Regression Model

iv. The simple linear regression model (also known as


two-variable regression model or bivariate linear regression
model) relates the two variables x and y.
if still have β 2 x2 就是multiple regression
 ?1是0的话 x不能explain y
如果  ?1增加 y也会加 y = β0 + β1 x + u (1)

v. The variables y and x have several names as shown in


Tab.1.
Y X
Dependent variable Independent variable
Explained variable Explanatory variable
Response variable Control variable
Predicted variable Predictor variable
Regressand Regressor
Table 1: Terminology for simple regression

5/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Simple Regression Model

vi. The variable u in Eq.1 is called error term or disturbance in


the relationship, represents factor other than x that affect y.
Simply say, u stands for all unobserved factors. that affect y
vii. Eq.1 also address the functional relationship between y and
x. If the other factors in u are held fixed (remember ceteris
paribus?), so that the change in u is zero, ∆u = 0, then x
has a linear effect on y:

∆y
= β1 if ∆u = 0 (2)
∆x
viii. Thus, β1 in Eq.2 is the slope parameter in Eq.1 and it is of
primary interest in applied economics. While β0 in Eq.1 is
the intercept parameter or sometimes called constant term.
Also has its uses, although rarely central to analysis.

6/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


22 Simple Regression Model

ix. Linearity of Eq.1 implies that a one-unit change in x has


the same effect on y, regardless of the initial value of x as
shown in Eq.2. This does not work for increasing returns
idea in economy, why?
x. The most difficult issue to address is whether Eq.1 really
allows us to draw ceteris paribus conclusions about how x
affects y. β1 does measure the effect of x on y by holding
other factors in u fixed. If any of the unobservable in u is
related to the x, we will not get reliable estimators of β0
and β1 of Eq.1 (more on this later).
xi. One key assumption about u is that, as long we have
intercept β0 in Eq.1, this assumption always hold
expected value of u if intercept

E(u) = 0 (3)

7/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


23 Simple Regression Model

xii. A natural measure of how x and u are related is correlation


coefficient. If u and x are uncorrelated, then as random
variable, they are not linearly related.
xiii. However, correlation only measure linear dependence
between variable. While u is uncorrelated with x, but
being correlated with x2 where correlation fail to capture.
xiv. A much stronger assumption and better is the expected
value of u given x, or conditional expectation of u with
given x or: Corr(u,x = 0)

E(u|x) = E(u) (4)


Remember this hold when u and x are statistically
independent?  average value of u does
not depend on the value of x

8/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Simple Regression Model

xv. Thus, we can combine Eq.4 and Eq.3 to have

E(u|x) = E(u) = 0 (5)

If we take conditional expectation of Eq.1 condition on x


we have:

E(y|x) = E(β0 + β1 x + u|x)


= E(β0 |x) + E(β1 x|x) + E(u|x)
(6)
= β0 + β1 E(x|x) + E(u|x)
= β0 + β1 x

where Eq.6 is called population regression function (PRF).


Remember, Eq.6 states that average value of y changes

9/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Simple Regression Model

with x; it does not say y equals β0 + β1 x for all units in the


population. Or

y = β0 + β1 x + u
E(y|x) = β0 + β1 x

Noticed the different? Refer to Fig.1, can you relate with


Eq.6?

10/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


24 Simple Regression Model
这线叫population regression model

Figure 1: E(y|x) as a linear function of x

11/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


24 Estimating β0 & β1
Estimator must be a function of sample
i. Now we address how to estimate β0 and β1 in Eq.1.
ii. In order to do this, we need a sample from the population
of interest.
iii. Let i stands for observation of sample size n, or i = 1, . . . , n
for both y and x, where now we can rewrite Eq.1 for each i
as:
yi = β0 + β1 xi + ui (7)
where Eq.7 can be any of these:

y1 = β0 + β1 x1 + u1
y2 = β0 + β1 x2 + u2
..
.
yn = β0 + β1 xn + un
12/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Estimating β0 & β1

iv. For example, xi is annual income and yi is the annual


savings for family i during particular year. And we have
collected data on 15 families or n = 15.
v. Two key equations in estimating β0 and β1 of Eq.1:

E(u) = 0, hold with intercept (8)


Cov(x, u) = E(xu) = 0, x and u not correlated (9)

Eq.8 hold when we have intercept. For Eq.9, if


E(u|x) = E(u), then cov(x, u) and corr(x, u) will equal zero.
Why?
E[u|x]= E[u]

cov(x, u) = E(xu) − E(x)E(u)


= E(xu)

13/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Estimating β0 & β1

And, by law of iterated expectation, we have

E(xu) = E[E(xu|x)]

But
E(xu|x) = xE(u|x)
Thus

E(xu) = E[E(xu|x)]
= E[xE(u|x)]
= E[x0]
=0

we have proof Eq.9.

14/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


25 Method of Moments

i. From Eq.1, u = y − β0 − β1 x, and inserting this in Eq.8 and


Eq.9, we have

E(u) = E(y − β0 − β1 x) = 0 (10)


E(xu) = E[x(y − β0 − β1 x)] = 0 (11)

ii. Remember previously (refer to mathematical statistic


notes), in method of moments to estimate population mean
PnY, or E(Y) = µ is by using its sample counterpart or
of
i=1 Yi /n. Applying this to Eq.10 and Eq.11, we have (the
’hat’ ˆ· refer to estimate):
n
X
n−1 (yi − β̂0 − β̂1 xi ) = 0 (12)
i=1
n
X
n−1 xi (yi − β̂0 − β̂1 xi ) = 0 (13)
15/61 Aidil Rizal Shahrin i=1University of Malaya Unofficial Beamer Theme
Method of Moments
iii. Eq.12 can be rewrite as (using property of summation):
estimate of beta1

ȳ = β̂0 + β̂1 x̄ or
(14)
population parameter β̂0 = ȳ − β̂1 x̄ Estimator of beta0

iv. Inserting Eq.14 into Eq.13, we have


n
X
xi [yi − (ȳ − β̂1 x̄) − β̂i xi ] = 0
i=1

v. Finally, after manipulation of the above (refer to your


textbook) Pn Estimator of beta1
(x i − x̄)(yi − ȳ)
β̂1 = i=1Pn 2
(15)
i=1 (xi − x̄)

With ni=1 (xi − x̄)2 > 0, there must be variation of x. Same


P
x for all the sample will not work, no variation. Why,
16/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Method of Moments
(xi − x̄) = 0! Can you figure out? Numerator can be zero
but not denominator. Thus, Eq.14 and Eq.15 is an estimator
of β0 and β1 respectively based on method of moments.
vi. Using simple algebra, Eq.15 can be rewrite as
SD of y
sample correlation  
σ̂y always positive
β̂1 = ρ̂xy · (16)
if correlation xy negative, beta hat also negative σ̂x
SD of x

Where ρ̂xy is the sample correlation between xi and yi . σ̂y


and σ̂x denotes sample standard deviation of y and x
respectively.
vii. If xi and yi are positively correlated in the sample, β̂1 > 0
and vice versa (standard deviation is always positive).
viii. Thus, simple regression is an analysis of correlation
between 2 variables, and so one must be careful in
inferring causality.
17/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Ordinary Least Square

i. We can also estimate β0 and β1 in Eq.1 using OLS.


ii. Firstly, the residual for observation i is the difference
between the actual yi and its fitted value ŷi or

ûi = yi − ŷi = yi − β̂0 − β̂1 xi (17)

where the fitted value ŷi equal to β̂0 + β̂1 xi .


iii. The idea of OLS is to minimize sum of squared residuals
with respect to β̂0 and β̂1 , or
n
X
min û2i w.t. β̂0 , β̂1 (18)
i=1

18/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Ordinary Least Square

Thus, we have
n
∂ X
(yi − β̂0 − β̂1 xi )2 = 0 (19)
∂ β̂0 i=1
n
∂ X
(yi − β̂0 − β̂1 xi )2 = 0 (20)
∂ β̂1 i=1

By solving Eq.19 and Eq.20, we have


n
X
−2 (yi − β̂0 − β̂1 xi ) = 0 (21)
i=1
n
X
−2 xi (yi − β̂0 − β̂1 xi ) = 0 (22)
i=1

19/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Ordinary Least Square

iv. From Eq.21, solving for β̂0 , we have exactly the same as
Eq.14. Then inserting this solution into Eq.22, with some
algebra manipulation, we end up with Eq.15.
v. Thus, method of moments and OLS produce the same
estimator for β0 and β1 which is Eq.14 and Eq.15
respectively.

20/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Sample Regression Function (SRF)

i. Previously, we have discuss the PRF in Eq.6 or

E(y|x) = β0 + β1 x

which is unknown and fixed in the population (the β0 and


β1 is unknown and fixed)!
ii. With estimate of β0 and β1 from Eq.14 and Eq.15 (either by
MM or OLS), we can have

ŷ = β̂0 + β̂1 x (23)

a sample regression function (SRF) which is not fixed.


Since SRF is obtained from a given sample data, a new
sample will give different slope estimate of β̂0 and β̂1 .
Refer to Fig.2.
21/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Sample Regression Function (SRF)

Figure 2: Fitted values and residuals

22/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Example...

i. Let say you are interested to study do return on equity


(roe) do influence CEO salary.
ii. The model you have in mind is (you assume it is linear)

salary = β0 + β1 roe

iii. While the econometric model will be

salary = β0 + β1 roe + u

23/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Example...

iv. And the PRF is

E(salary|roe) = β0 + β1 roe

Why do we need the PRF, even though it is unknown?


Refer to Fig.1 that show the PRF line. This line is unknown
because it is population and fixed. Let say roe = 20%, it will
give E(salary|roe = 20%).
v. And the SRF is the estimate of PRF, or

\ = β̂0 + β̂1 roe


salary

vi. Based on n = 209, we have

\ = 963.191 + 18.501roe
salary

24/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Example...

Figure 3: The OLS regression line salary = 963.191 + 18.501roe and the
(unknown) PRF

25/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Goodness of Fit

i. Define:
a. Total sum of squares (SST) as:
n
X
SST ≡ (yi − ȳ)2 (24)
i=1

Which measure the total sample variation in the yi ; how


spread the yi are in the sample.
b. Explained sum of squares (SSE) as:
n
X
SSE ≡ (ŷi − ȳ)2 (25)
i=1

Which measure the total sample variation in the ŷi .

26/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Goodness of Fit

c. Residual sum of squares (SSR) as:


n
X
SSR ≡ û2i (26)
i=1

Which measure the total sample variation in the ûi .


ii. Since total variation in y can always be expressed as the
sum of the explained variation and the unexplained
variation SSR, or (see notes for proof and different
textbook used different abbreviation)

SST = SSE + SSR (27)

iii. We divide Eq.27 by SST to get

SSE SSR
1= +
SST SST
27/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Goodness of Fit

iv. The R-squared of the regression, or coefficient of


determination, is defined as:
SSE SSR
R2 ≡ =1− (28)
SST SST
v. R2 is the ratio of the explained variation compared to the
total variation; thus it is interpreted as the fraction of the
sample variation in y that is explained by x.
vi. The value of R2 is always between zero and one, because
SSE ≤ SST. We normally multiply R2 with 100 in
interpreting making it in percentage form.
vii. In cross sectional analysis, low R2 is common.
viii. Beware that using R2 as main gauge of success for an
econometric analysis can lead to trouble. Low R2 doesn’t
mean that regression result is useless.
28/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Goodness of Fit

ix. Word of caution! All the discussion above assume that we


have intercept like in Eq.1. More on this later

29/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Changing Units of Measurements

i. In the CEO salary regression, we obtain the following:


V

salary = 963.191 + 18.501roe


(29)
n = 209, R2 = 0.0132

where salary is in thousands of dollars and roe is in


percentage.
ii. If we convert salary from thousands of dollars to become in
dollars (we multiply the salary data with 1000) and called it
salardol, we have
V

salardol = 963, 191 + 18, 501roe (30)

30/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Changing Units of Measurements
iii. So, when dependent variable is multiplied (divided) by
constant c, then the intercept and slope estimates are
multiplied (divided) by c, or
   
(ŷ × c) = β̂0 × c + β̂1 × c × x (31)

iv. If we convert roe to decimal (we divide the roe data by 100)
and called it roedec, we have
V

salary = 963.191 + 1, 850.1roedec (32)

v. So when independent variable are divided (multiplied) by


constant c, then the slope estimate are multiplied (divided)
by constant c.   x
ŷ = β̂0 + β̂1 × c (33)
c
vi. The R2 is not affected with the changes in units of y or x.
31/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Incorporating Nonlinearities

i. Why in applied work in social sciences we encounter


regression equation where the dependent variable appears
in logarithmic form?
ii. Refer to wage equation below:
V

wage = −0.90 + 0.54educ (34)

where wage is dollars per hour and educ denotes years of


schooling. In Eq.34, additional year of education is
predicted to increase hourly wage by 54 cents. It is not
reasonable since we expect higher education (more years of
schooling) have higher predicted wage than lower
education (happen due to linearity of the Eq.34).

32/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Incorporating Nonlinearities

iii. A more realistic model is:

log(wage) = β0 + β1 educ + u (35)

where log(·) is natural logarithm. In Eq.35, with ∆u = 0, we


have
∆ log(wage) = β1 × ∆educ (36)
The key relationship is this approximation

y1 − y0 ∆y
log(y1 ) − log(y0 ) ≈ = , or
y0 y0
(37)
∆wage
log(wage1 ) − log(wage0 ) ≈
wage0

33/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Incorporating Nonlinearities

where the approximation work well for small changes in y.


Replacing Eq.37 into Eq.36 and multiply both side with
100, we have (dropping the subscript)
 
∆wage
× 100 = (β1 × 100) ∆educ (38)
wage

where on the LHS of Eq.38 is known as percentage change


in wage. The result of Eq.35 is
V

log(wage) = 0.584 + 0.083educ (39)

where each year of education (∆educ = 1) increases wage


by a constant percentage of 8.3%(0.083 × 100). Since
percentage change in wage is the same for each additional
year of educ, the change in wage for an extra year of educ as
education increases; Eq.35 implied an increasing return to
34/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Incorporating Nonlinearities

education. It can be easily seen by exponentiating Eq.35,


we have
wage = exp(β0 + β1 educ + u) (40)
With u = 0, the graph of Eq.40

Figure 4: wage = exp(β0 + β1 educ), with β1 > 0

35/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Incorporating Nonlinearities

Those with higher education is predicted to increase wage


much higher than lower education.
iv. Below we have summary of functional involving
logarithms (more on this later)

Model D.V. I.V. Interpretation of β1


Level-level y x ∆y = β1 ∆x
Level-log y log(x) ∆y = (β1 /100)%∆x
Log-level (semi-elasticity) log(y) x %∆y = (100β1 )∆x
Log-log (elasticity) log(y) log(x) %∆y = β1 %∆x
Table 2: Summary of functional forms involving logarithms

36/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Meaning of ’Linear’ Regression

i. The simple linear regression model

y = β0 + β1 x + u

where linearity referring to linear in the parameter β0 and


β1 .
ii. The are no restrictions on how y and x relate to the original
explained and explanatory variable of interest. For
example, we can use simple regression to estimate a model
such as: √
cons = β0 + β1 inc + u

37/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Meaning of ’Linear’ Regression

iii. This following model are not linear in parameter, so cannot


rely on linear regression.

1
cons = +u
β0 + β1 inc

required nonlinear regression model.

38/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Unbiasedness of OLS

i. Previously, we claim that the key assumption of simple


regression analysis is E(u|x) = 0.
ii. Remember, β̂0 and β̂1 as estimator for the population
parameter β0 and β1 respectively of Eq.1.
iii. The focus of this section is studying the properties of the
distribution of β̂0 and β̂1 over different random samples
from the population.
iv. We establish the first important properties of OLS
estimator which is unbiasedness which depend on simple
set assumptions (SLR stands for simple linear regression).

39/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Unbiasedness of OLS

Assumption SLR.1: Linear in Parameters


In the population model, the dependent variable y is related to
independent variable x, and the error (or disturbance), u as

y = β0 + β1 x + u (41)

where β0 and β1 are the population intercept and slope


parameters, respectively.

v. The Eq.41 is not restrictive; by choosing y and x


appropriately (like log), we can obtain interesting
nonlinear relationship with linear model of Eq.41.

Assumption SLR.2: Random Sampling


We have a random sample of size n, {(xi , yi ) : i = 1, 2, . . . , n},
following the population model in Eq.41.
40/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Unbiasedness of OLS

vi. Failure of random sampling normally related to time series


analysis and sample selection problems. Not all
cross-sectional samples can be viewed as outcome of
random samples, but many can be.
vii. We can write Eq.41 in term of random sample as

yi = β0 + β1 xi , i = 1, 2, . . . , n, (42)

where ui is the error or disturbance for observation i. Refer


to Fig.5:

41/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Unbiasedness of OLS

Figure 5: Graph of yi = β0 + β1 xi + ui

42/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Unbiasedness of OLS

Assumption SLR.3: Sample Variation in X


The sample outcomes on x, namely, {xi , i = 1, . . . , n}, are not all
the same value.
viii. Easily fulfill unless x in population variation is minimal or
the sample size is small.
ix. Assumption SLR.3 fail if the sample standard deviation of
xi is zero.
Assumption SLR.4: Zero Conditional Mean
The error u has an expected value of zero given any value of
the explanatory variable. In other words,

Fixed in repeated sample E(u|x) = 0

43/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Unbiasedness of OLS
x. For random sample, assumption SLR.4 implies that
E(ui |xi ) = 0, for all i = 1, 2, . . . , n.
xi. In statistical derivations, conditioning on the sample value
of independent variable is the same as treating the xi as
fixed in repeated samples. Let say we choose n sample
values for x1 , x2 , . . . , xn , then we obtain a sample on y.
Next, another sample of y is obtained using the same
values for x1 , x2 , . . . , xn . Then another y is obtained, again
using the same values x1 , x2 , . . . , xn and so on.
xii. This fixed-in-repeated-samples scenario is not very realistic
in non-experimental contexts. For example, in studying
relationship with consumption (y) and income (x), where
we choose value of income ahead of time and the sampling
individuals with those particular level of income. In reality,
individual income and consumption are both recorded
randomly.
44/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Unbiasedness of OLS

Fixed-in-repeated-sample assumption
The danger in fixed-in-repeated-samples assumption always
implies that ui and xi are independent.

xiii. To show that the OLS estimators are unbiased, the


estimator Eq.15 can be rewritten as (refer to textbook for
detail derivation)
 n
X
1
β̂1 = β1 + di ui (43)
SSTx
i=1

where:
n
X
SSTx = (xi − x̄)2 (44)
i=1
di = xi − x̄ (45)
45/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Unbiasedness of OLS

Taking expected value of Eq.43 (This derivation are


conditioning on {x1 , x2 , . . . , xn } implicit, as a result SSTx
and di are nonrandom)
n
" X #
1
E(β̂1 ) = β1 + E di ui
SSTx
i=1
 X n
1
= β1 + E (di ui )
SSTx
i=1
n
(46)
 X
1
= β1 + di E (ui )
SSTx
i=1
 X n
1
= β1 + di · 0
SSTx
i=1
= β1
46/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Unbiasedness of OLS

For β̂0 , Eq.14 can be rewritten as (averaging Eq.42 across i


to get ȳ = β0 + β1 x̄ + ū and insert it into Eq.14)

β̂0 = ȳ − β̂1 x̄
= β0 + β1 x̄ + ū − β̂1 x̄ (47)
= β0 + (β1 − β̂1 )x̄ + ū

xiv. Taking expected value of Eq.47,

E(β̂0 ) = β0 + E[(β1 − β̂1 )x̄] + E(ū)


= β0 + E[(β1 − β̂1 )]x̄ (48)
= β0

Since E(ū) = 0 by assumption SLR.2 and SLR.4 and


E(β̂1 ) = β1 which implies that E[(β1 − β̂1 )] = 0.
47/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Unbiasedness of OLS

Unbiasedness of OLS Estimator


With assumption SLR.1 through SLR.4 hold, β̂0 (Eq.14) and β̂1
(Eq.15) are unbiased estimator of β0 and β1 in Eq.1.

xv. Remember that unbiasedness is a feature of the sampling


distributions of β̂0 and β̂1 .
xvi. If the sample we obtain is somehow ’typical’, then our
estimate should be ’near’ the population value.
Unfortunately, we might be unlucky that the point estimate
is far from β1 and we can never know for sure whether this
is the case.

48/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Variances of the OLS Estimators

i. It is important to know how far we can expect β̂1 to be


away from β1 on average.
ii. The measure of spread in the distribution of β̂1 (and β̂0 ) is
easiest to work with the variance or its square root, the
standard deviation.
Assumption SLR.5: Homoskedasticity
The error u has the same variance given any value of the
explanatory variable. In other words, variance of the u(error)

Var(u|x) = σ 2

iii. The homoskedasticity assumption plays no role in


showing that β̂0 and β̂1 are unbiased.
49/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Variances of the OLS Estimators

iv. The σ 2 is often called the error variance or disturbance


variance (see textbook pg. 45 for proof.)
v. And the square root of σ 2 , or

σ2 = σ (49)

is standard deviation of the error. A larger σ means that


distribution of the unobservable affecting y is more spread
out.

50/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


45 Variances of the OLS Estimators

vi. Assumption SLR.4 and SLR.5 can be stated also as


linear parameter
E(y|x) = β0 + β1 x, SLR.4
2
Var(y|x) = σ , SLR.5
constant Var[u|x]
In other words, the conditional expectation of y given x is
linear, but the variance of y given x is constant. This
situation is graphed in Fig.6:

51/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Variances of the OLS Estimators

Figure 6: The simple regression model under homoskedasticity

vii. When var(u|x) depends on x (not constant), the error term


exhibit heteroskedasticity (or nonconstant variance).
52/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Variances of the OLS Estimators

viii. Because var(u|x) = var(y|x), heteroskedasticity is present


whenever var(y|x) is a function of x. In Fig.7 we have
heteroskedasticity problem where the variance increase as
educ level increase.

Figure 7: Var(wage|educ) increasing with educ

53/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Sampling Variances of the OLS Estimators

i. Under assumptions SLR.1 through SLR.5

σ 2 n−1 ni=1 x2i σ 2 n−1 ni=1 x2i


P P
Var(β̂0 ) = Pn 2
= (50)
i=1 (xi − x̄) SSTx
σ2 σ2
Var(β̂1 ) = Pn 2
= (51)
i=1 (xi − x̄) SSTx

where these are conditional on the sample values


{x1 , . . . , xn } (for proof refer to textbook).
ii. Eq.50 and 51 are invalid in the presence of
heteroskedasticity.

54/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Sampling Variances of the OLS Estimators

iii. Most of the time we are interested on Eq.51. It depends on


the error variance σ 2 and the total variation in {x1 , . . . , xn }.
Larger error variance, the larger is Var(β̂1 ). This make
sense since more variation in the unobservable affecting y
makes it more difficult to precisely estimate β1 . While more
variability in independent variable (xi ) is preferred. And
the total variation in the xi increases with sample size.

55/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Estimating the Error Variance

i. The problem with Eq.50 and 51 is that σ 2 is unknown. So


we used data to estimate σ 2 , which then allow us to
estimate Var(β̂0 ) and Var(β̂1 ).
ii. To estimate σ 2 , we use the property of conditional variance
which state that

Var(u|x) = σ 2 = E(u2 |x) − [E(u|x)]2


(52)
= E(u2 |x)

And using the law of total variance

Var(u) = E[Var(u|x)] + Var[E(u|x)]


= E[σ 2 ] + Var[0] (53)
2

56/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Estimating the Error Variance

Thus
Var(u|x) = Var(u) = σ 2 (54)
And

Var(u) = E[u − E(u)]2


= E[u2 ] (55)
2

Pn
iii. So, unbiased estimator for σ 2 is n−1 2
i=1 ui .
iv. However, we do not observed the errors ui , but we do have
estimates of it which is the OLS residuals ûi . So if we
replace the error with the residuals, we have
n
X SSR
n−1 û2i = (56)
n
i=1
57/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Estimating the Error Variance

v. However, Eq.56 is still biased because it must account for


two restrictions (refer to your textbook). Thus the unbiased
estimator for σ 2 is
n
1 X 2 SSR
σ̂ 2 = ûi = (57)
n−2 n−2
i=1

where n − 2 is the degree of freedom where 2 is deducted


from the n due to two restrictions mention earlier.
vi. If we replace σ 2 in Eq.50 and 51 with Eq.57, we now have
unbiased estimators of Var(β̂0 ) and Var(β̂1 ).

58/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Estimating the Error Variance

vii. Later on, we need estimators of the standard deviations of


β̂0 and β̂1 and this require estimating σ and its natural
estimator is √
σ̂ = σ̂ 2 (58)
and is called the standard error of the regression (SER)
(also known as standard error of the estimate or root mean
squared error). The estimator in Eq.58 is biased estimator of
σ but it is consistent estimator of σ.
viii. Since our focus on β̂1 , the natural estimator of standard
deviation of β̂1 , or sd(β̂1 ) is (taking square root of Eq.51) is

σ̂ σ̂
se(β̂1 ) = √ = P (59)
SSTx n 2 1/2

i=1 (xi − x̄)

is called standard error of β̂1 .


59/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Regression through Origin

i. A regression through the origin

ỹ = β̃1 x (60)

because Eq.60 passes through the point x = 0, ỹ = 0, the


origin.
ii. The slope estimate of Eq.60 (refer textbook for derivation)
Pn
xi yi
β̃1 = Pi=1
n 2
(61)
i=1 xi

60/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Regression through Origin

iii. When regression through origin is used, one must be


careful in interpreting the R2 that typically reported by the
software (unless stated otherwise) because the R2 is
obtained without removing the sample average of
{yi : i = 1, . . . , n} in obtaining SST. In other words,
Pn 2
2 i=1 (yi − β̃1 xi )
R =1− Pn 2 (62)
i=1 yi

where the denominator act as if we know the average


value of y in population is zero.

61/61 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

You might also like