0% found this document useful (0 votes)
6 views53 pages

Lect 3

Uploaded by

vss.yt15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views53 pages

Lect 3

Uploaded by

vss.yt15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Linear Regression with One Regressor

Lecture 6: Linear Regression with One Regressor 1 / 65


Outline

1 The Linear Regression Model

2 The OLS Estimation Method for a Linear Regression Model

3 The Algebraic Properties of the OLS Estimator

4 Measures of Fit

5 The Least Squares Assumptions

6 Sampling Distribution of the OLS Estimators

Lecture 6: Linear Regression with One Regressor 2 / 65


The Linear Regression Model

Definition of regress in Merriam-Webster’s dictionary

Merriam-Webster gives the following definition of the word "regress":


1 An act or the privilege of going or coming back
2 Movement backward to a previous and especially worse or more
primitive state or condition
3 The act of reasoning backward

Lecture 6: Linear Regression with One Regressor 3 / 65


The Linear Regression Model

The meaning of regression in statistics?

In statistics, regression analysis focus on the conditional mean of the


dependent variable given the independent variables, which is a
function of the values of independent variables.
A very simple functional form of a conditional expectation is a linear
function. That is, we can model the conditional mean as follows,

E(Y | X = x) = f (x) = β0 + β1 x (1)


The above equation is a simple linear regression function.

Lecture 6: Linear Regression with One Regressor 4 / 65


The Linear Regression Model

Research question:

Let’s introduce a regression analysis with the application of test scores


versus class sizes in California school districts.
Can reducing class sizes increase students’ test scores?

How can we answer this question?

Lecture 6: Linear Regression with One Regressor 5 / 65


The Linear Regression Model

Randomized controlled experiment

Randomly choose 42 students and divide them into two classes, with
one having 20 students and another having 22.
They are taught with the same subject and by the same teachers.
Randomization ensures that it is the difference in class sizes of the two
classes that is the only factor affecting test scores.

Lecture 6: Linear Regression with One Regressor 6 / 65


The Linear Regression Model

Compute conditional means

Compute the expected values of test scores, given the different class
sizes.

E(TestScore|ClassSize = 20)
E(TestScore|ClassSize = 22)

The effect of class size on test scores is

E(TestScore|ClassSize = 20) − E(TestScore|ClassSize = 22)

Lecture 6: Linear Regression with One Regressor 7 / 65


The Linear Regression Model

The population regression function for test scores on class


sizes

We use a linear regression function to describe the relationship


between test scores and class sizes.
The population regression function or the population regression line

E(TestScore|ClassSzie) = β0 + β1 ClassSize (2)

Lecture 6: Linear Regression with One Regressor 8 / 65


The Linear Regression Model

The simple linear regression model for test scores on class


sizes

We can lump all these factors into a single term, and set up a simple
linear regression model as follows,

TestScore = β0 + β1 ClassSize + OtherFactors (3)


If we assume E(OtherFactors|ClassSize) = 0, then the simple linear
regression model becomes the population regression line.

Lecture 6: Linear Regression with One Regressor 9 / 65


The Linear Regression Model

A distinction between the population regression function and


the population regression model

A population regression function


It’s a deterministic relation between class size and the expectation of
test scores.
However, we cannot compute the exact value of the test score of a
particular observation.
A population regression model
It’s a complete description of a data generating process (DGP).
The association between test scores and class size is not deterministic,
depending on the value of other factors.

Lecture 6: Linear Regression with One Regressor 10 / 65


The Linear Regression Model

An interpretation of the population regression model

Now we have set up the simple linear regression model,

TestScore = β0 + β1 ClassSize + OtherFactors

What is β1 and β0 represent in the model?

Lecture 6: Linear Regression with One Regressor 11 / 65


The Linear Regression Model

Interpret β1

Denote ∆TestScore and ∆ClassSize to be their respective change.


Holding other factors constant, we have

∆TestScore = β1 ∆ClassSize

where β0 is removed because it is a constant.


Then, we get

∆TestScore
β1 =
∆ClassSize
That is, β1 measures the change in the test score resulting from a
one-unit change in the class size.

Lecture 6: Linear Regression with One Regressor 12 / 65


The Linear Regression Model

Marginal effect

When TestScore and ClassSize are two continuous variable, we can


write β1 as

dTestScore
β1 =
dClassSize
We often call β1 as the marginal effect of the class size on the test
score.

Lecture 6: Linear Regression with One Regressor 13 / 65


The Linear Regression Model

Holding other things constant

The phrase of "holding other factors constant" is important. Without


it, we cannot disentangle the effect of class sizes on test scores from
other factors.
"Holding other things constant" is often expressed as the notion of
ceteris paribus.

Lecture 6: Linear Regression with One Regressor 14 / 65


The Linear Regression Model

Interpret β0

β0 is the intercept in the model.


Sometimes it bears real meanings, but sometimes it merely represents
an intercept.
In regression model of test scores on class sizes, β0 is the test score
when the class size and other factors are all zero, which is obviously
nonsensical.

Lecture 6: Linear Regression with One Regressor 15 / 65


The Linear Regression Model

The general linear regression model

Consider two random variables Y and X . For both, there are n


observations so that each observation i = 1, 2, 3, . . . is associated with
a pair of values of (Xi , Yi ).
Then a simple linear regression model that associates Y with X is

Yi = β0 + β1 Xi + ui , for i = 1, . . . , n (4)
Yi is called the dependent variable, the regressand, or the LHS
(left-hand side) variable.
Xi is called the independent variable, the regressor, or the RHS
(right-hand side) variable.

Lecture 6: Linear Regression with One Regressor 16 / 65


The Linear Regression Model

The general linear regression model (cont’d)

β0 is the intercept, or the constant term. It can either have economic


meaning or have merely mathematical sense, which determines the
level of the regression line, i.e., the point of intersection with the Y
axis.
β1 is the slope of the population regression line. Since β1 = dYi /dXi ,
it is the marginal effect of X on Y . That is, holding other things
constant, one unit change in X will make Y change by β1 units.
ui is the error term. ui = Yi − (β0 + β1 Xi ) incorporates all the other
factors besides X that determine the value of Y .
β0 + β1 Xi represents the population regression function(or the
population regression line).

Lecture 6: Linear Regression with One Regressor 17 / 65


The Linear Regression Model

An graphical illustration of a linear regression model

Lecture 6: Linear Regression with One Regressor 18 / 65


The OLS Estimation Method for a Linear Regression Model

The intuition for the OLS and minimization

We use the ordinary least squares (OLS) estimation method to


estimate the simple linear regression model.

Yi = β0 + β1 Xi + ui , for i = 1, . . . , n

Lecture 6: Linear Regression with One Regressor 19 / 65


The OLS Estimation Method for a Linear Regression Model

Ordinary

It means that the OLS estimator is a very basic method, from which
we may derive some variations of the OLS estimator.
Other least squares estimators: the weighted least squares (WLS), and
the generalized least squares (GLS).

Lecture 6: Linear Regression with One Regressor 20 / 65


The OLS Estimation Method for a Linear Regression Model

Least

It means that the OLS estimator tries to minimize something. The


"something" is the mistakes we make when we try to guess (estimate)
the values of the parameters in the model.
If our guess for β0 and β1 is b0 and b1 , then the mistake of our guess is

ûi = Yi − b0 − b1 Xi

Lecture 6: Linear Regression with One Regressor 21 / 65


The OLS Estimation Method for a Linear Regression Model

Squares

It represent the actual thing (a quantity) that we minimize. The OLS


does not attempt to minimize each ûi .
We minimize the sum of the squared mistakes,
n
X
ûi2
i=1

Taking square is to avoidPpossible offsetting between positive and


negative values of ûi in i ûi .

Lecture 6: Linear Regression with One Regressor 22 / 65


The OLS Estimation Method for a Linear Regression Model

The OLS estimators for β0 and β1

Let b0 and b1 be some estimators of β0 and β1 , respectively.


The OLS estimators are the solution to the following minimization
problem:
n
X n
X
min S(b0 , b1 ) = ûi2 = (Yi − b0 − b1 Xi )2 (5)
b0 ,b1
i=1 i=1

where S(b0 , b1 ) is a function of b0 and b1

Lecture 6: Linear Regression with One Regressor 23 / 65


The OLS Estimation Method for a Linear Regression Model

The first order conditions

Evaluated at the optimal solution (β̂0 , β̂1 ), the FOCs are

n
∂S X
(β̂0 , β̂1 ) = (−2)(Yi − β̂0 − β̂1 Xi ) = 0 (6)
∂b0
i=1
n
∂S X
(β̂0 , β̂1 ) = (−2)(Yi − β̂0 − β̂1 Xi )Xi = 0 (7)
∂b1
i=1

Lecture 6: Linear Regression with One Regressor 24 / 65


The OLS Estimation Method for a Linear Regression Model

Get the OLS estimator β̂0

From the first condition, we have


n
X n
X
Yi − nβ̂0 − β̂1 Xi = 0
i=1 i=1
n n
1X β̂1 X
β̂0 = Yi − Xi = Y − β̂1 X (8)
n n
i=1 i=1

Lecture 6: Linear Regression with One Regressor 25 / 65


The OLS Estimation Method for a Linear Regression Model

Get the OLS estimator β̂1

From the second condition, we have


n
X n
X n
X
Xi Yi − β̂0 Xi − β̂1 Xi2 = 0
i=1 i=1 i=1
n n n
!2n n
X 1 X X 1 X X
Xi Yi − Xi Yi + β̂1 Xi − β̂1 Xi2 = 0
n n
i=1 i=1 i=1 i=1 i=1
n ni=1 Xi Yi − ni=1 Xi ni=1 Yi
P P P
β̂1 = (9)
n ni=1 Xi2 − ( ni=1 Xi )2
P P

Lecture 6: Linear Regression with One Regressor 26 / 65


The OLS Estimation Method for a Linear Regression Model

A trick of collecting terms

X X X X X
(Xi − X )(Yi − Y ) = Xi Yi − X Yi − Y Xi + XY
i i i i i
X
= Xi Yi − 2nX Y + nX Y
i
X
= Xi Yi − nX Y
i
!
1 X X X
= n Xi Yi − Xi Yi
n
i i i

1
− X )2 =
 P 2
n i Xi − ( i Xi )2 .
P P 
Similarly, we can show that i (Xi n

Lecture 6: Linear Regression with One Regressor 27 / 65


The OLS Estimation Method for a Linear Regression Model

Concise expressions of β̂1

Collecting terms in the expression in β̂1 , we have


Pn
(Xi − X )(Yi − Y )
β̂1 = i=1 Pn 2
i=1 (Xi − X )

The sampleP
covariance of X and Y is
1 n
sXY = n−1 i=1 (Xi − X )(Yi − Y )
1 Pn
The sample variance of X is sX2 = n−1 i=1 (Xi − X )2
β̂1 can also be written as
sXY
β̂1 =
sX2

Lecture 6: Linear Regression with One Regressor 28 / 65


The OLS Estimation Method for a Linear Regression Model

Summary of the OLS estimators

In sum, the OLS estimators for β0 and β1 as

Pn
i=1 (Xi − X )(Yi − Y) sXY
β̂1 = Pn = (10)
i=1 (Xi − X )
2 sX2
β̂0 = Y − β̂1 X (11)

Lecture 6: Linear Regression with One Regressor 29 / 65


The OLS Estimation Method for a Linear Regression Model

The predicted values, residuals, and the sample regression


line

Ŷi = β̂0 + β̂1 Xi

The predicted values: Ŷi for i = 1, . . . , n


The residuals: ûi = Yi − Ŷi for i = 1, . . . , n
The sample regression line: β̂0 + β̂1 Xi
The sample average point (X , Y ) is always on the sample regression
line because
Y = β̂0 + β̂1 X

Lecture 6: Linear Regression with One Regressor 30 / 65


The OLS Estimation Method for a Linear Regression Model

A comparison between the population regression model and


the sample counterparts

Population Sample
Regression functions β0 + β1 Xi β̂0 + β̂1 Xi
Parameters β0 , β1 β̂0 , β̂1
Errors vs residuals ui ûi
The regression model Yi = β0 + β1 Xi + ui Yi = β̂0 + β̂1 Xi + ûi

Lecture 6: Linear Regression with One Regressor 31 / 65


The OLS Estimation Method for a Linear Regression Model

The OLS estimates of the relationship between test scores


and the student-teacher ratio

TestScore = β0 + β1 ClassSize + OtherFactors

Let’s first do some simple exploratory analysis before a regression


analysis.

Lecture 6: Linear Regression with One Regressor 32 / 65


The OLS Estimation Method for a Linear Regression Model

Basic summary statistics

Some commonly used summary statistics are computed, including the


mean, standard deviation, median, minimum, maximum, and quantiles
(percentiles), etc.

Table: Summary Of distributions of student-teacher ratios and test scores


Average S.t.d. 25% 50% 75%
TestScore 654.16 19.05 640.05 654.45 666.66
STR 19.64 1.89 18.58 19.72 20.87

Lecture 6: Linear Regression with One Regressor 33 / 65


The OLS Estimation Method for a Linear Regression Model

Scatterplot

The correlation coefficient between the two variables is -0.23.


Lecture 6: Linear Regression with One Regressor 34 / 65
The OLS Estimation Method for a Linear Regression Model

Regression analysis

\ = 698.93 − 2.28 × STR


TestScore

Lecture 6: Linear Regression with One Regressor 35 / 65


The OLS Estimation Method for a Linear Regression Model

Interpretation of the estimated coefficients

What does the slope tell us?


How large is the effect actually?
What does the intercept mean?

Lecture 6: Linear Regression with One Regressor 36 / 65


The Algebraic Properties of the OLS Estimator

The algebraic properties of the ols estimator

Let’s first look at some of the algebraic properties of the OLS


estimators.
These properties hold regardless of any statistical assumptions.

Lecture 6: Linear Regression with One Regressor 37 / 65


The Algebraic Properties of the OLS Estimator

TSS, ESS, and SSR

From Yi = Ŷi + ûi , we can define


Pn 2
The total sum of squares: TSS = i=1 (Yi − Y )
Pn
The explained sum of squares: ESS = i=1 (Ŷi − Y )2
= ni=1 (Yi − Ŷi )2
P Pn 2
The sum of squared residuals: SSR = i=1 ûi
The "deviation from the mean" form is only valid when an intercept is
included in the regression model.

Lecture 6: Linear Regression with One Regressor 38 / 65


The Algebraic Properties of the OLS Estimator

Some algebraic properties among ûi , Ŷi , and Yi

n
X
ûi = 0 (12)
i=1
n
1 X
Ŷi = Y (13)
n
i=1
n
X
ûi Xi = 0 (14)
i=1
TSS = ESS + SSR (15)

Lecture 6: Linear Regression with One Regressor 39 / 65


Measures of Fit

Goodness of Fit: R2

ESS SSR
R2 = =1− (16)
TSS TSS

R 2 is often called the coefficient of determination.


It indicates the proportion of the variance in the dependent variable
that is predictable from the independent variable(s).

Lecture 6: Linear Regression with One Regressor 44 / 65


Measures of Fit

R 2 ∈ [0, 1]

R 2 = 0 when β̂1 = 0.

β̂1 = 0 ⇒ Yi = β̂0 + ûi ⇒ Ŷi = Y = β̂0


Xn
⇒ ESS = (Ŷi − Y )2 = 0 ⇒ R 2 = 0
i

R 2 = 1 when ûi = 0 for all i = 1, . . . , n.


n
X
ûi = 0 ⇒ SSR = ûi2 = 0 ⇒ R 2 = 1
i

Lecture 6: Linear Regression with One Regressor 45 / 65


Measures of Fit

R 2 = rXY
2

rXY is the sample correlation coefficient


Pn
SXY i (Xi − X )(Yi − Y )
rXY = = P
SX SY n 2 1/2
2
Pn 
i (Xi − X ) i (Yi − Y )

Lecture 6: Linear Regression with One Regressor 46 / 65


Measures of Fit

R 2 = rXY
2
(cont’d)

n
X n
X
ESS = (Ŷi − Y )2 = (β̂0 + β̂1 Xi − Y )2
i=1 i=1
Xn
= (Y − β̂1 X + β̂1 Xi − Y )2
i=1
n h
X i2 n
X
= β̂1 (Xi − X ) = β̂12 (Xi − X )2
i=1 i=1
 Pn 2 X
n
i=1 (Xi − X )(Yi − Y)
= Pn 2
(Xi − X )2
i=1 (Xi − X ) i=1
Pn 2
i=1 (Xi − X )(Yi − Y )
= Pn 2
i=1 (Xi − X )

Lecture 6: Linear Regression with One Regressor 47 / 65


Measures of Fit

R 2 = rXY
2
(cont’d)

It follows that
Pn 2
2 ESS i=1 (Xi − X )(Yi − Y ) 2
R = = Pn 2
Pn 2
= rXY
TSS i=1 (X i − X ) i=1 (Yi − Y )

Note: This property holds only for the linear regression model with
one regressor and an intercept.

Lecture 6: Linear Regression with One Regressor 48 / 65


Measures of Fit

The use of R 2

R 2 is usually the first statistics that we look at for judging how well
the regression model fits the data.
However, we cannot merely rely on R 2 for judge whether the
regression model is "good" or "bad".

Lecture 6: Linear Regression with One Regressor 49 / 65


Measures of Fit

The standard error of regression (SER) as a measure of fit

v
u n
u 1 X
SER = t ûi2 = s (17)
n−2
i=1

SER has the same unit of ui , which are the unit of Yi .


SER measures the average “size” of the OLS residual.
The root mean squared error (RMSE) is closely related to the SER:
v
u n
u1 X
RMSE = t ûi2
n
i=2

As n → ∞, SER = RMSE .

Lecture 6: Linear Regression with One Regressor 50 / 65


Measures of Fit

R 2 and SER for the application of test scores v.s. class sizes

In the application of test scores v.s. class sizes, R 2 is 0.051 or 5.1%,


which implies that the regressor STR explains only 5.1% of the
variance of the dependent variable TestScore.
SER is 18.6, which means that standard deviation of the regression
residuals is 18.6 points on the test.

Lecture 6: Linear Regression with One Regressor 51 / 65


The Least Squares Assumptions

Assumption 1: The conditional mean of ui given Xi is zero

E (ui |Xi ) = 0 (18)

If the equation above is satisfied, then Xi is called exogenous.


This assumption can be stated a little stronger as E (u|X = x) = 0 for
any value x, that is E (ui |X1 , . . . , Xn ) = 0.
It follows that E (u) = E (E (u|X )) = E (0) = 0.

Lecture 6: Linear Regression with One Regressor 52 / 65


The Least Squares Assumptions

An illustration of Assumption 1

Figure: An illustration of E (u|X = x) = 0

Lecture 6: Linear Regression with One Regressor 53 / 65


The Least Squares Assumptions

Correlation and conditional mean

E (ui |Xi ) = 0 ⇒ Cov(ui , Xi ) = 0

A simple proof:

Cov(ui , Xi ) = E (ui Xi ) − E (ui )E (Xi )


= E (Xi E (ui |Xi )) − 0 · E (Xi )
=0

where the law of iterated expectation is used twice at the second


equality.
It follows that
Cov(ui , Xi ) 6= 0 ⇒ E (ui |Xi ) 6= 0

Lecture 6: Linear Regression with One Regressor 54 / 65


The Least Squares Assumptions

Assumption 2: (Xi , Yi ) for i = 1, . . . , n are i.i.d.

Each pair of X and Y , i.e., (Xi , Yi ) for i = 1, . . . , n, is selected


randomly from the same joint distribution of X and Y .
The cases that may violate of the i.i.d. assumption:
Time series data, Cov(Yt , Yt−1 ) 6= 0. Serial correlation problem.
Spatial data, Cov(Yr , Ys ) 6= 0, where s and r refer to two neighboring
regions. Spatial correlation problem.

Lecture 6: Linear Regression with One Regressor 55 / 65


The Least Squares Assumptions

Assumption 3: large outliers are unlikely

0 < E (Xi4 ) < ∞ and 0 < E (Yi4 ) < ∞

A large outlier is an extreme value of X or Y .


On a technical level, if X and Y are bounded, then they have finite
fourth moments, i.e., finite kurtosis.
The essence of this assumption is to say that a large outlier can
strongly influence the results. So we need to rule out large outliers in
estimation.

Lecture 6: Linear Regression with One Regressor 56 / 65


The Least Squares Assumptions

The influential observations and the leverage effects

Figure: How an outlier can influence the OLS estimates

Lecture 6: Linear Regression with One Regressor 57 / 65

You might also like