Lect 3
Lect 3
4 Measures of Fit
Research question:
Randomly choose 42 students and divide them into two classes, with
one having 20 students and another having 22.
They are taught with the same subject and by the same teachers.
Randomization ensures that it is the difference in class sizes of the two
classes that is the only factor affecting test scores.
Compute the expected values of test scores, given the different class
sizes.
E(TestScore|ClassSize = 20)
E(TestScore|ClassSize = 22)
We can lump all these factors into a single term, and set up a simple
linear regression model as follows,
Interpret β1
∆TestScore = β1 ∆ClassSize
∆TestScore
β1 =
∆ClassSize
That is, β1 measures the change in the test score resulting from a
one-unit change in the class size.
Marginal effect
dTestScore
β1 =
dClassSize
We often call β1 as the marginal effect of the class size on the test
score.
Interpret β0
Yi = β0 + β1 Xi + ui , for i = 1, . . . , n (4)
Yi is called the dependent variable, the regressand, or the LHS
(left-hand side) variable.
Xi is called the independent variable, the regressor, or the RHS
(right-hand side) variable.
Yi = β0 + β1 Xi + ui , for i = 1, . . . , n
Ordinary
It means that the OLS estimator is a very basic method, from which
we may derive some variations of the OLS estimator.
Other least squares estimators: the weighted least squares (WLS), and
the generalized least squares (GLS).
Least
ûi = Yi − b0 − b1 Xi
Squares
n
∂S X
(β̂0 , β̂1 ) = (−2)(Yi − β̂0 − β̂1 Xi ) = 0 (6)
∂b0
i=1
n
∂S X
(β̂0 , β̂1 ) = (−2)(Yi − β̂0 − β̂1 Xi )Xi = 0 (7)
∂b1
i=1
X X X X X
(Xi − X )(Yi − Y ) = Xi Yi − X Yi − Y Xi + XY
i i i i i
X
= Xi Yi − 2nX Y + nX Y
i
X
= Xi Yi − nX Y
i
!
1 X X X
= n Xi Yi − Xi Yi
n
i i i
1
− X )2 =
P 2
n i Xi − ( i Xi )2 .
P P
Similarly, we can show that i (Xi n
The sampleP
covariance of X and Y is
1 n
sXY = n−1 i=1 (Xi − X )(Yi − Y )
1 Pn
The sample variance of X is sX2 = n−1 i=1 (Xi − X )2
β̂1 can also be written as
sXY
β̂1 =
sX2
Pn
i=1 (Xi − X )(Yi − Y) sXY
β̂1 = Pn = (10)
i=1 (Xi − X )
2 sX2
β̂0 = Y − β̂1 X (11)
Population Sample
Regression functions β0 + β1 Xi β̂0 + β̂1 Xi
Parameters β0 , β1 β̂0 , β̂1
Errors vs residuals ui ûi
The regression model Yi = β0 + β1 Xi + ui Yi = β̂0 + β̂1 Xi + ûi
Scatterplot
Regression analysis
n
X
ûi = 0 (12)
i=1
n
1 X
Ŷi = Y (13)
n
i=1
n
X
ûi Xi = 0 (14)
i=1
TSS = ESS + SSR (15)
Goodness of Fit: R2
ESS SSR
R2 = =1− (16)
TSS TSS
R 2 ∈ [0, 1]
R 2 = 0 when β̂1 = 0.
R 2 = rXY
2
R 2 = rXY
2
(cont’d)
n
X n
X
ESS = (Ŷi − Y )2 = (β̂0 + β̂1 Xi − Y )2
i=1 i=1
Xn
= (Y − β̂1 X + β̂1 Xi − Y )2
i=1
n h
X i2 n
X
= β̂1 (Xi − X ) = β̂12 (Xi − X )2
i=1 i=1
Pn 2 X
n
i=1 (Xi − X )(Yi − Y)
= Pn 2
(Xi − X )2
i=1 (Xi − X ) i=1
Pn 2
i=1 (Xi − X )(Yi − Y )
= Pn 2
i=1 (Xi − X )
R 2 = rXY
2
(cont’d)
It follows that
Pn 2
2 ESS i=1 (Xi − X )(Yi − Y ) 2
R = = Pn 2
Pn 2
= rXY
TSS i=1 (X i − X ) i=1 (Yi − Y )
Note: This property holds only for the linear regression model with
one regressor and an intercept.
The use of R 2
R 2 is usually the first statistics that we look at for judging how well
the regression model fits the data.
However, we cannot merely rely on R 2 for judge whether the
regression model is "good" or "bad".
v
u n
u 1 X
SER = t ûi2 = s (17)
n−2
i=1
As n → ∞, SER = RMSE .
R 2 and SER for the application of test scores v.s. class sizes
An illustration of Assumption 1
A simple proof: