0% found this document useful (0 votes)
71 views23 pages

Simple Linear Regression, Cont.: BIOST 515 January 13, 2004

This document discusses simple linear regression and key concepts such as sums of squares, analysis of variance tables, coefficient of determination, prediction, and maximum likelihood estimation. It breaks the total error in a regression model down into residual error and error explained by regression. It also discusses how to perform an F-test and t-test to test if a regression coefficient is equal to zero and how this relates to the sums of squares.

Uploaded by

HazemIbrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views23 pages

Simple Linear Regression, Cont.: BIOST 515 January 13, 2004

This document discusses simple linear regression and key concepts such as sums of squares, analysis of variance tables, coefficient of determination, prediction, and maximum likelihood estimation. It breaks the total error in a regression model down into residual error and error explained by regression. It also discusses how to perform an F-test and t-test to test if a regression coefficient is equal to zero and how this relates to the sums of squares.

Uploaded by

HazemIbrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Lecture 3

Simple linear regression, cont.


BIOST 515

January 13, 2004


1

Breakdown of sums of squares

The simplest regression estimate for Yi is Ȳ (an intercept-only


model). Yi − Ȳ is the total error and can be broken down
further by

Yi − Ȳ = (Yi − Ŷi) + (Ŷi − Ȳ )

total error = residual error + error explained by regression


2

(x , y )
● i i

yi − y^i

●(xi , y^i) yi − y
y

y^i − y

(x , y)

x
3

If we square the previous expression and sum over all


observations, we get

N
X N
X N
X
(Yi − Ȳ )2 = (Ŷi − Ȳ )2 + (Yi − Ŷi)2
i=1 i=1 i=1

=
SST O = SSR + SSE,
where SST O is the corrected sums of squares of the
observations, SSR is the sum of squares regression and SSE
is the sums of squares error.
4

Intuitively, if SSR is ’large’ compared to SSE, then β1 is


significantly different than zero.
Recall that Z2 = SSE
σ2
∼ χ2
N −2. It can also be shown that,
under H0, Z1 = SSR
σ2
χ2
1 and Z1 and Z2 are independent.
Under H0,
Z1/1 SSR
F = = ∼ F1,N −2.
Z2/(N − 2) SSE/(N − 2)
If the observed statistic

Fobs > F1,N −2,1−α,

then we reject H0 : β1 = 0.
5

The calculations for the F-test are usually presented in an


analysis of variance (ANOVA) table.
Source of Sums of squares Degrees of Mean E[Mean square]
variation freedom square
PN PN
Regression SSR = i=1 (Ŷi − Ȳ )2 1 SSR σ 2 + β12 i=1 (Xi − X̄)2
PN 2 SSE 2
Error SSE = i=1 (Ŷi − Yi ) N-2 N −2 σ
PN 2
Total SST O = i=1 (Yi − Ȳ ) N-1

lm1=lm(Mortality~Education,data=smsa)
anova(lm1)

Analysis of Variance Table

Response: Mortality
Df Sum Sq Mean Sq F value Pr(>F)
Education 1 59662 59662 20.508 3.008e-05 ***
Residuals 58 168737 2909
6

Fobs = 59662/(168737/58) = 20.51 > F1,58,.95 = 4.01.


Therefore, we reject H0 : β1 = 0.
To get SSTO:

alm1=anova(lm1)
SSTO=sum(alm1$"Sum Sq")
print(SSTO)

[1] 228398.3

Where do the degrees of freedom come from?


7

In class, we will show that the t-test and F-test are equivalent
for H0 : β1 = 0. However, the t-test is somewhat more
adaptable as it can be used for one-sided alternatives. We can
also easily calculate it for different hypothesized values in H0.
One-sided t-test for the SMSA example:
H0 : β1 = 0 vs. HA : β1 < 0.

βˆ1
tobs = = −4.529
ˆ βˆ1)
se(
−2
tN
α = −1.627 > −4.529 therefore reject H0 in favor of HA.
8

Coefficient of Determination

2 SSR SSE
R = =1−
SST O SST O
• Often referred to as the proportion of variation explained be
the predictor

• Because 0 ≤ SSE ≤ SST O, 0 ≤ R2 ≤ 1

• As predictors are added to the model R2 will not decrease

• Large R2 does not necessarily imply a “good” model


9

• R2 does not
? measure the magnitude of the slope
? measure the appropriateness of the model

From SMSA example with education as a predictor of


mortality:

R2=alm1$"Sum Sq"[1]/SSTO
print(R2)

0.261217

R2 = 0.26
10

Prediction

Sometimes, we would like to be able to predict the outcome


for a new value of the predictor. The new outcome is defined
as
ynew = β0 + β1xnew + 
with an estimated value of

yd ˆ ˆ
new = β0 + β1xnew + 
ˆ.

The expected value is

E[yd
new ] = β0 + β1xnew ,
11

and the variance is


( )
2 1 (xnew − x̄)2
var(yd
new ) = σ 1 + + PN .
N i=1 (xi − x̄)2

The 100(1 − α)% confidence interval is given by


( )1/2
2
1 (xnew − x̄)
βˆ0 +βˆ1xnew ±tN −2,1−α/2 ×σ̂× 1 + + PN .
N i=1(xi − x̄)
2

Note: We have assumed  ∼ N (0, σ 2) to construct the


prediction interval. If the error terms are not close to normal,
12

then the prediction interval could be misleading. This is not


the case for the interval for the fitted response which only
requires approximate normality for βˆ0 and βˆ1.
13


1100


1050


● ●
● ●
1000

● ●
● ●
● ● ●


Mortality

● ● ● ● ●
● ● ● ●
950

● ● ●


● ●
● ●

● ● ●
● ● ● ●
900


● ● ●
● ● ●

● ● ●
● ● ●

850

● ●

800

9.0 9.5 10.0 10.5 11.0 11.5 12.0

Education
14

Maximum Likelihood Estimation

Assumptions about the distribution of i are not necessary for


least squares estimation. If we assume that i ∼iid N (0, σ 2),
then Yi ∼iid N (β0 + β1xi, σ 2) and

2 1 1
p(Yi|β0, β1, σ ) = √ exp{− 2 (Yi − (β0 + β1xi)2}.
2πσ 2 2σ
The likelihood is then equal to
 N N
2 1 1 X
L(β0, β1, σ ) = √ exp{− 2 (Yi −(β0 +β1xi)2}.
2πσ 2 2σ i=1
15

The maximum likelihood estimators (MLEs) are those values


of β0, β1 and σ 2 that maximize L or, equivalently, l = log(L).

N
2 1 X
l ∝ −N/2 log(σ ) − 2 (Yi − (β0 + β1xi))2.
2σ i=1

The MLEs for the simple linear regression model are given by

βˆ0 = Ȳ − βˆ1¯(x),
PN
ˆ i=1 Yi(xi − x̄)
β1 = PN
(x − x̄)2
i=1 i
16

and
N
1
σˆ2 =
X
(Yi − βˆ0 − βˆ1xi)2.
N i=1

The MLEs for β0 and β1 are the same as the least squares
estimators. However the MLE for σ 2 is not. Recall that the
least squares estimate of σ 2 is unbiased. The MLE of σ 2 is
biased (although it is consistent).
17

Considerations in the use of regression

1. Regression models are only interpretable over the range of


the observed data.

2. The disposition of x plays an important role in the model fit.

3. Outliers or erroneous data can disturb the model fit.

4. Just because the regression results indicate that two variables


are related, there is no evidence about causality.
18

Multiple Linear Regression

Example:
y = β0 + β1x1 + β2x2 + ,

E(y) = 2 + 8x1 + 10x2

β1 indicates the change in the expected response per unit


change in x1 when x2 is held constant. Likewise, β2 represents
the change in the expected response per unit change in x2
when x1 is held constant.
19

10
8
6
E(y)

x2

4
x2

2
0
x1
0 2 4 6 8 10

x1
20

We now consider the model

yi = β0 + β1xi1 + · · · + βpxip + i, (1)

i = 1, . . . , n, E[i] = 0, var(i) = σ 2 and cov(i, j ) = 0. The


parameter βj , j = 1 . . . , p represents the expected change in yi
per unit of change in xj holding the remaining predictors
xi(i 6= j) constant.
21

We can use the model defined in (1) to describe more


complicated models. For example, we might be interested in a
cubic polynomial model,

y = β0 + β1x + β2x2 + β3x3 + .

If we let x1 = x, x2 = x2 and x3 = x3, then we can rewrite


the regression model as

y = β0 + β1x1 + β2x2 + β3x3 + ,

which is a multiple linear regression model with 3 predictors.


How do we interpret this model?
22

Interactions

We may also want to include interaction effects

y = β0 + β1x1 + β2x2 + β3x1x2.

If we let x3 = x1x2, this model is equivalent to

y = β0 + β1x1 + β2x2 + β3x3.

You might also like