0% found this document useful (0 votes)
25 views59 pages

Lecture1 STAT4355

Uploaded by

Sayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views59 pages

Lecture1 STAT4355

Uploaded by

Sayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Lecture 1.

Simple Linear Regression

STAT 4355 Applied Linear Models

Spring 2020
Outline

1. Introduction (Ch 1)
2. Simple Linear Regression Model (Ch 2.1)
3. Least-Squares Estimation of the Parameters (Ch. 2.2)
4. Hypothesis Testing on the Slope and Intercept (Ch 2.3)
5. Interval Estimation in Simple Linear Regression (Ch 2.4)
6. Prediction of New Observations (Ch 2.5)
7. Coefficient of Determination(Ch 2.6)
8. Regression Through the Origin (Ch 2.10)
9. Estimation by Maximum Likelihood (Ch 2.11)

2
Introduction

2
Time spent on Instagram
Instagram (IG or Insta) is a photo and video-sharing social
networking service owned by Facebook, Inc. Instagram users can
follow other users to add their content to a feed (Wikipedia).

Facebook machine learning


researcher, Jennifer, suspects
that users with a larger
number of followings tend to
spend more time on Instagram.

She randomly choose 25 IG


users, and collect the
information on their number of
followings and daily average
use.
From Google Search 3
Time spent on Instagram (cont’d)
Let y represent daily average IG use in minutes and x represent
number of followings. The observations are plotted with a scatter
diagram.

This display clearly suggests a relationship between IG daily use


and number of followings. The data points generally, but not
exactly, fall along a straight line.
4
Regression analysis
Regression analysis is a statistical technique for investigating and
modeling the relationship between variables.

The equation of a straight line relating these two variables is


y = β0 + β1 x,
where β0 is the intercept and β1 is the slope.

5
Regression analysis (cont’d)

The data points do not fall exactly on a straight line. Let the
difference between the observed value of y and the straight line
(β0 + β1 x) be an error ε.

The error may be made up of the effects of other variables on daily


IG use, measurement error, and so forth.

6
Regression analysis (cont’d)

Now we have a simple linear regression model

y = β0 + β1 x + ε

I y : dependent variable or response variable


I x : independent variable, predictor, or regressor variable
I β0 : intercept
I β1 : slope
I ε : error

7
The Origin of the term “Regression”

“Regression analysis was first developed by Sir Francis Galton in


the latter part of the 19th century.
Galton had studied the relation between heights of parents and
children and noted that the heights of children of both tall and
short parents appeared to “revert” or “regress” to the mean of the
group.
He considered this tendency to be a regression to “mediocrity”.
Galton developed a mathematical description of this regression
tendency, the precursor of today’s regression models.

The term regression persists to this day to describe statistical


relations between variables.”

from Kutner’s book

8
Regress to the mean: Galton’s Example

I The regression line predicts that, on average, sons’ heights


would have moved closer to the average height (regress to
the mean).

9
Regression Effect & Fallacy
I Regression effect: It’s unlikely, on average, to always remain
far out from the mean (in either direction).
I Attributing these expected variations to an outside in
influence is the Regression fallacy.

I Examples
I In a test-retest situation, those who scored below the mean on
the first test tend on average to show improvement while those
who scored above the mean tend on on average to fall back.

I In MLB, the 1st year mean batting average for “Rookies of the
Year” is 0.285. In their 2nd year, these “Rookies of the Year”
had a mean b.a. of 0.272. This is called the “sophomore
slump”, which is blamed on stress of limelight, players get big
heads, etc.
I 30-yr. mean batting average (b.a.) for major leagues is 0.260.

10
Use of Regression
Regression models are used for several purposes, including the
following:
1. Data description
I Engineers and scientists frequently use equations to summarize
or describe a set of data. Regression analysis is helpful in
developing such equations.
I For example, we may collect a considerable amount of data on
IG users’ daily use and followings, and a regression model
would probably be a much more convenient and useful
summary of those data than a table or even a graph.
2. Prediction and estimation
I Many applications of regression involve prediction of the
response variable.
I For example, we may wish to predict daily average IG use for a
specified number of followings.
I However, even when the model form is correct, poor estimates
of the model parameters may still cause poor prediction
performance.
11
Simple Linear Regression Model

12
Simple Linear Regression Model

y = β0 + β1 x + ε

I β0 and β1 are unknown constants.


I ε is a random error.
I The errors are assumed to have mean zero and unknown
variance σ 2 .
I Additionally, we assume that the errors are uncorrelated and
independent of the predictor.

13
Simple Linear Regression Model (cont’d)

I For each possible value of x, there is a probability distribution.

E (y |x) = β0 + β1 x
Var(y |x) = Var(β0 + β1 x + ε) = σ 2

Thus, the mean of y is a linear function of x although the


variance of y does not depend on the value of x.
Furthermore, because the errors are uncorrelated, the
responses are also uncorrelated.

I We call β0 and β1 regression coefficients.


β1 : change in the mean of y produced by a unit change in x
β0 : mean of y when x = 0

14
Least-Squares Estimation of the Parameters

15
Least Squares Estimation

I The parameters β0 and β1 are unknown and must be


estimated using sample data.
I We have n pairs of data: (y1 , x1 ), (y2 , x2 ), . . . , (yn , xn ).

16
Least Squares Estimation (cont’d)
Method of least squares : we want to minimize the sum of the
squares of the differences between the observations yi and the
straight line.

17
Least Squares Estimation (cont’d)

A simple linear regression model

yi = β0 + β1 xi + εi for i = 1, 2, . . . , n

Then, the least-squares criterion is


n
X
S(β0 , β1 ) = (yi − β0 − β1 xi )2
i=1

By minimizing S(β0 , β1 ), we obtain the estimators of β0 and β1 ,


say βb0 and βb1 .

18
Estimation of β0 and β1

The least-squares estimators of β0 and β1 , say βb0 and βb1 , must


satisfy
n
∂S X
= −2 (yi − βb0 − βb1 xi ) = 0
∂β0
βb0 ,βb1 i=1
n
∂S X
= −2 (yi − βb0 − βb1 xi )xi = 0
∂β1
βb0 ,βb1 i=1

19
Estimation of β0 and β1 (cont’d)
Simplifying two equations yields
Xn n
X
nβb0 + βb1 xi = yi
i=1 i=1
n
X Xn Xn
βb0 xi + βb1 xi2 = yi xi
i=1 i=1 i=1

⇒ least-squares normal equations

The solution to the normal equations is


( ni=1 yi )( ni=1 xi )
Pn P P
i=1 y i x i −
βb1 = Pn n 2 ,
2 − ( i=1 xi )
Pn
i=1 i x n
βb0 = y − βb1 x
n n
1X 1X
where y = yi and x = xi .
n n
i=1 i=1
20
Estimation of β0 and β1 (cont’d)
Let
n
X
Sxx = (xi − x)2 = Corrected sum of squares of X
i=1
n
X
Sxy = (xi − x)(yi − y ) = Corrected sum of cross products of X and Y
i=1
Xn
Syy = (yi − y )2 = Corrected sum of squares of Y
i=1

Then,
Sxy
βb1 =
Sxx

21
Estimation of β0 and β1 (cont’d)

βb0 and βb1 are the least-squares estimators of the intercept and
slope, respectively.

The fitted simple linear regression model is then

E (y |x) = βb0 + βb1 x.

22
Example: Rocket Propellant Data
A rocket motor is manufactured by bonding an igniter propellant
and a sustainer propellant together inside a metal housing.

The shear strength of the bond between the two types of


propellant is an important quality characteristic.

It is suspected that shear strength is related to the age in weeks of


the batch of sustainer propellant.

Observation Age of Propellant (weeks) Shear Strength (psi)


i xi yi
1 15.50 2158.70
2 23.75 1678.15
.. .. ..
. . .
20 21.50 1753.70

23
Rocket Propellant Example (cont’d)

Fit a linear regression model!


24
Rocket Propellant Example (cont’d)

Needed Calculations for LSEs:

i xi yi xi2 xi yi
1 15.50 2158.70 240.25 33459.85
2 23.75 1678.15 564.06 39856.06
.. .. .. .. ..
. . . . .
20 21.50 1753.70 462.25 37704.55
Total 266.75 42627.15 4672.44 527388.90

25
Rocket Propellant Example (cont’d)

From the textbook pp 15-17,

y = 2, 131.3575, x = 13.3625, n = 20
X n n
X
yi = 42, 627.15, xi = 267.25
i=1 i=1
n
X n
X
xi yi = 528, 492.64, xi2 = 4, 677.69
i=1 i=1

26
Rocket Propellant Example (cont’d)

n
X
Sxx = (xi − x)2 = 1, 106.56
i=1
n
X
Sxy = (xi − x)(yi − y ) = −41, 112.65
i=1

βb1 = −37.15 and βb0 = 2, 627.82

Thus, E (y |x) = 2, 627.82 − 37.15x

27
Rocket Propellant Example (cont’d)

How to interpret βb0 and βb1 ?

The slope −37.15 is the average weekly decrease in propellant


shear strength due to the age of the propellant.

The intercept 2, 627.82 represents the shear strength in a batch of


propellant immediately following manufacture.

28
Residual ei

The difference between the observed value yi and the


corresponding fitted value ybi is a residual.

The ith residual is

ei = yi − ybi = yi − (βb0 + βb1 xi ) for i = 1, 2, . . . , n

Observation Observed Value Fitted Value Residual


i yi ybi ei
1 2158.70 2051.94 106.76
2 1678.15 1745.42 −67.27
.. .. .. ..
. . . .
20 1753.70 1829.02 −75.32

29
Residual ei (cont’d)

Illustration of Residuals

30
Residual ei (cont’d)

After obtaining the least-squares fit, a number of interesting


questions come to mind:
I How well does this equation fit the data?
I Is the model likely to be useful in prediction?
I Are any of the basic assumptions (such as constant variance
and uncorrelated errors) violated?
⇒ model adequacy checking (Later!)

The residuals play a key role in evaluating model adequacy.

31
R programming: Reading data and fitting a model

propellant <- read.csv("data/lecture1/Rocket Prop.csv")


x <- propellant[,3]
y <- propellant[,2]
fit <- lm(y~x)
summary(fit)

32
R output
Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-215.98 -50.68 28.74 66.61 106.76

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2627.822 44.184 59.48 < 2e-16 ***
x -37.154 2.889 -12.86 1.64e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 96.11 on 18 degrees of freedom


Multiple R-squared: 0.9018,Adjusted R-squared: 0.8964
F-statistic: 165.4 on 1 and 18 DF, p-value: 1.643e-10

33
Properties of the LSE

Goal : Find E (βb0 ), Var(βb0 ), E (βb1 ), Var(βb1 ).

βb0 and βb1 are linear combinations of the observations yi .


n
X
(xi − x)(yi − y ) n
Sxy i=1
X
βb1 = = n = ci yi ,
Sxx X
2
(xi − x) i=1

i=1

xi − x
with ci = .
Sxx

34
Properties of the LSE (cont’d)

E (βb1 ) = β1

35
Properties of the LSE (cont’d)

σ2
Var(βb1 ) =
Sxx

36
Properties of the LSE (cont’d)

E (βb0 ) = β0

37
Properties of the LSE (cont’d)

1 x2 
Var(βb0 ) = σ 2 +
n Sxx

38
Properties of the least-squares fit
I The sum of the residuals in any regression model that
n
X
contains an intercept β0 is always zero, that is, ei = 0.
i=1

I The sum of the observed values yi equals the sum of the


n
X Xn
fitted values ybi , or yi = ybi .
i=1 i=1

39
Properties of the least-squares fit (cont’d)
I The least-squares regression line always passes through the
centroid [the point (y , x)] of the data.

I The sum of the residuals weighted by the corresponding value


of the regressor variable always equals zero, that is,
Xn
xi ei = 0.
i=1

40
Properties of the least-squares fit (cont’d)

I The sum of the residuals weighted by the corresponding fitted


n
X
value always equals zero, that is, ybi ei = 0.
i=1

41
Estimation of σ 2

I For example, we have a sample: x1 , x2 , . . . , xn


n
1 X
sample variance s 2 = (xi − x)2 and E (s 2 ) = σ 2
n−1
i=1

I Similarly, our estimated variance is


n n
2 1 X 2 1 X
σ
b = ei = (yi − ybi )2 = s 2
n−2 n−2
i=1 i=1

42
Estimation of σ 2 (cont’d)

n
X
SSE = ei2 Sum of Squared Error
i=1
Xn
= (yi − βb0 − βb1 xi )2
i=1
Xn
= (yi − y + βb1 x − βb1 xi )2
i=1
Xn n
X n
X
= (yi − y )2 − 2βb1 (yi − y )(xi − x) + βb12 (xi − x)2
i=1 i=1 i=1
Sxy  S 2
xy
= Syy −2 Sxy + Sxx
Sxx Sxx
2
Sxy
= Syy − = Syy − βb1 Sxy = Syy − βb12 Sxx
Sxx

43
Estimation of σ 2 (cont’d)

 1 Xn  1
σ2) = E
E (b ei2 = E (SSE ) = σ 2 (why?)
n−2 n−2
i=1

Thus, an unbiased estimator of σ 2 is


n
1 X 2 SSE
b2 =
σ ei = = MSE Mean Squared Error
n−2 n−2
i=1

SSE has n − 2 degrees of freedom because two degrees of freedom


are associated with βb0 and βb1 involved in obtaining ybi .

44
45
Rocket Propellant Example: Computing MSE

From the textbook pp 21-22,


n = 20
Sxx = 1, 106.56, Sxy = −41, 112.65, Syy = 1, 693, 737.60

2
Sxy
SSE = Syy − = 166, 402.65
Sxx
SSE 166, 402.65
b2 = MSE =
σ = = 9, 244.59
n−2 18
p
σ
b = 9, 244.59 = 96.15

From R output, Residual standard error = 96.11.

46
Alternative Form of the Model

Suppose that we redefine the regressor variable xi as the deviation


from its own average, say xi − x.

The regression model then becomes

yi = β0 + β1 (xi − x) + β1 x + εi
= (β0 + β1 x) + β1 (xi − x) + εi
= β0∗ + β1 (xi − x) + εi ,

where β0∗ = β0 + β1 x.

Note that βb0∗ = y .

The fitted model is E (y |x) = y + β̂1 (x − x).

47
Hypothesis Testing on the Slope and Intercept

48
Hypothesis Testing on the Slope and Intercept

We are often interested in testing hypotheses and constructing


confidence intervals about the model parameters.

We need an additional assumption that the model errors εi are


normally distributed. The errors are normally and independently
distributed with mean 0 and variance σ 2 :

εi ∼ N(0, σ 2 )

49
Hypothesis Testing for β1
H0 : β1 = β10 vs H1 : β1 6= β10

We assume that εi ∼ N(0, σ 2 ). Hence, yi ∼ N(β0 + β1 xi , σ 2 ).


n
X  σ2 
βb1 = ci yi ⇒ βb1 ∼ N β1 , .
Sxx
i=1
Thus,
βb − β1
p1 ∼ N(0, 1)
σ 2 /Sxx
Under H0 , the test statistic

βb1 − β10
Z0 = p ∼ N(0, 1)
σ 2 /Sxx

Do we know σ 2 ?

50
Hypothesis Testing for β1 (cont’d)
Typically σ 2 is unknown.

b2 is an unbiased estimator for σ 2 . (page 46)


We know that σ

σ2
(n − 2)b
∼ χ2n−2 and σ
b2 and βb1 are independent
σ2
(why? will be discussed later).

βb1 − β10 H0
t0 = p ∼ tn−2
b2 /Sxx
σ
Rejection region: |t0 | > tα/2,n−2

51
Hypothesis Testing for β1 (cont’d)

The denominator of the test statistic is often called the estimated


standard error, or more simply, the standard error of the slope.
That is, s
b2
σ
se(βb1 ) =
Sxx
Therefore, we often see t0 written as

βb1 − β10
t0 =
se(βb1 )

52
Hypothesis Testing for β0
A similar procedure can be used to test hypotheses about β0 .

To test
H0 : β0 = β00 vs H1 : β0 6= β00 ,
we would use the test statistic

βb0 − β00
t0 = ,
se(βb0 )

where s
1 x2 
se(βb0 ) = b2
σ +
n Sxx
is the standard error of the intercept.

We reject H0 if |t0 | > tα/2,n−2

53
Rocket Propellant Example: Testing H0 : β1 = 0
From the textbook p.25,
Test H0 : β1 = 0.

b2 = 9, 244.59
We know βb1 = −37.15, Sxx = 1, 106.56, and σ
Then, s
b2
σ
se(βb1 ) = = 2.89
Sxx
Test statistic is
βb1
t0 = = −12.85
se(βb1 )
If we choose α = 0.05, the critical value of t is t0.025,18 = 2.101.

Thus, we would reject H0 and conclude that there is a linear


relationship between shear strength and the age of the propellant.

54
Rocket Propellant Example: Testing H0 : β0 = 0
Test H0 : β0 = 0.

We know βb0 = 2, 627.82, n = 20, x = 13.3625, Sxx = 1, 106.56


b2 = 9, 244.59
and σ

Then, s
1 x2 
se(βb0 ) = b2
σ + = 44.20
n Sxx
Test statistic is
βb0
t0 = = 59.45
se(βb0 )
If we choose α = 0.05, the critical value of t is t0.025,18 = 2.101.

Thus, we would reject H0 .

55
Testing Significance of Regression

The test for significance of regression is a test to determine if


there is a linear relationship between the response y and the
regressor x.

That is,
H0 : β1 = 0 vs H1 : β1 6= 0
This is a special case of the hypotheses H0 : β1 = β10 .

Failing to reject H0 : β1 = 0 implies that there is no linear


relationship between x and y .

56
Testing Significance of Regression (cont’d)
1) Situations where H0 : β1 = 0 is not rejected

Left: x is of little value in explaining the variation in y and the


best estimator of y for any x is y .
Right: the true relationship between x and y is not linear.

Thus, failing to reject H0 is equivalent to saying that there is no


linear relationship between y and x.

57
Testing Significance of Regression (cont’d)
2) Situations where H0 : β1 = 0 is rejected

If H0 is rejected, this implies that x is of value in explaining the


variability in y .

However, rejecting H0 could mean:


Left: the straight-line model is adequate.
Right: even though there is a linear effect, better results could be
obtained with the addition of higher order polynomial terms in x.
58

You might also like