0% found this document useful (0 votes)
9 views29 pages

Linear Regresssion

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views29 pages

Linear Regresssion

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Example: Toluca Company

Toluca manufactures replacement parts for refrigeration equipment.


Parts are manufactured periodically in varying lot sizes. The production
of a particular part involves setting up the production process and
machine and assembly operations. Based on 25 recent production
runs, the company wants to determine a relationship between lots size
and work hours.

1
600
y = 3.5702x + 62.366
R2 = 0.8215
500

400

300

200

100

0
0 20 40 60 80 100 120 140

2
Simple Linear Regression Model
Basic model:
Yi = 0 + 1Xi + i

where
Yi is the response variable (or dependent variable).
Xi is the predictor variable (or independent or explanatory variable).
i is a random error term with E[i]=0 and Var[i]=2.
i and j are uncorrelated for ij.
0 and 1 are parameters (intercept and slope).

NB: Linear regression model means linear in the parameters.


For example, Yi = 0 + 1Xi2 + i is a linear regression model.

3
Simple Linear Regression Model
Fixed versus random X
Some results for regression analysis assume that X is fixed (controlled).
This requirement can often be relaxed.

4
Simple Linear Regression Model (with fixed X)
Regression function:
E( Yi | x i ) = E(0 + 1x i + i ) = 0 + 1x i + E(i )
= 0 + 1x i

Variance of Yi :

Var ( Yi | x i ) = Var (i ) = 2

Correlation of Yi’s:
Corr( Yi , Yj ) = 0, for i  j

5
Simple Linear Regression Model (with fixed X)

Y=0+1X
Y

i
(Xi,Yi)

6
Estimation: Least Squares (LS) method
Principle: Minimize the sum of squared error of regression model !
n n
Q(0 ,1 ) =   =  ( Yi − 0 − 1Xi )2
2
i
i=1 i=1

Derivation:
Q n n n
=  − 2( Yi − 0 − 1Xi ) = 0
0 i=1
Y
i =1
i = nˆ 0 + ˆ 1  X i
i =1

Q n n n n
=  − 2 Xi ( Yi − 0 − 1Xi ) = 0
1 i=1
 i i 0  i 1 i
X Y
i =1
= ˆ
 X + ˆ
 X 2

i =1 i =1

Normal equations

ˆ 1 =
 ( X − X)( Y − Y )
i i

 ( X − X)i
2

ˆ 0 = Y − ˆ 1X

7
Example: Toluca Company
Matlab code:
>> Ex = mean(X);
>> Ey = mean(Y);
>> b1 = sum( (X-Ex).*(Y-Ey) ) / sum( (X-Ex).^2 )
b1 =
3.5702
>> b0 = Ey - b1*Ex
b0 =
62.3659
>> plot(X,Y,'o'), hold on
>> plot( [20,120], [b0+b1*20, b0+b1*120])

600

500

400

300

200

100
20 40 60 80 100 120 8
Point estimation of mean response
For a given X, the mean response is estimated as

Ŷ = ˆ 0 + ˆ 1X

The residual of the i’th observation is


i = Yi − Ŷi

9
Properties of the LS regression function
n
1. The sum of residuals is zero:  i = 0
i=1

2. The sum of squared residuals  2


i is miminum.

3. The regression line always goes through ( X, Y ).

10
Estimation of the error variance 2
The error variance is estimated as:

1 n
S =
2
  i i
n − 2 i=1
( Y −Ŷ ) 2

Why do we divide by (n-2)?


Two degrees of freedom are lost due to estimation of 0 and 1.

11
Test of the Simple Linear Regression model

Is there a significant relationship between Y and X?

or equivalently

Can we assume that 1 = 0 ?

To test the hypothesis 1 = 0 versus the alternative 1  0, we must


know the sampling distribution of ̂1 .

12
^
Sampling distribution of 1
1 is estimated as

ˆ 1 =
 ( X − X)(Y − Y ) =  k Y
i i
where k i =
( Xi − X )
 ( X − X)
i
2 i i
 i
( X − X ) 2

If Yi is normal distributed, then ̂1 is also normal distributed.

It can be shown that:


E(ˆ 1 ) = 1 (unbiased)
 2
Var (ˆ 1 ) =
 ( X i − X )2

13
^
Sampling distribution of 1
ˆ 1 − 1
~ N(0,1) − distribution
Var (ˆ 1 )

What if 2 (and Var (ˆ 1 ) ) is estimated?


1 n S2
ˆ = S =  ( Yi −Ŷi )2 S =
2 2 2

n − 2 i=1  ( Xi − X )2
ˆ 1

ˆ 1 − 1
~ t(n − 2) − distribution
Sˆ 1

14
t-test of the hypothesis 1=0
Null hypothesis: 1=0
Alternative: 10

Under the null hypothesis:


ˆ 1
has a t(n − 2) − distribution
Sˆ 1

The null hypothesis is rejected at significance level  if

ˆ 1
 t1− 2
Sˆ 1

/2 /2

t1-/2
15
Example: Toluca Company
Matlab code:
>> b0 = 62.3659;
>> b1 =3.5702;
>> n = 25;
>> e = Y - (b0+b1*X);
>> Se = sqrt( sum(e.^2/(n-2) ));
>> Sb1 = Se / sqrt(sum((X-mean(X)).^2));
>> p=2*tcdf(-abs(b1/Sb1),n-2))
p =
4.4488e-010

16
^
Sampling distribution of 0
̂ 0 is normal distributed with

E(ˆ 0 ) =  0 (unbiased)
 1 X 2 
ˆ
Var ( 0 ) =   +
2
2
 n  i
( X − X ) 

Therefore

ˆ 0 −  0
~ t(n − 2) − distribution
S ˆ 0

1 X2 
where S 2
=S  +
2
2
 i
ˆ 0 
 n ( X − X ) 

17
t-test of the hypothesis 0=0
Null hypothesis: 0=0 (regression line goes through the origin)
Alternative: 00

Under the null hypothesis:


ˆ 0
has a t(n − 2) − distribution
Sˆ 0

The null hypothesis is rejected at significance level  if

ˆ 0
 t 1−  2
Sˆ 0

/2 /2

t1-/2
18
Example: Toluca Company
Matlab code
>> b0 = 62.3659;
>> b1 =3.5702;
>> n = 25;
>> e = Y - (b0+b1*X);
>> Se = sqrt( sum(e.^2/(n-2) ));
>> Sb0 = sqrt( Se^2*(1/n + ...
mean(X)^2/(sum((X-mean(X)).^2)) ) );
>> p=2*tcdf(-abs(b0/Sb0),n-2))
p =
0.0267

19
Analysis of Variance (ANOVA) of regression model

Y=0+1X
Y

Yi − Ŷi
Yi − Y
Ŷi − Y
Y

20
Variance breakdown
Yi − Y = ( Yi − Ŷi ) + ( Ŷi − Y )

It can be shown that

 i
( Y − Y ) 2
=  i i  i
( Y − Ŷ ) 2
+ ( Ŷ − Y ) 2

SST = SSE + SSR

where
– Total sum of squares: SST =  ( Yi − Y )2

– Error sum of squares: SSE =  ( Yi − Ŷi )2

– Regression sum of squares: SSR =  ( Ŷi − Y )2

21
Breakdown of degrees of freedom
n
Total sum of squares: SST =  ( Yi − Y )2 df = n-1
i =1

There are n different Y’s, but only (n-1) degrees of freedom since the
mean value is estimated.

n
Error sum of squares: SSE =  ( Yi − Ŷi )2 df = n-2
i =1

There are n different Y’s, but 2 degrees of freedom are used to


estimate the model Ŷ = ˆ 0 + ˆ 1X

n
Regression sum of squares: SSR =  ( Ŷi − Y )2 df = 1
i =1

Ŷi has 2 degrees of freedom, but one is lost to estimation of the mean
value
22
ANOVA Table

Source of Sum of df Mean squares


Variation squares
n n
Regression  i
( Ŷ − Y ) 2
1  i
( Ŷ
i =1
− Y ) 2

i =1
n

Error  ( Y − Ŷ )
i =1
i i
2
n-2
1 n

n − 2 i=1
( Yi − Ŷi ) 2

n
1 n
Total  (Y − Y)
i =1
i
2
n-1 
n − 1 i=1
( Yi − Y ) 2

23
F-distribution
F(1,18)-distribution
4

3.5

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3

24
F-test of 1 = 0
The null hypothesis is rejected for large F:

SSR / 1
 F1−  (1,n − 2)  reject H0 F-test is one-sided !
SSE /(n − 2)

F1−  (1, n − 2)
Matlab code:
>> SSE = sum( (Y-(b0+b1*X)).^2 );
>> SSR = sum( (b0+b1*X-mean(Y)).^2 );
>> p=1-fcdf(SSR/(SSE/23),1,23)
p =
4.4489e-010

P-value of F test is identical to P-value for t-test


The two tests are essentially identical in this case.

25
Coefficient of determination (R2)
The coefficient of determination is defined as:
SSE SSR
R2 = 1− =
SST SST
R2 is the fraction of the variance of Y explained by the regression model.

R2 = 1 R2 = 0

Ŷ = Y
Ŷ = ˆ 0 + ˆ 1X

R = corr( X, Y )

26
Matrix approach to linear regression
The regression model can be formulated in terms of matrices:

Y = Xβ + ε
 Y1  1 X1   1 
Y  1 X   
 
where Y =  2  X= 2
β =  0 ε =  2
      1  
     
 Yn  1 X n  n 

and
 2 0  0
 
0 2  0 
E( Y ) = Xβ E(ε ) = 0 and Cov(ε ) = 
  
 2
0 0   

27
Matrix approach to linear regression
The normal equations were derived earlier:
n n
nˆ 0 + ˆ 1  X i =  Yi
i =1 i =1
n n n
ˆ 0  X i + ˆ 1  X =  X i Yi
2
i
i =1 i =1 i =1

In matrix form, the normal equations can be written:

X' Xβˆ = X' Y

or
βˆ = ( X' X ) −1 X' Y

28
Example: Toluca Company
Matlab code:
>> X = [ones(25,1) X];
>> b = inv(X'*X)*X'*Y
b =
62.3659
3.5702

Alternatively, use Matlab’s built-in function REGRESS:

>> b = regress(Y,X)
b =
62.3659
3.5702

REGRESS can also return various statistics and confidence intervals.

29

You might also like