0% found this document useful (0 votes)

9 views29 pages

Linear Regresssion

Uploaded by

Muhammed Mikhdad K G 21177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views29 pages

Linear Regresssion

Uploaded by

Muhammed Mikhdad K G 21177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Example: Toluca Company

Toluca manufactures replacement parts for refrigeration equipment.

Parts are manufactured periodically in varying lot sizes. The production
of a particular part involves setting up the production process and
machine and assembly operations. Based on 25 recent production
runs, the company wants to determine a relationship between lots size
and work hours.

1
600
y = 3.5702x + 62.366
R2 = 0.8215
500

400

300

200

100

0
0 20 40 60 80 100 120 140

2
Simple Linear Regression Model
Basic model:
Yi = 0 + 1Xi + i

where
Yi is the response variable (or dependent variable).
Xi is the predictor variable (or independent or explanatory variable).
i is a random error term with E[i]=0 and Var[i]=2.
i and j are uncorrelated for ij.
0 and 1 are parameters (intercept and slope).

NB: Linear regression model means linear in the parameters.

For example, Yi = 0 + 1Xi2 + i is a linear regression model.

3
Simple Linear Regression Model
Fixed versus random X
Some results for regression analysis assume that X is fixed (controlled).
This requirement can often be relaxed.

4
Simple Linear Regression Model (with fixed X)
Regression function:
E( Yi | x i ) = E(0 + 1x i + i ) = 0 + 1x i + E(i )
= 0 + 1x i

Variance of Yi :

Var ( Yi | x i ) = Var (i ) = 2

Correlation of Yi’s:
Corr( Yi , Yj ) = 0, for i  j

5
Simple Linear Regression Model (with fixed X)

Y=0+1X
Y

i
(Xi,Yi)

6
Estimation: Least Squares (LS) method
Principle: Minimize the sum of squared error of regression model !
n n
Q(0 ,1 ) =   =  ( Yi − 0 − 1Xi )2
2
i
i=1 i=1

Derivation:
Q n n n
=  − 2( Yi − 0 − 1Xi ) = 0
0 i=1
Y
i =1
i = nˆ 0 + ˆ 1  X i
i =1

Q n n n n
=  − 2 Xi ( Yi − 0 − 1Xi ) = 0
1 i=1
 i i 0  i 1 i
X Y
i =1
= ˆ
 X + ˆ
 X 2

i =1 i =1

Normal equations

ˆ 1 =
 ( X − X)( Y − Y )
i i

 ( X − X)i
2

ˆ 0 = Y − ˆ 1X

7
Example: Toluca Company
Matlab code:
>> Ex = mean(X);
>> Ey = mean(Y);
>> b1 = sum( (X-Ex).*(Y-Ey) ) / sum( (X-Ex).^2 )
b1 =
3.5702
>> b0 = Ey - b1*Ex
b0 =
62.3659
>> plot(X,Y,'o'), hold on
>> plot( [20,120], [b0+b1*20, b0+b1*120])

600

500

400

300

200

100
20 40 60 80 100 120 8
Point estimation of mean response
For a given X, the mean response is estimated as

Ŷ = ˆ 0 + ˆ 1X

The residual of the i’th observation is

i = Yi − Ŷi

9
Properties of the LS regression function
n
1. The sum of residuals is zero:  i = 0
i=1

2. The sum of squared residuals  2

i is miminum.

3. The regression line always goes through ( X, Y ).

10
Estimation of the error variance 2
The error variance is estimated as:

1 n
S =
2
  i i
n − 2 i=1
( Y −Ŷ ) 2

Why do we divide by (n-2)?

Two degrees of freedom are lost due to estimation of 0 and 1.

11
Test of the Simple Linear Regression model

Is there a significant relationship between Y and X?

or equivalently

Can we assume that 1 = 0 ?

To test the hypothesis 1 = 0 versus the alternative 1  0, we must

know the sampling distribution of ̂1 .

12
^
Sampling distribution of 1
1 is estimated as

ˆ 1 =
 ( X − X)(Y − Y ) =  k Y
i i
where k i =
( Xi − X )
 ( X − X)
i
2 i i
 i
( X − X ) 2

If Yi is normal distributed, then ̂1 is also normal distributed.

It can be shown that:

E(ˆ 1 ) = 1 (unbiased)
 2
Var (ˆ 1 ) =
 ( X i − X )2

13
^
Sampling distribution of 1
ˆ 1 − 1
~ N(0,1) − distribution
Var (ˆ 1 )

What if 2 (and Var (ˆ 1 ) ) is estimated?

1 n S2
ˆ = S =  ( Yi −Ŷi )2 S =
2 2 2

n − 2 i=1  ( Xi − X )2
ˆ 1

ˆ 1 − 1
~ t(n − 2) − distribution
Sˆ 1

14
t-test of the hypothesis 1=0
Null hypothesis: 1=0
Alternative: 10

Under the null hypothesis:

ˆ 1
has a t(n − 2) − distribution
Sˆ 1

The null hypothesis is rejected at significance level  if

ˆ 1
 t1− 2
Sˆ 1

/2 /2

t1-/2
15
Example: Toluca Company
Matlab code:
>> b0 = 62.3659;
>> b1 =3.5702;
>> n = 25;
>> e = Y - (b0+b1*X);
>> Se = sqrt( sum(e.^2/(n-2) ));
>> Sb1 = Se / sqrt(sum((X-mean(X)).^2));
>> p=2*tcdf(-abs(b1/Sb1),n-2))
p =
4.4488e-010

16
^
Sampling distribution of 0
̂ 0 is normal distributed with

E(ˆ 0 ) =  0 (unbiased)
 1 X 2 
ˆ
Var ( 0 ) =   +
2
2
 n  i
( X − X ) 

Therefore

ˆ 0 −  0
~ t(n − 2) − distribution
S ˆ 0

1 X2 
where S 2
=S  +
2
2
 i
ˆ 0 
 n ( X − X ) 

17
t-test of the hypothesis 0=0
Null hypothesis: 0=0 (regression line goes through the origin)
Alternative: 00

Under the null hypothesis:

ˆ 0
has a t(n − 2) − distribution
Sˆ 0

The null hypothesis is rejected at significance level  if

ˆ 0
 t 1−  2
Sˆ 0

/2 /2

t1-/2
18
Example: Toluca Company
Matlab code
>> b0 = 62.3659;
>> b1 =3.5702;
>> n = 25;
>> e = Y - (b0+b1*X);
>> Se = sqrt( sum(e.^2/(n-2) ));
>> Sb0 = sqrt( Se^2*(1/n + ...
mean(X)^2/(sum((X-mean(X)).^2)) ) );
>> p=2*tcdf(-abs(b0/Sb0),n-2))
p =
0.0267

19
Analysis of Variance (ANOVA) of regression model

Y=0+1X
Y

Yi − Ŷi
Yi − Y
Ŷi − Y
Y

20
Variance breakdown
Yi − Y = ( Yi − Ŷi ) + ( Ŷi − Y )

It can be shown that

 i
( Y − Y ) 2
=  i i  i
( Y − Ŷ ) 2
+ ( Ŷ − Y ) 2

SST = SSE + SSR

where
– Total sum of squares: SST =  ( Yi − Y )2

– Error sum of squares: SSE =  ( Yi − Ŷi )2

– Regression sum of squares: SSR =  ( Ŷi − Y )2

21
Breakdown of degrees of freedom
n
Total sum of squares: SST =  ( Yi − Y )2 df = n-1
i =1

There are n different Y’s, but only (n-1) degrees of freedom since the
mean value is estimated.

n
Error sum of squares: SSE =  ( Yi − Ŷi )2 df = n-2
i =1

There are n different Y’s, but 2 degrees of freedom are used to

estimate the model Ŷ = ˆ 0 + ˆ 1X

n
Regression sum of squares: SSR =  ( Ŷi − Y )2 df = 1
i =1

Ŷi has 2 degrees of freedom, but one is lost to estimation of the mean
value
22
ANOVA Table

Source of Sum of df Mean squares

Variation squares
n n
Regression  i
( Ŷ − Y ) 2
1  i
( Ŷ
i =1
− Y ) 2

i =1
n

Error  ( Y − Ŷ )
i =1
i i
2
n-2
1 n

n − 2 i=1
( Yi − Ŷi ) 2

n
1 n
Total  (Y − Y)
i =1
i
2
n-1 
n − 1 i=1
( Yi − Y ) 2

23
F-distribution
F(1,18)-distribution
4

3.5

2.5

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3

24
F-test of 1 = 0
The null hypothesis is rejected for large F:

SSR / 1
 F1−  (1,n − 2)  reject H0 F-test is one-sided !
SSE /(n − 2)

F1−  (1, n − 2)
Matlab code:
>> SSE = sum( (Y-(b0+b1*X)).^2 );
>> SSR = sum( (b0+b1*X-mean(Y)).^2 );
>> p=1-fcdf(SSR/(SSE/23),1,23)
p =
4.4489e-010

P-value of F test is identical to P-value for t-test

The two tests are essentially identical in this case.

25
Coefficient of determination (R2)
The coefficient of determination is defined as:
SSE SSR
R2 = 1− =
SST SST
R2 is the fraction of the variance of Y explained by the regression model.

R2 = 1 R2 = 0

Ŷ = Y
Ŷ = ˆ 0 + ˆ 1X

R = corr( X, Y )

26
Matrix approach to linear regression
The regression model can be formulated in terms of matrices:

Y = Xβ + ε
 Y1  1 X1   1 
Y  1 X   
 
where Y =  2  X= 2
β =  0 ε =  2
      1  
     
 Yn  1 X n  n 

and
 2 0  0
 
0 2  0 
E( Y ) = Xβ E(ε ) = 0 and Cov(ε ) = 
  
 2
0 0   

27
Matrix approach to linear regression
The normal equations were derived earlier:
n n
nˆ 0 + ˆ 1  X i =  Yi
i =1 i =1
n n n
ˆ 0  X i + ˆ 1  X =  X i Yi
2
i
i =1 i =1 i =1

In matrix form, the normal equations can be written:

X' Xβˆ = X' Y

or
βˆ = ( X' X ) −1 X' Y

28
Example: Toluca Company
Matlab code:
>> X = [ones(25,1) X];
>> b = inv(X'*X)*X'*Y
b =
62.3659
3.5702

Alternatively, use Matlab’s built-in function REGRESS:

>> b = regress(Y,X)
b =
62.3659
3.5702

REGRESS can also return various statistics and confidence intervals.

AP Statistics 핵심정리
100% (1)
AP Statistics 핵심정리
20 pages
Stanley Lawie 2008 Large Sample THMethod
No ratings yet
Stanley Lawie 2008 Large Sample THMethod
11 pages
Stress Coping and Satisfaction in Nursing Students
No ratings yet
Stress Coping and Satisfaction in Nursing Students
12 pages
Concrete Durability Index Testing Manual
100% (1)
Concrete Durability Index Testing Manual
26 pages
Method Validation of Analytical Procedures
100% (1)
Method Validation of Analytical Procedures
14 pages
Effective Inventory Management Practice and Firms Performance: Evidence From Nigerian Consumable Goods Firms
No ratings yet
Effective Inventory Management Practice and Firms Performance: Evidence From Nigerian Consumable Goods Firms
12 pages
Final Yr Finance Project
0% (1)
Final Yr Finance Project
86 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Unit - 5
No ratings yet
Unit - 5
111 pages
Linear Models
No ratings yet
Linear Models
92 pages
Lecture 2. Relaxing The Assumptions of CLRM - 0
No ratings yet
Lecture 2. Relaxing The Assumptions of CLRM - 0
17 pages
Syllabus: Data Warehousing and Data Mining
No ratings yet
Syllabus: Data Warehousing and Data Mining
18 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
CH 2
No ratings yet
CH 2
31 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Regression Analysis and Multiple Regression: Session 7
No ratings yet
Regression Analysis and Multiple Regression: Session 7
100 pages
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
No ratings yet
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
35 pages
Mtech-Syllabus-Data Science - Sem1
No ratings yet
Mtech-Syllabus-Data Science - Sem1
25 pages
Maths Skills
100% (1)
Maths Skills
19 pages
Unit 7
No ratings yet
Unit 7
22 pages
Regression: Dr. Agustinus Suryantoro, M.S
No ratings yet
Regression: Dr. Agustinus Suryantoro, M.S
31 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Intronumericalrecipes v01 Chapter02 Regress
No ratings yet
Intronumericalrecipes v01 Chapter02 Regress
15 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
No ratings yet
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
6 pages
Example Class One
No ratings yet
Example Class One
4 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
No ratings yet
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
2 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Food Wasting Behaviours Questionnaire. A Test of A New Method and A Natural Experiment During The COVID-19 Pandemic
No ratings yet
Food Wasting Behaviours Questionnaire. A Test of A New Method and A Natural Experiment During The COVID-19 Pandemic
37 pages
CUHK STAT5102 Ch7
No ratings yet
CUHK STAT5102 Ch7
33 pages
Simple Regression 1
No ratings yet
Simple Regression 1
18 pages
Jurnal Pemasaran PDF
No ratings yet
Jurnal Pemasaran PDF
7 pages
Longitudinal Soft-Tissue Pro$le Changes: A Study of Three Analyses
No ratings yet
Longitudinal Soft-Tissue Pro$le Changes: A Study of Three Analyses
15 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
6th Lecture Note 108335647 230518 203102
No ratings yet
6th Lecture Note 108335647 230518 203102
35 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Unit 14
No ratings yet
Unit 14
26 pages
12 W12NSE6220 - Fall 2023 - Zeng
No ratings yet
12 W12NSE6220 - Fall 2023 - Zeng
44 pages
LinearStatisticalModels and Regression Analysis
No ratings yet
LinearStatisticalModels and Regression Analysis
27 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
Unit 12
No ratings yet
Unit 12
25 pages
Unit4 Multivariate Analysis
No ratings yet
Unit4 Multivariate Analysis
20 pages
Linear Regression Full Version
No ratings yet
Linear Regression Full Version
34 pages
Is The Dependent Variable Related To The Independent Variable?
No ratings yet
Is The Dependent Variable Related To The Independent Variable?
10 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
Unit 15
No ratings yet
Unit 15
19 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Unit 3
No ratings yet
Unit 3
27 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
NDP Program
No ratings yet
NDP Program
10 pages
Unit 19
No ratings yet
Unit 19
10 pages
Performance Scrutiny of Price Prediction On Blockc
No ratings yet
Performance Scrutiny of Price Prediction On Blockc
7 pages
Regression
No ratings yet
Regression
19 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Unit 13
No ratings yet
Unit 13
21 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
5) Mba Assignment 4
No ratings yet
5) Mba Assignment 4
2 pages
Experimental Psychology Final Exam Reviewer
No ratings yet
Experimental Psychology Final Exam Reviewer
13 pages
Unit 18
No ratings yet
Unit 18
22 pages
MPRA - paper - 57298 - Формулы !!!!!!!!!!!!!!!!!!!!!
No ratings yet
MPRA - paper - 57298 - Формулы !!!!!!!!!!!!!!!!!!!!!
22 pages
Unit 10
No ratings yet
Unit 10
16 pages
Unit 20
No ratings yet
Unit 20
12 pages
Design Analysis and Performance Prediction of Packed Bed Latent Heat Storage System Employing Machine Learning Models
No ratings yet
Design Analysis and Performance Prediction of Packed Bed Latent Heat Storage System Employing Machine Learning Models
17 pages
Notes 1017 Part1
No ratings yet
Notes 1017 Part1
50 pages
Pradytha Galuh Putranti - 2304220013 - SSD - B ING-STAT
No ratings yet
Pradytha Galuh Putranti - 2304220013 - SSD - B ING-STAT
26 pages
The Hound of The Baskervilles
No ratings yet
The Hound of The Baskervilles
14 pages
Chapter 9 Simple Linear Regression and Correlation
No ratings yet
Chapter 9 Simple Linear Regression and Correlation
56 pages
Problem I
No ratings yet
Problem I
13 pages
R22cse (Iot)
No ratings yet
R22cse (Iot)
52 pages
Chapter 2 - 1907876925
No ratings yet
Chapter 2 - 1907876925
33 pages
Notes 2
No ratings yet
Notes 2
16 pages
CH 11
No ratings yet
CH 11
55 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Probability and Statistics 2022-2023 (Se1) - Fie
No ratings yet
Probability and Statistics 2022-2023 (Se1) - Fie
2 pages
Stat 324: Lecture 19 Linear Regression: Example 1. 10 Students Took Two Midterm Exams
No ratings yet
Stat 324: Lecture 19 Linear Regression: Example 1. 10 Students Took Two Midterm Exams
2 pages
ML Assignment
No ratings yet
ML Assignment
3 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
Econometric Lec3
No ratings yet
Econometric Lec3
76 pages
Forecasting The Next 3 Years Global Unit Volume of The Coca-Cola Company in Billions From 2015-2024
No ratings yet
Forecasting The Next 3 Years Global Unit Volume of The Coca-Cola Company in Billions From 2015-2024
13 pages
Week 3-4
No ratings yet
Week 3-4
75 pages
Applied General Statistics (HIS 223)
No ratings yet
Applied General Statistics (HIS 223)
35 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Week 2
No ratings yet
Week 2
54 pages
Tutorial2 SLR
No ratings yet
Tutorial2 SLR
10 pages
Matrix Algebra
No ratings yet
Matrix Algebra
9 pages
6 1 Extended Questions QSTNST - 7ugS5zSpHYHTn0vf
No ratings yet
6 1 Extended Questions QSTNST - 7ugS5zSpHYHTn0vf
28 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Machine Learning Basics Dl2 RK
No ratings yet
Machine Learning Basics Dl2 RK
16 pages