0% found this document useful (0 votes)

45 views31 pages

Chapter 17

This document provides an introduction to simple linear regression. Simple linear regression examines the relationship between a dependent variable (Y) and one independent variable (X) using a linear equation of the form Y = β0 + β1X + ε, where β0 is the Y-intercept, β1 is the slope, and ε is the error term. The coefficients β0 and β1 are estimated using the method of least squares, which finds the line that minimizes the sum of squared differences between the observed data points and the regression line. The quality of fit is assessed based on how well the linear model predicts the dependent variable from the independent variable.

Uploaded by

Abdinor Abukar Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views31 pages

Chapter 17

Uploaded by

Abdinor Abukar Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 31

Simple Linear Regression

Introduction

• In Chapters 17 to 19, we examine the

relationship between interval variables via a
mathematical equation.
• The motivation for using the technique:
– Forecast the value of a dependent variable (Y) from
the value of independent variables (X1, X2,…Xk.).
– Analyze the specific relationships between the
independent variables and the dependent variable.
The Model
The model has a deterministic and a probabilistic components

House
Cost
bo ut
sa
o st
c
se t.
ho u o Size)
n g a e fo + 75(
d i a r
Buil er squ 25000
p =
$75 e cost
Most lots sell s
H ou
for $25,000

House size
However, house cost vary even among same size
houses! Since cost behave unpredictably,
House we add a random component.
Cost

Most lots sell

for $25,000
House cost = 25000 + 75 (Size) 
House size
• The first order linear model

YY  00  11XX 

Y = dependent variable 0 and 1 are unknown population
X = independent variable Y parameters, therefore are estimated
from the data.
0 = Y-intercept

 1 = slope of the line
Rise  = Rise/Run
 = error variable 0 Run
X
Estimating the Coefficients
• The estimates are determined by
– drawing a sample from the population of interest,
– calculating sample statistics.
– producing a straight line that cuts into the data.
Y 
 Question: What should be
 considered a good line?



  

X
The Least Squares (Regression) Line

A good line is one that minimizes

the sum of squared differences between the
points and the line.
Sum of squared differences = (2 - 1)2 + (4 - 2)2 + (1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99

(2,4)
Let us compare two lines
4
 The second line is horizontal
3  (4,3.2)
2.5
2
(1,2) 
 (3,1.5)
1

The smaller the sum of

1 2 3 4 squared differences
the better the fit of the
line to the data.
The Estimated Coefficients

To calculate the estimates of the line The regression equation that estimates
coefficients, that minimize the differences the equation of the first order linear model
between the data points and the line, use is:
the formulas:
cov(X,Y)) ssXYXY
cov(X,Y ˆ
bb11  2  22 
 ˆ
Y  bb00  bb11XX
Y
ssXX
2
 ssXX 
bb00 YY bb11XX



The Simple Linear Regression Line

• Example 17.2 (Xm17-02)

– A car dealer wants to find
the relationship between Car Odometer Price
the odometer reading and 1 37388 14636
the selling price of used cars. 2 44758 14122
3 45833 14016
– A random sample of 100 cars 4 30862 15590
is selected, and the data 5 31705 15568
recorded. 6 34010 14718
. .
Independent .
Dependent
– Find the regression line. . .
variable .
X variable Y
. . .
• Solution
– Solving by hand: Calculate a number of statistics
X  36,009.45; sX2 
 (X i  X )2
 43,528,690
n 1

Y  14,822.823; cov(X,Y ) 
 (X i  X )(Yi  Y )
 2,712,511
n 1
where n = 100.
cov(X,Y) 1,712,511
b1    .06232
 2
sX 43,528,690
b0  Y  b1 X  14,822.82  (.06232)(36,009.45)  17,067

Yˆ  b0  b1 X  17,067  .0623X

• Solution – continued
– Using the computer (Xm17-02)

Tools > Data Analysis > Regression >

[Shade the Y range and the X range] > OK
Xm17-02
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8063
R Square 0.6501
Adjusted R Square 0.6466

Yˆ  17,067  .0623X
Standard Error 303.1
Observations 100

ANOVA
df SS MS F Significance F
Regression 1 16734111 16734111 182.11 0.0000
Residual 98 9005450 91892
Total 99 25739561
 Coefficients Standard Error t Stat P-value
Intercept 17067 169 100.97 0.0000
Odometer -0.0623 0.0046 -13.49 0.0000
Interpreting the Linear Regression -
Equation
17067 Odometer Line Fit Plot

16000

15000
Price

14000

0 No data 13000
Odometer

Yˆ  17,067  .0623X

The intercept is b0 = $17067. This is the slope of the line.

intercept as the For each additional mile on the odometer,
Do not interpret the the price decreases by an average of $0.0623
“Price of cars that have not been driven”
Error Variable: Required Conditions

• The error is a critical part of the regression model.

• Four requirements involving the distribution of  must
be satisfied.
– The probability distribution of  is normal.
– The mean of  is zero: E() = 0.
– The standard deviation of  is  for all values of X.
– The set of errors associated with different values of Y are
all independent.
The Normality of 
E(Y|X3)
The standard deviation remains constant,
 + X 
0 1 3
E(Y|X2)
0 + 1 X2 

but the mean value changes with X E(Y|X1)

0 + 1X1 

X1 X2 X3
From the
From the first
first three
three assumptions
assumptions we we
have: YY isis normally
have: normally distributed
distributed with
with
mean E(Y) == 00 ++ 11X,
mean E(Y) X, and
and aa constant
constant
deviation 
standard deviation
standard
Assessing the Model
• The least squares method will produces a
regression line whether or not there are linear
relationship between X and Y.
• Consequently, it is important to assess how well
the linear model fits the data.
• Several methods are used to assess the model.
All are based on the sum of squares for errors,
SSE.
Sum of Squares for Errors
– This is the sum of differences between the points
and the regression line.
– It can serve as a measure of how well the line fits the
data. SSE is defined by
nn
SSE (Yi i Yi )i .
SSE  
 (Y  ˆ
Yˆ 22 .
)
i1
i1

– A shortcut formula
2 cov(X,Y)
2
2
 SSE(n
 SSE (n1)s
1)sY 
2 cov(X,Y)
Y s22
sXX
Standard Error of Estimate
– The mean error is equal to zero.
– If  is small the errors tend to be close to zero
(close to the mean error). Then, the model fits the
data well.
– Therefore, we can, use  as a measure of the
suitability of using a linear model.
– An estimator of  is given by s
SStan
tandard
dard Error
Error of
of Estimate
Estimate
SSE
SSE
ss 
nn22
• Example 17.3
– Calculate the standard error of estimate for Example 17.2,
and describe what does it tell you about the model fit?
• Solution

sY2 
 i i
(Y  Yˆ ) 2

 259,996
Calculated before
n 1
[cov( X , Y )] 2
( 2, 712,511) 2
SSE  (n  1) sY2  2
 99(259,996)   9,005,450
sX 43,528,690
SSE 9,005,450 It is hard to assess the model based
s    303.13
n2 98 on s even when compared with the
mean value of Y.
s   303.1 y  14,823
Testing the Slope
– When no linear relationship exists between two
variables, the regression line should be horizontal.



 
  

 
         
  
 
     
      
    
           
    
  
       
    
          

Linear relationship. No linear relationship.

Different inputs (X) yield Different inputs (X) yield
different outputs (Y). the same output (Y).
The slope is not equal to zero The slope is equal to zero
• We can draw inference about 1 from b1 by testing
H0 :  1 = 0
H1: 1  0 (or < 0,or > 0)
– The test statistic is
b1111
b s
s
tt  where ssbb11 
ssbb11 1)sXX
(n1)s
(n 22

The standard error of b1.

– If the error variable is normally distributed, the statistic

 has Student t distribution
 
 with d.f. = n-2.
• Example 17.4
– Test to determine whether there is enough evidence
to infer that there is a linear relationship between the
car auction price and the odometer reading for all
three-year-old Tauruses, in Example 17.2.
Use  = 5%.
• Solving by hand
– To compute “t” we need the values of b1 and sb1.

b1  .0623
s 303.1
sb1    .00462
(n 1)sX
2
(99)(43,528,690)
b1  1 .0623  0
t   13.49
sb1 .00462

– The rejection region is t > t.025 or t < -t.025 with  = n-2 = 98.
 Approximately, t.025 = 1.984
Xm17-02
• Using the computer
Price Odometer SUMMARY OUTPUT
14636 37388
14122 44758 Regression Statistics
14016 45833 Multiple R 0.8063
15590 30862 R Square 0.6501 There is overwhelming evidence to infer
15568 31705 Adjusted R Square 0.6466
14718 34010 Standard Error 303.1 that the odometer reading affects the
14470 45854 Observations 100 auction selling price.
15690 19057
15072 40149 ANOVA
14802 40237 df SS MS F Significance F
15190 32359 Regression 1 16734111 16734111 182.11 0.0000
14660 43533 Residual 98 9005450 91892
15612 32744 Total 99 25739561
15610 34470
14634 37720 Coefficients Standard Error t Stat P-value
14632 41350 Intercept 17067 169 100.97 0.0000
15740 24469 Odometer -0.0623 0.0046 -13.49 0.0000
Coefficient of Determination
– To measure the strength of the linear relationship we
use the coefficient of determination:

cov(X,Y) 22
R 
22
R 
cov(X,Y)
2 2 
 or,
or, 
 rr22
XY 
;;
2 2
ssXXssYY XY

SSE
SSE
or, RR 1
or, 22 1 (seep.p.18
(see 18above)
above)

 (Yi i Y )
(Y  Y )22


• To understand the significance of this coefficient note:

par t by The regression model

lained in
Ex p
Overall variability in Y Rema
ins, in
part,
unexp
lained
The error
y2
Two data points (X1,Y1) and (X2,Y2)
of a certain sample are shown.

y1 Variation in Y = SSR + SSE

x1 x2
Variation explained by the
Total variation in Y = + Unexplained variation (error)
regression line

(Y1  Y ) 2  (Y2  Y ) 2  (Yˆ1  Y ) 2  (Yˆ2  Y ) 2 (Y1  Yˆ1) 2  (Y2  Yˆ2 ) 2

  
• R2 measures the proportion of the variation in Y
that is explained by the variation in X.

R 1
2 SSE

 i SSE
(Y Y ) 2


SSR
 (Yi Y ) 2
 (Y Y )
i
2
 (Yi Y ) 2

• R2 takes on any value between zero and one.

R2 = 1: Perfect match between the line and the data points.
R2 = 0: There are no linear relationship between X and Y.
• Example 17.5
– Find the coefficient of determination for Example 17.2;
what does this statistic tell you about the model?
• Solution
– Solving by hand;

2
[cov(X,Y)] [2,712,511]2
R 
2
2 2
 (43,528,688)(259,996)  .6501
sX sY


– Using the computer
From the regression output we have
SUMMARY OUTPUT

Regression Statistics
65% of the variation in the auction
Multiple R 0.8063 selling price is explained by the
R Square 0.6501
Adjusted R Square 0.6466 variation in odometer reading. The
Standard Error
Observations
303.1
100
rest (35%) remains unexplained by
this model.
ANOVA
df SS MS F Significance F
Regression 1 16734111 16734111 182.11 0.0000
Residual 98 9005450 91892
Total 99 25739561

CoefficientsStandard Error t Stat P-value

Intercept 17067 169 100.97 0.0000
Odometer -0.0623 0.0046 -13.49 0.0000

Quantitative Research Designs
100% (15)
Quantitative Research Designs
16 pages
Land Use & Zoning: Line & Grade
No ratings yet
Land Use & Zoning: Line & Grade
19 pages
Offshore Hvac Design
100% (1)
Offshore Hvac Design
6 pages
LP-III Lab Manual
No ratings yet
LP-III Lab Manual
49 pages
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
67% (3)
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
35 pages
ENCOR - Chapter - 1 - Packet Forwarding
No ratings yet
ENCOR - Chapter - 1 - Packet Forwarding
57 pages
CBSE Class 11 Mathematics Worksheet - Set Theory (1) Export PDF
100% (1)
CBSE Class 11 Mathematics Worksheet - Set Theory (1) Export PDF
14 pages
1 Plant Nutrition
No ratings yet
1 Plant Nutrition
35 pages
Geology of Kohistan
100% (1)
Geology of Kohistan
39 pages
Intro To Project Management
No ratings yet
Intro To Project Management
66 pages
292322356
No ratings yet
292322356
69 pages
All Test Cases PDF
0% (1)
All Test Cases PDF
7 pages
Lectures 7 8-Simple Regression Analysis - Assumptions and Estimations (OLS)
No ratings yet
Lectures 7 8-Simple Regression Analysis - Assumptions and Estimations (OLS)
21 pages
Aidco 450E BR
No ratings yet
Aidco 450E BR
4 pages
Simple Regression
No ratings yet
Simple Regression
35 pages
Simple Regression
100% (1)
Simple Regression
50 pages
8-1 To 8-3 Simple - Lin - Regress - Inference
No ratings yet
8-1 To 8-3 Simple - Lin - Regress - Inference
49 pages
Lecture 11
No ratings yet
Lecture 11
62 pages
Chapter 10 E-COMMERCE DIGITAL MARKETS, DIGITAL GOODS
No ratings yet
Chapter 10 E-COMMERCE DIGITAL MARKETS, DIGITAL GOODS
40 pages
Interpretation of Geophysical Logs Coal.
No ratings yet
Interpretation of Geophysical Logs Coal.
16 pages
15.simple Linear Regression-530
No ratings yet
15.simple Linear Regression-530
54 pages
Slide Chap11
No ratings yet
Slide Chap11
19 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
51 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Simple Linear Regression1
No ratings yet
Simple Linear Regression1
51 pages
2024-Lecture 11
No ratings yet
2024-Lecture 11
37 pages
Regression
No ratings yet
Regression
56 pages
Simple Linear Regression Sample
No ratings yet
Simple Linear Regression Sample
55 pages
F Regression
No ratings yet
F Regression
65 pages
Week-4 BA Linear Regression
No ratings yet
Week-4 BA Linear Regression
16 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Q4G8W2
No ratings yet
Q4G8W2
7 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
Line Algorithm
No ratings yet
Line Algorithm
62 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
68 pages
Properties of Water Reading - (1 - )
No ratings yet
Properties of Water Reading - (1 - )
4 pages
03 - Simple Linear Regression
No ratings yet
03 - Simple Linear Regression
13 pages
Simple Lin Regress Inference
No ratings yet
Simple Lin Regress Inference
51 pages
Project Report
No ratings yet
Project Report
29 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
A Review On Cellular Manufacturing Syste
No ratings yet
A Review On Cellular Manufacturing Syste
5 pages
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
No ratings yet
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
51 pages
Regression
No ratings yet
Regression
19 pages
Linear Regression
No ratings yet
Linear Regression
64 pages
Linear Regression
No ratings yet
Linear Regression
21 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Chap 10 Regression Analysis
No ratings yet
Chap 10 Regression Analysis
68 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
SimpleLineaReg Example
No ratings yet
SimpleLineaReg Example
3 pages
SimpleLineaReg Example
No ratings yet
SimpleLineaReg Example
12 pages
Section 2
No ratings yet
Section 2
22 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
Regression Basics: Predicting A DV With A Single IV
No ratings yet
Regression Basics: Predicting A DV With A Single IV
20 pages
Forging Presentation
No ratings yet
Forging Presentation
17 pages
Prajwal Deshmukh - Batch A
No ratings yet
Prajwal Deshmukh - Batch A
38 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Geosynthetic Lining System For Modern Waste Facilities - Experiences in Developing Asia
No ratings yet
Geosynthetic Lining System For Modern Waste Facilities - Experiences in Developing Asia
8 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
FM Transmitter
No ratings yet
FM Transmitter
12 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
64 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Simple Linear Regression
100% (1)
Simple Linear Regression
50 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
50 pages
CH 12
No ratings yet
CH 12
57 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
Cement Evaluation Challenges
No ratings yet
Cement Evaluation Challenges
18 pages
GLB Earn Proration Anytime
No ratings yet
GLB Earn Proration Anytime
11 pages
ML Assignment No. 1: 1.1 Title
No ratings yet
ML Assignment No. 1: 1.1 Title
8 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Estimation of Causal Relationships I: Illustration 1
No ratings yet
Estimation of Causal Relationships I: Illustration 1
8 pages
Regression
No ratings yet
Regression
66 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
1 - Assignment - PH 401 (EE) - MODULE - 6 (Statistical Mechanics)
No ratings yet
1 - Assignment - PH 401 (EE) - MODULE - 6 (Statistical Mechanics)
2 pages
Economic Order Quantity: Information
No ratings yet
Economic Order Quantity: Information
11 pages
Digital Marketing Adoption and Success For Small Businesses: The Application of The Do-It-Yourself and Technology Acceptance Models
No ratings yet
Digital Marketing Adoption and Success For Small Businesses: The Application of The Do-It-Yourself and Technology Acceptance Models
26 pages
Academic Writing Skills - Process Essay (Autosaved)
No ratings yet
Academic Writing Skills - Process Essay (Autosaved)
16 pages
Chapter 12 Information Rights
No ratings yet
Chapter 12 Information Rights
24 pages
Chapter 9 Competive Advantage and MIS
No ratings yet
Chapter 9 Competive Advantage and MIS
24 pages
Academic Writing Skills - Essay
No ratings yet
Academic Writing Skills - Essay
13 pages
Chapter 11 Enterprise Resource Planning (ERP) Systems
No ratings yet
Chapter 11 Enterprise Resource Planning (ERP) Systems
19 pages
Linear Regression II
No ratings yet
Linear Regression II
54 pages
Book Review of Lewis Vaughn's "The Power of Critical Thinking"
No ratings yet
Book Review of Lewis Vaughn's "The Power of Critical Thinking"
6 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Test2 QP VE Resit2
No ratings yet
Test2 QP VE Resit2
3 pages
Water Flow
No ratings yet
Water Flow
4 pages
FinQuiz - Curriculum Note, Study Session 2, Reading 4
No ratings yet
FinQuiz - Curriculum Note, Study Session 2, Reading 4
5 pages
(Reg. Relationship Steps
No ratings yet
(Reg. Relationship Steps
4 pages
Phys BP PB 2
No ratings yet
Phys BP PB 2
1 page
Program // Mouseeventsview - CPP: Implementation of The Cmouseeventsview Class
No ratings yet
Program // Mouseeventsview - CPP: Implementation of The Cmouseeventsview Class
6 pages
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet

Chapter 17

Uploaded by

Chapter 17

Uploaded by

Simple Linear Regression

• In Chapters 17 to 19, we examine the

Most lots sell

YY  00  11XX 

A good line is one that minimizes

The smaller the sum of

• Example 17.2 (Xm17-02)

Tools > Data Analysis > Regression >

The intercept is b0 = $17067. This is the slope of the line.

• The error is a critical part of the regression model.

but the mean value changes with X E(Y|X1)

Linear relationship. No linear relationship.

The standard error of b1.

– If the error variable is normally distributed, the statistic

par t by The regression model

y1 Variation in Y = SSR + SSE

(Y1  Y ) 2  (Y2  Y ) 2  (Yˆ1  Y ) 2  (Yˆ2  Y ) 2 (Y1  Yˆ1) 2  (Y2  Yˆ2 ) 2

• R2 takes on any value between zero and one.

CoefficientsStandard Error t Stat P-value

You might also like