0% found this document useful (0 votes)
35 views43 pages

Topic 9-Simple Linear Regression

This document discusses simple linear regression analysis. It introduces scatter plots, correlation, and regression. Key topics covered include the linear regression model, finding the least squares regression equation, and interpreting the intercept and slope. An example is provided to demonstrate determining the linear regression equation that can be used to estimate textbook selling prices based on number of pages from sample data.

Uploaded by

Derrick Vincent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views43 pages

Topic 9-Simple Linear Regression

This document discusses simple linear regression analysis. It introduces scatter plots, correlation, and regression. Key topics covered include the linear regression model, finding the least squares regression equation, and interpreting the intercept and slope. An example is provided to demonstrate determining the linear regression equation that can be used to estimate textbook selling prices based on number of pages from sample data.

Uploaded by

Derrick Vincent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

BFS1024

Statistics for Finance

Topic 9: SIMPLE LINEAR


REGRESSION

Chap 9-1
Chapter Topics

 Introduction of scatter plots, correlation analysis &


regression analysis.

 Simple Linear Regression


 Find least square regression (linear regression equation)

 Coefficient of Correlation & Coefficient of Determination

 Estimation

Chap 9-2
Correlation vs. Regression
 A scatter plot (or scatter diagram) can be used to
show the relationship between two numerical
variables

 Correlation analysis is used to measure strength of the


association (linear relationship) between two
variables
 Correlation is only concerned with strength of the
relationship

Chap 9-3
 Regression analysis is used to:
 Predict the value of a dependent variable based on
the value of independent variables
 Explain the impact of changes in an independent
variable on the dependent variable

Dependent variable (Y)


the variable you wish to explain

Independent variable (X)


the variable used to explain the dependent variable

Chap 9-4
10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Perfect Negative Correlation


Chap 9-5
10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Perfect Positive Correlation


Chap 9-6
10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Zero Correlation
Chap 9-7
10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Strong Positive Correlation


Chap 9-8
Example: Data
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Example : Scatter Plot
Interpretation:
450  There appears to be a positive
400 linear relationship between
House Price ($1000s)

350 house price and size. This


300 means, the bigger the house,
250 relatively, the price will be
200 higher.
150  There is no visible pattern of
100 an non-linear relationship,
50 hence, a linear model is
0 suitable to reflect the
0 1000 2000 3000 relationship between house
Square Feet price and its size.
 The relationship also appears
to be strong as the data reveal
a clear and predictable pattern.
SIMPLE LINEAR REGRESSION

Chap 9-11
Simple Linear Regression Model

 Only one independent variable, X


 Relationship between X and Y is described
by a linear function
 Changes in Y are related to changes in X

Chap 9-12
Types of Relationships
Linear relationships Curvilinear relationships
Y Y

X X

Y Y

X X
Chap 9-13
The Linear Regression Model

Population Random
Population Independent Error
Slope
Dependent Y intercept Variable term
Coefficient
Variable

Yi  β0  β1Xi  ε i
Linear component Random Error
component

Chap 9-14
Linear Regression Equation
The simple linear regression equation provides an
estimate of the population regression line

Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept

Value of X for

Ŷi  b0  b1Xi
observation i

Chap 9-15
Interpretation of the
Intercept and the Slope
 b0 (called the intercept) is the estimated mean
value of Y when the value of X is zero

The ______ is ________ when _______ is zero.


(y) (b0) (x)

Chap 9-16
 b1 (called the slope) is the estimated change in the
mean value of Y for every one-unit change in X

For every additional ___________ of ________, the


(1 unit) (x)
___________ will _________________ for _________.
(y) (increase/decrease) (b1)

Note:
The unit of measurement for b0 and b1 always follow
the y.
Chap 9-17
Finding the Least Squares Equation

  X Y 
 XY   n 
b1   
   X 2  
 X 2    
  n 
  

b0  Y  b1 X
These values can be obtained from the calculator.
Chap 9-18
Example 1
Nicholas from MMU is concerned about the cost to
students of textbooks. He believes there is a
relationship between the number of pages in the text
and the selling price of the book. To provide insight
into the problem he selects a sample of 8 textbooks
currently on sale in the bookstore.

Find the least square regression between the number of


pages for textbooks and their selling prices.

Chap 9-19
Book Page Price(RM)
Introduction to Statistics 500 84
Basic Algebra 700 75
Introduction to Psychology 800 99
Introduction to Sociology 600 72
Business Management 400 69
Introduction to Biology 500 81
Fundamentals of Finance 600 63
Principles of Marketing 800 93
Chap 9-20
Example 1

Develop a Regression Equation for the information


given in Example 1 that can be used to estimate the
selling price (Y) based on the number of pages (X).

n8  XY  397,200
 X  4900  X 2  3,150,000
 Y  636 2
 Y  51,606

Chap 9-21
  X Y 
 XY   
b1   n 
   X 2

 X   
2

  n 
 4900(636) 
397200   
  8
 0 .05143
  4900 2 
3150000   8  
  

b0  Y  1 X  79.5  0.05143(612.5)  48

Chap 9-22
Thus, the regression equation is:
Y’ = 48 + 0.05143 X

The slope of the line is 0.05143.


 For every additional one unit of textbook page, the
selling price will increase for 0.05143 cents
It means that each addition page costs about 5 cents.

The intercept is 48 (when x = 0 ).


 The selling price is RM48 when the no of pages is zero.
It means that a book with no pages would cost RM48.
It has no practical application/ meaning less (no book
without pages)
Chap 9-23
Find Least Square Regression
using EXCEL
Data
--------
Data
Analysis
--------
Regression
Linear Regression Example
Excel Summary Output
Regression Statistics The regression equation is:
house price  98.24833 0.10977(square feet)
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual/Error 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Coefficient of Correlation &
Coefficient of Determination

Chap 9-26
Correlation Coefficient (r)
 To measure strength of the linear relationship
between 2 variables
 Also called the Pearson’s Product Moment
Correlation Coefficient
 It can range from – 1.00 to 1.00

r=1 indicates a perfect positive relationship


r=0 no relationship/independent
r = -1 indicates a perfect negative relationship
r values close to 0 indicate weak correlation

Chap 9-27
Formula to Calculate the
Coefficient of Correlation

  X Y 
 XY   
r  n 
   X 2   2   Y 2 
 X     Y   
2
n   n 
     

Can use calculator to obtain this value.


Chap 9-28
Coefficient of Determination 2
(r )
 Is the proportion of the total variation in the
dependent variable (Y) that is explained by the
variation in the independent variable (X).
 It is the square of the coefficient of correlation
 It ranges from 0 to 1
 It does not give any information on the direction of
the relationship between the variables.

Chap 9-29
Example 1
Nicholas from MMU is concerned about the cost to
students of textbooks. He believes there is a
relationship between the number of pages in the text
and the selling price of the book. To provide insight
into the problem he selects a sample of 8 textbooks
currently on sale in the bookstore.

Compute the Correlation Coefficient & the Coefficient


of Determination.

Chap 9-30
 X Y 
 XY  
 n 
r
   X 2   2   Y 2 
 X     Y   
2
n   n 
     
 4900(636) 
397200   
  8
 0.614
  49002    6362 
3150000   8 
 51606  
 8


     

It indicates a moderate positive relationship between


the number of pages & the selling price of the book.
Chap 9-31
Coefficient of Determination (r2)

2
r  0.614 r  0.377

That is 37.7% of the total variation in


the selling price of the book (Y)
is explained by the variation
in the number of pages of the book (X).

The balance of 62.3% is


the unexplained variation.

Chap 9-32
Estimation

Chap 9-33
Estimation
What is the estimated selling price for a book
that has 800 pages?

Price = RM48 + 0.05143 (Number of Pages)


= RM48 + 0.05143 (800)
= RM89.14

Chap 9-34
Types of Estimation
Interpolated Estimate Extrapolated Estimate
is an estimation made
is an estimation made outside
within the given data
the given data range.
range.

Interpolated Estimate is always more reliable


than the
Extrapolated Estimate.

Chap 9-35
Using the Regression Equation in Example 1, compute
Y when X = 1100 & X = 550

The regression equation is:


Y’ = 48 + 0.05143 X

Price = RM48 + 0.05143 (Number of Pages)


= RM48 + 0.05143 (1100)
= RM104.57

Price = RM48 + 0.05143 (Number of Pages)


= RM48 + 0.05143 (550)
= RM76.29
Chap 9-36
Book Page Price(RM)
Introduction to Statistics 500 84
Basic Algebra 700 75
Introduction to Psychology 800 99
Introduction to Sociology 600 72
Business Management 400 69
Introduction to Biology 500 81
Fundamentals of Finance 600 63
Principles of Marketing 800 93
Chap 9-37
Why X = 1100 is not a reliable estimate whereas X
= 550 is more reliable?

The minimum value of X is 400 pages &


the maximum value of X is 800 pages.

Because, X = 1100 is outside the given data range so


it is an extrapolated estimation.

While X = 550 is within the given data range so it is


an interpolated estimation.

Interpolated estimation is always more reliable.


Chap 9-38
Exercise 1
A company manufacturing machine parts would like to
develop a model to estimate the number of worker hours
required for production runs of varying lot sizes. A random
sample of 14 production runs is selected with the following
results.
a) Calculate the correlation coefficient & coefficient of
determination.
b) Determine the least square regression line.
c) Estimate the worker hours for these lot sizes;
35 units & 100 units.
d) Which of the two estimates that is more reliable?
Chap 9-39
Lot Size Worker Hours
20 50
20 55
30 73
30 67
40 87
40 95
50 108
50 112
60 128
60 135
70 148
70 160
80 170
80 162 Chap 9-40
Exercise 2
Sunflowers, a chain of women’s clothing stores, has
improved its market share over the past 25 years by
increasing the number of stores in the chain. As the
director of special projects and planning, you need to
develop a strategic plan for opening several new stores.
This plan must be able to forecast annual sales for all
potential stores under consideration. You believe that the
size of the store is significantly related to its success and
want to incorporate this information in the decision-
making process. To estimate the relationship between the
store size (sq. ft) and its annual sales, a sample of 14 stores
was selected.
Chap 9-41
a) Calculate and interpret the correlation coefficient
& coefficient of determination.
b) Determine the least square regression line.
c) Estimate the annual sales for these store sizes;
1.4 sq. ft. & 6 sq. ft
d) Which of the two estimates that is more reliable?

Chap 9-42
Store Sq. Ft. Annual Sales ($’000)
1 1.7 3.7
2 1.6 3.9
3 2.8 6.7
4 5.6 9.5
5 1.3 3.4
6 2.2 5.6
7 1.3 3.7
8 1.1 2.7
9 3.2 5.5
10 1.5 2.9
11 5.2 10.7
12 4.6 7.6
13 5.8 11.8
14 3.0 4.1
Chap 9-43

You might also like