Chapter 8 Simple Linear Regression
Chapter 8 Simple Linear Regression
CHAPTER 8
Draw a scatter plot/diagram to see relationship between two variables. Understand and interpret the terms dependent variable and independent variable. Find linear regression model and make predictions. Study on the strength of the relationship called correlation analysis.
2
CHAPTER 8
CHAPTER 8
Examples: 1. A sociologist wants to find out if increase in crime rate is due to increase in cost of living. X = cost of living Y = crime rate 2. A fitness instructor wants to find out the relationship between weight loss and the amount of workout time. X = amount of workout time Y = weight
5
CHAPTER 8
A plot between the pairs (x, y) values. To examine relationship between two variables, X and Y.
CHAPTER 8
CHAPTER 8
CHAPTER 8
No particular pattern.
No relationship between X and Y.
10
CHAPTER 8
Question: You are a marketing analyst for Hasbro Toys. You gather the following data: Ad (RM) 1 2 3 4 5 Sales (Units) 1 1 2 2 4
11
CHAPTER 8
Answer:
Sales, Y
4 3 2 1 0
0 1 2 3 4 5
Advertising, X
12
13
CHAPTER 8
A mathematical equation that describes the linear relationship between X and Y. Can be used to predict the values of Y from known values of X. Represents a straight line, so it is of the form y=mx + c, where m is the slope and c is the y-intercept.
14
CHAPTER 8
Y = + X +
where = y-intercept = slope = random error component
15
CHAPTER 8
This regression line is usually estimated by using the paired sample data. The estimated regression line is given by
Y ' a bX
where
a = estimated b = estimated
16
CHAPTER 8
The method used to find the values of a and b is slightly different from the familiar method you learned in algebra.
17
CHAPTER 8
n( XY ) ( X )( Y ) b 2 2 n( X ) ( X ) Y X a b n n
Now we can fit the regression line to the data using the values of a and b. The estimated regression line is
Y ' a bX
18
CHAPTER 8
Question: You are an economist for the county cooperative. You gather the following data. Fertilizer (lb.) 4 6 10 12 Yield (lb.) 3.0 5.5 6.5 9.0
Find the estimated regression line relating crop yield and fertilizer.
19
CHAPTER 8
4
6 10
3.0
5.5 6.5
16
36 100
12
33 65
12
Total: Mean: 32 8
9.0
24.0 6
144
296
108
218
20
CHAPTER 8
a 6 0.65(8) 0.8
Therefore, the estimated regression line is
CHAPTER 8
Answer:
Yield (Y)
10 8 6 4 2 0
0
Fertilizer (X)
y .8 .65x
10
15
22
CHAPTER 8
Answer: What do a and b in the regression line means? 1. Y-intercept, a = 0.8 Average Crop Yield (Y) is expected to be 0.8 lb. when no Fertilizer (X) is used. X = 0, Y = 0.8 2. Slope, b = 0.65 Crop Yield (Y) is expected to increase by 0.65 lb. for each 1 lb. increase in Fertilizer (X).
23
CHAPTER 8
Question: A student wants to know the relationship between number of pages and the price of the book. To analyze this, he selects a sample of 8 textbooks currently on sale in a bookstore. Develop a regression line to fit the data given.
24
CHAPTER 8
Question:
Book History Algebra Geometry Physics Sociology Biology Statistics Nursing No. of Pages (X) 500 700 800 600 400 500 600 800 Price (Y) 84 75 99 72 69 81 63 93
25
CHAPTER 8
800
600 400 500
99
72 69 81
640,000
360,000 160,000 250,000
79200
43200 27600 40500
600
800 Total: Mean: 4900 612.5
63
93 636 79.5
360,000
640,000 3150,000
37800
74400 397,200
26
CHAPTER 8
a 79.5 0.0514(612.5) 48
Therefore, the estimated regression line is
Y ' 48 0.0514X
27
CHAPTER 8
Now, that we have estimated the regression line, we can predict Y given any values of X. This can be found by substituting X into the estimated regression line, Y ' a bX However, the value of X to insert in the equation must be within the range of X in the data set.
28
CHAPTER 8
For Example 3, predict the price of the book that has 550 pages.
Thus, if the book is 550 page thick, the price is estimated to be RM76.27
REMEMBER! To predict Y , X must have values within the data set range.
29
30
CHAPTER 8
Correlation measures the strength of a linear relationship between two variables. (strong? weak?)
31
CHAPTER 8
A numerical measure for correlation of the quantitative data is the Pearson correlation coefficient, r. The formula is given by
[n(X ) (X ) ][nY Y ]
2 2 2 2
n(XY ) (X )(Y )
32
CHAPTER 8
CHAPTER 8
Question: A food analyst wants to know how much a person would spend on food, given certain amount of income. He selects a random sample of 7 people with their income and food expenditure as shown below.
Income (RM 00) 35 49 21 39 15 28 25
15
11
34
CHAPTER 8
Question: (i) Find the estimated regression line for the data.
(ii) How much would a person spend on food if his income is RM 3000?
(iii) Compute Pearson correlation coefficient, r. Interpret the r value.
35
CHAPTER 8
49
21 39 15
15
7 11 5
2401
441 1521 225
225
49 121 25
735
147 429 75
28
25 Total: Mean: 212 30.2857
8
9 64 9.1429
784
625 7222
64
81 646
224
225 2150
36
CHAPTER 8
Answer:
CHAPTER 8
38
CHAPTER 8
0.9587
The value r = 0.9587 shows a very strong positive relationship between income and food expenditure. When income is high, the food expenditure also increases.
39