Chapter 8-Simple Linear Regression
Chapter 8-Simple Linear Regression
2
Interval or
Topics Ratio scale
• Correlation analysis
– Pearson’s coefficient of correlation (r)
– Coefficient of determination (r2)
– Test for significance of the correlation coefficient
• Regression analysis
– Simple linear regression equation
– F test for significance of simple linear regression
model
– t test for significance of individual slope coefficients
3
Examples
A businessperson may want to know whether the volume of
sales for a given month is related to the amount of
advertising the iPhone does that month.
4
Simple Linear Regression and Correlation
Quantitative Variables
midterm , final
weight , height
5
Predict y based on x
x y
Midterm Score Final Score
x y
Height of father Height of son
x ,y x,y
Weight Height
6
7
Example 1 Scatter Diagram
x y
Dependent variable
Independent variable
The information from the sample implies that the sales volume is related to the advertising costs.
Independent variable is a variable that provides the basis for estimation. It is the predictor variable.
Dependent variable is the variable that is being predicted or estimated.
8
Correlation Analysis
a statistical measurement that describes the relationship between two or more variables
9
Correlation Analysis Interval or
Ratio scale
• Coefficient of correlation
– A measure of the strength and direction of the linear relationship between
two variables.
• Interval scale or ratio scale
– Pearson product-moment correlation coefficient
– Pearson’s correlation coefficient
10
x y
11
How to run Excel : Data / Data Analysis / Correlation
12
13
14
Pearson’s correlation coefficient (r) can assume any value from -1 to 1.
15
16
Interpretation of Correlation
17
Example 1: Correlation analysis
Interpret r = 0.759 : There is a strong positive linear relationship
between the advertising cost and sales volume.
18
Coefficient of Determination
• Approximately 57.61% of the total variation in the sales volume (million baht ) can
be explained by the variation in the advertising cost (million baht)
• The variation in the advertising cost (million baht) explains 57.61% of the total variation
in the sales volume (million baht).
• The regression model explains 57.61% of the total variation in the sales volume (million baht).
19
SAS output
r
P-value
- + - +
Ts:
21
t Test for Significance of the Correlation Coefficient (ρ)
This method is statistical inference. We use the sample correlation coefficient (r) to infer or to make
conclusion about the population correlation coefficient () under a particular level of significance ().
Hypotheses
Test statistic
n2
t r with df n 2
1 r2
22
Examples for Correlation
EX. 2
X= No. of Police
Y= No. of Crimes
Correlation Analysis
The CORR Procedure
1 With Variables:Police
1 Variables: Crimes
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
Police 8 18.25000 5.87367 146.00000 11.00000 27.00000 Police
Crimes 8 11.87500 6.44621 95.00000 5.00000 21.00000 Number of Crimes
Pearson Correlation Coefficients, N = 8
Prob > |r| under H0: Rho=0
Crimes
Police -0.87440 r
Police 0.0045 P-Value
r = - 0.8744 There is a strong negative linear relationship between
the number of police and the number of crimes.
25
Ho : 0 (There is no negative correlation between number of police and the number of crimes.)
Ha : 0 (There is a negative correlation between number of police and the number of crimes.)
n2 82
ts : t r 0.8744
1 r2 = -4.4143
1 (0.8744) 2
D&C : The null hypothesis is rejected and the conclusion would be that there is a significant
negative correlation between no. of police and no. of crimes.
26
At the 0.05 level of significance is there evidence of a correlation
d. between the number of police and the number of crimes.
Ho : 0
Ha : 0
P-Value : 0.0045
EX. 31. A professor is doing a study of the relationship between students’ grades and their
part-time work. She analyzes data from a random sample of 15 college students
with the SAS results displayed below.
Correlation Analysis
The CORR Procedure
b. r 2 0.003
29
At the 0.05 level of significance, is there evidence of a positive correlation between
c. Grade-point average and number of hours worked ?
Ho : 0
Ha : 0 1.771
ts : n2 15 2
t r 0.05476 0.1977
1 r2 1 0.054762
Ho : 0
Ha : 0
P-Value : 0.8463
Simple linear regression is used to estimate the linear relationship between two quantitative variables.
32
Regression Analysis Interval or
Ratio scale
• Regression equation
– An equation that expresses the linear relationship
between two variables.
33
Actual yi
ŷ = a + bx
ei = yi - ŷi
Minimize ∑(ei2)
Estimated yi
( ŷi )
34
Error y yˆ
Final
25
Midterm
18
35
Standard Error of the Estimate
36
a = y-intercept x0
Regression Equation ŷ a bx b = Slope Slope
y
x
37
From Example 1
The BIG BEAR Company has advertised its products on the TV several times. The data of the
advertising costs and the sales volume after advertising have been recorded for ten times as follows.
38
How to run Excel : Data / Data Analysis / Regression
39
Excel Output for Regression
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.75901411
r2 R Square 0.57610242
Adjusted R Square 0.52311522
Se Standard Error 9.900824
Observations 10 n
ANOVA
df SS MS F Significance F
Regression 1 1065.789474 1065.789474 10.87248322 0.01090193
Residual 8 784.2105263 98.02631579
Total 9 1850
40
The coefficient of Determination r 2 0.5761
Sample size = 10
The Standard Error of Estimate Se 9.9008
The value of a = 18.9474
The value of b = 11.8421
The Least Square Regression Equation is yˆ 18.9474 11.8421 x
Predict Sales , yˆ 18.9474 11.8421 x , given x = 2.5 = 48.5527 millions baht
41
Predict y based on x
42
X = GPA
Y = Monthly Salary ($)
Example 2 The following data are the monthly salaries and the grade point averages for students
information systems.
2.6 3300
3.4 3600
3.6 4000
3.2 3500
3.5 3900
2.9 3600
Analyze these data in excel output of the estimated regression equation, use GPA as independent
variable.
43
Excel >> Data >> Data Analysis
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.863634916
r2 R Square 0.745865268
Adjusted R Square 0.682331585
Se Standard Error 145.8896288
Observations 6 n
ANOVA
df SS MS F Significance F
Regression 1 249864.8649 249864.8649 11.73968254 0.02662527
Residual 4 85135.13514 21283.78378
Total 5 335000
44
The coefficient of Determination = 0.7459
Interpretation: Approximately 74.59% of the total variation in monthly salary can be explained
by the variation in GPA
OR: The variation in GPA explains 74.59% of the total variation in monthly salary.
OR: The regression model explains 74.59% of the total variation in monthly salary.
Sample size 6
45
change in y
a = y-intercept , let x = 0 b slope
change in x
The value of a = 1790.5405
Interpretation: If the student gets GPA 0.00 point then the salary will be $1790.5405
46
The Least Square Regression Equation is yˆ 1790.5405 581.0811x
Predict the monthly salary of the student who obtained GPA 3.50.
1790.5405 581.0811(3.50)
$3824.3244
47
Predict y based on x
48
More Problems
49
y x
50
72.7057+1.3283x
132.4792
0.8146
9.1910
51
y
x
y
52
53
54
55
56
57
y
x
Weight = y
58
59
60
Choose Computer Output A1
61
x , y,
- Excel Output
62
F Test for Significance of Simple Linear Regression
63
From Example 1
The BIG BEAR Company has advertised its products on the TV several times. The data of the
advertising costs and the sales volume after advertising have been recorded for ten times as follows.
64
SAS output
ŷ a bx
y
yˆ 18.94737 11.84211x
n = 10
SS MS
Regression
Se MSE r
2 SSR
SST
a
X b Sb t P-value
65
66
More Problems
67
68
Determination ( r2 )
a
(x) b Sb
69
y
n = 10
SSR
Se MSE ( Se ) ( r2 ) r2
SST
a
(x) b P-value
Sb t 70
y
X
71
72
y
n = 11
( Se ) ( r2 )
a
(x) b P-value
Sb t
73
ŷ a bx
Interval or
Ratio scale
n – (k + 1)
k = Number of Xi
75
From Example 1
The BIG BEAR Company has advertised its products on the TV several times. The data of the
advertising costs and the sales volume after advertising have been recorded for ten times as follows.
76
Example 1: Regression analysis – t test
77
3
78
Example 1: Regression analysis – t test
Determine whether there is any positive linear relationship in the population between the
advertising cost (million baht) and sales volume (million baht), at = 0.05.
79
3
80
Exercise for Simple Linear Regression
4) The owner of Mazda Motors wants to study the relationship between the age of a car and its
selling price. She analyzes data from a random sample of 12 used car sold at Mazda Motors
y x
during the last year. If we want to estimate selling price based on the age of the car
Table 1
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.543646
R Square 0.295551
Adjusted R Square 0.225106
Standard Error 1.966875
Observations 12
ANOVA
Significance
df SS MS F F
Regression 1 16.23069 16.23069 4.195499 0.067702
Residual 10 38.68597 3.868597
Total 11 54.91667
Standard
Coefficients Error t Stat P-value
Intercept 13.1814 2.158124 6.107805 0.000115
Selling
price($000s) -0.61733 0.301389 -2.04829 0.067702
Table 2
Choose Table 2
a) Define the name of explanatory variable. Age of a car
___________________________
b) Define the name of estimated variable. Selling Price
___________________________
c) Determine and interpret the coefficient of determination. SSR 12.58729
R2 = 0.2956
SST 42.58917
The regression model explains 29.56% of the total variation in the selling price.
There is a moderate negative linear relationship between the age of car used and its selling price.
e) State the simple linear regression equation. yˆ 11.17724 0.47876 x
y selling price 0.47876 Each additional year of the car used then the
b slope
x Age 1 selling price would be decreased by $478.76
i) Determine the value of the standard error of the estimate.
F = 4.20
k) At the 0.10 level of significance, is there evidence of a linear relationship between
the age of a car and its selling price?
Ho : 0
Ha : 0
P-value 0.0677
D&C : The null hypothesis is rejected and the conclusion would be that there is significant
linear relationship between the age of a car and its selling price.
l. At the 0.10 level of significance, is there evidence of a negative linear relationship
between the age of a car and its selling price?
Ho : 0
Ha : 0
b
TS. : t = -2.05
Sb
D&C : The null hypothesis is rejected and the conclusion would be that there is significant
negative linear relationship between the age of a car and its selling price.
THE END
Simple Linear Regression and Correlation
89