0% found this document useful (0 votes)
2 views

Chapter 8-Simple Linear Regression

Chapter 8 discusses Simple Linear Regression and Correlation, focusing on the relationship between independent and dependent variables. It covers correlation analysis, regression analysis, and the significance of correlation coefficients, providing examples from various fields. The chapter explains how to interpret correlation coefficients and regression equations, along with the standard error of the estimate.

Uploaded by

kgzawhein1910
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter 8-Simple Linear Regression

Chapter 8 discusses Simple Linear Regression and Correlation, focusing on the relationship between independent and dependent variables. It covers correlation analysis, regression analysis, and the significance of correlation coefficients, providing examples from various fields. The chapter explains how to interpret correlation coefficients and regression equations, along with the standard error of the estimate.

Uploaded by

kgzawhein1910
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Chapter 8

Simple Linear Regression and Correlation

2
Interval or
Topics Ratio scale

• Independent variables vs. Dependent variables

• Correlation analysis
– Pearson’s coefficient of correlation (r)
– Coefficient of determination (r2)
– Test for significance of the correlation coefficient

• Regression analysis
– Simple linear regression equation
– F test for significance of simple linear regression
model
– t test for significance of individual slope coefficients

• Standard error of the estimate (Se)

3
Examples
A businessperson may want to know whether the volume of
sales for a given month is related to the amount of
advertising the iPhone does that month.

Educators are interested in determining whether the


number of hours a student studies is related to the student’s
score on a particular exam.

A zoologist may want to know whether the birth weight of a


certain animal is related to its life span.

Medical researchers are interested in such questions as


 Is caffeine related to heart damage?
 Is there a relationship between a person’s age and his/her blood
pressure?

4
Simple Linear Regression and Correlation

This topic is study about the relationship between two variables so as to


develop the numerical measures that can express the relationship between
these two variables.

Quantitative Variables
midterm , final

weight , height

5
Predict y based on x

X = Independent variable , Explanatory variable , Predictor

Y = Dependent variable , Predicted , Estimated

x y
Midterm Score Final Score
x y
Height of father Height of son
x ,y x,y
Weight Height

6
7
Example 1 Scatter Diagram
x y
Dependent variable

Independent variable

The information from the sample implies that the sales volume is related to the advertising costs.
Independent variable is a variable that provides the basis for estimation. It is the predictor variable.
Dependent variable is the variable that is being predicted or estimated.

8
Correlation Analysis

a statistical measurement that describes the relationship between two or more variables

9
Correlation Analysis Interval or
Ratio scale

• Coefficient of correlation
– A measure of the strength and direction of the linear relationship between
two variables.
• Interval scale or ratio scale
– Pearson product-moment correlation coefficient
– Pearson’s correlation coefficient

10
x y

11
How to run Excel : Data / Data Analysis / Correlation

The correlation coefficient ( r ) = 0.7590

12
13
14
Pearson’s correlation coefficient (r) can assume any value from -1 to 1.

15
16
Interpretation of Correlation

r = 0.759 There is a strong positive linear relationship between x and y.

r = 0.514 There is a moderate positive linear relationship between x and y.

r = -0.1378 There is a weak negative linear relationship between x and y.

r = -0.9211 There is a strong negative linear relationship between x and y.

17
Example 1: Correlation analysis
Interpret r = 0.759 : There is a strong positive linear relationship
between the advertising cost and sales volume.

18
Coefficient of Determination

Interpret r 2  0.759 2  0.5761

• Approximately 57.61% of the total variation in the sales volume (million baht ) can
be explained by the variation in the advertising cost (million baht)

• The variation in the advertising cost (million baht) explains 57.61% of the total variation
in the sales volume (million baht).

• The regression model explains 57.61% of the total variation in the sales volume (million baht).

19
SAS output

r
P-value

IF P-Value is less than  , then Reject H0


20
t - distribution

- + - +

Ts:

21
t Test for Significance of the Correlation Coefficient (ρ)

This method is statistical inference. We use the sample correlation coefficient (r) to infer or to make
conclusion about the population correlation coefficient () under a particular level of significance ().
Hypotheses

H0:  = 0 H0:   0 H0:   0


HA:  ≠ 0 HA:  < 0 HA:  > 0

Test statistic

n2
t  r with df  n  2
1 r2

22
Examples for Correlation

EX. 2

X= No. of Police

Y= No. of Crimes
Correlation Analysis
The CORR Procedure
1 With Variables:Police
1 Variables: Crimes
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
Police 8 18.25000 5.87367 146.00000 11.00000 27.00000 Police
Crimes 8 11.87500 6.44621 95.00000 5.00000 21.00000 Number of Crimes
Pearson Correlation Coefficients, N = 8
Prob > |r| under H0: Rho=0
Crimes
Police -0.87440 r
Police 0.0045 P-Value


r = - 0.8744 There is a strong negative linear relationship between
the number of police and the number of crimes.

r2 = 0.7646 The regression model explains 76.46% of the total variation in


number of crimes.

25
Ho :   0 (There is no negative correlation between number of police and the number of crimes.)

Ha :  0 (There is a negative correlation between number of police and the number of crimes.)

cv : t  t , n2 t 0.05, 6   1.943


-1.943

n2 82
ts : t  r   0.8744
1 r2 = -4.4143
1  (0.8744) 2

D&C : The null hypothesis is rejected and the conclusion would be that there is a significant
negative correlation between no. of police and no. of crimes.

26
At the 0.05 level of significance is there evidence of a correlation
d. between the number of police and the number of crimes.

Ho :  0

Ha :  0

P-Value : 0.0045 

D&C : The null hypothesis is rejected and the conclusion would be


that there is a significant correlation between
no. of police and no. of crimes.
X = Part-time work ( hours ) Y = GPA

EX. 31. A professor is doing a study of the relationship between students’ grades and their
part-time work. She analyzes data from a random sample of 15 college students
with the SAS results displayed below.

Correlation Analysis
The CORR Procedure

Pearson Correlation Coefficients, N = 15


Prob > |r| under H0: Rho=0
HOURS
GPA 0.05476 r
GPA 0.8463 P-value
a. r = 0.05476
There is a weak positive linear relationship between
students’ grade and the part-time work.

b. r 2  0.003

The regression model explains 0.30% of the total variation in


students’ grade (GPA).

29
At the 0.05 level of significance, is there evidence of a positive correlation between
c. Grade-point average and number of hours worked ?

Ho :  0

Ha :  0 1.771

cv : t  t , n2 t 0.05, 13  1.771

ts : n2 15  2
t  r  0.05476  0.1977
1 r2 1  0.054762

D&C : The null hypothesis is not rejected and the conclusion


would be that there is no significant positive correlation
between grade-point average and the number of hours
worked.
At the 0.05 level of significance, is there evidence of a correlation between
d. Grade-point average and number of hours worked ?

Ho :  0

Ha :  0

P-Value : 0.8463 

The null hypothesis is not rejected and the conclusion would be


D&C : that there is no significant correlation between grade-point
average and the number of hours worked.
Simple Linear Regression

Simple linear regression is used to estimate the linear relationship between two quantitative variables.

32
Regression Analysis Interval or
Ratio scale

• Regression equation
– An equation that expresses the linear relationship
between two variables.

• General form  = Intercept point to Y axis, where X=0


 = slope coefficient in the population
– Population model: Y =  + X
– Sample equation: ŷ = a + bx a = Intercept point to y axis, where x=0
b = slope coefficient in the sample

• Least squares principle


– Minimizing the sum of the squares of the vertical
distances between the actual values and the
predicted values of Y.

33
Actual yi
ŷ = a + bx

ei = yi - ŷi
Minimize ∑(ei2)
Estimated yi
( ŷi )

34
Error  y  yˆ
Final

(Actual Final Score) y ( Predicted Final Score)


28 ŷ

25

Midterm
18

35
Standard Error of the Estimate

• A measure of the scatter, or dispersion, of the observed values around


the line of regression.

36
a = y-intercept x0
Regression Equation ŷ  a  bx b = Slope Slope 
y
x

Example of Regression Equation:


yˆ  18.9476  11.842 x

37
From Example 1

The BIG BEAR Company has advertised its products on the TV several times. The data of the
advertising costs and the sales volume after advertising have been recorded for ten times as follows.

Advertising cost Sales volume


(million baht) (million baht)
2 30
4 60
2 40 X = Advertising Cost (million baht)
3 60 Y = Sales Volume (million baht)
1 30
1 40
2 40
2 50
2 30
3 70

38
How to run Excel : Data / Data Analysis / Regression

39
Excel Output for Regression

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.75901411
r2 R Square 0.57610242
Adjusted R Square 0.52311522
Se Standard Error 9.900824
Observations 10 n

ANOVA
df SS MS F Significance F
Regression 1 1065.789474 1065.789474 10.87248322 0.01090193
Residual 8 784.2105263 98.02631579
Total 9 1850

Coefficients Standard Error t Stat P-value


Intercept a 18.9473684 8.498818559 2.22941204 0.056348647
x Adv_Cost
b
11.8421053 Sb 3.591406333 3.297344875 0.01090193

40
The coefficient of Determination r 2  0.5761
Sample size = 10
The Standard Error of Estimate Se  9.9008
The value of a = 18.9474
The value of b = 11.8421
The Least Square Regression Equation is yˆ  18.9474 11.8421 x
Predict Sales , yˆ  18.9474 11.8421 x , given x = 2.5 = 48.5527 millions baht

The Standard Error of Slope =


Sb  3.5914

41
Predict y based on x

X = Independent variable , Explanatory variable , Predictor

Y = Dependent variable , Predicted , Estimated

r2 = The coefficient of Determination

Se = Standard Error of Estimate

Sb = Standard Error of Slope

The Least Square Regression Equation ŷ  a  bx

42
X = GPA
Y = Monthly Salary ($)

Example 2 The following data are the monthly salaries and the grade point averages for students

who had obtained a bachelor’s degree in business administration with a major in

information systems.

GPA Monthly Salary ($)

2.6 3300

3.4 3600

3.6 4000

3.2 3500

3.5 3900

2.9 3600

Analyze these data in excel output of the estimated regression equation, use GPA as independent
variable.

43
Excel >> Data >> Data Analysis

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.863634916
r2 R Square 0.745865268
Adjusted R Square 0.682331585
Se Standard Error 145.8896288
Observations 6 n

ANOVA
df SS MS F Significance F
Regression 1 249864.8649 249864.8649 11.73968254 0.02662527
Residual 4 85135.13514 21283.78378
Total 5 335000

Coefficients Standard Error t Stat P-value


Intercept a 1790.540541 545.9568169 3.279637666 0.030511138
x GPA b 581.0810811 S 169.5932486
b
3.426322013 0.02662527

44
The coefficient of Determination = 0.7459
Interpretation: Approximately 74.59% of the total variation in monthly salary can be explained
by the variation in GPA
OR: The variation in GPA explains 74.59% of the total variation in monthly salary.
OR: The regression model explains 74.59% of the total variation in monthly salary.

Sample size 6

The Standard Error of Estimate Se  145.8896

45
change in y
a = y-intercept , let x = 0 b  slope 
change in x
The value of a = 1790.5405
Interpretation: If the student gets GPA 0.00 point then the salary will be $1790.5405

The value of b = 581.0811 Interpretation: If the GPA of student increases by 1 point


y then the salary is expected to be increased
Slope  by $ 581.0811
x
581.0811 salary

1 GPA

Slope : How does y change if x changes by 1 unit ?

46
The Least Square Regression Equation is yˆ  1790.5405  581.0811x

Predict the monthly salary of the student who obtained GPA 3.50.

yˆ  1790.5405  581.0811x ,x = 3.50

 1790.5405  581.0811(3.50)
 $3824.3244

The Standard Error of Slope Sb  169.5932

47
Predict y based on x

X = Independent variable , Explanatory variable , Predictor

Y = Dependent variable , Predicted , Estimated

r2 = The coefficient of Determination

Se = Standard Error of Estimate

Sb = Standard Error of Slope

The Least Square Regression Equation ŷ  a  bx

48
More Problems

49
y x

50
72.7057+1.3283x

132.4792

0.8146

9.1910

51
y

x
y

52
53
54
55
56
57
y
x
Weight = y

58
59
60
Choose Computer Output A1

61
x , y,

- Excel Output

-How To run and read SAS Output

62
F Test for Significance of Simple Linear Regression

63
From Example 1

The BIG BEAR Company has advertised its products on the TV several times. The data of the
advertising costs and the sales volume after advertising have been recorded for ten times as follows.

Advertising cost Sales volume


(million baht) (million baht)
2 30
4 60
2 40 X = Advertising Cost (million baht)
3 60 Y = Sales Volume (million baht)
1 30
1 40
2 40
2 50
2 30
3 70

64
SAS output

ŷ  a  bx
y
yˆ  18.94737  11.84211x
n = 10

SS MS
Regression

Se  MSE r 
2 SSR
SST

a
X b Sb t P-value
65
66
More Problems

67
68
Determination ( r2 )

Standard Error of Estimate ( Se )


n = 10

a
(x) b Sb
69
y
n = 10

SSR
Se  MSE ( Se ) ( r2 ) r2 
SST

a
(x) b P-value
Sb t 70
y
X

71
72
y
n = 11

( Se ) ( r2 )

a
(x) b P-value
Sb t

73
ŷ  a  bx

ŷ  35.87983  0.47961(time) , Time = 60 yˆ  64.6564 marks

y  int ercept ( let x = 0 ) If a student spends 0 minute per week in library


then his exam score would be 35.8798 marks.
Slope = 0.47961
score 0.47961 Each additional minute increases in library time then
slope  
time 1 the exam score is expected to increase by 0.47961 mark
ssr 1948.97581
Deter min ation  r  2
 = 0.8060
sst 2418.18182
The regression model explains 80.6% of the total variation in exam score.
74
t Test for Significance of Linear Relationship between two variables ()

Interval or
Ratio scale

Any linear relationship ? Negative linear relationship ? Positive linear relationship ?

n – (k + 1)
k = Number of Xi

75
From Example 1

The BIG BEAR Company has advertised its products on the TV several times. The data of the
advertising costs and the sales volume after advertising have been recorded for ten times as follows.

Advertising cost Sales volume


(million baht) (million baht)
2 30
4 60
2 40 X = Advertising Cost (million baht)
3 60 Y = Sales Volume (million baht)
1 30
1 40
2 40
2 50
2 30
3 70

76
Example 1: Regression analysis – t test

Determine whether there is any linear relationship in the population between


the advertising cost (million baht) and sales volume (million baht), at  = 0.05.

77
3

78
Example 1: Regression analysis – t test

Determine whether there is any positive linear relationship in the population between the
advertising cost (million baht) and sales volume (million baht), at  = 0.05.

79
3

80
Exercise for Simple Linear Regression

4) The owner of Mazda Motors wants to study the relationship between the age of a car and its

selling price. She analyzes data from a random sample of 12 used car sold at Mazda Motors
y x
during the last year. If we want to estimate selling price based on the age of the car
Table 1

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.543646
R Square 0.295551
Adjusted R Square 0.225106
Standard Error 1.966875
Observations 12

ANOVA
Significance
df SS MS F F
Regression 1 16.23069 16.23069 4.195499 0.067702
Residual 10 38.68597 3.868597
Total 11 54.91667

Standard
Coefficients Error t Stat P-value
Intercept 13.1814 2.158124 6.107805 0.000115
Selling
price($000s) -0.61733 0.301389 -2.04829 0.067702
Table 2
Choose Table 2
a) Define the name of explanatory variable. Age of a car
___________________________
b) Define the name of estimated variable. Selling Price
___________________________
c) Determine and interpret the coefficient of determination. SSR 12.58729
R2   = 0.2956
SST 42.58917

The regression model explains 29.56% of the total variation in the selling price.

d) Interpret the coefficient of correlation, r = -0.5436

There is a moderate negative linear relationship between the age of car used and its selling price.
e) State the simple linear regression equation. yˆ  11.17724  0.47876 x

f) Predict the selling price with 10 years used. x  10 yˆ  11.17724  0.47876(10)


 6.38964 thousand dollars = 6389.64 dollars
g) Interpret the meaning of the y-intercept in the context of this problem.

y-intercept = 11.17724 The price of a new car is $11,177.24


The purchased price of the car is $11,177.24
h) Interpret the meaning of the slope in the context of this problem.

y selling price 0.47876 Each additional year of the car used then the
b  slope   
x Age 1 selling price would be decreased by $478.76
i) Determine the value of the standard error of the estimate.

Se  MSE  3.00019 = 1.7321

j) Define the value of the F statistic.

F = 4.20
k) At the 0.10 level of significance, is there evidence of a linear relationship between
the age of a car and its selling price?

Ho :  0

Ha :  0

P-value 0.0677

D&C : The null hypothesis is rejected and the conclusion would be that there is significant
linear relationship between the age of a car and its selling price.
l. At the 0.10 level of significance, is there evidence of a negative linear relationship
between the age of a car and its selling price?

Ho :  0

Ha :  0

CV. : t ,n2  t0.1,122 = -1.372 -1.372

b
TS. : t = -2.05
Sb

D&C : The null hypothesis is rejected and the conclusion would be that there is significant
negative linear relationship between the age of a car and its selling price.
THE END
Simple Linear Regression and Correlation

89

You might also like