L5 - Simple Linear Regression Students
L5 - Simple Linear Regression Students
Linear Regression
and Correlation
Analysis
Chapter 14, Groebner
1
After completing this chapter, you should be able to:
• Understand and calculate
correlation between two variables
• Understand scatter plots Calculate
Chapter and interpret the simple correlation
2
Scatter Plots and Correlation
• A scatter plot (or scatter diagram) is used to show the
relationship between two variables
• Correlation analysis is used to measure strength of
the association (linear relationship) between two
variables
– Only concerned with strength of the relationship
3
Scatter Plot Examples
Linear relationships Curvilinear relationships
y y
x x
y y
x
4
Scatter Plot Examples
Strong relationships Weak relationships
y y
x x
y y
x x
5
Scatter Plot Examples
No relationship
x
6
Correlation Coefficient, r
• Correlation measures the strength of the linear
association between two variables
• The sample correlation coefficient, r is a
measure of the strength of the linear
relationship between two variables, based on
sample observations
7
Features of r
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear
relationship
• If r > |0.5|, indicates strong relationship
• The closer to 0, the weaker the linear relationship
8
Examples of Approximate
r Values y
y y
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3 r = +1 9
Calculating the Correlation Coefficient
r
( x x)( y y)
[ ( x x ) ][ ( y y ) ]
2 2
Number of Number of
TV Ads Cars Sold
1 14
3 24
2 18
1 17
3 27
Significance Test for Correlation
• Hypotheses
H: ρ = 0 (no correlation)
HA: ρ ≠ 0 (correlation exists)
• Test statistic
r The Greek letter ρ (rho) represents
t the population correlation coefficient
1 r 2
n2 (with n – 2 degrees of freedom)
t-distribution table
Conf. Level 0.1 0.3 0.5 0.7 0.8 0.9 0.95 0.98 0.99
One Tail 0.45 0.35 0.25 0.15 0.10 0.05 0.025 0.01 0.005
Two Tails 0.90 0.70 0.50 0.30 0.20 0.10 0.05 0.02 0.01
d.f. Values of t
1 0.1584 0.5095 1.0000 1.9626 3.0777 6.3137 12.7062 31.8200 63.6559
2 0.1421 0.4447 0.8165 1.3860 1.8860 2.9200 4.3030 6.9650 9.9250
3 0.1366 0.4242 0.7650 1.2500 1.6380 2.3530 3.1820 4.5410 5.8410
. . . . . . . . . .
. . . . . . . . . .
12 0.1283 0.9470 0.6950 1.0830 1.3560 1.7820 2.1790 2.6810 3.0550
13 0.1281 0.3940 0.6940 1.0790 1.3500 1.7710 2.1600 2.6500 3.0120
14 0.1280 0.3933 0.6920 1.0760 1.3450 1.7610 2.1450 2.6240 2.9770
15 0.1278 0.3928 0.6910 1.0740 1.3410 1.7530 2.1310 2.6020 2.9470
16 0.1277 0.3923 0.6900 1.0710 1.3370 1.7460 2.1200 2.5830 2.9210
17 0.1276 0.3919 0.6890 1.0690 1.3330 1.7400 2.1100 2.5670 2.8980
18 0.1274 0.3915 0.6880 1.0670 1.3300 1.7340 2.1010 2.5520 2.8780
19 0.1274 0.3912 0.6880 1.0660 1.3280 1.7290 2.0930 2.5390 2.8610
20 0.1273 0.3909 0.6870 1.0640 1.3250 1.7250 2.0860 2.5280 2.8450
21 0.1272 0.3906 0.6860 1.0630 1.3230 1.7210 2.0800 2.5180 2.8310
22 0.1271 0.3904 0.6860 1.0610 1.3210 1.7170 2.0740 2.5080 2.8190
23 0.1271 0.3902 0.6850 1.0600 1.3190 1.7140 2.0687 2.5000 2.8070
24 0.1270 0.3900 0.6850 1.0590 1.3180 1.7109 2.0639 2.4922 2.7970
25 0.1269 0.3898 0.6840 1.0580 1.3160 1.7080 2.0595 2.4850 2.7870
26 0.1269 0.3896 0.6840 1.0580 1.3150 1.7060 2.0560 2.4790 2.7790
14
Example
Test the coefficient of correlation between the number of TV
ads and number of cars sold at 1% level of significance?
16
17
Regression Analysis
• Regression analysis is used to:
– Predict the value of a dependent variable based on the value
of at least one independent variable (regression equation)
– Explain the impact of changes in an independent variable on
the dependent variable (R2 or r2)
18
Simple Linear Regression Model
19
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear
20
The estimated simple linear regression equation
ŷ b0 b1 x
(x x)
2
xy x y
and b1 n
x 2
( x ) 2
b0 y b1x n
22
Simple Linear Regression
Example
• A real estate agent wishes to examine the relationship
between the number of tv ads and number of cars
sold
• A random sample of 5 cars is selected
– Dependent variable (y) = number of cars sold
– Independent variable (x) = number of tv ads
23
Find the least square regression equation
Number of Number of
TV Ads Cars Sold
1 14
3 24
2 18
1 17
3 27
25
26
Coefficient of Determination, R2
• The coefficient of determination or R2 is the percentage of the total
variation in the dependent variable(y) that is explained by the
independent variable(x).If the value of R2 close to 1, it indicates a
‘good fit’ model.
• From example, R2 =
SSR
R 2
where 0 R 1 2
SST
27
• SST = total sum of squares ( y y) 2
– Measures the variation of the yi values
around their mean y
• SSE = error sum of squares ( y ˆ
y ) 2
– Unexplained variation attributable to
factors other than the relationship
between x and y
• SSR = regression sum of squares ( ˆ
y y ) 2
– Explained variation attributable to the
relationship between x and y
28
Significance of the regression slope
coefficient
If t test statistics > t critical value or t test statistics < - t critical value ,
reject Ho.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.762114627
R Square 0.580543792
Adjusted R Square 0.5283192906
Standard Error 41.330323650
Observations 10
ANOVA
df SS MS F Significance F
Regression SSR ( yˆ 1 y )
2
18934.9352 18934.935 11.08476 0.010394
Residual SSE ( y 7 yˆ )
2
13665.577 1708.19565
Total SST ( y
8 y)2 32600.5000
31
32
33