Testing The Significance of The Correlation Coefficient
Testing The Significance of The Correlation Coefficient
Correlation Coefficient
Learning Outcomes
Calculate and interpret the correlation coefficient
The sample data are used to compute r, the correlation coefficient for
the sample. If we had data for the entire population, we could find the
population correlation coefficient. But because we have only have
sample data, we cannot calculate the population correlation coefficient.
The sample correlation coefficient, r, is our estimate of the unknown
population correlation coefficient.
Note
If r is significant and the scatter plot shows a linear trend, the line
can be used to predict the value of y for values of x that are within
:
the domain of observed x values.
If r is not significant OR if the scatter plot does not show a linear
trend, the line should not be used for prediction.
If r is significant and if the scatter plot shows a linear trend, the line
may NOT be appropriate or reliable for prediction OUTSIDE the
domain of observed x values in the data.
Drawing a Conclusion
There are two methods of making the decision. The two methods are
equivalent and give the same result.
Note
Calculation Notes:
𝑟√𝑛 − 2
The formula for the test statistic is
𝑡=
√1 − 𝑟2
. The value of the test statistic, t, is shown in the computer or
calculator output along with the p-value. The test statistic t has
the same sign as the correlation coefficient r.
The p-value is the combined area in both tails.
Example
try it
For a given line of best fit, you computed that r = 0.6501 using n = 12
data points and the critical value is 0.576. Can the line be used for
prediction? Why or why not?
If the scatter plot looks linear then, yes, the line can be used for
prediction, because r > the positive critical value.
Example
try it
For a given line of best fit, you compute that r = 0.5204 using n = 9
data points, and the critical value is 0.666. Can the line be used for
prediction? Why or why not?
No, the line cannot be used for prediction, because r < the positive
critical value.
Example 3
Try it
For a given line of best fit, you compute that r = –0.7204 using n = 8
data points, and the critical value is = 0.707. Can the line be used for
prediction? Why or why not?
Yes, the line can be used for prediction, because r < the negative
critical value.
:
Example
try it
For a given line of best fit, you compute that r = 0 using n = 100 data
points. Can the line be used for prediction? Why or why not?
No, the line cannot be used for prediction no matter what the sample
size is.
The regression line equation that we calculate from the sample data
gives the best-fit line for our particular sample. We want to use this
best-fit line for the sample as an estimate of the best-fit line for the
population. Examining the scatterplot and testing the significance of
the correlation coefficient helps us determine if it is appropriate to do
this.
The y values for each x value are normally distributed about the line
with the same standard deviation. For each x value, the mean of the y
values lies on the regression line. More y values lie near the line than
are scattered further away from the line.
Concept Review
𝑦^ = 𝑎 + 𝑏𝑥
Linear regression is a procedure for fitting a straight line of the form
The slope b and intercept a of the least-squares line estimate the slope
β and intercept α of the population (true) regression line. To estimate
the population standard deviation of y, σ, use the standard deviation of
the residuals, s.
𝑆𝑆𝐸
𝑠=
√𝑛 − 2
The variable ρ (rho) is the population correlation coefficient.
The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform
this test (STATS TESTS LinRegTTest).
Formula Review
𝑦^ = 𝑎 + 𝑏𝑥
Least Squares Line or Line of Best Fit:
where