0% found this document useful (0 votes)

100 views12 pages

Testing The Significance of The Correlation Coefficient

Uploaded by

Yoj Milana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views12 pages

Testing The Significance of The Correlation Coefficient

Uploaded by

Yoj Milana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Testing the Significance of the

Correlation Coefficient
Learning Outcomes
Calculate and interpret the correlation coefficient

The correlation coefficient, r, tells us about the strength and direction

of the linear relationship between x and y. However, the reliability of the
linear model also depends on how many observed data points are in
the sample. We need to look at both the value of the correlation
coefficient r and the sample size n, together.

We perform a hypothesis test of the “significance of the correlation

coefficient” to decide whether the linear relationship in the sample
data is strong enough to use to model the relationship in the
population.

The sample data are used to compute r, the correlation coefficient for
the sample. If we had data for the entire population, we could find the
population correlation coefficient. But because we have only have
sample data, we cannot calculate the population correlation coefficient.
The sample correlation coefficient, r, is our estimate of the unknown
population correlation coefficient.

The symbol for the population correlation coefficient is ρ, the

Greek letter “rho.”
ρ = population correlation coefficient (unknown)
r = sample correlation coefficient (known; calculated from sample
data)
:
The hypothesis test lets us decide whether the value of the population
correlation coefficient
ρ is “close to zero” or “significantly different from zero”. We decide this
based on the sample correlation coefficient r and the sample size n.

If the test concludes that the correlation coefficient is significantly

different from zero, we say that the correlation coefficient is
“significant.”
Conclusion: There is sufficient evidence to conclude that there is a
significant linear relationship between x and y because the correlation
coefficient is significantly different from zero. What the conclusion
means: There is a significant linear relationship between x and y. We
can use the regression line to model the linear relationship between x
and y in the population.

If the test concludes that the correlation coefficient is not

significantly different from zero (it is close to zero), we say that
correlation coefficient is “not significant.”

Conclusion: “There is insufficient evidence to conclude that there is a

significant linear relationship between
x and y because the correlation coefficient is not significantly different
from zero.” What the conclusion means: There is not a significant linear
relationship between x and y. Therefore, we CANNOT use the
regression line to model a linear relationship between x and y in the
population.

Note

If r is significant and the scatter plot shows a linear trend, the line
can be used to predict the value of y for values of x that are within
:
the domain of observed x values.
If r is not significant OR if the scatter plot does not show a linear
trend, the line should not be used for prediction.
If r is significant and if the scatter plot shows a linear trend, the line
may NOT be appropriate or reliable for prediction OUTSIDE the
domain of observed x values in the data.

Performing the Hypothesis Test

Null Hypothesis: H0: ρ = 0
Alternate Hypothesis: Ha: ρ ≠ 0

What the Hypotheses Mean in Words

Null Hypothesis H0: The population correlation coefficient IS NOT

significantly different from zero. There IS NOT a significant linear
relationship(correlation) between x and y in the population.
Alternate Hypothesis Ha: The population correlation coefficient IS
significantly DIFFERENT FROM zero. There IS A SIGNIFICANT
LINEAR RELATIONSHIP (correlation) between x and y in the
population.

Drawing a Conclusion

There are two methods of making the decision. The two methods are
equivalent and give the same result.

Method 1: Using the p-value

Method 2: Using a table of critical values
:
In this chapter of this textbook, we will always use a significance level
of 5%, α = 0.05

Note

Using the p-value method, you could choose any appropriate

significance level you want; you are not limited to using α = 0.05. But
the table of critical values provided in this textbook assumes that we
are using a significance level of 5%, α = 0.05. (If we wanted to use a
different significance level than 5% with the critical value method, we
would need different tables of critical values that are not provided in
this textbook.)

Method 1: Using a p-value to make a decision

To calculate the p-value using LinRegTTEST:

On the LinRegTTEST input screen, on the line prompt for β or ρ,

highlight “≠ 0”
The output screen shows the p-value on the line that reads “p =”.
(Most computer statistical software can calculate thep-value.)

If the p-value is less than the significance level (α = 0.05)

Decision: Reject the null hypothesis.

Conclusion: “There is sufficient evidence to conclude that there is
a significant linear relationship between x and y because the
correlation coefficient is significantly different from zero.”

If the p-value is NOT less than the significance level (α = 0.05)

:
Decision: DO NOT REJECT the null hypothesis.
Conclusion: “There is insufficient evidence to conclude that there
is a significant linear relationship between x and y because the
correlation coefficient is NOT significantly different from zero.”

Calculation Notes:

You will use technology to calculate the p-value. The following

describes the calculations to compute the test statistics and the
p-value:
The p-value is calculated using a t-distribution with n – 2 degrees
of freedom.

𝑟√𝑛 − 2
The formula for the test statistic is

𝑡=
√1 − 𝑟2
. The value of the test statistic, t, is shown in the computer or
calculator output along with the p-value. The test statistic t has
the same sign as the correlation coefficient r.
The p-value is the combined area in both tails.

An alternative way to calculate the p-value (p) given by LinRegTTest is

the command 2*tcdf(abs(t),10^99, n-2) in 2nd DISTR.

Method 2: Using a table of Critical Values to

make a decision
The 95% Critical Values of the Sample Correlation Coefficient Table
can be used to give you a good idea of whether the computed value of
is significant or not. Compare r to the appropriate critical value in the
table. If r is not between the positive and negative critical values, then
the correlation coefficient is significant. Ifr is significant, then you may
:
want to use the line for prediction.

Example

Suppose you computed r = 0.801 using n = 10 data points.df = n – 2 =

10 – 2 = 8. The critical values associated with df = 8 are -0.632 and +
0.632. If r < negative critical value or r > positive critical value, then r
is significant. Since r = 0.801 and 0.801 > 0.632, r is significant and the
line may be used for prediction. If you view this example on a number
line, it will help you.

r is not significant between -0.632 and +0.632. r = 0.801 > +0.632.

Therefore, r is significant.

try it

For a given line of best fit, you computed that r = 0.6501 using n = 12
data points and the critical value is 0.576. Can the line be used for
prediction? Why or why not?

If the scatter plot looks linear then, yes, the line can be used for
prediction, because r > the positive critical value.

Example

Suppose you computed r = –0.624 with 14 data points. df = 14 – 2 = 12.

The critical values are –0.532 and 0.532. Since –0.624 < –0.532, r is
significant and the line can be used for prediction
:
r = –0.624-0.532. Therefore, r is significant.

try it

For a given line of best fit, you compute that r = 0.5204 using n = 9
data points, and the critical value is 0.666. Can the line be used for
prediction? Why or why not?

No, the line cannot be used for prediction, because r < the positive
critical value.

Example 3

Suppose you computed r = 0.776 and n = 6. df = 6 – 2 = 4. The critical

values are –0.811 and 0.811. Since –0.811 < 0.776 < 0.811, r is not
significant, and the line should not be used for prediction.

–0.811 < r = 0.776 < 0.811. Therefore, r is not significant.

Try it

For a given line of best fit, you compute that r = –0.7204 using n = 8
data points, and the critical value is = 0.707. Can the line be used for
prediction? Why or why not?

Yes, the line can be used for prediction, because r < the negative
critical value.
:
Example

Suppose you computed the following correlation coefficients. Using the

table at the end of the chapter, determine if r is significant and the line
of best fit associated with each r can be used to predict a y value. If it
helps, draw a number line.

1. r = –0.567 and the sample size, n, is 19. The df = n – 2 = 17. The

critical value is –0.456. –0.567 < –0.456 so r is significant.
2. r = 0.708 and the sample size, n, is nine. The df = n – 2 = 7. The
critical value is 0.666. 0.708 > 0.666 so r is significant.
3. r = 0.134 and the sample size, n, is 14. The df = 14 – 2 = 12. The
critical value is 0.532. 0.134 is between –0.532 and 0.532 so r is
not significant.
4. r = 0 and the sample size, n, is five. No matter what the dfs are, r =
0 is between the two critical values so r is not significant.

try it

For a given line of best fit, you compute that r = 0 using n = 100 data
points. Can the line be used for prediction? Why or why not?

No, the line cannot be used for prediction no matter what the sample
size is.

Assumptions in Testing the Significance of the

Correlation Coefficient
Testing the significance of the correlation coefficient requires that
certain assumptions about the data are satisfied. The premise of this
:
test is that the data are a sample of observed points taken from a larger
population. We have not examined the entire population because it is
not possible or feasible to do so. We are examining the sample to draw
a conclusion about whether the linear relationship that we see between
x and y in the sample data provides strong enough evidence so that we
can conclude that there is a linear relationship between x and y in the
population.

The regression line equation that we calculate from the sample data
gives the best-fit line for our particular sample. We want to use this
best-fit line for the sample as an estimate of the best-fit line for the
population. Examining the scatterplot and testing the significance of
the correlation coefficient helps us determine if it is appropriate to do
this.

The assumptions underlying the test of significance are:

There is a linear relationship in the population that models the

average value of y for varying values of x. In other words, the
expected value of y for each particular value lies on a straight line
in the population. (We do not know the equation for the line for the
population. Our regression line from the sample is our best
estimate of this line in the population.)
The y values for any particular x value are normally distributed
about the line. This implies that there are more y values scattered
closer to the line than are scattered farther away. Assumption (1)
implies that these normal distributions are centered on the line: the
means of these normal distributions of y values lie on the line.
The standard deviations of the population y values about the line
are equal for each value of x. In other words, each of these normal
distributions of yvalues has the same shape and spread about the
:
line.
The residual errors are mutually independent (no pattern).
The data are produced from a well-designed, random sample or
randomized experiment.

The y values for each x value are normally distributed about the line
with the same standard deviation. For each x value, the mean of the y
values lies on the regression line. More y values lie near the line than
are scattered further away from the line.

Concept Review

𝑦^ = 𝑎 + 𝑏𝑥
Linear regression is a procedure for fitting a straight line of the form

to data. The conditions for regression are:

Linear: In the population, there is a linear relationship that models

the average value of y for different values of x.
Independent: The residuals are assumed to be independent.
:
Normal: The y values are distributed normally for any value of x.
Equal variance: The standard deviation of the y values is equal for
each x value.
Random: The data are produced from a well-designed random
sample or randomized experiment.

The slope b and intercept a of the least-squares line estimate the slope
β and intercept α of the population (true) regression line. To estimate
the population standard deviation of y, σ, use the standard deviation of
the residuals, s.

𝑆𝑆𝐸
𝑠=
√𝑛 − 2
The variable ρ (rho) is the population correlation coefficient.

To test the null hypothesis H0: ρ = hypothesized value, use a linear

regression t-test. The most common null hypothesis is H0: ρ = 0 which
indicates there is no linear relationship between x and y in the
population.

The TI-83, 83+, 84, 84+ calculator function LinRegTTest can perform
this test (STATS TESTS LinRegTTest).

Formula Review

𝑦^ = 𝑎 + 𝑏𝑥
Least Squares Line or Line of Best Fit:

where a = y-intercept, b = slope

Standard deviation of the residuals:

:
𝑆𝑆𝐸
𝑠=
√𝑛 − 2

where

SSE = sum of squared errors

n = the number of data points

Correlation, Regression and Test of Signficance in R
No ratings yet
Correlation, Regression and Test of Signficance in R
16 pages
Chapter 9 - Correlation and Regression
No ratings yet
Chapter 9 - Correlation and Regression
112 pages
DADM-Correlation and Regression
No ratings yet
DADM-Correlation and Regression
138 pages
Production Planning and Control
No ratings yet
Production Planning and Control
44 pages
Corr and Reg
No ratings yet
Corr and Reg
48 pages
Section 9.1
No ratings yet
Section 9.1
44 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
26 pages
Week 14 Correlation-And-Regression
No ratings yet
Week 14 Correlation-And-Regression
46 pages
Regression and Correlation
No ratings yet
Regression and Correlation
48 pages
Lecture 4 - Correlation and Regression
No ratings yet
Lecture 4 - Correlation and Regression
35 pages
Previous Week
No ratings yet
Previous Week
51 pages
6.1 Test For Single Mean: Assumptions
No ratings yet
6.1 Test For Single Mean: Assumptions
17 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
Lecture - Week 9 Statistical Inference
No ratings yet
Lecture - Week 9 Statistical Inference
40 pages
MATH 101-Week 7-8 - Lesson 4.1 Correlation & Regression Analysis
No ratings yet
MATH 101-Week 7-8 - Lesson 4.1 Correlation & Regression Analysis
53 pages
Mathematics: The Line of Best Fit
No ratings yet
Mathematics: The Line of Best Fit
24 pages
stAT AND PROB
No ratings yet
stAT AND PROB
26 pages
Testing A Correlation Coefficient's Significance: Using H: 0 U D O Is Preferable To H: U 0
No ratings yet
Testing A Correlation Coefficient's Significance: Using H: 0 U D O Is Preferable To H: U 0
14 pages
Stat
No ratings yet
Stat
17 pages
12.1correlation and Simple Linear
No ratings yet
12.1correlation and Simple Linear
45 pages
Chapter 9 All
No ratings yet
Chapter 9 All
73 pages
Correlation
100% (1)
Correlation
29 pages
Correlation & Regression
No ratings yet
Correlation & Regression
20 pages
Correlation and Regression
No ratings yet
Correlation and Regression
12 pages
Correlation and Regression
No ratings yet
Correlation and Regression
9 pages
2 T-Test
No ratings yet
2 T-Test
26 pages
Biostatistics Lect 7a - Correlation - 142021
No ratings yet
Biostatistics Lect 7a - Correlation - 142021
31 pages
Working With Relationships Between Two Variables - Size of Teaching Tip & Stats Test Score
No ratings yet
Working With Relationships Between Two Variables - Size of Teaching Tip & Stats Test Score
20 pages
Lecture 8 and 9 Regression Correlation and Index
No ratings yet
Lecture 8 and 9 Regression Correlation and Index
32 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
Module-4
No ratings yet
Module-4
35 pages
Unit 8.1 Correlation-Regression
No ratings yet
Unit 8.1 Correlation-Regression
38 pages
We Introduce The Linear Correlation Coefficient R
No ratings yet
We Introduce The Linear Correlation Coefficient R
14 pages
Correlation and Regression Original
No ratings yet
Correlation and Regression Original
44 pages
Lesson 13
No ratings yet
Lesson 13
21 pages
Chapter1-Introduction To Regression Analysis
No ratings yet
Chapter1-Introduction To Regression Analysis
12 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Notes - Correlation and Regression
No ratings yet
Notes - Correlation and Regression
26 pages
6) CorrelationAndRegression - 27
No ratings yet
6) CorrelationAndRegression - 27
5 pages
Microsoft PowerPoint Session 4 PDF
No ratings yet
Microsoft PowerPoint Session 4 PDF
86 pages
2 Correlation and Linear Regression PDF
No ratings yet
2 Correlation and Linear Regression PDF
26 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Test of Significance
No ratings yet
Test of Significance
13 pages
Unit6 - Spearman's and Kendall's Test
No ratings yet
Unit6 - Spearman's and Kendall's Test
5 pages
The Line of Best Fit: Lesson 19.2
No ratings yet
The Line of Best Fit: Lesson 19.2
34 pages
Session 4 Correlation and Regression
No ratings yet
Session 4 Correlation and Regression
81 pages
Chapter 8 - PSYC 284
No ratings yet
Chapter 8 - PSYC 284
7 pages
Regression and Correlation
No ratings yet
Regression and Correlation
23 pages
SolomonAntonioVisuyanTandoyBallartaGumbocAretanoNaive - Ed104 - Pearson R & Simple Regression - April 24, 2021
No ratings yet
SolomonAntonioVisuyanTandoyBallartaGumbocAretanoNaive - Ed104 - Pearson R & Simple Regression - April 24, 2021
13 pages
Modified Correlation
No ratings yet
Modified Correlation
53 pages
Partial Correlation
No ratings yet
Partial Correlation
2 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
Lecture 7 8 Weeks Correlation and Regression
No ratings yet
Lecture 7 8 Weeks Correlation and Regression
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
Explaining Psychological Statistics - 4th Edition ISBN 1118436601, 9781118436608 EPUB DOCX PDF Download
No ratings yet
Explaining Psychological Statistics - 4th Edition ISBN 1118436601, 9781118436608 EPUB DOCX PDF Download
16 pages
FIRO-B Technical Guide by The Myers-Briggs Company
No ratings yet
FIRO-B Technical Guide by The Myers-Briggs Company
74 pages
Research Paper Updated
100% (1)
Research Paper Updated
30 pages
Mumbai Educational Trust: MET Institute of Computer Science
No ratings yet
Mumbai Educational Trust: MET Institute of Computer Science
368 pages
Effectiveness of Performance Appraisal Systems and Their Impact On Employee Motivation and Productivity
No ratings yet
Effectiveness of Performance Appraisal Systems and Their Impact On Employee Motivation and Productivity
35 pages
Sir Syed University of Engineering & Technology
No ratings yet
Sir Syed University of Engineering & Technology
4 pages
Report Res511 Ap2253g
No ratings yet
Report Res511 Ap2253g
52 pages
Research Title
No ratings yet
Research Title
18 pages
Cs3353 Fds Unit 3 Notes Eduengg
No ratings yet
Cs3353 Fds Unit 3 Notes Eduengg
47 pages
Department of Commerce: Dhanraj Baid Jain College
No ratings yet
Department of Commerce: Dhanraj Baid Jain College
43 pages
Economics NEP Syllabus UG
No ratings yet
Economics NEP Syllabus UG
71 pages
Solution Manual For Kail, Children and Their Development, 7th Edition Download
No ratings yet
Solution Manual For Kail, Children and Their Development, 7th Edition Download
65 pages
Final Preboard Examination
No ratings yet
Final Preboard Examination
14 pages
Assorted Cement Brands Sales Performance Among The Shop Sellers - A Group Level Analysis in The Context With Chennai and Tiruvallur Districts
No ratings yet
Assorted Cement Brands Sales Performance Among The Shop Sellers - A Group Level Analysis in The Context With Chennai and Tiruvallur Districts
12 pages
Alem Kebede Tekle
No ratings yet
Alem Kebede Tekle
99 pages
Improving Teamwork in Agile Software Engineering Education - The ASEST
No ratings yet
Improving Teamwork in Agile Software Engineering Education - The ASEST
12 pages
III Sem - Syllabus
No ratings yet
III Sem - Syllabus
12 pages
Dbs3e PPT ch03
No ratings yet
Dbs3e PPT ch03
61 pages
07 Chapter - II
No ratings yet
07 Chapter - II
57 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
15 pages
Nov Dec 2023
No ratings yet
Nov Dec 2023
5 pages
Work Life Balance of Working Parents: A Study of IT Industry Saloni Pahuja
No ratings yet
Work Life Balance of Working Parents: A Study of IT Industry Saloni Pahuja
18 pages
Sheet 4
No ratings yet
Sheet 4
2 pages
Enhancing Motivation To Support Organizational Commitment of Global Human Resources Case Study of Sonatrach Company
No ratings yet
Enhancing Motivation To Support Organizational Commitment of Global Human Resources Case Study of Sonatrach Company
15 pages
Stats Crib Sheet Exam
No ratings yet
Stats Crib Sheet Exam
2 pages
II Semester MA221TC-Matlab Manual - II
No ratings yet
II Semester MA221TC-Matlab Manual - II
26 pages
Groundwater Vulnerability To Selenium in Semi-Arid Environments Amman Zarqa Basin, Jordan
No ratings yet
Groundwater Vulnerability To Selenium in Semi-Arid Environments Amman Zarqa Basin, Jordan
23 pages
Assignment
No ratings yet
Assignment
7 pages
The Role of Cognitive Emotion Regulation Strategie
No ratings yet
The Role of Cognitive Emotion Regulation Strategie
8 pages
KTEE309
No ratings yet
KTEE309
5 pages

Testing The Significance of The Correlation Coefficient

Uploaded by

Testing The Significance of The Correlation Coefficient

Uploaded by

Testing the Significance of the

The correlation coefficient, r, tells us about the strength and direction

We perform a hypothesis test of the “significance of the correlation

The symbol for the population correlation coefficient is ρ, the

If the test concludes that the correlation coefficient is significantly

If the test concludes that the correlation coefficient is not

Conclusion: “There is insufficient evidence to conclude that there is a

Performing the Hypothesis Test

What the Hypotheses Mean in Words

Null Hypothesis H0: The population correlation coefficient IS NOT

Method 1: Using the p-value

Using the p-value method, you could choose any appropriate

Method 1: Using a p-value to make a decision

On the LinRegTTEST input screen, on the line prompt for β or ρ,

If the p-value is less than the significance level (α = 0.05)

Decision: Reject the null hypothesis.

If the p-value is NOT less than the significance level (α = 0.05)

You will use technology to calculate the p-value. The following

An alternative way to calculate the p-value (p) given by LinRegTTest is

Method 2: Using a table of Critical Values to

Suppose you computed r = 0.801 using n = 10 data points.df = n – 2 =

r is not significant between -0.632 and +0.632. r = 0.801 > +0.632.

Suppose you computed r = –0.624 with 14 data points. df = 14 – 2 = 12.

Suppose you computed r = 0.776 and n = 6. df = 6 – 2 = 4. The critical

–0.811 < r = 0.776 < 0.811. Therefore, r is not significant.

Suppose you computed the following correlation coefficients. Using the

1. r = –0.567 and the sample size, n, is 19. The df = n – 2 = 17. The

Assumptions in Testing the Significance of the

The assumptions underlying the test of significance are:

There is a linear relationship in the population that models the

to data. The conditions for regression are:

Linear: In the population, there is a linear relationship that models

To test the null hypothesis H0: ρ = hypothesized value, use a linear

where a = y-intercept, b = slope

Standard deviation of the residuals:

SSE = sum of squared errors

n = the number of data points

You might also like