Correlation Coefficients: Find Pearson's Correlation Coefficient
Correlation Coefficients: Find Pearson's Correlation Coefficient
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2.
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 2::Multiply x and y together to fill the xy column. For example, row 1 would be
43 × 99 = 4,257.
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779
Step 3: Take the square of the numbers in the x column, and put the result in the
x2 column.
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481
Step 4: Take the square of the numbers in the y column, and put the result in the
y2 column.
Step 5: Add up all of the numbers in the columns and put the result at the
bottom.2 column. The Greek letter sigma (Σ) is a short way of saying “sum of.”
Σx = 247
Σy = 486
Σxy = 20,485
Σx2 = 11,409
Σy2 = 40,022
n is the sample size, in our case = 6
The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or 52.98%, which means the variables
have a moderate positive correlation.
Sample problem: test the significance of the correlation coefficient r = 0.565 using the critical
values for PPMC table. Test at α = 0.01 for a sample size of 9.
Step 1: Subtract two from the sample size to get df, degrees of freedom.
9–7=2
Step 2: Look the values up in the PPMC Table. With df = 7 and α = 0.01, the table value is
= 0.798
Step 3: Draw a graph, so you can more easily see the relationship.
r = 0.565 does not fall into the “reject” region (above 0.798), so there isn’t enough evidence to
state a strong linear relationship exists in the data.
r value =
0 No relationship
The images show that a strong negative correlation means that the graph has a downward slope
from left to right: as the x-values increase, the y-values get smaller. A strong positive correlation
means that the graph has an upward slope from left to right: as the x-values increase, the y-values
get larger.
A correlation coefficient gives you an idea of how well data fits a line or curve. Pearson wasn’t the
original inventor of the term correlation but his use of it became one of the most popular ways to measure
correlation.
Francis Galton (who was also involved with the development of the interquartile range) was the first
person to measure correlation, originally termed “co-relation,” which actually makes sense considering
you’re studying the relationship between a couple of different variables. In Co-Relations and Their
Measurement, he said “The statures of kinsmen are co-related variables; thus, the stature of the father is
correlated to that of the adult son and so on; but the index of co-relation … is different in the different
cases.” It’s worth noting though that Galton mentioned in his paper that he had borrowed the term from
biology, where “Co-relation and correlation of structure” was being used but until the time of his paper it
hadn’t been properly defined.
In 1892, British statistician Francis Ysidro Edgeworth published a paper called “Correlated Averages,”
Philosophical Magazine, 5th Series, 34, 190-204 where he used the term “Coefficient of Correlation.” It
wasn’t until 1896 that British mathematician Karl Pearson used “Coefficient of Correlation” in two
papers: Contributions to the Mathematical Theory of Evolution and Mathematical Contributions to the
Theory of Evolution. III. Regression, Heredity and Panmixia. It was the second paper that introduced the
Pearson product-moment correlation formula for estimating correlation.