Correlation Coefficient

Statistics Dr.
Ansam Al-Obaidi
Correlation
The word “correlation” is a term used in everyday conversation to describe some
type of relationship between variables. You know that the amount you eat is
correlated with the amount that you weigh; that how hard you work in school is
correlated with how successful you'll be in life (well, maybe!).
In statistics, correlation has a precise definition. It's a measure of the strength of the
relationship between two variables.
Scatter Plots
Scatter Plots Before we take up the discussion of correlation, we need to examine a

way to display the relation between two variables x and y. The most common and
easiest way is a scatter plot. The following example illustrates a scatter plot.
Example
In Europe and Asia, m-commerce is popular. M-commerce users have special

mobile phones that work like electronic wallets as well as provide phone and Internet
services. Users can do everything from paying for parking to buying a TV set or
soda from a machine to banking to checking sports scores on the Internet. For the
years 2000 through 2004, was there a relationship between the year and the number
of m-commerce users? Construct a scatter plot. Let x = the year and let y = the
number of m-commerce users, in millions.
Statistics Dr. Ansam Al-Obaidi
(a) Table showing the number of m-commerce users (in millions) by year
(b) Scatter plot showing the number of m-commerce users (in millions) by year
Question
Amelia plays basketball for her high school. She wants to improve to play at the
college level. She notices that the number of points she scores in a game goes up in
response to the number of hours she practices her jump shot each week. She records
the following data:
Construct a scatter plot and state if what Amelia thinks appears to be true
A scatter plot shows the direction of a relationship between the variables. A clear
direction happens when there is either:
• High values of one variable occurring with high values of the other variable or low
values of one variable occurring with low values of the other variable.
• High values of one variable occurring with low values of the other variable.
You can determine the strength of the relationship by looking at the scatter plot and
seeing how close the points are to a line, a power function, an exponential function,
or to some other type of function. For a linear relationship there is an exception.
Consider a scatter plot where all the points fall on a horizontal line providing a
"perfect fit." The horizontal line would in fact show no relationship.
When you look at a scatterplot, you want to notice the overall pattern and any
deviations from the pattern. The following scatterplot examples illustrate these
concepts.
Pearson's Correlation Coefficient
The Pearson correlation coefficient (usually denoted as r) assumes that X and Y are
jointly distributed as bivariate normal, i.e. X and Y each are normally distributed,
and that they are linearly related. When these assumptions are not satisfied,
nonparametric versions can be used to estimate correlation. These include the
Spearman correlation coefficient.
Pearson correlation coefficient is a statistic that measures the strength and direction
of the linear relationship between two variables. The correlation coefficient does not
tell you anything about the cause-and-effect relationship between the variables. The
Pearson correlation coefficient ranges in value from -1 to +1. The absolute value of
the correlation coefficient tells you how strongly the variables are linearly related.
A value of either +1 or -1 means that you can perfectly predict the values of one
variable from the values of the other.
• If the correlation coefficient is +1, all points fall on a line with values of both
variables increasing together.
• If the correlation coefficient is -1, all points fall on a line but as values of one
variable increase the values of the other variable decrease.
• It is 0 when there is no linear relationship between two variables.
Figure below shows plots, correlation coefficients and summary lines for
correlations of different sizes.
Scatterplots with Correlation Coefficient
Computing Pearson's correlation coefficient (r)
Let’s take the table below as an example to calculate the correlation coefficient.
Pearson's correlation is computed by dividing the sum of the xy column (Σxy) by the
square root of the prod
uct of the sum of the x2 column (Σx2) and the sum of the y2 column (Σy2). The
resulting formula is: -
Therefore, r is:-
An alternative computational formula that avoids the step of computing deviation

scores is:
Example
Imagine that you’re studying the relationship between newborns’ weight and length.
You have the weights and lengths of the 10 babies born last month at your local
hospital. You enter the data in a table and find the value of the correlation coefficient
between weight and height.
Exercise:
The table below shows the marks obtained by a group of students, in two separate
tests.
The first test is out of 50 marks while the second test is out of 30 marks. Let x and y
represent the marks obtained in Test 1 and Test 2, respectively. Find the value of
the correlation coefficient between x and y.
When to use the Pearson correlation coefficient
The Pearson correlation coefficient (r) is one of several correlation coefficients that
you need to choose between when you want to measure a correlation. The Pearson
correlation coefficient is a good choice when all of the following are true:
Both variables are quantitative.
The variables are normally distributed: You can create a histogram of each
variable to verify whether the distributions are approximately normal. It’s not a
problem if the variables are a little non-normal.
The data have no outliers: Outliers are observations that don’t follow the same
patterns as the rest of the data. A scatterplot is one way to check for outliers.
The relationship is linear: “Linear” means that the relationship between the two
variables can be described reasonably well by a straight line. You can use a
scatterplot to check whether the relationship between two variables is linear or not.
Your scatterplot may look something like one of the following:
Pearson vs. Spearman’s rank correlation coefficients
The Spearman rank-order correlation coefficient (Spearman’s correlation) is a

nonparametric measure of the strength and direction of association that exists
between two ranked variables. The test is used for either ordinal variables or for
continuous data that has failed the assumptions necessary for conducting the
Pearson's product-moment correlation. It’s a better choice than the Pearson
correlation coefficient when one or more of the following is true:
The two variables should be measured on an ordinal, interval or ratio scale.
The variables aren’t normally distributed.
The data includes outliers.
The relationship between the variables is non-linear and monotonic. A

monotonic relationship is a relationship that does one of the following: (1) as the
value of one variable increases, so does the value of the other variable; or (2) as the
value of one variable increases, the other variable value decreases. Examples of
monotonic and non-monotonic relationships are presented in the diagram below.
Whilst there are a number of ways to check whether a monotonic relationship exists
between your two variables, it is suggested to create a scatterplot (for example using
SPSS Statistical software which is designed to undertake a range of statistical
procedures), where you can plot one variable against the other, and then visually
inspect the scatterplot to check for monotonicity. Your scatterplot may look
something like one of the following:
References
Introduction to Statistics
Online Edition
Primary author and editor: David M. Lane
Other authors: David Scott1, Mikki Hebl1, Rudy Guerra1, Dan Osherson1, and Heidi Zimmer2 1Rice
University; 2University of Houston, Downtown Campus
Introductory Statistics
SENIOR CONTRIBUTING AUTHORS
BARBARA ILLOWSKY, DE ANZA COLLEGE SUSAN DEAN, DE ANZA COLLEGE

Correlation Coefficient

Uploaded by

Copyright:

Available Formats

Correlation Coefficient

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Correlation Coefficient

Uploaded by

Copyright:

Available Formats

Statistics Dr.

Scatter Plots Before we take up the discussion of correlation, we need to examine a

In Europe and Asia, m-commerce is popular. M-commerce users have special

Pearson's Correlation Coefficient

Scatterplots with Correlation Coefficient

Computing Pearson's correlation coefficient (r)

An alternative computational formula that avoids the step of computing deviation

When to use the Pearson correlation coefficient

Both variables are quantitative.

Pearson vs. Spearman’s rank correlation coefficients

The Spearman rank-order correlation coefficient (Spearman’s correlation) is a

The two variables should be measured on an ordinal, interval or ratio scale.

The variables aren’t normally distributed.

The data includes outliers.

The relationship between the variables is non-linear and monotonic. A

You might also like