0% found this document useful (0 votes)

15 views8 pages

Correlation Coefficient

Correlation is a statistical measure that quantifies the degree of linear association between two variables, with types including positive, negative, and no correlation. The correlation coefficient, r, ranges from -1 to +1, indicating the strength and direction of the relationship, while Pearson's r specifically measures linear relationships and is sensitive to outliers. Alternatives to Pearson's correlation include Spearman rank and Kendall Tau, which can assess non-linear relationships.

Uploaded by

sanjeeva jayasuriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views8 pages

Correlation Coefficient

Uploaded by

sanjeeva jayasuriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CORRELATION AND PEARSON’S COEFFICIENT

(Alba González)

CORRELATION

WHAT IS CORRELATION?

Correlation is a statistical measure that quantifies the degree of linear association between two
variables, indicating how they change together at a constant rate. This tool is frequently used to
describe simple relationships without making a statement about cause and effect. Correlations describe
data moving together; thus, they are useful for describing simple relationships among data.

TYPES OF CORRELATION

There is a positive linear correlation when the variable on the x-axis increases as the variable on
the y-axis increases. For example, most of the time there is an increase in between a person’s education
level and their family income.

A negative linear correlation is found when one variable increases as the other variable decreases.
This is shown by a downwards-sloping straight regression line. For example, the longer time it takes
a worker to reach their workplace, the lower the job satisfaction is.

No correlation implies that there is no pattern that can be detected between the variables. For example,
the amount of ice cream sold at the number of shark attacks.

There is a non-linear correlation when there is a relationship between variables, but the relationship
is not linear (straight), although this is not the focus of this project. For example, the implantation of

1
new technology might follow an S-shaped growth pattern – slow adaptation onset, followed by a rapid
increase, and a plateau state when it is fully established.

CORRELATION COEFFICIENT

HOW IS CORRELATION MEASURED? Size of r Interpretation

It is measured by the correlation coefficient, r, which is a 0.9 to 1.00 Very high correlation
parameter that quantifies the strength and direction of a linear 0.70 to 1.89 High correlation
relationship between two variables. It ranges from -1 to +1, 0.50 to 0.69 Moderate correlation
the closer to these values, the stronger the positive or negative
0.30 to 0.49 Low correlation
relationship. If the correlation coefficient is 0, it means that
there is no relationship. 0.00 to 0.29 Little if any correlation

SCATTERPLOTS

We can use scatterplots to visualize correlations. The correlation coefficient, r, shows how close the
point in the scatterplot comes to a linear relationship; the stronger the relationship or bigger the r
values, the closer to the line in which we want to fit the data.

PEARSON’S COEFFICIENT

Pearson’s r measures the strength and direction (decreasing or increasing, depending on the sign) of a
linear relationship between two variables X and Y can be defined as:

Where:

- Xi and Yi are individual data points.

- and are the means of two variables

Strength refers to how one variable will change due to the change in the other. The closer to +1 and -
1, the stronger the relationship. In a scatterplot, the values will lie closer to the line.

Direction indicates a positive linear or negative linear relationship between variables. In the scatterplot,
if the slope goes up is positive, and if it goes down then it is negative.

INTERPRETATION

2
The closer the scatterplots lie next to the line, the stronger the relationship between variables. The
further they move from the line, the weaker the relationship.

For example:

This scatterplot corresponds to a small negative correlation, as the

values do not lie close to the straight line. The change in one variable
is inversely proportional to the change in the other variable, as the
slope is negative.

It is important to understand that the negative correlation should not be mistaken with no correlation,
for instance, if the Pearson coefficient is -0.9 it indicates a higher correlation than +0.7, and a
correlation of +0.8 is not better than -0.8.

PROPERTIES

LINK WITH THE COEFFICIENT OF DETERMINATION

The square of the Pearsons' correlation coefficient, R2, also known as the coefficient of determination,
explains the percentage of variation of one variable that is explained by the variations of the other
variable.

Looking at the scatterplot:

The Pearson correlation coefficient (r) is 0.767, this is

a close value to +1, so we can conclude that there is a
strong correlation between life expectancy and years
of schooling. The R2 is 0.59, which is the result of
0.772. For a statistical point of view, a linear
regression model can predict 59% of the variations of
life expectancy based on the schooling durations.

SYMMETRY

This coefficient is symmetric, meaning that either x or y can be expressed as a function of the other;
thus, flipping the axis will not affect the Pearson r.

Using the previous example, the mean years of schooling can be plotted against the life expectancy
and the value of r and R2 remains unchanged.

3
INSENSITIVITY TO SCALE AND LOCATION

The Pearson's coefficient does not have any specific unit, meaning it lacks dimension. Therefore,
multiplying x or y by a negative/positive number, or adding/subtracting, it does not have any impact
on the outcome; there will be different location of the values in the scatterplot, but the correlation
remains constant.

In these scatterplots, the x-axis values are different. Miles were obtained by dividing by 1.61, and
minutes by multiplying by 60. The overall result shows that the slope and the values are different, but
the correlation remains constant.

CONDITIONS FOR EXISTENCE

To apply the Pearson correlation coefficient, the following condition must be satisfied:

- The variables must be measured at the interval or ratio level; thus, x and y variables are
quantitative and are expressed as real numbers.
- The data should be organized in paired observations and shown in a 2-column value: x value
with its corresponding y value.
- The variance and covariance of x and y must be defined, and the variances must be non-null.

PROBLEMS WITH PEARSON’S CORRELATION COEFFICIENT

LINEARITY ASSUMPTION

4
Pearson correlation assumes a linear correlation between
variables, thus, if the relationship is non-linear, the
coefficient will not accurately represent the association.
This is seen when the correlation coefficient is weak, we
may conclude that there is no relationship between the two
variables, however, the issue may be that the relationship
is non-linear.

HIGH CORRELATION DOES NOT IMP LY LINEAR

RELATIONSHIP

If we obtain an r=0.82, most of the people will plot a straight regression line. But it might represent
something like this:

SENSITIVE TO OUTLIERS

To observe this, there is an example measuring the relationship between life expectancy and health
expenditure.

Pearson’s coefficient is 0.54 with the outlier, while the coefficient without outlier is 0.71. Therefore,
visually, the scatterplot values fit better to the line. Meaning that one single outlier changes noticeably
the outcome of the analysis.

5
This leads to the following question, should outliers be excluded from the analysis?

It will depend on the context and what the outlier represents, or the sample size – if it is large enough,
outliers are expected to be seen in the analysis. Therefore, keep the outliers if they represent data of
the population studied, and remove them if they appear due to experimental or measurement errors, or
if there is any significant reason why they should be excluded.

In the context of the example, is there a good reason to remove the outlier? Not really, there is no sign
that indicates the removal of the outlier; in fact, it seems more relevant to consider.

ESTIMATOR BIAS
The Pearson’s correlation coefficient can slightly underestimate the absolute value for a population,
especially on small sample sizes. This produces a bias, noticeable around the absolute range value
(0.5-0.7).

CORRELATION IS NOT CAUSATION

Observing a high correlation between two variables does

not imply a causal relationship where the value of one
variable directly influences the other. In some situations,
even when this Pearson r is high, the correlation may be
coincidental, this is known as spurious correlations.

In this example, the consumption of mozzarella cheese

per capita and the number of Civil engineering
doctorates awarded in the US. In these scatterplots the
correlation is remarkably high, but we can safely assume
that there is no casualty at play.

ALTERNATIVES TO THE PEARSON CORRELATION COEFFICIENT

The Spearman rank correlation coefficient serves as an alternative to the previous coefficient
mentioned, the Pearson correlation coefficient. The spearman rank is the Pearson correlation
coefficient calculated between the ranks of the x and t values. For instance, we set the smallest value
of the x axis as 1, the next value 2, etc., and in the y axis we do the same. This means that the Spearman
rank correlation is just the Pearson correlation between the two new list of values.

We also find the Kendall Tau which is another rank-based correlation. In this case, it is a non-
parametric procedure, thus, the data obtained does not have to be normally distributed, compared to
Pearson’s correlation which is parametric. This alternative is preferred over Spearman’s when there is
truly little data.

Both alternatives are useful for assessing increases in a y variable according to changes in a x variable,
being the main advantage compared to Pearson’s, is that we can see the correlation even when the
relationship is not linear. When the values of one variable consistently move in the same direction as
the other, this phenomenon is known as monotonic relationships.

6
In the following examples we can compare the alternatives and the Pearson’s correlation:

The Spearman rank correlation and Kendall Tau values are 1, meaning a perfectly monotonic
relationship when positive, and the same would happen for a value of -1. In contrast, the Pearson
correlation coefficient is 0.86 in the first plot and 0.85 in the second sample as it is affected by the non-
linear character of the relationship between x and y. But what would occur when the relationship is
not monotonic?

As predicted, the values of Spearman rank and Kendall Tau are influenced by the non-monotonic
relationship between x and y. Furthermore, similar to Pearson’s coefficient, they cannot detect the
association between the two variables in the second plot.

7
REFERENCES
- https://www.jmp.com/en_au/statistics-knowledge-portal/what-is-
correlation.html#:~:text=Correlation%20is%20a%20statistical%20measure,statement%20about%20cause%20a
nd%20effect.
- https://www.ncl.ac.uk/webtemplate/ask-assets/external/maths-resources/statistics/regression-and-
correlation/types-of-correlation.html
- https://www.questionpro.com/blog/pearson-correlation-
coefficient/#what_is_the_pearson_correlation_coefficient?
- https://medium.com/@anthony.demeusy/pearson-correlation-methodology-limitations-alternatives-part-3-
alternatives-cc2a56f7ad1f

Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
FULL Guideline SOGC 2005 2016
100% (1)
FULL Guideline SOGC 2005 2016
756 pages
Lesson 10 Relationship Between Variables
No ratings yet
Lesson 10 Relationship Between Variables
85 pages
Chapter 6 Correlation and Regression
No ratings yet
Chapter 6 Correlation and Regression
29 pages
The Significance of Correlation
No ratings yet
The Significance of Correlation
6 pages
Module - 2 Correlation Analysis: Contents: 2.2 Types of Correlation
No ratings yet
Module - 2 Correlation Analysis: Contents: 2.2 Types of Correlation
7 pages
Correlation
No ratings yet
Correlation
33 pages
Lecture 13 Correlation Chapter 12 Part 1
No ratings yet
Lecture 13 Correlation Chapter 12 Part 1
20 pages
Topic 4.5 Correlational Analysis
No ratings yet
Topic 4.5 Correlational Analysis
28 pages
CORRELATION and hOW TO FIND VALUE OF CORRELATION COEFFICIENT
No ratings yet
CORRELATION and hOW TO FIND VALUE OF CORRELATION COEFFICIENT
12 pages
Correlational Analysis Pearson R and Spearman's Rank
No ratings yet
Correlational Analysis Pearson R and Spearman's Rank
12 pages
Pearson R
No ratings yet
Pearson R
48 pages
CORRELATION
No ratings yet
CORRELATION
61 pages
Unit 2 Correlation Analysis: 2.1. Definition
No ratings yet
Unit 2 Correlation Analysis: 2.1. Definition
9 pages
Correlation BMLT
No ratings yet
Correlation BMLT
5 pages
Lecture No 04 - Stats - 3!5!24
No ratings yet
Lecture No 04 - Stats - 3!5!24
26 pages
Lecture10 Correlation
No ratings yet
Lecture10 Correlation
13 pages
MRS - Diana-Correlation Analysis-Notes
No ratings yet
MRS - Diana-Correlation Analysis-Notes
16 pages
Correlation
100% (1)
Correlation
78 pages
Pearson Correlation Coefficient
No ratings yet
Pearson Correlation Coefficient
4 pages
Correlation and Regression Analysis PDF
No ratings yet
Correlation and Regression Analysis PDF
11 pages
Correlation Notes
No ratings yet
Correlation Notes
8 pages
Correlation Analysis
No ratings yet
Correlation Analysis
48 pages
Correlation and Dependence: Navigation Search
No ratings yet
Correlation and Dependence: Navigation Search
7 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
14 pages
Presentation On: Correlation and Rank Correlation: Submitted To
100% (3)
Presentation On: Correlation and Rank Correlation: Submitted To
23 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
11 pages
Correlation Constant
No ratings yet
Correlation Constant
23 pages
Correlation D 17
No ratings yet
Correlation D 17
8 pages
Correlation
No ratings yet
Correlation
18 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
23 pages
Correlation and Regression Analysis
100% (1)
Correlation and Regression Analysis
59 pages
Correlation
No ratings yet
Correlation
20 pages
Correlation 1
No ratings yet
Correlation 1
7 pages
Correlation Analysis and Its Types
No ratings yet
Correlation Analysis and Its Types
50 pages
Angilika
No ratings yet
Angilika
4 pages
Statistical Inference - II
No ratings yet
Statistical Inference - II
171 pages
SOCI1005 - Correlation and Regression
No ratings yet
SOCI1005 - Correlation and Regression
36 pages
Stats Unit 2
No ratings yet
Stats Unit 2
24 pages
Correlation 1
100% (1)
Correlation 1
57 pages
CORRELATION
No ratings yet
CORRELATION
4 pages
Correlation SBC
No ratings yet
Correlation SBC
4 pages
Correlation Analysis
No ratings yet
Correlation Analysis
52 pages
Biostatistics PPT - 6
No ratings yet
Biostatistics PPT - 6
35 pages
Correlation N Regression
No ratings yet
Correlation N Regression
25 pages
16.. Correlation Analysis - Michael
No ratings yet
16.. Correlation Analysis - Michael
25 pages
Correlation and Its Significance
No ratings yet
Correlation and Its Significance
15 pages
Linear Correlation (Pearson) : Assumptions
No ratings yet
Linear Correlation (Pearson) : Assumptions
2 pages
Correlation Rev 1.0
No ratings yet
Correlation Rev 1.0
5 pages
Lesson 6.2 Correlation and Regression Analysis Final Edition
No ratings yet
Lesson 6.2 Correlation and Regression Analysis Final Edition
8 pages
Pearson and Correlation
No ratings yet
Pearson and Correlation
8 pages
Statistics: Correlation: 2.1 Interpreting A Scatterplot
No ratings yet
Statistics: Correlation: 2.1 Interpreting A Scatterplot
8 pages
202003241550009941rajeev Pandey Correlation Research
No ratings yet
202003241550009941rajeev Pandey Correlation Research
87 pages
11 Correlation
No ratings yet
11 Correlation
28 pages
Correlation
No ratings yet
Correlation
19 pages
Peter
No ratings yet
Peter
48 pages
CORRELATION & REGRESSION Notes For Mba
100% (1)
CORRELATION & REGRESSION Notes For Mba
62 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Beginner’s Guide to Correlation Analysis: Bite-Size Stats, #4
From Everand
Beginner’s Guide to Correlation Analysis: Bite-Size Stats, #4
Lee Baker
No ratings yet
Cross Correlation: Unlocking Patterns in Computer Vision
From Everand
Cross Correlation: Unlocking Patterns in Computer Vision
Fouad Sabry
No ratings yet
Born in the year 1959: Astrological character profiles for every day of the year
From Everand
Born in the year 1959: Astrological character profiles for every day of the year
Christoph Däppen
No ratings yet
7856
No ratings yet
7856
7 pages
TEC in Language Learning
No ratings yet
TEC in Language Learning
21 pages
TPLS, 14
No ratings yet
TPLS, 14
5 pages
AAFU Volume 49 Issue 6 Page 371 395
No ratings yet
AAFU Volume 49 Issue 6 Page 371 395
25 pages
Bodhi
No ratings yet
Bodhi
5 pages
Salcedo II Vs Comelec
No ratings yet
Salcedo II Vs Comelec
5 pages
Marketing Mix Mcdonald
No ratings yet
Marketing Mix Mcdonald
3 pages
Portugal v. Portugal-Beltran, G.R. No. 155555, 16 August 2005
No ratings yet
Portugal v. Portugal-Beltran, G.R. No. 155555, 16 August 2005
10 pages
Niact 2
No ratings yet
Niact 2
25 pages
Bca 303
No ratings yet
Bca 303
16 pages
Case Details SBI
No ratings yet
Case Details SBI
7 pages
RCC CE Capstone-Outline
No ratings yet
RCC CE Capstone-Outline
1 page
Management Reporting System and Its Evaluation
75% (4)
Management Reporting System and Its Evaluation
6 pages
Afmp 2011-2017
100% (2)
Afmp 2011-2017
263 pages
R-C101C Manual EU Verision
No ratings yet
R-C101C Manual EU Verision
44 pages
Shouding: 1Mhz, 2A Step-Up Current Mode PWM Converter
No ratings yet
Shouding: 1Mhz, 2A Step-Up Current Mode PWM Converter
10 pages
Flyer Filter Sleeves
No ratings yet
Flyer Filter Sleeves
1 page
TURBOMAX Residential Sizing Guide
No ratings yet
TURBOMAX Residential Sizing Guide
3 pages
RESUME - Payam Rahrow
No ratings yet
RESUME - Payam Rahrow
2 pages
English Assignment
No ratings yet
English Assignment
11 pages
IADC-SPE-184628-MS - Drill Bit Connections A Time For Change
No ratings yet
IADC-SPE-184628-MS - Drill Bit Connections A Time For Change
10 pages
Neural Information Processing: Teddy Mantoro Minho Lee Media Anugerah Ayu Kok Wai Wong Achmad Nizar Hidayanto
No ratings yet
Neural Information Processing: Teddy Mantoro Minho Lee Media Anugerah Ayu Kok Wai Wong Achmad Nizar Hidayanto
703 pages
Automotive Servicing NC Ii Jay Christian T Agsalon
No ratings yet
Automotive Servicing NC Ii Jay Christian T Agsalon
3 pages
Documentation
No ratings yet
Documentation
52 pages
Zebra
No ratings yet
Zebra
4 pages
The Standard - 2014-07-30
0% (1)
The Standard - 2014-07-30
71 pages
Conservation Equations and Modeling of Chemical and Biochemical Processes 1st Edition Said S.E.H. Elnashaie Download
No ratings yet
Conservation Equations and Modeling of Chemical and Biochemical Processes 1st Edition Said S.E.H. Elnashaie Download
63 pages
Distributed Systems Lab 10
No ratings yet
Distributed Systems Lab 10
24 pages
LangChain Academy - Introduction To LangGraph - Motivation
No ratings yet
LangChain Academy - Introduction To LangGraph - Motivation
17 pages
Quick Details
No ratings yet
Quick Details
2 pages
MEP Myanmar
No ratings yet
MEP Myanmar
27 pages
Tydings-Kosciasowzki Law
No ratings yet
Tydings-Kosciasowzki Law
127 pages
Environmental Laws Chapter 3 1
No ratings yet
Environmental Laws Chapter 3 1
4 pages
Manuf Sustainability 2023-24 Preset2 Manuf Sustainability en
No ratings yet
Manuf Sustainability 2023-24 Preset2 Manuf Sustainability en
98 pages

Correlation Coefficient

Uploaded by

Correlation Coefficient

Uploaded by

CORRELATION AND PEARSON’S COEFFICIENT

HOW IS CORRELATION MEASURED? Size of r Interpretation

- Xi and Yi are individual data points.

This scatterplot corresponds to a small negative correlation, as the

LINK WITH THE COEFFICIENT OF DETERMINATION

Looking at the scatterplot:

The Pearson correlation coefficient (r) is 0.767, this is

CONDITIONS FOR EXISTENCE

PROBLEMS WITH PEARSON’S CORRELATION COEFFICIENT

HIGH CORRELATION DOES NOT IMP LY LINEAR

CORRELATION IS NOT CAUSATION

Observing a high correlation between two variables does

In this example, the consumption of mozzarella cheese

ALTERNATIVES TO THE PEARSON CORRELATION COEFFICIENT

You might also like