LEARNING OBJECTIVES:
At the end of this lesson, the student is expected to:
1. Apply a variety of statistical tools to process and manage numerical data;
2. Use the methods of linear regression and correlations to predict the value of a variable
given certain conditions; and
3. Recognize the importance of statistical analyses in making decisions.
CORRELATION
Correlation is a statistic that measures the degree to which two variables move in relation to
each other.
In finance, the correlation can measure the movement of a stock with that of a benchmark
index, such as the S&P 500.
Correlation measures association, but doesn't show if x causes y or vice versa, or if the
association is caused by a third–perhaps unseen–factor.
Correlation, in the finance and investment industries, is a statistic that measures the degree to
which two securities move in relation to each other. Correlations are used in advanced portfolio
management, computed as the correlation coefficient, which has a value that must fall between
-1.0 and +1.0.
The Formula for Correlation is
(X X)(Y Y)
r Pearson Product-Moment Correlation
(X X)2 (Y Y)2
Where: r = correlation coefficient
X = average of observations of variable x
Y = average of observations of variable y
Example of Correlation
Investment managers, traders, and analysts find it very important to calculate
correlation because the risk reduction benefits of diversification rely on this statistic. Financial
spreadsheets and software can calculate the value of correlation quickly.
As a hypothetical example, assume that an analyst needs to calculate the correlation for
the following two data sets:
X: (41, 19, 23, 40, 55, 57, 33)
Y: (94, 60, 74, 71, 82, 76, 61)
There are three steps involved in finding the correlation. The first is to add up all the X
values to find SUM(X), add up all the Y values to fund SUM(Y) and multiply each X value with its
corresponding Y value and sum them to find SUM(X,Y):
SUM(X) = (41 + 19 + 23 + 40 + 55 + 57 + 33) = 268
SUM(Y) = (94 + 60 + 74 + 71 + 82 + 76 + 61) = 518
SUM(X,Y) = (41 x 94) + (19 x 60) + (23 x 74) + ... (33 x 61) = 20,391
The next step is to take each X value, square it, and sum up all these values to
find SUM(x^2). The same must be done for the Y values:
SUM(X²) = (41²) + (19²) + (23²) + ... (33²) = 11,534
SUM(Y²) = (94²) + (60²) + (74²) + ... (61²) = 39,174
Noting that there are seven observations, n, the following formula can be used
to find the correlation coefficient, r:
n (X,Y) (X) (Y)
r
n (X)2 (X)2 n (Y)2 (Y)2
In this example, the correlation would be:
7 20,391 (268 518)
r
7 11,534 2682 7 39,174 5182
3,913
r=
7,248.4
r = 0.54
Definitions:
1. Two variables are positively correlated if the values of the two variables both increase or both
decrease.
2. Two variables are negatively correlated if the values of one variable increase while the values of
the other decrease.
3. Two variables are not correlated or they have zero correlation if one variable neither increases
nor decreases while the other increases.
PEARSON PRODUCT-MOMENT CORRELATION
Fortunately, Karl Pearson invented a formula than can give a numerical value to the measure of
a correlation. This formula does not only show how greatly two data sets are correlated but also reveals
if the correlation is direct or inverse, or if the data sets are not correlated. The formula named after him
is called Pearson Product-Moment Correlation.
(X X)(Y Y)
r
(X X)2 (Y Y)2
Karl Pearson (1857-1936)
He was an influential English mathematician and biostatistician. In 1911, he founded the world’s
statistics department of the University College London, and contributed significantly to the field of
biometrics, meteorology, social Darwinism and Eugenics.
Interpreting Pearson Product-Moment Correlation
1 representing a strong positive relationship
-1 representing a strong negative relationship
0 no relationship
Example 1: Ice Cream Sales
The local ice cream shop keeps track of how much ice cream they sell versus the
temperature on that day, here are their figures for the last 12 days: a = x, b = y
Example 2: Height & Weights
The following table gives the heights and weights of 10 friends:
Find the coefficient of correlation.
NAME Height (cm) Weight (kg) (X X) (Y Y) (X X)2 (Y Y)2 (X X)(Y Y)
ALBERT 180 87 10.5 17.1 110.25 292.41 179.55
BETH 176 55 6.5 -14.9 42.25 222.01 -96.85
CINDY 144 52 -25.5 -17.9 650.25 320.41 456.45
DAVID 195 94 25.5 24.1 650.25 580.81 614.55
EMILY 159 87 -10.5 17.1 110.25 292.41 -179.55
FRANK 185 79 15.5 9.1 240.25 82.81 141.05
GARY 166 59 -3.5 -10.9 12.25 118.81 38.15
HELEN 173 64 3.5 -5.9 12.25 34.81 -20.65
IDA 149 45 -20.5 -24.9 420.25 620.01 510.45
JEREMY 168 77 -1.5 7.1 2.25 50.41 -10.65
169.5 69.9 2250.5 2614.9 1632.500
(X X)(Y Y) 1632.50
r = r = 0.67
(X X)2
(Y Y)2
2250.5 2614.9
Example 3: Grades
Find the coefficient of correlation.
Math Score English Score (X X) (Y Y) (X X)2 (Y Y)2 (X X)(Y Y)
4 5 -0.8 0 0.64 0 0.00
5 4 0.2 -1 0.04 1 -0.20
9 8 4.2 3 17.64 9 12.60
2 3 -2.8 -2 7.84 4 5.60
8 9 3.2 4 10.24 16 12.80
1 2 -3.8 -3 14.44 9 11.40
2 1 -2.8 -4 7.84 16 11.20
7 6 2.2 1 4.84 1 2.20
6 7 1.2 2 1.44 4 2.40
4 5 -0.8 0 0.64 0 0.00
4.8 5 65.6 60.0 58.0
(X X)(Y Y) 58.0
r = r = 0.92
(X X) 2
(Y Y) 2
65.6 60.0
TASK 4
Submit the e-file or scanned docs in the GClassroom
(Task #4 submission/attachment under Announcement)
QA. Find the correlation coefficient (5 points), and answer the missing parts. (total = 20 points)
COUNTRY COVID CASES DEATH (X X) (Y Y) (X X)2 (Y Y)2 (X X)(Y Y)
Indonesia 59,968 30,770
Philippines 23,011 10,874
Vietnam 191 35 1 9
Thailand 212 79 4 10
Myanmar 1,406 1,146
Malaysia 2,229 809 5
Cambodia 46 0 2 11 13
Laos 4 0 6 7 14
Singapore 59 29 12
Brunei 18 3 3 8 15
r = #16
QB. Find the correlation coefficient (5 points), and answer the missing parts. (total = 15 points)
NAME Calculus II Calculus III (X X) (Y Y) (X X)2 (Y Y)2 (X X)(Y Y)
ADRIAN 85 78 24
SHERWIN 81 75 17 20 28
GERALD 90 76 25
RAUL 84 77 19 21 29
NIKKO 83 75 26
JAYSON 85 76 18 22 30
FRANCIS 82 75 27
84.3 76 23 31
r = #32