0% found this document useful (0 votes)
12 views54 pages

Mod 3a

The document discusses correlation analysis and different measures of correlation. It covers topics like Karl Pearson's coefficient of correlation, significance of measuring correlation, scatter diagrams, and interpreting correlation coefficients. Examples are provided to explain correlation and various measures that can quantify the strength and direction of association between variables.

Uploaded by

kamleshmisra22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views54 pages

Mod 3a

The document discusses correlation analysis and different measures of correlation. It covers topics like Karl Pearson's coefficient of correlation, significance of measuring correlation, scatter diagrams, and interpreting correlation coefficients. Examples are provided to explain correlation and various measures that can quantify the strength and direction of association between variables.

Uploaded by

kamleshmisra22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

BUSINESS STATISTICS AND RESEARCH

MODULE 3:

DAY 5 / 6
BRMS Team
Decision Science Area
Karl Pearson’s coefficient of Correlation CMS Business School
Jain (Deemed-to-be University)

bschool.cms.ac.in
AGENDA- DAY 5 / 6
• Introduction
• Comparative Analysis
• Correlation Analysis
• Measures of correlation
• Significance of measuring correlation
• Karl Pearson’s Correlation coefficient

bschool.cms.ac.in
Introduction

• In simple terms, Association can be understood as


connection/relationship.
• We have built our professions in an environment where
understanding the association between a lot of entities has
become vital.
• Consciously or otherwise, we work with a lot of variables in our
daily routine and thrive to understand the association between
many of them.

bschool.cms.ac.in
Introduction

• Let’s take few examples as below:


• Is there any association between the number of hours an employee logs
in everyday and the productivity?
• Is there be any association between the demand of my product/service
and the Covid-19 pandemic outbreak?
• As people age, will their maturity improve?
• Will the end-to-end travel time decrease if the frequency of the public
transport increases?

bschool.cms.ac.in
Introduction

• Also, there are other questions like:


• If the USD exchange value decreases, will the demand for my
product/service decrease? If yes, by what extent?
• Will there be increase in the attrition rate in my organization if there are
no performance based bonuses given? If yes by how much?

bschool.cms.ac.in
Introduction

• We can find the answers for the above questions by measuring


the association between the variables.
• This means, we need to identify the following details:
• Is there any association between the selected variables?
• If yes, then in what direction – positive or negative?
• If yes, then how strong the relationship is?
• What is the magnitude of the relationship?

bschool.cms.ac.in
Introduction

• Most popular measures of association are correlation analysis


and regression analysis.
• Correlation analysis helps in understanding the strength and
direction of the association whereas regression analysis helps in
understanding the magnitude of the association between given
variables.
• Correlation analysis can be conducted by measuring coefficient
of correlation between the variables.

bschool.cms.ac.in
Introduction

• Karl Pearson’s coefficient of correlation can be used to


measure the strength and direction of the association.
• Spearman’s rank correlation coefficient can be used to
measure the association when the data is ordinal.

bschool.cms.ac.in
Introduction

• Regression analysis can be conducted to understand the


magnitude of relationship between the variables.
• This can be done by arriving at a linear equation (also known as
modelling) that states the relation between a dependent variable
and independent variable(s).
• In this module, the above measures of association will be
dealt-with.

bschool.cms.ac.in
Comparative Analysis

• We live in an environment full of variables.


• Some we understand and most we don’t.
• The environment we progress in demands for maximum
understanding of these variables.
• One of the major requirements in understanding these variables
is their association.

bschool.cms.ac.in
Comparative Analysis

• There is a need to answer the questions:


• How are the variables around us related to each other?
• Are they really related to each other?
• If yes then how strong is their relationship or in which way they are
related?

bschool.cms.ac.in
Correlation Analysis

• More often, an analysis of data concerning two or more


quantitative variables is needed to look for any statistical
relationship or association between them.
• The knowledge of such relationship is important to make
inferences in a given situation.

bschool.cms.ac.in
Correlation Analysis

• Consider an example:
• Typically, in the summer as the temperature increases people
are thirstier.
• Consider the two numerical variables, temperature and water
consumption.
• We would expect the higher the temperature, the more water a
given person would consume.
• Thus we would say that in the summer, temperature and water
consumption are positively correlated.

bschool.cms.ac.in
Correlation Analysis
• Consider another example:
• For seven random summer days, a person recorded the
temperature and their water consumption, during a three-hour
period spent outside.
Water Consumption
Temperature (F)
(Ounces)
75 16
83 20
85 25
85 27
92 32
97 48
99 48 bschool.cms.ac.in
Correlation Analysis
• The graph below helps visualize what appears to be a somewhat
linear relationship between temperature and the amount of
water one drinks.

bschool.cms.ac.in
Correlation Analysis

• Similarly, we come across various examples in our daily life like,


• Budget on ration and number of visitors / special occasions at home
• Family income and expenditure on luxury items
• Frequency of smoking and lung damage
• Age and sign legibility distance
• No. of occupants in a hotel and its water / electricity consumption
• The list is never ending.
• Hence, correlation can be defined as “a measure of association
between two numerical variables”.

bschool.cms.ac.in
Significance of Measuring Correlation

• Correlation analysis contributes to the understanding of


economic behavior, aids in locating the critically important
variables on which others depend, may reveal to the economist
the connections by which disturbances spread and suggest to
him the paths through which stabilizing forces may become
effective.
• The effect of correlation is to reduce the range of uncertainty of
our prediction. The prediction based on correlation analysis will
be more reliable and near to reality.

bschool.cms.ac.in
Correlation Coefficient

• Sample Correlation Coefficient, ‘r’, measures the direction and


the strength of the linear association between two numerically
paired variables.
• It varies between +1 and -1 (-1 ≤ r ≤ +1)
• The values can be interpreted as mentioned in the tables in the
next slide.
• Direction of the Association: The association can be either
positive or negative.

bschool.cms.ac.in
Correlation Coefficient
• Positive Correlation: As the ‘X’ variable increases so does the ‘Y’
variable.
r value Positive Correlation Interpretation

+1 Perfect positive linear relationship

0 No linear relationship

0.9 Strong Positive Association

0.5 Moderate Positive Association

0.25 Weak Positive Association

Example: In the summer, as the temperature increases, so does thirst.


bschool.cms.ac.in
Correlation Coefficient
• Negative Correlation: As the ‘X’ variable increases, the ‘Y’ variable
decreases.
r value Negative Correlation Interpretation

-1 Perfect Negative linear relationship

-0.9 Strong Negative Association

-0.5 Moderate Negative Association

-0.25 Weak Negative Association

Example: As the price of an item increases, the number of items sold decreases.
bschool.cms.ac.in
Correlation Coefficient

• If ‘r’ equals zero, then there is no linear association between the


two variables.
• The closer ‘r’ is to one (in magnitude) the stronger the linear
association.

bschool.cms.ac.in
Measures of Correlation

• The degree of relationship between the two variables can be


measured using the following methods:
a. Scatter diagram.
b. Karl –Pearson’s coefficient of correlation.
c. Spearman’s Rank correlation coefficient.

bschool.cms.ac.in
Scatter Diagram

• It is a graphical presentation of bi-variate data.


• Here a variable (X) is taken along the x-axis and the other
variable (Y) is taken along the y-axis and each pair of (X, Y)
values are represented by a point of the graph.
• The rough estimate of correlation can be obtained using the
following scatters diagrams.

bschool.cms.ac.in
Scatter Diagram
• If the variables form a positives slope (a line moving in the upward
direction) they are said to be perfectly positively correlated.
• If the variables are clustered around the positive slope then they are
positively correlated.

bschool.cms.ac.in
Scatter Diagram
• If the variables form a negative slope (a line moving in downward
direction) they are said to be perfectly negatively correlated.
• If the variables are clustered around the negative slopes, they are negatively
correlated.

If the variables are spread all over the graph, then they are not correlated. bschool.cms.ac.in
Karl Pearson’s Correlation Coefficient

• It is a mathematical measure based on covariance and variances.


• Covariance is a descriptive measure of the linear association
between two variables.
• Covariance describes the extent to which a change in one
variable (x) is paired with a comparable change in another
variable (y).

bschool.cms.ac.in
Properties of Karl Pearson’s Correlation Coefficient
• The value of r does not depend upon the units of measurement.
• The value of r does not depend upon which variable is labelled
‘X’ and which is labelled ‘Y’
• Correlation coefficient lies between -1 and 1. A positive value of r
means a positive linear relationship, a negative value means a
negative linear relationship
• If r = ±1, then all the points of the scatter diagram lie exactly on a
straight line and the correlation is said to be positive perfect if r
= +1 and negative perfect if r = -1.
• ‘r’ measures only the linear relationship between ‘X’ and ‘Y’

bschool.cms.ac.in
Karl Pearson’s Correlation Coefficient

bschool.cms.ac.in
PRACTICE :
Numerical Problems

bschool.cms.ac.in
Karl Pearson’s Correlation Coefficient
• A travel and leisure magazine provides an annual list of the 500
best hotels in India. The magazine provides a rating for each
hotel along with a brief description that includes the size of the
hotel, amenities and the cost per night for a double room. A
sample of 12 of the top-rated hotels in India is as follows:

bschool.cms.ac.in
Hotel Location No. of Rooms Cost/night Rs. ’00)
Cubs Trail Resort Kanha, MP 220 499
Seasons Resort and Spa Cochin, Kerala 727 340
Buffalo Inn Coorg, Karnataka 285 585
Swasti Heritage Hotel Udaipur, Rajasthan 273 495
Tiger Den Jim Corbett, Uttarakhand 145 495
Snowden Spa and resorts Dharmashala, HP 213 279
Sun & Sand Beach Resort Panjim, Goa 398 279
Sand Stone Beach Resort Mahabalipuram, TN 343 455
Snow View Towers Gangtok, Sikkim 250 595
Six Seasons Beach Resort Vizag, AP 414 367
Golden Sands Mapusa, Goa 400 675
Chiru Towers Hyderabad, Telangana 700 420
bschool.cms.ac.in
Problem 1
Questions:
a. Develop a scatter diagram with the number of rooms on the
horizontal axis and the cost per night on the vertical axis. Does
there appear to be a relationship between the number of rooms
and the cost per night? Discuss.
b. What is the correlation coefficient? What does it tell you about
the relationship between the number of rooms and the cost per
night for a double room? Does this appear reasonable? Discuss.

bschool.cms.ac.in
The data points on the
scatter diagram does not
follow any pattern. They
neither are around positive
scope nor around negative
scope.
Hence, there appears no
relationship between the
number of rooms and the
cost per night per room.

bschool.cms.ac.in
X Y X2 Y2 XY
220 499 48400 249001 109780
727 340 528529 115600 247180
285 585 81225 342225 166725
273 495 74529 245025 135135
145 495 21025 245025 71775
213 279 45369 77841 59427
398 279 158404 77841 111042
343 455 117649 207025 156065
250 595 62500 354025 148750
414 367 171396 134689 151938
400 675 160000 455625 270000
700 420 490000 176400 294000
4368 5484 1959026 2680322 1921817
bschool.cms.ac.in
X Y X2 Y2 XY
220 499 48400 249001 109780
727 340 528529 115600 247180
285 585 81225 342225 166725
273 495 74529 245025 135135
145 495 21025 245025 71775
213 279 45369 77841 59427
398 279 158404 77841 111042
343 455 117649 207025 156065
250 595 62500 354025 148750
414 367 171396 134689 151938
400 675 160000 455625 270000
700 420 490000 176400 294000
4368 5484 1959026 2680322 1921817
bschool.cms.ac.in
Solution 1
Since r = -0.29, there is a weak negative correlation between the number of
rooms and the cost of room per night.
This does appear reasonable as this result is a reflection of the scatter diagram.

bschool.cms.ac.in
Problem 2
Newly appointed finance secretary receives a feedback from his team in a
review meeting about the rising unemployment in the country. Coming from
the science background, he decides to take various parameters to understand
the real reason behind the rise in the unemployment rate. One of the
parameters he selects is the industrial production. He seeks the data about the
industrial production index and number of unemployed people between 2012
and 2019 from his team.
He gets the following table that gives indices of industrial production and
number of registered unemployed people (in lakh). He decides to use the
correlation analysis to understand the relationship between the given data.
Use the Karl Pearson’s Coefficient of Correlation analysis to find out what the
finance secretary discovers from the given data.
bschool.cms.ac.in
Problem 2

Year 2012 2013 2014 2015 2016 2017 2018 2019

Index of Production 100 102 104 107 105 112 103 99

Number Unemployed 15 12 13 11 12 12 19 26

bschool.cms.ac.in
X Y X2 Y2 XY
100 15 10000 225 1500
102 12 10404 144 1224
104 13 10816 169 1352
107 11 11449 121 1177
105 12 11025 144 1260
112 12 12544 144 1344
103 19 10609 361 1957
99 26 9801 676 2574
832 120 86648 1984 12388

bschool.cms.ac.in
X Y X2 Y2 XY

100 15 10000 225 1500

102 12 10404 144 1224

104 13 10816 169 1352

107 11 11449 121 1177

105 12 11025 144 1260

112 12 12544 144 1344

103 19 10609 361 1957

99 26 9801 676 2574

832 120 86648 1984 12388


bschool.cms.ac.in
Solution 2
Since r = -0.629, there is a moderate negative correlation between the Index of
Production and Unemployment.
This means, as the index of production increases, the unemployment reduces
moderately.

bschool.cms.ac.in
Questions ?

bschool.cms.ac.in
Problem 3
A financial analyst wanted to find out whether inventory turnover influences
any company’s earnings per share (in percent). A random sample of 7
companies listed in a stock exchange was selected and the following data was
recorded for each. Find the strength of association between inventory turnover
and earnings per share. Interpret this finding to the analyst.

bschool.cms.ac.in
Problem 3
Inventory Turnover Earnings per share
Company
(no. of times) (percent)
A 4 11
B 5 9
C 7 13
D 8 7
E 6 13
F 3 8
G 5 8
bschool.cms.ac.in
X Y X2 Y2 XY

4 11 16 121 44

5 9 25 81 45

7 13 49 169 91

8 7 64 49 56

6 13 36 169 78

3 8 9 64 24

5 8 25 64 40

38 69 224 717 378

bschool.cms.ac.in
X Y X2 Y2 XY

4 11 16 121 44

5 9 25 81 45

7 13 49 169 91

8 7 64 49 56

6 13 36 169 78

3 8 9 64 24

5 8 25 64 40

38 69 224 717 378

bschool.cms.ac.in
Solution 3
Since r = 0.126, there is a weak positive correlation between inventory
turnover and earnings per share.
This means, as the inventory turnover increases, the earning per share
increases not significantly.

bschool.cms.ac.in
Problem 4
A nutritionist well-known for her nutritional prescriptions to pregnant women
wishes to estimate the association between gestational age and infant birth
weight in order to enhance her prescriptions. For this, a small study is
conducted involving 10 infants to investigate the association between
gestational age at birth, measured in weeks, and birth weight, measured in
grams. Calculate the association and give recommendations to the nutritionist.

bschool.cms.ac.in
Problem 4
Infant ID Gestational Age (In Weeks) Birth Weight (In Grams)
1 35 1895
2 36 2030
3 29 1440
4 40 2835
5 36 3090
6 42 3827
7 40 3260
8 37 2690
9 41 3285
10 38 2920
bschool.cms.ac.in
X Y X2 Y2 XY

35 1895 1225 3591025 66325

36 2030 1296 4120900 73080

29 1440 841 2073600 41760

40 2835 1600 8037225 113400

36 3090 1296 9548100 111240

42 3827 1764 14645929 160734

40 3260 1600 10627600 130400

37 2690 1369 7236100 99530

41 3285 1681 10791225 134685

38 2920 1444 8526400 110960

374 27272 14116 79198104 1042114


bschool.cms.ac.in
X Y X2 Y2 XY

35 1895 1225 3591025 66325

36 2030 1296 4120900 73080

29 1440 841 2073600 41760

40 2835 1600 8037225 113400

36 3090 1296 9548100 111240

42 3827 1764 14645929 160734

40 3260 1600 10627600 130400

37 2690 1369 7236100 99530

41 3285 1681 10791225 134685

38 2920 1444 8526400 110960

374 27272 14116 79198104 1042114


bschool.cms.ac.in
Solution 4
Since r = 0.89, there is a strong negative correlation between Gestational Age
and the birth weight.
It can be inferred that as the gestational age increases the birth weight also
increases proportionately.

bschool.cms.ac.in
Problem 5
The success of a shopping center can be represented as a function of the
distance (in miles) from the center of the population and the number of clients
(in hundreds of people) who will visit. The data is given in the table below.
Calculate the linear correlation coefficient.

No. Customers 8 7 6 4 2 1

Distance 15 19 25 23 34 40

bschool.cms.ac.in
bschool.cms.ac.in

You might also like