100% found this document useful (2 votes)
3K views46 pages

Correlation: (For M.B.A. I Semester)

The document discusses correlation and different types of correlation: 1. Correlation measures the relationship between two or more variables and can be positive, negative, or no correlation. 2. Positive correlation occurs when variables move in the same direction, while negative correlation is when they move in opposite directions. 3. Types of correlation include simple, multiple, partial, linear, and non-linear correlation.

Uploaded by

Arun Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
3K views46 pages

Correlation: (For M.B.A. I Semester)

The document discusses correlation and different types of correlation: 1. Correlation measures the relationship between two or more variables and can be positive, negative, or no correlation. 2. Positive correlation occurs when variables move in the same direction, while negative correlation is when they move in opposite directions. 3. Types of correlation include simple, multiple, partial, linear, and non-linear correlation.

Uploaded by

Arun Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Correlation

(For M.B.A. I Semester)

by Prof. Nitin Karoulia


9033099006
Example – Marks of 7 students
Student Marks in Physics Marks in Chemistry
A 82 71
B 68 63
C 76 61
D 39 45
E 54 48
F 75 72
G 47 42
Scatter Diagram

80

70

60

50

40

30

20

10

0
35 40 45 50 55 60 65 70 75 80 85
Correlation

Correlation a statistical method which enables the researcher to find whether two
variables are related and to what extent they are related.
Correlation is considered as the sympathetic movement of two or more variables.
We can observe this when a change in one particular variable is accompanied by
changes in other variables as well, and this happens either in the same or opposite
direction, then the resultant variables are said to be correlated. Considering a data
where we find two or more variables getting valued then we might study the
related variation for these variables.
 
In correlation, when values of one variable increase with the increase in another
variable, it is supposed to be a positive correlation. On the other hand, if the values
of one variable decrease with the increase in another variable, then it would be a
negative correlation. There might be the case when there is no change in a variable
with any change in another variable. In this case, it is defined as no correlation
between the two.
 
Correlation

Correlation can be of three types as follows:


 
1.Simple correlation
2.Multiple correlation
3.Partial correlation
 
we are going to focus on correlation and its types.  
Correlation

Correlation Definition

The relationship between more than one variable is considered as


correlation. Correlation is considered as a number which can be used to
describe the relationship between two variables. Simple correlation is
defined as a variation related amongst any two variables.

The multiple correlation and partial correlation are categorized as related variation
among three or more variables. Two variables are correlated only when they vary in
such a way that the higher and lower values of one variable corresponds to the
higher and lower values of the other variable. We might also get to know if they are
correlated when the higher value of one variable corresponds with the lower value
of the other.
 
Correlation Symbol
 Symbol of correlation = r
 
Correlation

Correlation Formula
The formula for correlation is as follows,
 
 
Correlation (r) = N∑XY−(∑X)(∑Y) / √ [N∑X²−(∑X)²] √ [N∑Y²−(∑Y)²]
 Where,
x and y are the variables.
b = the slope of the regression line is also called as the regression coefficient a =
intercept point of the regression line which is in the y-axis.
N = Number of values or elements X = First Score

Y = Second Score
∑XY = Sum of the product of the first and Second Scores
∑X = Sum of First Scores
∑Y = Sum of Second Scores
∑ = Sum of square first scores.
∑ = Sum of square second scores.
Correlation

Coefficient of Correlation
Coefficient of correlation, r, called the linear correlation coefficient,
measures the strength and the direction of a linear relationship
between two variables. It also called as Pearson product moment
correlation coefficient. The algebraic method of measuring the
correlation is called the coefficient of correlation. There are mainly
three coefficients of correlation
 
1. Karl Pearson’s Coefficient of correlation
2. Pearson’s rank correlation coefficient
3. Concurrent correlation
Correlation

Karl Pearson’s Coefficient of correlation


 
The most important algebraic method of measuring correlation is Karl
Pearson’s Coefficient of correlation or Pearsonian’s coefficient of
Correlation. It has widely used application in Statistics. It is denoted
by r.
 
The formula is given by
 
r = N∑xy − (∑x)(∑y) / √ [N∑x²−(∑x)²] √ [N∑y²−(∑y)²]
Correlation

Interpretation of Karl Pearson’s Coefficient of correlation


 
Karl Pearson’s Coefficient of correlation denoted by r is the degree of
correlation between two variables. r takes values between –1 and 1
 
When r is –1, we say there is perfect negative correlation.
When r is a value between –1 and 0, we say that there is a negative
correlation
When r is 0, we say there is no correlation
When r is a value between 0 and 1, we say there is a positive
correlation
When r is 1, we say there is a perfect positive correlation.
Correlation

Properties of the Coefficient of correlation


 
1. Coefficient of correlation has a well defined formula
2. Coefficient of correlation is a number and is independent of the
unit of measurement
3. Coefficient of correlation lies between –1 and 1
4. Coefficient of correlation between x and y will be same as that
between y and x.
Correlation

Types of Correlation
 There are different types of Correlation. They are listed as follows:

Positive Correlation
 A positive correlation is a correlation in the same direction.
 
Negative Correlation
 A negative correlation is a correlation in the opposite direction.
 
Partial Correlation
The correlation is partial if we study the relationship between two variables keeping
all other variables constant.
Example:
The Relationship between yield and rainfall at a constant temperature is partial
correlation.
Correlation

Linear Correlation
 When the change in one variable results in the constant change in the other
variable, we say the correlation is linear. When there is a linear correlation, the
points plotted will be in a straight line
 
Example:
Consider the variables with the following values.
X : 10 20 30 40 50
Y : 20 40 60 80 100  
 
Here, there is a linear relationship between the variables. There is a ratio 1:2 at all
points. Also, if we plot them they will be in a straight line.
Correlation

Zero Order Correlation


One of the most common and basic techniques for analyzing the relationships
between variables is zero-order correlation. The value of a correlation coefficient
can vary from -1 to +1. A -1 indicates a perfect negative correlation, while a +1
indicates a perfect positive correlation. A correlation of zero means there is no
relationship between the two variables.
 

Scatter Plot Correlation


A scatter plot is a type of mathematical diagram using cartesian coordinates to
display values for two variables for a set of data. Scatter plots will often show at a
glance whether a relationship exists between two sets of data. The data displayed
on the graph resembles a line rising from left to right. Since the slope of the line is
positive, there is a positive correlation between the two sets of data.
Correlation
Correlation

Spearman's Correlation
Spearman's rank correlation coefficient allows us to identify easily the strength of
correlation within a data set of two variables, and whether the correlation is
positive or negative. The Spearman coefficient is denoted with the Greek letter rho
(ρ).
=> ρ = 1 - 6∑d² / n(n²−1)
 
Non Linear Correlation
When the amount of change in one variable is not in a constant ratio to the change
in the other variable, we say that the correlation is non linear.
 
Example:
 
Consider the variables with the following values
Correlation

X : 10 20 30 40 50
Y : 10 30 70 90 120

Here there is a non linear relationship between the variables. The ratio between
them is not fixed for all points. Also if we plot them on the graph, the points will not
be in a straight line. It will be a curve.
 
Non linear correlation is also known as curvilinear correlation.

Simple Correlation
If there are only two variable under study, the correlation is said to be simple.
 
Example:
 
The correlation between price and demand is simple.
Correlation

Multiple Correlations
When one variable is related to a number of other variables, the correlation is not
simple. It is multiple if there is one variable on one side and a set of variables on
the other side.
 
Example:
 
Relationship between yield with both rainfall and fertilizer together is multiple
correlations
 
Weak Correlation
 The range of the correlation coefficient between -1 to +1. If the linear correlation
coefficient takes values close to 0, the correlation is weak.
Correlation

Positive Correlation
A relationship between two variables in which both variables move in same
directions. A positive correlation exists when as one variable decreases, the other
variable also decreases and vice versa. When the values of two variables x and y
move in the same direction, the correlation is said to be positive. That is in positive
correlation, when there is an increase in x, there will be and an increase in y also.
Similarly when there is a decrease in x, there will be a decrease in y also.
 
Positive Correlation Example
 
Price and supply are two variables, which are positively correlated. When Price
increases, supply also increases; when price decreases, supply decreases.
Correlation

Positive Correlation Graph


Correlation

Strong Positive Correlation


 
A strong positive correlation has variables that has the same changes, but the point
are more close together and form a line.
Correlation

Weak Positive Correlation


 
A weak positive correlation has variables that has the same changes but the points
on the graph are dispersed.
Correlation

Negative Correlation
In a negative correlation, as the values of one of the variables increase, the values
of the second variable decrease or the value of one of the variables decreases, the
value of the other variable increases. When the values of two variables x and y
move in opposite direction, we say correlation is negative. That is in negative
correlation, when there is an increase in x, there will be a decrease in y. Similarly
when there is a decrease in x, there will be an increase in y increase.

 Negative Correlation Example


 When price increases, demand also decreases; when price decreases, demand also
increases. So price and demand are negatively correlated.

Perfect Negative Correlation


The closer the correlation coefficient is either -1 or +1, the stronger the relationship
is between the two variables. A perfect negative correlation of -1.0 indicated that
for every member of the sample, higher score on one variable is related to a lower
score on the other variable.
Correlation

Correlation Data Sets


In statistics, sometimes we will have to study the relationship between two or more
variables. The statistical technique used to study the relationships between the
variables is called the correlation technique. Correlation analysis is the analysis of
association between two or more variables. The tendency of two or more variables
to vary together directly or inversely is called as correlation.
 
Two variables are said to be correlated, if the change in one of the variable results
in a corresponding change in the other variable. That is, when two variables move
together, they are said to be correlated.
 
Let us take an example to understand the term correlation. In a given data with
heights and weights of students in a school, we can assume that students with a
more height would have a more weight. Besides, it is assumed that students who
have short height will have less weight.
Correlation

Correlation Analysis
Correlation is a term that refers to the strength of a relationship
between two variables. Correlation and regression analysis are
related in the sense that both deal with relationships among
variables. The correlation coefficient is a measure of linear association
between two variables. Values of the correlation coefficient are
always between -1 and +1. The value of -1 represents a perfect
negative correlation while a value of +1 represents a perfect positive
correlation. A value of 0 means that there is no relationship between
the variables being tested.
Correlation

Interpretation of coefficient of correlation based on the error likely


 
1. If the coefficient of correlation is less than the error likely, then its not
significant
2. If the coefficient of correlation is more than six times the error likely, it is
significant.
3. If the error is too small and coefficient of correlation is 0.5 or more then the
coefficient of correlation is significant.

The values of r between 0 and 1 are said to have a limited degree of correlation. A
limited degree of correlation may be positive or negative. Limited correlation can be
high, moderate or low based on whether it is close to 1 or 0.
 
Correlation

Covariance Correlation
Covariance and correlation are both describe the degree of similarity
between two random variables. Suppose that X and Y are real-valued
random variables for the experiment with means E(X), E(Y) and
variances var(X), var(Y), respectively. The covariance of X and Y is
defined by
 
cov(X, Y) = E[(X - E(X))(Y - E(Y))]
 
and the correlation of X and Y is defined by cor(X, Y) = cov(X,Y) /
std(X)std(Y).
 
Correlation

Cross Correlation
The cross correlation function is a measure of the similarity between
two data sets. One set is displaced related to the other, corresponding
values of the two sets are multiplied together and the product are
summed to give the value of the cross correlation. Whenever two sets
are almost same, the product will be positive and the cross
correlation is large. When set are unlike, some of the products will be
positive and some negative and the sum will be small.
 
Correlation Examples
Given below are some examples to calculate correlation.
 
Correlation

Solved Example
 Question:
 To determine the correlation value for the given set of X and Y values:
X Values Y Values
21 2.5
23 3.1
37 4.2
19 5.6
24 6.4
33 8.4

Solution:
 
Let us count the number of values. N = 6
Determine the values for XY, X2, Y2
Correlation
X Value Y Value X*Y X*X Y*Y
21 2.5 52.5 441 6.25
23 3.1 71.3 529 9.61
37 4.2 155.4 1369 17.64
19 5.6 106.4 361 31.36
24 6.4 153.6 576 40.96
33 8.4 277.2 1089 70.56
Determine the following values ∑X , ∑Y , ∑XY , ∑ X2 , ∑ Y2
∑X=157
∑Y=30.2
∑XY=816.4
∑X²=4365
∑Y²=176.38
Correlation (r) = N∑XY−(∑X)(∑Y) / √ [N∑ X2−(∑ X)2][N∑ Y2 −(∑ Y)2]
 = 6×816.4 - 157×30.2 / √[6 × 4365 – (157)²][6 × 176.38 – (30.2)²]
(r)=0.33
Spearman’s Correlation Coefficient
Spearman’s Correlation Coefficient
Correlation

Regression
We’ve seen how to explore the relationship between two quantitative variables
graphically with a scatterplot. When the relationship has a straight-line pattern, the
Pearson correlation coefficient describes it numerically. We can analyze the data
further by finding an equation for the straight line that best describes the pattern.
This equation predicts the value of the response(y) variable from the value of the
explanatory variable.
 
Much of mathematics is devoted to studying variables that are deterministically
related. Saying that x and y are related in this manner means that once we are told
the value of x, the value of y is completely specified. For example, suppose the cost
for a small pizza at a restaurant if $10 plus $.75 per slice. If we let x= # toppings and
y = price of pizza, then y=10+.75x. If we order a 3-topping pizza, then
y=10+.75(3)=12.25
Correlation

There are many variables x and y that would appear to be related to one another,
but not in a deterministic fashion. Suppose we examine the relationship between
x=high school GPA and Y=college GPA. The value of y cannot be determined just
from knowledge of x, and two different students could have the same x value but
have very different y values. Yet there is a tendency for those students who have
high (low) high school GPAs also to have high(low) college GPAs. Knowledge of a
student’s high school GPA should be quite helpful in enabling us to predict how that
person will do in college.
Regression analysis is the part of statistics that deals with investigation of the
relationship between two or more variables related in a nondeterministic fashion.
 
Historical Note: The statistical use of the word regression dates back to Francis
Galton, who studied heredity in the late 1800’s. One of Galton’s interests was
whether or not a man’s height as an adult could be predicted by his parents’
heights. He discovered that it could, but the relationship was such that very tall
parents tended to have children who were shorter than they were, and very short
parents tended to have children taller than themselves. He initially described this
phenomenon by saying that there was a “reversion to mediocrity” but later
changed to the terminology “regression to mediocrity.”
Correlation

The least-squares line is the line that makes the sum of the squares of the vertical
distances of the data points from the line as small as possible.
 
 Equation for Least Squares (Regression) Line
 
Correlation

 
Correlation

 
Correlation
Correlation

When talking about regression equations, the following are terms used for x and
y x: predictor variable, explanatory variable, or independent variable
y: response variable or dependent variable
 
 Extrapolation is the use of the least-squares line for prediction outside the range of
values of the explanatory variable x that you used to obtain the line. Extrapolation
should not be done!
 
Measuring the Contribution of x in Predicting y
 We can consider how much the errors of prediction of y were reduced by using the
information provided by x.
Correlation

The coefficient of determination, r2, represents the proportion of the total sample
variation in y (measured by the sum of squares of deviations of the sample y values
about their mean y ) that is explained by (or attributed to) the linear relationship
between x and y.
Correlation

Interpretation: 98% of the total sample variation in y is explained by the straight-


line relationship between y and x, with the total sample variation in y being
measured by the sum of squares of deviations of the sample y values about their
mean y .
 
 
Interpretation: An r2 of .98 means that the sum of squares of deviations of the y
values about their predicted values has been reduced 98% by the use of the least
squares equation yˆ = -2.2 + 2.3x, instead of y , to predict y.
Thank you

You might also like