Correlation: (For M.B.A. I Semester)
Correlation: (For M.B.A. I Semester)
80
70
60
50
40
30
20
10
0
35 40 45 50 55 60 65 70 75 80 85
Correlation
Correlation a statistical method which enables the researcher to find whether two
variables are related and to what extent they are related.
Correlation is considered as the sympathetic movement of two or more variables.
We can observe this when a change in one particular variable is accompanied by
changes in other variables as well, and this happens either in the same or opposite
direction, then the resultant variables are said to be correlated. Considering a data
where we find two or more variables getting valued then we might study the
related variation for these variables.
In correlation, when values of one variable increase with the increase in another
variable, it is supposed to be a positive correlation. On the other hand, if the values
of one variable decrease with the increase in another variable, then it would be a
negative correlation. There might be the case when there is no change in a variable
with any change in another variable. In this case, it is defined as no correlation
between the two.
Correlation
Correlation Definition
The multiple correlation and partial correlation are categorized as related variation
among three or more variables. Two variables are correlated only when they vary in
such a way that the higher and lower values of one variable corresponds to the
higher and lower values of the other variable. We might also get to know if they are
correlated when the higher value of one variable corresponds with the lower value
of the other.
Correlation Symbol
Symbol of correlation = r
Correlation
Correlation Formula
The formula for correlation is as follows,
Correlation (r) = N∑XY−(∑X)(∑Y) / √ [N∑X²−(∑X)²] √ [N∑Y²−(∑Y)²]
Where,
x and y are the variables.
b = the slope of the regression line is also called as the regression coefficient a =
intercept point of the regression line which is in the y-axis.
N = Number of values or elements X = First Score
Y = Second Score
∑XY = Sum of the product of the first and Second Scores
∑X = Sum of First Scores
∑Y = Sum of Second Scores
∑ = Sum of square first scores.
∑ = Sum of square second scores.
Correlation
Coefficient of Correlation
Coefficient of correlation, r, called the linear correlation coefficient,
measures the strength and the direction of a linear relationship
between two variables. It also called as Pearson product moment
correlation coefficient. The algebraic method of measuring the
correlation is called the coefficient of correlation. There are mainly
three coefficients of correlation
1. Karl Pearson’s Coefficient of correlation
2. Pearson’s rank correlation coefficient
3. Concurrent correlation
Correlation
Types of Correlation
There are different types of Correlation. They are listed as follows:
Positive Correlation
A positive correlation is a correlation in the same direction.
Negative Correlation
A negative correlation is a correlation in the opposite direction.
Partial Correlation
The correlation is partial if we study the relationship between two variables keeping
all other variables constant.
Example:
The Relationship between yield and rainfall at a constant temperature is partial
correlation.
Correlation
Linear Correlation
When the change in one variable results in the constant change in the other
variable, we say the correlation is linear. When there is a linear correlation, the
points plotted will be in a straight line
Example:
Consider the variables with the following values.
X : 10 20 30 40 50
Y : 20 40 60 80 100
Here, there is a linear relationship between the variables. There is a ratio 1:2 at all
points. Also, if we plot them they will be in a straight line.
Correlation
Spearman's Correlation
Spearman's rank correlation coefficient allows us to identify easily the strength of
correlation within a data set of two variables, and whether the correlation is
positive or negative. The Spearman coefficient is denoted with the Greek letter rho
(ρ).
=> ρ = 1 - 6∑d² / n(n²−1)
Non Linear Correlation
When the amount of change in one variable is not in a constant ratio to the change
in the other variable, we say that the correlation is non linear.
Example:
Consider the variables with the following values
Correlation
X : 10 20 30 40 50
Y : 10 30 70 90 120
Here there is a non linear relationship between the variables. The ratio between
them is not fixed for all points. Also if we plot them on the graph, the points will not
be in a straight line. It will be a curve.
Non linear correlation is also known as curvilinear correlation.
Simple Correlation
If there are only two variable under study, the correlation is said to be simple.
Example:
The correlation between price and demand is simple.
Correlation
Multiple Correlations
When one variable is related to a number of other variables, the correlation is not
simple. It is multiple if there is one variable on one side and a set of variables on
the other side.
Example:
Relationship between yield with both rainfall and fertilizer together is multiple
correlations
Weak Correlation
The range of the correlation coefficient between -1 to +1. If the linear correlation
coefficient takes values close to 0, the correlation is weak.
Correlation
Positive Correlation
A relationship between two variables in which both variables move in same
directions. A positive correlation exists when as one variable decreases, the other
variable also decreases and vice versa. When the values of two variables x and y
move in the same direction, the correlation is said to be positive. That is in positive
correlation, when there is an increase in x, there will be and an increase in y also.
Similarly when there is a decrease in x, there will be a decrease in y also.
Positive Correlation Example
Price and supply are two variables, which are positively correlated. When Price
increases, supply also increases; when price decreases, supply decreases.
Correlation
Negative Correlation
In a negative correlation, as the values of one of the variables increase, the values
of the second variable decrease or the value of one of the variables decreases, the
value of the other variable increases. When the values of two variables x and y
move in opposite direction, we say correlation is negative. That is in negative
correlation, when there is an increase in x, there will be a decrease in y. Similarly
when there is a decrease in x, there will be an increase in y increase.
Correlation Analysis
Correlation is a term that refers to the strength of a relationship
between two variables. Correlation and regression analysis are
related in the sense that both deal with relationships among
variables. The correlation coefficient is a measure of linear association
between two variables. Values of the correlation coefficient are
always between -1 and +1. The value of -1 represents a perfect
negative correlation while a value of +1 represents a perfect positive
correlation. A value of 0 means that there is no relationship between
the variables being tested.
Correlation
The values of r between 0 and 1 are said to have a limited degree of correlation. A
limited degree of correlation may be positive or negative. Limited correlation can be
high, moderate or low based on whether it is close to 1 or 0.
Correlation
Covariance Correlation
Covariance and correlation are both describe the degree of similarity
between two random variables. Suppose that X and Y are real-valued
random variables for the experiment with means E(X), E(Y) and
variances var(X), var(Y), respectively. The covariance of X and Y is
defined by
cov(X, Y) = E[(X - E(X))(Y - E(Y))]
and the correlation of X and Y is defined by cor(X, Y) = cov(X,Y) /
std(X)std(Y).
Correlation
Cross Correlation
The cross correlation function is a measure of the similarity between
two data sets. One set is displaced related to the other, corresponding
values of the two sets are multiplied together and the product are
summed to give the value of the cross correlation. Whenever two sets
are almost same, the product will be positive and the cross
correlation is large. When set are unlike, some of the products will be
positive and some negative and the sum will be small.
Correlation Examples
Given below are some examples to calculate correlation.
Correlation
Solved Example
Question:
To determine the correlation value for the given set of X and Y values:
X Values Y Values
21 2.5
23 3.1
37 4.2
19 5.6
24 6.4
33 8.4
Solution:
Let us count the number of values. N = 6
Determine the values for XY, X2, Y2
Correlation
X Value Y Value X*Y X*X Y*Y
21 2.5 52.5 441 6.25
23 3.1 71.3 529 9.61
37 4.2 155.4 1369 17.64
19 5.6 106.4 361 31.36
24 6.4 153.6 576 40.96
33 8.4 277.2 1089 70.56
Determine the following values ∑X , ∑Y , ∑XY , ∑ X2 , ∑ Y2
∑X=157
∑Y=30.2
∑XY=816.4
∑X²=4365
∑Y²=176.38
Correlation (r) = N∑XY−(∑X)(∑Y) / √ [N∑ X2−(∑ X)2][N∑ Y2 −(∑ Y)2]
= 6×816.4 - 157×30.2 / √[6 × 4365 – (157)²][6 × 176.38 – (30.2)²]
(r)=0.33
Spearman’s Correlation Coefficient
Spearman’s Correlation Coefficient
Correlation
Regression
We’ve seen how to explore the relationship between two quantitative variables
graphically with a scatterplot. When the relationship has a straight-line pattern, the
Pearson correlation coefficient describes it numerically. We can analyze the data
further by finding an equation for the straight line that best describes the pattern.
This equation predicts the value of the response(y) variable from the value of the
explanatory variable.
Much of mathematics is devoted to studying variables that are deterministically
related. Saying that x and y are related in this manner means that once we are told
the value of x, the value of y is completely specified. For example, suppose the cost
for a small pizza at a restaurant if $10 plus $.75 per slice. If we let x= # toppings and
y = price of pizza, then y=10+.75x. If we order a 3-topping pizza, then
y=10+.75(3)=12.25
Correlation
There are many variables x and y that would appear to be related to one another,
but not in a deterministic fashion. Suppose we examine the relationship between
x=high school GPA and Y=college GPA. The value of y cannot be determined just
from knowledge of x, and two different students could have the same x value but
have very different y values. Yet there is a tendency for those students who have
high (low) high school GPAs also to have high(low) college GPAs. Knowledge of a
student’s high school GPA should be quite helpful in enabling us to predict how that
person will do in college.
Regression analysis is the part of statistics that deals with investigation of the
relationship between two or more variables related in a nondeterministic fashion.
Historical Note: The statistical use of the word regression dates back to Francis
Galton, who studied heredity in the late 1800’s. One of Galton’s interests was
whether or not a man’s height as an adult could be predicted by his parents’
heights. He discovered that it could, but the relationship was such that very tall
parents tended to have children who were shorter than they were, and very short
parents tended to have children taller than themselves. He initially described this
phenomenon by saying that there was a “reversion to mediocrity” but later
changed to the terminology “regression to mediocrity.”
Correlation
The least-squares line is the line that makes the sum of the squares of the vertical
distances of the data points from the line as small as possible.
Equation for Least Squares (Regression) Line
Correlation
Correlation
Correlation
Correlation
When talking about regression equations, the following are terms used for x and
y x: predictor variable, explanatory variable, or independent variable
y: response variable or dependent variable
Extrapolation is the use of the least-squares line for prediction outside the range of
values of the explanatory variable x that you used to obtain the line. Extrapolation
should not be done!
Measuring the Contribution of x in Predicting y
We can consider how much the errors of prediction of y were reduced by using the
information provided by x.
Correlation
The coefficient of determination, r2, represents the proportion of the total sample
variation in y (measured by the sum of squares of deviations of the sample y values
about their mean y ) that is explained by (or attributed to) the linear relationship
between x and y.
Correlation