2020 Correlation
2020 Correlation
net/publication/343682955
Correlation
CITATION READS
1 5,334
2 authors, including:
Akash Gautam
University of Hyderabad
72 PUBLICATIONS 299 CITATIONS
SEE PROFILE
All content following this page was uploaded by Akash Gautam on 16 August 2020.
The word correlation is used in everyday life to There are different ways of estimating the corre-
denote some form of a relation between two ran- lation between two random variables.
dom variables. We might say that we have noticed
a relation between unemployment and inflation. Scatter Diagram
Correlation is a real number that gives us an idea A scatter diagram is a diagrammatic method to
of the degree of association between two vari- observe correlation. Here, one variable, x-values
ables. At a basic level, it is also known as “bivar- are depicted on the horizontal axis and another var-
iate” statistic. Thus, correlation is a unitless value iable; y-values are on the vertical axis. A pattern
© Springer Nature Switzerland AG 2020
J. Vonk, T. K. Shackelford (eds.), Encyclopedia of Animal Cognition and Behavior,
https://doi.org/10.1007/978-3-319-47829-6_214-1
2 Correlation
between the coordinates would indicate the correla- association between two continuous variables, is
tion. If the trend is downward sloping from left to the most used measure (Gilbert 1986). Pearson’s
right, it means a negative correlation. If it is upward coefficient r is acquired for a sample drawn from a
sloping from left to right, it means a positive corre- populace. The populace value of Pearson’s coef-
lation. If the points are scattered with no discernible ficient is called rho (s), and in this way, r is a
linear pattern, then the variables are said to be gauge of s. The formula for r is as follows:
uncorrelated (Fig. 1).
P P P
N xy ð xÞð yÞ
r ¼ rhffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffii
P P P P
Pearson’s Correlation Coefficient N x2 ð xÞ2 N y2 ð yÞ2
The correlation coefficient is another method of
measuring the statistical relationship or strength
of the association between the two variables. It is Note: In this formula, N is equal to the number
represented in an example by the value “r.” It can of pairs of scores, and Sxy is called the total of the
range from 1.00 to +1.00 (Fig. 2). cross products.
When the correlation coefficient approaches
+1.00, it implies there is a positive relationship Spearman’s Rank Correlation Coefficient
(or perfect positive) or a high level of association Another measure to calculate the correlation coef-
between both variables. It means that the greater ficient is Spearman’s rank correlation method.
or lower the score of a member on one variable, It was formulated by an English psychologist
the greater or lower the score will be on the other Charles Spearman. This method uses a monotonic
variable, respectively. For instance, there is a pos- function to show the relation between two vari-
itive correlation between height and weight of ables, unlike Pearson’s correlation, where it is for
plants up to a particular age. linear relationships only. Spearman’s correlation
Though there are many measures of correla- is a nonparametric measure and generally calcu-
tion, the Pearson product-moment correlation lated for ordinal or ranked data (Kothari 2004).
coefficient, which measures the degree of linear
Correlation, Fig. 1 Scatter diagrams showing positive, negative, and no correlation between x and y variables
Correlation, Fig. 2 The values of “r” ranges from 1 (absolute negative correlation) to +1 (absolute positive
correlation) including 0 (no correlation)
Correlation 3
X X
Spearman’s correlation coefficient (rs) can be cal- Zm ¼ i ¼ 1n j ¼ 1ngij dij
culated by the following formula:
where n represents the total population.
rs ¼ 6SD2 =N N 2 1 Here, Zm is just the summation of products of
both distances. Therefore, Zm will be contingent
where on the number of populations and the distances. In
the late 1970s, the Mantel test met with several
D ¼ difference between the observation of two criticisms for biasedness as this test works fine
ranked variables only for large sample sizes. To avoid this partial-
N ¼ number of observations ity, a null distribution must be done empirically by
permuting rows and columns of one of the dis-
The range of rs is also +1 to 1, similar to the tance matrices. Zm ranges from +1 to 1.
Pearson’s coefficient.
Phi Correlation Coefficient
Kendall Rank Correlation Coefficient The phi correlation method is again a nonpara-
Kendall rank correlation method is an alternative metric test that is used to show associations
to the Spearman’s correlation coefficient method, between two noncontinuous variables, for exam-
as the former one can be used to know the non- ple, the number of males and females in a school.
linear associations between the two variables. This English mathematician Karl Pearson gave this
simply means that it can measure the monotonic as method. Phi coefficient (’) can be calculated eas-
well as linear correlation function. Kendall rank ily from the chi-square result as follows:
correlation coefficient (t) is a nonparametric test
and generally used for the small sample size or for pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
’¼ ðchi square sample sizeÞ
tied ranks. Kendall’s correlation coefficient estab-
lishes the strength of association based on the pat-
However, j is calculated only when p < 0.05;
tern of concordance and discordance between the
otherwise j2 (i.e., ratio of chi-square and sample
pairs of observations. For any given set of paired
size) is sufficient to measure the amount of vari-
variables x and y, t can be calculated as:
ance in the dependent variable, which can be
explained by the independent variable. For exam-
t ¼ ðNo:of concordant pairs
ple, j of 0.72 means that the independent variable
No:of discordant pairsÞ=nðn 1Þ=2 accounts for 52% of the variance in the dependent
variable. The j ranges from 0 to +1, where +1
where denominator n(n1)/2 represents the num- means there is a perfect 1 to 1 correlation between
ber of all possible bivariate data points. t also the two variables (Frey 2018).
ranges from +1 to 1 for untied ranks.
Conclusion References
Correlation is one of the popular statistical tools to Diniz-Filho, J. A., Soares, T. N., Lima, J. S., Dobrovolski,
R., Landeiro, V. L., de Campos Telles, M. P., Rangel,
calculate the degree of relationship between two
T. F., & Bini, L. M. (2013). Mantel test in population
variables, and which is utilized by researchers in genetics. Genetics and Molecular Biology, 36(4),
general. Depending upon the value of correlation 475–485.
coefficient (r), one can categorize correlation as Frey, B. (2018). The SAGE encyclopedia of educational
research, measurement, and evaluation (pp. 1–4).
positive, negative, or no correlation. To obtain the
Thousand Oaks: SAGE Publications, Inc.
correlation, one can proceed manually through the Gilbert, N. (1986). Statistics. Philadelphia: W.B. Saunders.
above formula or some readily available statistical Kothari, C. R. (2004). Research methodology: Methods and
analysis software like MS-office Excel, SPSS, techniques. New Delhi: New Age International Ltd.
etc. However, the main thing is to obtain the
proper inference from the calculated r values in
light of the number of samples and p-value.