Wub Ante
Wub Ante
Review
Cite This Article As: Wubante M (2020). REVIEW ON CORRELATION RESEARCH. Inter. J. Eng. Lit. Cult. 8(4):
99-106
In ease of ungrouped data of bivariate distribution, the following three methods are used to compute the value of co-
efficient of correlation:
1. Scatter Diagram Method
2. Pearson’s Product Moment Coefficient of Correlation
3. Spearman’s Rank Order Coefficient of Correlation
Scatter diagram or dot diagram is a graphic device for drawing certain conclusions about the correlation between two
variables.
In preparing a scatter diagram, the observed pairs of observations are plotted by dots on a graph paper in a two
dimensional space by taking the measurements on variable X along the horizontal axis and that on variable Y along the
vertical axis. The placement of these dots on the graph reveals the change in the variable as to whether they change in
the same or in the opposite directions. It is a very easy, simple but rough method of computing correlation. The
frequencies or points are plotted on a graph by taking convenient scales for the two series. The plotted points will tend to
concentrate in a band of greater or smaller width according to its degree. ‘The line of best fit’ is drawn with a free hand
and its direction indicates the nature of correlation. Scatter diagrams, as an example, showing various degrees of
correlation are shown in the figure below.
The Pearson correlation coefficient is just one of many types of coefficients in the field of statistics. The following
lesson provides the formula, examples of when the coefficient is used, and its significance.
The Karl Pearson’s product-moment correlation coefficient (or simply, the Pearson’s correlation coefficient) is a
measure of the strength of a linear association between two variables and is denoted by r or rxy (x and y being
the two variables involved).
This method of correlation attempts to draw a line of best fit through the data of two variables, and the value of
the Pearson correlation coefficient, r, indicates how far away all these data points are to this line of best fit.
The Pearson correlation coefficient is a very helpful statistical formula that measures the strength between
variables and relationships. In the field of statistics, this formula is often referred to as the Pearson R test. When
conducting a statistical test between two variables, it is a good idea to conduct a Pearson correlation coefficient
value to determine just how strong that relationship is between those two variables.
Formula
In order to determine how strong the relationship is between two variables, a formula must be followed to produce what
is referred to as the coefficient value. The coefficient value can range between -1.00 and 1.00. If the coefficient value is
in the negative range, then that means the relationship between the variables is negatively correlated, or as one value
increases, the other decreases. If the value is in the positive range, then that means the relationship between the
variables is positively correlated, or both values increase or decrease together. Let's look at the formula for conducting
the Pearson correlation coefficient value.
102 Inter. J. Eng. Lit. Cult.
The value of coefficient of correlation is + .85. This shows a high degree of relationship between the two variables.
Example 2
Calculate the correlation of coefficient between the number of cigarettes smoked and the longevity of a test subject.
Solution
Let us first assign random variables to our data in the following way:
X – The number of cigarettes smoked
Y – Years lived
Let us now construct a table to compute all the values we are going to use in our correlation formula. Note that N here =
9
Wubante 103
2 2
X X Y Y XY
25 625 63 3669 1575
35 1225 68 4624 2380
10 100 72 5184 720
40 1600 62 3844 2480
85 7225 65 4225 5525
75 5625 46 2116 3450
60 3600 51 2601 3060
45 2025 60 3600 2700
50 2500 55 3136 2750
2 2
ΣX = 425 ΣX = 24525 ΣY = 542 ΣY = 33188 ΣXY = 24640
2 2 2 2
(ΣX) = 425 = 180625 (ΣY) = 542 = 293764
Using the values in the formula, we get: r xy=−0.61
This implies a negative correlation between the considered variables i.e. the higher the number of cigarettes smoked per
week in last 5 years, the lesser the number of years lived. Note that it DOES NOT mean that smoking cigarettes
decreases the life span. Because there might be other factors responsible for one’s death. Still, it is an important
conclusion nevertheless.
When data are measured on, at least, an ordinal scale, the ordered categories can be replaced by their ranks
and Pearson’s correlation coefficient calculated on these ranks. This is called Spearman’s rank correlation
coefficient (r) and provides a measure of how closely two sets of rankings agree with each other.
A correlation can easily be drawn as a scatter graph, but the most precise way to compare several pairs of data
is to use a statistical test - this establishes whether the correlation is really significant or if it could have been the
result of chance alone.
Spearman’s Rank correlation coefficient is a technique which can be used to summarize the strength and
direction (negative or positive) of a relationship between two variables. The result will always be between 1 and
minus 1.
This non-parametric analysis tool provides a way to compare two sets of ordinal data (data that can be rank
ordered in a meaningful manner). The result, r, is a measure of the association between two datasets.
You may want to know if two reviewers have similar ratings for movies, or if two assessment techniques provide
similar results. If r is 1 it means when one series increases the other does also. If r is -1, there is a negative
relationship, meaning as one series increases the other decreases. At zero there is no relationship between the
two series.
This does not work for sets with a non-linear relationship; say a parabolic function for example.
Step 1: Collect the data, and create a table from your data.
Step 2: Rank the data: Ranking is achieved by giving the ranking '1' to the biggest number in a column, '2' to the second
biggest value and so on. The smallest value in the column will get the lowest ranking. This should be done for both sets
of measurements.
Step 3: Calculate the difference in ranks (D): This is the difference between the ranks of the two values on each row of
the table.
Step 4: Square the differences (D²) to remove negative values and then sum them (D²).
2
Step 5: Sum column D
Step 6: Calculate r
Step 7: Interpret the results
104 Inter. J. Eng. Lit. Cult.
Rank Correlation
Example 1: In a speech contest two professors judged 10 students. Their judgments were in ranks, which are presented
below. Determine the extent to which their judgments were in agreement.
2
Students Professor K’s Professor M’s Difference D
Ranks Ranks (D= R1-R2)
(R-1) (R-2)
A 1 1 0 0
B 3 2 1 1
C 4 5 -1 1
D 7 9 -2 4
E 6 6 0 0
F 9 8 1 1
G 8 10 -2 4
H 10 7 3 9
I 2 4 -2 4
J 5 3 2 4
2
N=10 ∑D=0 ∑D =28
2
r=1-6x∑D = 1- 6x28= 1-168= 1- .17= +.83
2 2
N (N -1) 10(10 -1) 990
The value of coefficient of correlation between scores in Mathematics and General Science is positive and high.
Example 2: The following data give the scores of 5 students in Mathematics and General Science respectively:
Compute the correlation between the two series of test scores by Rank Difference Method.
2
Students Score Score in Rank in Rank in Test Diff. in Ranks D
in Gen. Sci. Test 1 (R1) 2 (R2) D (R1-R2)
Math
A 8 10 2 1 1 1
B 7 8 3 2 1 1
C 9 7 1 3 -2 4
D 5 4 4 5 -1 1
E 1 5 5 4 1 1
N=5 ∑D=0 ∑D2=8
2
r=1-6x∑D = 1- 6x8= 1-48= 1- .40= +.60
2 2
N (N -1) 5(5 -1) 120
Wubante 105
The value of coefficient of correlation between scores in Mathematics and General Science is positive and moderate.
Correlation coefficient is used to test the strength of relationship between two variables.
A coefficient can range from r = +1.00 to -1.00.
Merely computation of correlation does not have any significance until and unless we determine how large must
the coefficient be in order to be significant, and
What does correlation tell us about the data?
What do we mean by the obtained value of coefficient of correlation?
1. Subject Characteristics
Individuals or groups have two or more characteristics; might be a cause of variation in the other two variables.
2. Location
3. Instrumentation
Instrument decay; care must be taken to ensure the observers don’t become tired, bored or inattentive.
Data collector characteristics; different gender, age or ethnicity may affect specific response.
4. Testing: Experience of responding to the first instrument may influence subject responses to the second instrument.
5. Mortality: Loss of subjects may make a relationship more (or less) likely in the remaining data.
106 Inter. J. Eng. Lit. Cult.
1. What is the relationship between gender, Cohen, L., Manion, L. & Morrison, K. (2000). Research
academic performance and university drug use? Methods in Education (5th ed.). London: New York.
2. Is there an association between personality type Crawford, M. (2014). Strengths and Limitations of
and seniority in companies? Correlational Design. Walden University.
3. What is the relationship between obesity and Creswell, J. (2012). Educational Research (4th ed.).
heart disease? Boston: Pearson Education Inc.
4. Is there a relationship between family income and Fitzgerald, S., Rumrill, P., & Schenker, J. (2004).
grade point average? Correlational Designs in Rehabilitation Research.
5. What is the relationship between education and Journal of Vocational Rehabilitation, 20 (2), 143-150.
income? Gallitano, G. (2002). Identifying Predictive and
6. Is there a relationship between intelligence and Correlational Variables in Male Recidivistic Criminal
self-esteem? Offenders (Doctoral Dissertation). Retrieved from
7. What relationship exists between anxiety and Pro-Quest Information and Learning Company.
achievement? Kumar, R. (2005). Research Methodology. SAGE
8. Use of aptitude test to predict success Publications.
Mujis, D. (2004). Doing Quantitative Research in
CONCLUSION Education. SAGE Publications.
Stangor, C. (2011). Research Methods for the Behavioral
Correlational research is designed to test research Sciences. Boston: Houghton Mifflin Company.
hypotheses in cases where it is not possible or desirable
to experimentally manipulate the independent variable of
interest. It is also desirable because it allows the
investigation of behavior in naturally occurring situations.