Group Assignment
Group Assignment
Correlation is defined as the quantification of the degree to which two random variables are
related, provided that the relationship is linear. It measures the degree of linear relationship
between two variables. If we want to measure the degree of association, calculating the
correlation coefficient can do this.
A correlation coefficient (from co- and relation) is a numerical assessment of the strength of
relationship between the x and y values in a set of (x, y) pairs.
In this section, we introduce the most commonly used correlation coefficient. An investigator is
often interested in how two or more attributes of individuals or objects in a population are related
to one another. For example, an environmental researcher might wish to know how the lead
content of soil varies with distance from a major highway. A common measure of the strength of
linear relationship, called Pearson’s sample correlation coefficient, it is also called product-
moment correlation coefficient. Is based on the sum of the products of Zx and Zy for each
observation in the bivariate data set, ∑ ZxZy .
Although there are several different correlation coefficients, Pearson’s correlation coefficient is
by far the most commonly used, and so the name “Pearson’s” is often omitted and it is referred
to as simply the correlation coefficient.
Properties of r
The value of r does not depend on the unit of measurement for either variable. For
example, if x is height, the corresponding Z score is the same whether height is expressed
in inches, meters, or miles, and thus the value of the correlation coefficient is not
affected. The correlation coefficient measures the inherent strength of the linear
relationship between two numerical variables.
The value of r does not depend on which of the two variables is considered x.
The value of r is between -1 and 1. A value near the upper limit, 1, indicates a substantial
positive relationship, whereas an r close to the lower limit, -1, suggests a substantial
negative relationship.
The correlation coefficient r =1 only when all the points in a scatterplot of the data lie
exactly on a straight line that slopes upward. Similarly, r =-1 only when all the points
lie exactly on a downward-sloping line. Only when there is a perfect linear relationship
between x and y in the sample does r take on one of its two possible extreme values.
The value of r is a measure of the extent to which x and y are linearly related—that is, the
extent to which the points in the scatterplot fall close to a straight line. A value of r close
to 0 does not rule out any strong relationship between x and y; there could still be a
strong relationship but one that is not linear.
The p-value can show the significance of the relation also. When P-value is below 0.05,
then we consider the correlation is statistically significant.
Assumptions of the Pearson correlation coefficient; these are the assumptions your data must
meet if u want to use Pearson’s r.
Both variables are on an interval or ratio level of measurement
Data from both variables follow normal distribution
Your data have no outliers
Your data is from a random or representative sample
You expect a linear relationship between the two variables
Where,
r = Rank correlation coefficient
D = Difference between the ranks of two items
n = the number of observations