0% found this document useful (0 votes)
26 views3 pages

Group Assignment

Correlation is defined as the quantification of the degree to which two random variables are related, provided that the relationship is linear. Pearson's correlation coefficient measures the strength and direction of the linear relationship between two variables and ranges from -1 to 1. There are different types of correlation coefficients that can be used depending on the type and distribution of the variables.

Uploaded by

datum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

Group Assignment

Correlation is defined as the quantification of the degree to which two random variables are related, provided that the relationship is linear. Pearson's correlation coefficient measures the strength and direction of the linear relationship between two variables and ranges from -1 to 1. There are different types of correlation coefficients that can be used depending on the type and distribution of the variables.

Uploaded by

datum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Correlation

Correlation is defined as the quantification of the degree to which two random variables are
related, provided that the relationship is linear. It measures the degree of linear relationship
between two variables. If we want to measure the degree of association, calculating the
correlation coefficient can do this.
A correlation coefficient (from co- and relation) is a numerical assessment of the strength of
relationship between the x and y values in a set of (x, y) pairs.
In this section, we introduce the most commonly used correlation coefficient. An investigator is
often interested in how two or more attributes of individuals or objects in a population are related
to one another. For example, an environmental researcher might wish to know how the lead
content of soil varies with distance from a major highway. A common measure of the strength of
linear relationship, called Pearson’s sample correlation coefficient, it is also called product-
moment correlation coefficient. Is based on the sum of the products of Zx and Zy for each
observation in the bivariate data set, ∑ ZxZy .

Pearson’s sample correlation coefficient r is given by


r =∑ ZxZy /n-1

Although there are several different correlation coefficients, Pearson’s correlation coefficient is
by far the most commonly used, and so the name “Pearson’s” is often omitted and it is referred
to as simply the correlation coefficient.
Properties of r
 The value of r does not depend on the unit of measurement for either variable. For
example, if x is height, the corresponding Z score is the same whether height is expressed
in inches, meters, or miles, and thus the value of the correlation coefficient is not
affected. The correlation coefficient measures the inherent strength of the linear
relationship between two numerical variables.
 The value of r does not depend on which of the two variables is considered x.
 The value of r is between -1 and 1. A value near the upper limit, 1, indicates a substantial
positive relationship, whereas an r close to the lower limit, -1, suggests a substantial
negative relationship.
 The correlation coefficient r =1 only when all the points in a scatterplot of the data lie
exactly on a straight line that slopes upward. Similarly, r =-1 only when all the points
lie exactly on a downward-sloping line. Only when there is a perfect linear relationship
between x and y in the sample does r take on one of its two possible extreme values.
 The value of r is a measure of the extent to which x and y are linearly related—that is, the
extent to which the points in the scatterplot fall close to a straight line. A value of r close
to 0 does not rule out any strong relationship between x and y; there could still be a
strong relationship but one that is not linear.
 The p-value can show the significance of the relation also. When P-value is below 0.05,
then we consider the correlation is statistically significant.
Assumptions of the Pearson correlation coefficient; these are the assumptions your data must
meet if u want to use Pearson’s r.
 Both variables are on an interval or ratio level of measurement
 Data from both variables follow normal distribution
 Your data have no outliers
 Your data is from a random or representative sample
 You expect a linear relationship between the two variables

Spearman’s rho (rs):


 Computes the correlation between two ordinal, or ranked variables
 if either the dependent or independent variables are not normally distributed, a Spearman
rank correlation coefficient is more appropriate – i.e. it is a nonparametric test
 This method is based on the ranks of the items rather than on their actual values
 The advantage of this method over the others is that it can be used even when the actual
values of items are unknown
 For example if you want to know the correlation between honesty and wisdom of the
boys of your class, you can use this method by giving ranks to the boys
 It can also be used to find the degree of agreements between the judgments of two
examiners or two judges
The formula is:

Where,
 r = Rank correlation coefficient
 D = Difference between the ranks of two items
 n = the number of observations

Correlation coefficients Data type


Pearson’s Product Moment Correlation Computes the correlation between two
Coefficient interval or ratio variables

Spearman’s rho (rs) Computes the correlation between two


ordinal, or ranked variables

Cramer’s Phi (φC) Measure of association between two


nominal variables (φC = 1)

The Phi Coefficient (rφ) Computes the correlation between two


naturally occurring dichotomous variables
Point biserial correlation Measure of association between an
interval/ratio variable and a dichotomous
variable

Biserial correlation coefficient (rbi) Compute degree of relationship between


two interval (or ratio) scales but for some
logical reason one of the two is more
sensibly interpreted as an artificially
created dichotomous nominal scale

Limitations of correlation coefficient


 It quantifies only the strength of the linear relationship between two variables.
 Care must be taken when the data contain any outliers, or pairs of observations that lie
considerably outside the range of the other data points.
 A high correlation between two variables does not imply a cause-and-effect relationship.
Application of correlation research
 Can provide insight in to complex real-world relationship, helping researchers to develop
theories and make prediction.
 It can tell us about the direction of relationship, the form (shape) of the relationship, and
the degree (strength) of the relationship between two variables.
 Beside the relationship it can measure the performance. With the help of correlation it is
possible to have the correct idea of the working capacity of a person.
 It helps the companies determine which variables they want to investigate further, and it
allows for rapid hypothesis testing.

You might also like