Correlation and Regression Notes
Correlation and Regression Notes
Correlation
It is a statistical measure which shows the relationship between two or more
variables moving in the same direction or in opposite direction. With correlation, two or
more variables may be compared to determine if there is a relationship and to measure the
strength of that relationship. The correlation coefficient gives the strength of relationship
between the variables.
The correlation may be either positive, negative or zero. The first role of correlation
is to determine the strength of relationship between the two variables represented on the
x-axis and y-axis. The measure of this magnitude is called the correlation co-efficient. The
data required to compute this coefficient are two continuous measurements (x, y) obtained
on the same entity.
If there is a perfect relationship, a straight line can be drawn through all the data
points. The greater the change in y for a constant change in x, the steeper the slope of the
line. In a less than perfect relationship between two variables, the closer the data points are
located on a straight line, the stronger the relationship and greater the correlation
coefficient. In contrast, a zero correlation would indicate absolutely no linear relationship
between the two variables.
Positive Correlation
One variable increases with increase of the other or decreases with decrease of the
other. Eg: Body temperature and pulse.
Negative Correlation
One variable increases with decrease of the other or decreases with increase of the
other. Eg: Insulin and blood sugar.
Zero Correlation
There is no relation between the variables.
The Coefficient of Correlation
A measure of the strength of linear relationship between two variables that is
defined in terms of the covariance of the variables divided by their standard deviations.
Covariance (x, y)
Correlation coefficient, r =
(S.D. of x ) ( S.D. of y)
Types
Simple Linear Regression (1 response – 1 predictor)
Multiple Regression (1 response – Many predictors)
Logistic Regression (Any response or predictors – Nominal / Ordinal)
The calculations involved in the regression line equation can be performed by using the
following values.
Regression equation of x on y :
Regression equation of y on x :