Lecture 7 8 Weeks Correlation and Regression
Lecture 7 8 Weeks Correlation and Regression
Regression
1
Correlation
A correlation is a relationship between two variables.
The data can be represented by the ordered pairs (x,
y) where x is the independent (or explanatory)
variable, and y is the dependent (or response)
variable.
y
x
Example: 2 4 6
x 1 2 3 4 5 –2
y –4 –2 –1 0 2
–4
2
Correlation Coefficient
The correlation coefficient is a measure of the
strength and the direction of a linear relationship
between two variables. The symbol r represents the
sample correlation coefficient. The formula for r is
n xy x y
r .
n x x n y y
2 2 2 2
3
Linear Correlation
y
y
r = 0.91 r = 0.88
x
x
Strong negative correlation
Strong positive correlation
y
y
r = 0.42
r = 0.07
x
x
Weak positive correlation
Nonlinear Correlation
4
Residuals
After verifying that the linear correlation between two
variables is significant, next we determine the equation
of the line that can be used to predict the value of y for
a given value of x.
Observed
y
y-value
Predicted d
3
y-value
x
Each data point di represents the difference between
the observed y-value and the predicted y-value for a
given x-value on the line. These differences are called
residuals. 5
Regression equation
Example continued:
Using the equation ŷ = –4.07x + 93.97, we can predict
the test score for a student who watches 9 hours of TV.
ŷ = –4.07x + 93.97
= –4.07(9) + 93.97
= 57.34
6
Linear Correlation
y y
As x increases, As x increases,
y tends to y tends to
decrease. increase.
x x
Negative Linear Correlation Positive Linear Correlation
y y
x x
No Correlation Nonlinear Correlation
7