Lecture 07 Regression
Lecture 07 Regression
x
Example: 2 4 6
x 1 2 3 4 5 –2
y –4 –2 –1 0 2
–4
Larson & Farber, Elementary Statistics: Picturing the World, 3e 3
Linear Correlation
y y
As x As x
increases, y increases, y
tends to tends to
decrease. increase.
x x
Negative Linear Correlation Positive Linear Correlation
y y
x x
No Correlation Nonlinear Correlation
Larson & Farber, Elementary Statistics: Picturing the World, 3e 4
Correlation Coefficient
The correlation coefficient is a measure of the
strength and the direction of a linear relationship
between two variables. The symbol r represents the
sample correlation coefficient. The formula for r is
n xy x y
r .
2 2
n x 2 x n y 2 y
r = 0.91 r = 0.88
x
x
Strong negative correlation
Strong positive correlation
y
y
r = 0.42
r = 0.07
x
x
Weak positive correlation
Nonlinear Correlation
Larson & Farber, Elementary Statistics: Picturing the World, 3e 6
Calculating a Correlation
Coefficient
Calculating a Correlation Coefficient
In Words In Symbols
1. Find the sum of the x-values. x
2. Find the sum of the y-values. y
3. Multiply each x-value by its xy
corresponding y-value and find the
sum.
4. Square each x-value and find the x2
sum. y2
5. Square each y-value and find the n xy x y
r .
sum. n x x
2 2 2
n y y
2
60 There is a strong
0.986
50 74 positive linear
correlation between x
and y.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 8
Correlation Coefficient
Example:
The following data represents the number of hours
12 different students watched television during the
weekend and the scores of each student who took a
test the following Monday.
a.) Display the scatter plot.
b.) Calculate the correlation coefficient r.
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score,
96 85 82 74 95 68 76 84 58 65 75 50
y
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 9
Correlation Coefficient
Example continued:
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score,
96 85 82 74 95 68 76 84 58 65 75 50
y
y
100
80
Test score
60
40
20
x
2 4 6 8 10
Hours watching TV
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 10
Correlation Coefficient
Example continued:
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score,
96 85 82 74 95 68 76 84 58 65 75 50
y
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500
x 54 y 908 xy 3724 x 2 332 y 2 70836
Predicted d
3
y-value
x
Each data point di represents the difference between the
observed y-value and the predicted y-value for a given x-value on
the line. These differences are called residuals.