Lecture Week 12 - Intro To Regression
Lecture Week 12 - Intro To Regression
Lecture Week 12
Introduction to Regression
Linear Regression
How well do scores on one variable predict scores on
another?
• Are stress scores related to health? (Correlation)
• Can you predict health given stress score? (Regression)
1
2024-03-27
Thought experiment
• You are a 1st year student taking Intro Psych. You are
trying to figure out what grade you are likely to get on
the first exam. You know that in past years, the average
of the first exam has been 68, with a standard deviation
of 12.
• If your professor selects your name randomly from the class list
and guesses what grade you’ll get, what should she guess?
• If the professor does this for all students in the course, what amount
of error will there be in her predictions, on average? (How many
percentage points will she be off?)
• What is your best guess for the grade you’ll get?
• What information could improve your professor’s estimate?
8
6
4
do we determine -8
0 1 2 3 4 5 6 7 8 9 10 11
the slope and
intercept? We
have a line that
has been fit to our 8
6
data in the most
accurate possible
4 Ŷ = bX + a
2
way, but we don’t 0 Ŷ = ? X + ?
know the numbers Y -2
that correspond to -4
that line. -6
-8
0 1 2 3 4 5 6 7 8 9 10 11
SY cov XY
Slope: b=r = 2 Intercept: a = Y − bX
SX S X
The regression line (i.e., the a and b) we want is the BEST FITTING line
that minimizes prediction errors.
2
2024-03-27
What would your regression equation look like if there were no correlation
between X and Y?
SY cov XY
b=r = 2 =0 a = Y − bX = Y
SX S X
Yˆ = bX + a = 0 X + Y = Y Ŷ = Y
When r = 0, the best prediction for Y (for any value of X) is the mean of Y.
8
6
4
Y
2
0
-2 r=0
-4
-6
-8
0 1 2 3 4 5 6 7 8 9 10 11
7
X
When r = 0, the best prediction for Y (for any value of X) is the mean of Y.
Our rate of error will be Sy.
When r=1, all observations fall exactly on the regression line, so there is
no error in our prediction.
3
2024-03-27
8
6
4
2
0
-2
-4
-6
-8
0 1 2 3 4 5 6 7 8 9 10 11
10
10
( Y − Ŷ )2
• Least Squares Criterion: The sum of squared
deviations between Y and the regression line is
less than between Y and any other line. When
we meet this criterion, we obtain the line of
best fit.
11
11
SY cov XY
b=r = 2 a = Y − bX
SX S X
12
12
4
2024-03-27
12.1
13
13