0% found this document useful (0 votes)
5 views

Lecture Week 12 - Intro To Regression

stats for psychology part 2

Uploaded by

saradump16
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture Week 12 - Intro To Regression

stats for psychology part 2

Uploaded by

saradump16
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2024-03-27

Lecture Week 12

Introduction to Regression

Correlation and regression


• Descriptive and inferential statistical procedures for
when the analysis involves paired outcome measures
• Correlation: degree of relation between two variables
• Regression: prediction of a score on Y from knowledge of a
score on X
• Examples
• Scores on 2 exams
• Is there a relation between the grades on the 2 midterms?
• Does score on midterm 1 predict score on midterm 2?

Linear Regression
How well do scores on one variable predict scores on
another?
• Are stress scores related to health? (Correlation)
• Can you predict health given stress score? (Regression)

• We use a linear equation to predict scores on one variable


from scores on another
• regression only works for linear relations

• We estimate how accurate the predictions are likely to be

1
2024-03-27

Thought experiment
• You are a 1st year student taking Intro Psych. You are
trying to figure out what grade you are likely to get on
the first exam. You know that in past years, the average
of the first exam has been 68, with a standard deviation
of 12.
• If your professor selects your name randomly from the class list
and guesses what grade you’ll get, what should she guess?
• If the professor does this for all students in the course, what amount
of error will there be in her predictions, on average? (How many
percentage points will she be off?)
• What is your best guess for the grade you’ll get?
• What information could improve your professor’s estimate?

8
6
4

Knowing that our


2 Ŷ = bX + a
0
data rarely fall Y -2 Ŷ = ? X + ?
perfectly along a -4
straight line, how -6

do we determine -8
0 1 2 3 4 5 6 7 8 9 10 11
the slope and
intercept? We
have a line that
has been fit to our 8
6
data in the most
accurate possible
4 Ŷ = bX + a
2
way, but we don’t 0 Ŷ = ? X + ?
know the numbers Y -2
that correspond to -4

that line. -6
-8
0 1 2 3 4 5 6 7 8 9 10 11

We use our regression coefficient, and our data

SY cov XY
Slope: b=r = 2 Intercept: a = Y − bX
SX S X

The regression line (i.e., the a and b) we want is the BEST FITTING line
that minimizes prediction errors.

2
2024-03-27

What would your regression equation look like if there were no correlation
between X and Y?

SY cov XY
b=r = 2 =0 a = Y − bX = Y
SX S X

Yˆ = bX + a = 0 X + Y = Y Ŷ = Y
When r = 0, the best prediction for Y (for any value of X) is the mean of Y.
8
6
4
Y
2
0
-2 r=0
-4
-6
-8
0 1 2 3 4 5 6 7 8 9 10 11

7
X

When r = 0, the best prediction for Y (for any value of X) is the mean of Y.
Our rate of error will be Sy.

When r=1, all observations fall exactly on the regression line, so there is
no error in our prediction.

As r increases from 0 to 1, the amount of error in the prediction decreases, since


the average spread of the observations about the regression line decreases.

How do we find the best fit?


• We have our observed values of Y (denoted
Yi); those we actually measured
• We have our predicted values of Y (denoted
Y*hat); those that fall on our regression line
• The line of best fit will minimize the deviations,
or errors, between each observed and
predicted value (each Y and Y*hat)
• We call these errors of prediction “residuals”

3
2024-03-27

How do we find the best fit?


• As in other tests, the deviations around the
line sum to zero
• ½ the points fall above our line, and ½ below it,
since it passes through both xbar and ybar

8
6
4
2
0
-2
-4
-6
-8
0 1 2 3 4 5 6 7 8 9 10 11

10

10

What do we mean by the best fit?


• The line that minimizes the sum of the squared
errors:

( Y − Ŷ )2
• Least Squares Criterion: The sum of squared
deviations between Y and the regression line is
less than between Y and any other line. When
we meet this criterion, we obtain the line of
best fit.

11

11

How do we know we have the


line of best fit?
• The equations we use for calculating a and b
(the regression coefficients) lead to the line
that minimizes the squared errors

SY cov XY
b=r = 2 a = Y − bX
SX S X

12

12

4
2024-03-27

12.1

In a study of alcoholics, the correlation between blood alcohol


concentration (BAC) and the score on a 50-item test of recent
memory is r = -.83. The mean BAC is .01% with s = .004, and
the mean number of items correctly answered on the memory
test is 35, with s = 6. What is the predicted score for an
individual with BAC = .015%?

13

13

You might also like