0% found this document useful (0 votes)
6 views

Lecture Week 13 - Regression

Uploaded by

saradump16
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture Week 13 - Regression

Uploaded by

saradump16
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2024-04-05

Least squares criterion: The best regression line is the one that produces
the smallest sum of squared errors of prediction.

e = 0 ∑e2=minimum

The regression line provides the best prediction of Y values from known X
values, but it is rare that the observed Y is exactly as is predicted (this only
Lecture Week 13 happens when r=+/-1).

Regression
Ŷ = Y only when r = ±1

1 2

The error of the estimate


Review: Standard Deviation
8
errors of
6 estimation
4
2
• Remember,

 ( y − y)
0
2
-2
-4
sy =
-6
-8 N −1
0 1 2 3 4 5 6 7 8 9 10 11

• If we know the x score we’re interested in, we can calculate its


• The error of the estimate is a measure of how corresponding predicted y score. This is the case in regression.
well our regression line fits our data • Then, the best estimate of Y is Y*hat, not Ybar
• To calculate it:
• Calculate the predicted value (y*hat) for a certain x
• Subtract that from the observed value (y) for that x
3 7

3 7

8
6
4
2
0
-2
-4

• Using the same logic that we used for calculating Sy, we can -6
-8

calculate a measure of the average deviations about the 0 1 2 3 4 5 6 7 8 9 10 11

regression line, called the standard error of the estimate


When r = 0
SY −Yˆ = SY
since Ŷ = Y
(for large n)
(Y − Yˆ )2
Definitional formula: S ˆ = The spread of all errors of the estimate
Ŷ = .01X + .533
8

Y −Y
n−2
6
4

r =0 2
0

Y = 0.6 -2

This gives a measure of the standard deviation -4

of points (or errors) about the regression line: SY −Yˆ = 4.56 -6


-8

S Y = 4.30
0 1 2 3 4 5 6 7 8 9 10 11
the standard deviation of y predicted from x
8 10

8 10

1
2024-04-05

SY −Yˆ = 2.01 Simple regression


• Predicting Y (criterion variable) from X (predictor
Yˆ + SY −Yˆ variable)
10
9
8 Yˆ = 0.468 X + 4.08
7
Multiple regression
6 Yˆ − SY −Yˆ • Predicting Y (criterion) from several X (predictors)
5
Y4 Assumption of
3 homoscedasticity:
2 For both, your criterion should be at least ordinal,
width of y distribution
1
does not depend on value
preferably interval or ratio
0 of X. In this example, the
0 1 2 3 4 5 6 7 8 9 10 dashed lines are parallel

X
12 13

12 13

Using multiple regression Various uses of MR


• Can you predict a criterion variable from several predictors?
• MR requires a set of IVs and a DV that are inter-correlated to various • ANOVA summary table – significance test on F
degrees • R2 – variance in Y accounted for by the set of predictors
• MR can be really useful when you want to test a complex, real-world • Does each predictor variable contribute to the prediction of the
criterion? Does adding another variable improve the prediction?
measure that would be hard to control or simplify to the standards • Test if each regression coefficient is significantly different than zero (given its
we’d require in a lab experiment standard error)
• t-test for each regression coefficient
• But remember – this lack of control means we can infer correlation
only: not causation! • Which predictor variable contributes most to the prediction of Y?
• Evaluated by comparing s (standardized b)

• Can you predict future performance? Can you generalize your


regression equation to other data?
• e.g,. Based on his GPA, score on the MCAT, and his volunteer experience,
predict John’s grades in 1st-year med school.
14 15

14 15

The regression equation Assumptions of MR


• y is normally distributed
Simple regression: yˆ = bx + a • Check skew/SEskew < 3 (SPSS descriptives)
• Check for outliers
b is slope, a is y-intercept
• Best if x variables normally distributed (though not necessary – can
be dichotomous)
Multiple regression: yˆ = b0 + b1 x1 + b2 x2 + b3 x3 + bk xk • Linear relationship between y and each x variable
• N is “large enough”
b0 is y-intercept, b1, b2, bk are • Rules of thumb (although these depend on alpha, strength of correlation,
regression coefficients for etc.):
each predictor. k is the number
of predictor variables.
• To test overall model: N ≥ 50 + 8m (m is number of predictors)
• To test individual predictors: N ≥ 114 + m

Still attempting to minimize  ( y − yˆ ) 2


16 17

16 17

2
2024-04-05

Assumptions of MR Assumptions of MR These are


both issues
of
• No multicollinearity redundancy
• No multivariate outliers: • Two IVs are highly correlated (>0.90)
– you don’t
want to
looking at correlations • e.g., scores on two different IQ tests include the
between Y and each X same
• No singularity information
• Tested via Mahalonobis
distance, Cook’s, or • One IV is the combination of several other IVs twice in
your
Leverage • e.g., overall score on IQ test is singular with subtest scores on IQ test regression
• Cook’s < 1: good • Homoscedasticity model

• Variance of residuals (y-yhat) is similar across the range of scores

18 19

18 19

The regression equation


• Best if each IV (x) is strongly correlated with DV (y) but uncorrelated
with other IVs Simple regression: yˆ = bx + a
• Identify lowest number of IVs that will predict DV b is slope, a is y-intercept
• Each IV explains only an independent component of variance in DV
• Which IVs are in the model will influence how useful each IV is in the model –
shared variance is key in MR
Multiple regression: yˆ = b0 + b1 x1 + b2 x2 + b3 x3 + bk xk
b0 is y-intercept, b1, b2, bk are
regression coefficients for
each predictor. k is the number
of predictor variables.

20 Still attempting to minimize  ( y − yˆ ) 2


21

20 21

yˆ = b0 + b1 x1 + b2 x2 + b3 x3 + bk xk
Remember… Calculations
• Best if each IV (x) is strongly correlated with DV (y) but uncorrelated • How do we calculate all those b’s?
with other IVs • It’s complicated to do by hand – use a computer!!
• Identify lowest number of IVs that will predict DV
• Each IV explains only an independent component of variance in DV • But some concepts are familiar
• Which IVs are in the model will influence how useful each IV is in the model –
shared variance is key in MR

22 23

22 23

3
2024-04-05

SSreg ത 2
෍(𝑦ො − 𝑦) Test F for significance
Signal we’re
• The portion of the variance in Y that is explained by all the predictors interested in
evaluating
• The portion of variance in y that is related to changes in x Variability in y that is explained by x

• This is the difference between our predicted y values and the mean of y 𝑀𝑆𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
𝐹=
• since 𝑦ത is best prediction of any y value in the absence of any useful IVs 𝑀𝑆𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙
Noise we’re not
interested in

Variability in y that is not explained by x

24

24 25

Using SPSS Data

We’ll see if we can predict Performance on Comprehensive Exams from these three predictors
26 27

26 27

Analysis

28 29

28 29

4
2024-04-05

30 31

30 31

Significance test for the whole model


• The F test
• Tests null that the multiple R=0
• i.e., that correlation between DV and all IVs = 0

MS regression
F=
MS residual

32 33

32 33

How much variance in

Output Comps Performance is


explained by our 3
Significance tests for coefficients Model Summaryb
predictors
Standard deviation
Adjusted St d. Error of of the residuals
Model R R Square R Square the Estimate (the difference
• t test 1 .838a
.702 .256 3.896 between predicted
a. Predictors: (Constant), Grades in grad school, Y and observed Y
• Only testing whether that IV adds unique variance to R2 Prof essional motiv ation, Composite of qualif icat ions scores)
f or grad school
b. Dependent Variable: Perf ormance on comprehensiv e How much variance in
exams Comps Performance is
explained by our 3
ANOVAb predictors
Sum of
Model Squares df Mean Square F Sig.
1 Regression 71.640 3 23.880 1.573 .411a
Residual 30.360 2 15.180
Total 102.000 5
a. Predictors: (Const ant), Grades in grad school, Prof essional motiv ation, Composit e
of qualif ications f or grad school
b. Dependent Variable: Perf ormance on comprehensiv e exams
34 35

34 35

5
2024-04-05

Which X is contributing the most to the


intercept
prediction of Y?
Coeffi ci entsa
• Can’t interpret relative size of bs because each are
Unstandardized St andardized
relative to the variable’s scale
Model B
Coef f icients
St d. Error
Coef f icients
Beta t Sig.
Collinearity Statistics
Tolerance VI F
• but s (standardized bs) can be interpreted.
1 (Constant) -4.722 9.066 -.521 .654 • Bigger means more contribution to prediction
Prof essional mot iv ation .658 .872 .319 .755 .529 .832 1.203
Composit e of
qualif ications f or grad .272 .589 .291 .462 .690 .374 2.671

yˆ = b0 + b1 x1 + b2 x2 + b3 x3
school
Grades in grad school .416 .646 .402 .644 .586 .381 2.622
a. Dependent Variable: Perf ormance on comprehensiv e exams

z yˆ = 1 z x1 +  2 z x 2 +  3 z x 3

36

36 37

Another example
• University admissions officer trying to predict the future grades of
applicants
• use three variables (High School GPA, SAT, and Quality of letters of
recommendation) to predict university GPA.
• The regression equation for these data is:
𝑦=
ො 0.3764 X1+0.0012 X2 +0.0227X3 -0.1533

38 39

38 39

Correlations
Correlati ons
Regression analysis
Quality of
letters of Correlation between all predictors and Y.
Univ ersity Highschool recomme How much variance
GPA GPA SAT ndation
Not typically the focus of MR. Will always be in University GPA is
Univ ersity GPA Pearson Correlation 1 .264** -.419** .350** positive (b tells us about direction of explained by our 3
Sig. (2-tailed) .008 .000 .000 relationship between Y and each X). predictors
N 100 100 100 99
Highschool GPA Pearson Correlation .264** 1 .269** .627**
Model Summary
Sig. (2-tailed) .008 .007 .000
N 100 100 100 99 Adjusted St d. Error of
SAT Pearson Correlation -.419** .269** 1 .218* Model R R Square R Square the Estimate
Sig. (2-tailed) .000 .007 .030 1 .634a .402 .384 84.960 Standard deviation
N 100 100 100 99 a. Predictors: (Constant), Quality of letters of of the residuals
Quality of lett ers of Pearson Correlation .350** .627** .218* 1 recommendation, SAT, Highschool GPA (the difference
recommendation Sig. (2-tailed) .000 .000 .030 between predicted
N 99 99 99 99 How much variance Y and observed Y
**. Correlation is signif icant at the 0.01 lev el (2-tailed). in University GPA is scores)
*. Correlation is signif icant at the 0.05 lev el (2-tailed).
explained by our 3
predictors
40 41

40 41

6
2024-04-05

Coeffici entsa
MSerror . A measure of
the difference Unstandardized Standardized
between observed Coef f icients Coef f icients
and predicted scores. Model B Std. Error Beta t Sig.
1 (Constant) 72.695 43.203 1.683 .096
Square root of this =
Highschool GPA 27.901 14.944 .193 1.867 .065
ANOVAb
SEestimate .
SAT -.247 .037 -.547 -6.628 .000
Sum of Quality of lett ers of
25.097 7.343 .349 3.418 .001
Model Squares df Mean Square F Sig. recommendation
1 Regression 461685.1 3 153895.031 21.321 .000a a. Dependent Variable: Univ ersit y GPA
Residual 685722.1 95 7218.127 Measure of relationship between University
Total 1147407 98 GPA and Highschool GPA when SAT and Letters
a. Predictors: (Const ant), Qualit y of let ters of recommendation, SAT, Highschool GPA of Rec. are held constant.
b. Dependent Variable: Univ ersity GPA
aka measure of this relationship with the
effects of the other two variables removed
from the relationship.

aka the average of the b’s between HS-GPA


and U-GPA for each level of SAT and Letters
42 scores. 43

42 43

Model Summary

Adjusted St d. Error of

Different types of running MR analysis Model


1
R
.264a
R Square
.070
R Square
.060
the Estimate
104.885
a. Predictors: (Constant), Highschool GPA
ANOVAb

• Standard (SPSS: Enter method) Sum of


• All IVs enter model at once Model
1 Regression
Squares
81027.002
df
1
Mean Square
81027.002
F
7.365
Sig.
.008a
• Each IV is evaluated in terms of what it adds to the prediction of Y that is Residual 1078088 98 11000.902
different from the prediction from other IVs Total 1159115 99

• Sequential (Hierarchical) a. Predictors: (Const ant), Highschool GPA


b. Dependent Variable: Univ ersity GPA
• IVs entered in order you specify Coeffi cientsa

• Each IV is evaluated in terms of what it adds to the prediction of Y at the point Unstandardized St andardized
Coef f icients Coef f icients
it’s entered
Model B St d. Error Beta t Sig.
• Stepwise (Statistical) 1 (Constant) -66.291 30.972 -2.140 .035
Highschool GPA 38.236 14.089 .264 2.714 .008
• Controversial – based only on the data a. Dependent Variable: Univ ersity GPA
• Forward, backward, stepwise
Is the model significant?
Does HS-GPA contribute significantly to our prediction of U-GPA?
44 How much variance in U-GPA is explained by HS-GPA? 45

44 45

Model Summary

Adjusted St d. Error of
Model R R Square R Square the Estimate
1 .573a .329 .315 89.552 What variables go in the regression equation?
a. Predictors: (Constant), SAT, Highschool GPA
ANOVAb

Sum of • Generally, the ones that contribute significantly to your model.


Model Squares df Mean Square F Sig.
1 Regression 381219.9 2 190609.931 23.768 .000a • It is possible to have a significant R (model) and some variables that are not
Residual 777895.5 97 8019.542 contributing significantly to it.
Total 1159115 99
a. Predictors: (Const ant), SAT, Highschool GPA • These probably should be removed from the equation, as they are not adding any
b. Dependent Variable: Univ ersity GPA appreciable explanation
Coeffi cientsa
• but there may be theoretical reasons for leaving them in.
Unstandardized St andardized
Coef f icients Coef f icients
Model B St d. Error Beta t Sig.
1 (Constant) 130.701 41.665 3.137 .002
Highschool GPA 58.762 12.488 .406 4.705 .000
SAT -.238 .039 -.528 -6.118 .000
a. Dependent Variable: Univ ersity GPA

Is the model significant?


Does HS-GPA contribute significantly to our prediction of U-GPA? What about SAT?
How much variance in U-GPA is explained by HS-GPA and SAT together? 46 47

46 47

7
2024-04-05

Correlated predictors
• Remember, we want to have predictors that are highly correlated
with the criterion but not with one another. On its own, IV1 would explain
a+b variance in Y.
• The contribution of each variable to the regression model really On its own IV2 would explain
IV1
represents how much variance it explains in the model that can’t be a c+b variance in Y.
explained by other variables in the model. DV With both variables in the
• If predictors are highly correlated with one another, knowing something b model, the importance of
about one tells you something about the other. each will decrease.
c
• so variance in Y explained by one predictor can be explained by another predictor.
IV2 Further, the apparent
importance of each will
depend on which was
entered first.
48 49

48 49

Evaluating adding predictors Evaluating adding predictors: R2 change


Model Summary

Adjusted St d. Error of
Model R R Square R Square the Estimate
Can look at p values to
1 .632a .400 .381 .593 see if a given variable
a. Predictors: (Constant), Quality of letters of contributes significantly
recommendation, SAT, Highschool GPA
to the model.
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 22.211 3 7.404 21.085 .000a
Residual 33.358 95 .351
Total 55.569 98
a. Predictors: (Const ant), Qualit y of let ters of recommendation, SAT, Highschool GPA
b. Dependent Variable: Univ ersity GPA

Coeffici entsa

Unstandardized Standardized
Coef f icients Coef f icients
Model B Std. Error Beta t Sig.
1 (Constant) -.153 .325 -.472 .638
Highschool GPA .376 .115 .363 3.277 .001
SAT .001 .000 .356 4.023 .000
Quality of lett ers of
.023 .051 .045 .443 .659
recommendation
a. Dependent Variable: Univ ersit y GPA
50 51

50 51

Variabl es Entered/Removedb

Variables Variables
Model Ent ered Remov ed Method
1

2
Highschoo
l GPA
SATa
a .

.
Ent er

Ent er
Cheating…
3 Quality of
letters of
. Ent er
recommen
dation
a
• When doing MR, we want to think about the theory and what
a. All requested v ariables entered.
b. Dependent Variable: Univ ersity GPA Look also at SEest! variables make sense to include in the analysis.
• We don’t want to just try entering variables in all different orders and
see which model is best.
Model Summary

• That being said, SPSS provides several ways to do just that.


Change Statistics
Adjusted St d. Error of R Square
Model R R Square R Square the Estimate Change F Change df 1 df 2 Sig. F Change
1 .545a .297 .290 .634 .297 41.042 1 97 .000
2 .631b .398 .386 .590 .101 16.142 1 96 .000
3 .632c .400 .381 .593 .001 .196 1 95 .659
a. Predictors: (Constant), Highschool GPA
b. Predictors: (Constant), Highschool GPA, SAT
c. Predictors: (Constant), Highschool GPA, SAT, Qualit y of let ters of recommendation

52 53

52 53

8
2024-04-05

Simultaneous vs. Hierarchical Regression Avoid hierarchical methods!


• Simultaneous (SPSS: Enter)
• Put all the predictors in at once and see whether they contribute to the
model
They are not grounded in good theory – they look only at the numbers.
• Hierarchical (SPSS: Forward, Backward, Stepwise) This is not the way we should do science or statistics.
• Enter the predictors into the model in a specific order that is based on
previous work or theory.
• Should only be used if you have good reason to believe one variable is more
important than another.
• The contribution of each variable to the model is assessed as it is entered. If it
doesn’t add to the model, it’s removed.

54 58

54 58

What is b? Regression coefficients


b coefficients
• A measure of the relationship between the predictor and criterion • unstandardized (in units of each X variable)
controlling for all other predictors in the model • show the change in Y that’s associated with a one-unit change in X – in each
raw data scale.
• the relationship between X1 and Y if X2 is set at a constant
• Use these to construct your regression equation for real-world predictions
• The average relationship between X1 and Y if it’s measured at all possible values for X2 (i.e., to predict scores on the Y variable).

β coefficents
• Standardized (in units of standard deviation)
• Show the change in Y that’s associated with a one-unit change in X - but now
the changes are in standard deviations of both variables.
• how strongly each predictor variable influences Y
• Use these to compare the relative effects of variables in a regression model.
Both b-coefficients and beta-coefficients can be interpreted as
59 60
controlling for the effects of other variables.

59 60

Beta coefficients Beta coefficients


• A measure of how strongly each predictor (x) influences the criterion (y).
• Standardized variables: a 1 standard deviation change in X1 equals β1 standard
• β1 = 1.5 means that a change of one standard deviation in X1 is associated with an
deviations change in Y.
increase of 1.5 standard deviations in our predicted score. • A negative beta coefficient means that a 1 unit positive standard deviation
• The higher the beta value, the greater the impact of that predictor on the change in X is expected to result in a negative change in Y.
criterion. • e.g. β=-3. This means a 1 standard deviation change in X is expected to result in a -3 standard
deviation change in Y.
• With only one predictor variable in the model, beta is equivalent to the • This is very similar to the slope we’ve thought of before
correlation coefficient between the predictor and the criterion variable. • In fact, it is the slope of our regression line, keeping all other variables constant.

• With more than one predictor variable, we can’t compare the contribution of
each predictor variable by simply comparing the correlation coefficients (b’s).
• The beta coefficient allows such comparisons

61 63

61 63

9
2024-04-05

Beta coefficients
• A negative beta means means your two variables (X,Y) are negatively correlated.
Don’t ignore the sign!
• BUT if the beta coefficient seems to change sign as you add or subtract other X
variables, you may have multicollinearity.
• Normally the beta coefficient should not change sign as you add more independent
variables.

64

64

10

You might also like