0% found this document useful (0 votes)
22 views30 pages

Unit 10 - More Multiple Regression - 1 Per Page

This document discusses additional topics in multiple regression analysis including: 1. Using binary predictor variables to compare three or more group means through regression analysis. 2. Performing contrast tests after an overall significant F-test to investigate specific comparisons among parameters. 3. Applying multiple comparison corrections like Bonferroni to address the problem of multiple comparisons. 4. Transforming variables to better meet the assumptions of linear regression models.

Uploaded by

Kase1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views30 pages

Unit 10 - More Multiple Regression - 1 Per Page

This document discusses additional topics in multiple regression analysis including: 1. Using binary predictor variables to compare three or more group means through regression analysis. 2. Performing contrast tests after an overall significant F-test to investigate specific comparisons among parameters. 3. Applying multiple comparison corrections like Bonferroni to address the problem of multiple comparisons. 4. Transforming variables to better meet the assumptions of linear regression models.

Uploaded by

Kase1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Unit 10: More Regression

1
Unit 10 Outline

• More Multiple Regression Topics


– Binary Predictors to compare 3 or more group means
– Contrast Testing
– Multiple comparisons (& the Bonferroni correction)
– Transformation of Variables

2
Example: Inference for 3+ Means – Bone Density
• Studies suggest a link between exercise and healthy bones
• A study of 30 rats examined the effect of jumping on the bone
density of growing rats
• Three treatment groups
– No jumping (10 rats - group 1)
– 30 cm jump (10 rats - group 2)
– 60 cm jump (10 rats - group 3)
• 10 jumps per day, 5 days per week for 8 weeks
• Bone density measured after 8 weeks
• Test to see if the jumping treatments affect bone density
(measured in mg/cm3)

3
Inference for 3+ Means Example – Bone density
• As always, first visualize the data:
700

Groups
1 – No jumping
2 – 30 cm jump
650

3 – 60 cm jump
600

Means & SD’s


1 601 27.4
550

2 613 19.3
1 2 3 3 639 16.6

• We’d like to do a t-test, but there’s no formula for 3 groups 


• Solution: Regression with Binary Predictors
4
The F-test in a Binary Regression Model
• A naive application of regression here might use the codes
no jump = 1, low jump = 2, high jump = 3 in a single predictor,
but this is mathematically incorrect!
• A correct way to apply regression here is to create (I – 1) binary
variables (variables coded (0 or 1), sometimes called dummy
variables) that recreate the groups:
• Then to determine if there are any differences among the 3
groups, we can just perform the F-test from this regression model

5
Connection to Classic ANOVA
• In classic ANOVA, the model is set-up in such a way that is
looks at the group means directly (rather than modeling the
differences between groups like the regression with binary
predictors does). This actually makes the algebra easier by
hand (but complicated to explain/understand).
• The F-test from classic ANOVA is mathematically
equivalent to the F-test from a regression with binary
predictors. It compares the variability between group means
(the model) vs. the variability within groups (the error).
• Is less general than our binary predictor approach because
then it is difficult to have both binary and quantitative
predictors.
• Anyway, an example from Stata is shown on the next slide…
ANOVA Results from Stata
ANOVA from Stata
. oneway bonedensity group

Analysis of Variance
Source SS df MS F Prob > F

Between groups 7433.86667 2 3716.93333 7.98 0.0019


Within groups 12579.5 27 465.907407

Total 20013.3667 29 690.116092

Bartlett's test for equal variances: chi2(2) = 2.3353 Prob>chi2 = 0.311

7
Building a Regression Model with Binary Predictors
• Below are the summary statistics from a survey given out during the
regular school year asking how many texts the student sends per day, split
across class year:
. tabulate year, summarize(text_day)
| Summary of text_day
year | Mean Std. Dev. Freq.
------------+------------------------------------
freshman | 68.074074 73.094736 27
junior | 28.645161 29.214778 31
senior | 21.4 19.043993 20
sophomore | 56.56044 134.71164 91
------------+------------------------------------
Total | 49.118343 104.87412 169

• If we were to run a regression with the binary x-variables of just soph,


junior, and senior, what would be the resulting estimated regression
model (aka, the formula for the regression model)? Hint: which group
is the reference group?
yˆ  68.07  11.51( X soph )  39.43( X jr )  46.67( X sr )
Solution from Stata:
. regress text_day soph junior senior

Source | SS df MS Number of obs = 169


-------------+------------------------------ F( 3, 165) = 1.31
Model | 43101.4669 3 14367.1556 Prob > F = 0.2718
Residual | 1804660.17 165 10937.3343 R-squared = 0.0233
-------------+------------------------------ Adj R-squared = 0.0056
Total | 1847761.63 168 10998.5811 Root MSE = 104.58

------------------------------------------------------------------------------
text_day | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
soph | -11.51363 22.91892 -0.50 0.616 -56.7658 33.73853
junior | -39.42891 27.53005 -1.43 0.154 -93.7855 14.92768
senior | -46.67407 30.85374 -1.51 0.132 -107.5931 14.24495
_cons | 68.07407 20.12676 3.38 0.001 28.33488 107.8133
------------------------------------------------------------------------------

a) Is there any evidence that these groups send a different number of


texts per day, on average?
b) From this model, which group sends the most text messages? Which
group sends the fewest?
c) What is the estimate of the st.dev. of texts sent within the groups?
Solution:
a) H 0 : 1   2   3  0
H A : at least one   0
  0.05
SSM / df M
F  1.31
SSE / df E
p  value  0.278
Since our p-value is not less than 0.05, we cannot reject the null. The 4
groups may send about the same number of texts per day on average.

b) Since all the coefficients for the group differences are negative, that
means the reference group, freshmen, send the most texts. Seniors
send the least since their coefficient is most negative.

c) root MSE  se  104.58


Expanding Regression: combo of binary
and quantitative predictors
• Mathematically, there is no reason we cannot have both binary
and quantitative predictors
• It’s simple to do in software (Stata): just include both types in a
regression model
• This allows us to compare group means while controlling for the
effect of a quantitative predictor.
• This also allows us to look at interactions of effects, which we
will not cover in this course. An interactive effect is where the
effect of one variable differs across different groups
• For example (weight vs. height): every extra inch may add 5
lbs. for men while only add 3 lbs. for women, on average
• Visual: allows for non-parallel lines
Combining Binary and Quantitative
Predictors
. regress text_day soph junior senior haircut

Source | SS df MS Number of obs = 167


-------------+------------------------------ F( 4, 162) = 1.07
Model | 47339.9435 4 11834.9859 Prob > F = 0.3750
Residual | 1798015.69 162 11098.8623 R-squared = 0.0257
-------------+------------------------------ Adj R-squared = 0.0016
Total | 1845355.63 166 11116.6002 Root MSE = 105.35

------------------------------------------------------------------------------
text_day | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
soph | -11.86717 23.09216 -0.51 0.608 -57.46763 33.73328
junior | -39.0642 27.73669 -1.41 0.161 -93.83627 15.70787
senior | -47.79541 32.14659 -1.49 0.139 -111.2758 15.68496
haircut | .1501553 .1961588 0.77 0.445 -.2372026 .5375133
_cons | 62.95767 21.34816 2.95 0.004 20.80112 105.1142
------------------------------------------------------------------------------

What is the model statement for this regression? What are the
interpretation of the coefficient estimates (b’s) in this model?
Unit 10 Outline

• More Multiple Regression Topics


– Binary Predictors to compare 3 or more group means
– Contrast Testing
– Multiple comparisons (& the Bonferroni correction)
– Transformation of Variables

13
Contrasts
• After the omnibus F-test has shown overall significance in a multiple
regression, we can then investigate other comparisons of combinations of
parameters using contrasts

• This makes a lot of sense in the binary regression setting where a


combination of parameters can lead to a useful comparison. For the rats
example, what might be an interesting comparison of the 3 groups
involved (no jump, low jump, high jump)?
• We could compare the control (group 1) versus the 2 treatment groups
combined (groups 2 and 3).
• What would this mean in terms of the model β’s?
H0: β1 + β2 = 0
• We may want to compare just the 2 levels of jumping to see if there is
an effect of height (group 2 versus group 3).
• What would this mean in terms of the model β’s?
H0: β1 = β2
14
Results in Stata
After first fitting a regression in stata, contrasts are easy to do using
the test command (make sure you run the appropriate regress first):

To test whether the . test lowjump + highjump == 0


combined effect of the two ( 1) lowjump + highjump = 0
treatment groups is different
from zero: F( 1, 27) = 8.59
Prob > F = 0.0068
H0: β1 + β2 = 0

. test lowjump == highjump


To test whether the two
treatment groups are equal ( 1) lowjump - highjump = 0
to each other:
F( 1, 27) = 7.37
H0: β1 = β2 Prob > F = 0.0114

Note: the textbook’s definition of contrasts are for the ANOVA setting, not
for multiple regression, so the formulas are completely different but have a
similar feel. So do NOT refer to your text for this topic!!!
Unit 10 Outline

• More Multiple Regression Topics


– Binary Predictors to compare 3 or more group means
– Contrast Testing
– Multiple comparisons (& the Bonferroni correction)
– Transformation of Variables

16
The multiple comparisons problem
• To test H0: μ1 = μ2 = . . . = μI, why not simply conduct multiple two-
sample t-tests instead of using the binary regression F-test?
• For example, with I = 9 there are 36 possible pair-wise
t-tests. What is the probability of rejecting a true H0 at least once?
– P(Type I error) = 1 – (0.9536) = 0.84
• This inflated Type I error is due to multiple comparisons: we have
looked at multiple tests at once (each with α = 0.05), and thus it will
lead to significant results that are not truly there (simply by chance).
• In general, we would like the probability of a type I error to be some
fixed value α (e.g., α = 0.05)
• This is accomplished using the overall F-test
• What happens when we start using multiple contrasts or looking at all
the different t-stats in one regression model?
• If we don’t have any contrasts pre-specified, we can just look at all the
pairwise two-sample t-tests to see a difference in groups, but adjust α.
17
The Bonferroni correction
• A solution to the multiple comparisons problem is the adjustment of
α levels using the Bonferroni correction
• This correction is a conservative solution
• Suppose we wish to perform all possible pairs of comparisons
among I groups
I  I! I ( I  1)
There are   
 2  2!( I  2)!  such comparisons
  2
• The Bonferroni correction - to protect the overall level of α we must
perform each individual test at level

 *

I 
 
 2

18
Example - Bonferroni correction
• Suppose we wish to perform pair-wise comparisons among 3
groups but still maintain an overall α = 0.05
• If I = 3, there are

 3 3!
    3 possible comparisons
 2  2!(1)!
(Group 1 versus 2), (1 versus 3), and (2 versus 3)
• The Bonferroni correction says that if we want an overall α level
of < 0.05, then we do each of the 3 tests at the
α* = 0.05 / 3 = 0.0167 level
• Thus, with each test at level α* = 0.0167, this Bonferroni
correction gives an overall α < 0.05

19
Unit 10 Outline

• More Multiple Regression Topics


– Binary Predictors to compare 3 or more group means
– Contrast Testing
– Multiple comparisons (& the Bonferroni correction)
– Transformation of Variables

20
Transformation of Variables

• Way back in Unit 2, we mentioned the importance of linearity


and normality of residuals in any regression model.
• A violation of this can lead to incorrect conclusions for
hypothesis tests in regression: we may not be able to reject the
null hypothesis of no relationship when one is clearly there,
just not linearly (or vice versa)
• How to correct this? Non-linear transformations…like log,
square root, or raised to a power
• This works just fine since these functions are all increasing
functions (which just means an increase on the converted scale
means an increase on the original scale). The order of
observations is preserved.
• Let’s go back to the text messaging data (y = texts, x = class year)
Histogram of residuals,
and scatterplot of residuals vs. fitted

1500
.01
.008

1000
.006

Residuals
Density

.004

500
.002

0
0

0 500 1000 20 30 40 50 60 70
Residuals Fitted values

Why is the residual-vs.-fitted plot just 4 vertical bars? Does


that violate Regression’s Assumptions?

What is a cause for concern in the above graphs?


Fixing non-linearity and non-normality

• When attempting to fix non-linearity or non-normality, follow


these steps:
1. Check the histogram of the y-variable for symmetry
1. If right-skewed, consider logging or square-root
2. If left-skewed, consider raising to second power
3. Use the symmetric version of y in all future analyses
2. After making the y-variable symmetric, look at the
scatterplot of [converted] y vs. x (or multiple x’s).
1. If not linear, consider transforming the x in a similar
fashion as the steps for the y-variable above
2. Continue for all x-variables to consider for model
Example: Predicting Text Messages

• We want to create a regression to predict the number of text


messages (text_day) a person send per day. The candidate
predictors are:

cellphones: the number of different cellphones the student has ever owned
fastest_drive: the fastest the students has ever driven, in mph
haircut: how much the student’s last haircut cost ($)
senior: a 0/1 binary variable for whether or not the student is a senior

• We want to begin our model building process by first making sure


everything will work linearly, and then fit the model…
1. Making the y-variable symmetric
• We first check a histogram of the y-

.01
variable to see if it is skewed:

.008
.006
Density
• It is skewed-right, so we take the log

.004
of it:

.002
gen log_text = log(text_day+1)

0
0 500 1000 1500

• And check the histogram of


text_day

log_text:

.6
• Looks good, so that is what we will

.4
use going forward in all models
Density

.2
Note: the “+1” is just so that our computer
does not vomit when we take the log of zero
0

0 2 4 6 8
log_text
2. Checking the histograms of log(y) vs. x
8
log_text vs. cellphones log_text vs. fastest_drive

8
6

6
log_text

log_text
4

4
2

2
0

0
0 5 10 15 20 0 50 100 150 200
cellphones fastest_drive

log_text vs. haircut log_text vs. senior

8
8

6
6

log_text
log_text

4
4

2
2

0
0

0 50 100 150 200 250 0 .2 .4 .6 .8 1


haircut senior

Which of these look linear-ish? Which of these could benefit


from a transformation of x?
Fixing Symmetry (and Linearity) in the x’s

8
.2

6
.15

log_text
Density

Density

4
.1

.5

2
.05

0
0

0
0 5 10 15 20 0 1 2 3 0 1 2 3
cellphones log_cellphones log_cellphones

cellphones above, haircut below


.02

.8

8
.015

.6

6
log_text
Density
Density

.01

4
.4
.005

2
.2

0
0

0 50 100 150 200 250 0 1 2 3 4 5 0 2 4 6


haircut log_haircut log_haircut

Note: haircut won’t ever get fixed completely since there are a lot of people
all piled up at zero. But it is more symmetric and more linear
3. Fit the Model
. regress log_text log_cellphones log_haircut fastest_drive senior

Source | SS df MS Number of obs = 167


-------------+------------------------------ F( 4, 162) = 11.05
Model | 40.7124798 4 10.1781199 Prob > F = 0.0000
Residual | 149.24163 162 .921244633 R-squared = 0.2143
-------------+------------------------------ Adj R-squared = 0.1949
Total | 189.95411 166 1.14430187 Root MSE = .95981

--------------------------------------------------------------------------------
log_text | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
log_cellphones | .8642687 .1891676 4.57 0.000 .4907163 1.237821
log_haircut | .1166675 .057846 2.02 0.045 .002438 .230897
fastest_drive | .0077078 .0025508 3.02 0.003 .0026707 .012745
senior | -.6377016 .2404366 -2.65 0.009 -1.112496 -.1629077
_cons | .9308021 .4171938 2.23 0.027 .1069629 1.754641
--------------------------------------------------------------------------------
.5

4
.4

2
.3

Residuals
Density

0
.2

-2
.1

-4
0

-4 -2 0 2 2 3 4 5
Residuals Fitted values
Unit 10: Main Points
a) What is the formula for this regression model?
yˆ  0.9308  0.864( x1 )  0.117( x2 )  0.0077( x3 )  0.6377( x4 )

b) What is the interpretation of the coefficient for senior in this


model?

Since the sign of the coefficient is negative, seniors send fewer


text messages per day than non-seniors, controlling for the other
3 predictors. In fact comparing a senior to a non-senior, we
expect a multiplicative change of e-0.6377 = 0.529 in #texts sent
(almost half as many), controlling for the other 3 variables in the
model.

29
Unit 10: Main Points
• When you are trying to compare means across 3 or more groups,
this should be done via a regression with (I – 1) binary predictors
Note: for a categorical response this would be done via a chi-sq test

• If there is evidence of difference among groups, then an a priori


hypothesis can be tested via a contrast F-test

• Care must be taken when doing many different hpothesis tests so


not to inflate the Type I error (using Bonferroni correction).

• Fixing linearity can be done by doing a non-linear transformation


of the y-variable or x-variable (or both), and symmetric variables
usually work the best in regression. Log-transforming usually
works best, and only works for right-skewed variables. It makes
interpretation difficult.
30

You might also like