0% found this document useful (0 votes)

17 views11 pages

Multiple Regression

This document summarizes the key concepts of multiple linear regression: - Multiple regression is used to predict a quantitative response variable using 2 or more quantitative explanatory variables. The statistical model includes a term for each explanatory variable. - Assumptions include independence of responses, normally distributed errors, linear relationships between variables, and constant variability. - Variables are examined individually and together to understand relationships and identify strong predictors for inclusion in the regression model. - ANOVA is used to test if the regression coefficients are jointly equal to zero, while t-tests assess the significance of individual coefficients.

Uploaded by

daniel.jublee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views11 pages

Multiple Regression

Uploaded by

daniel.jublee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Lecture 10: Multiple Regression

Readings: Chapter 11

Jul 21, 2011

Introduction

• Simple Linear Regression (Chapters 2 and 10): used when there is a single quantitative
explanatory variable.

• Multiple Regression: used when there are 2 or more quantitative explanatory variables
which will be used to predict the quantitative response variable.

The Model

• For simple linear regression, the statistical model is:

y = β0 + β1 x +

• For multiple regression, the statistical model is:

y = β0 + β1 x1 + β2 x2 + . . . βp xp + ,

where p is the number of explanatory variables.

Multiple Regression Assumptions

• Independence: Responses yi ’s are independent of each other (examine the way in which sub-
jects/units were selected in the study).

• Normality: For any fixed value of x, the response y varies according to a normal distribution
(normal probability plot of the residuals).

• Linearity: The mean response has a linear relationship with x (scatter plot of y against each
predictor variable).

• Constant variability: The standard deviation of y (σ) is the same for all values of x (scatter
plots of residuals against predicted values).

What to do when we have multiple x variables?

1. Look at the variables individually.

– Means, standard deviations, minimums, and maximums, outliers (if any), stem plots or
histograms are all good ways to show what is happening with your individual variables.
– In SPSS, Analyze → Descriptive Statistics → Explore.

2. Look at the relationships between the variables using the correlation and scatter
plots.

– In SPSS, Analyze → Correlate → Bivariate. Put all your variables (all the x’s and y) into
the “variables” box, and hit “ok”.
– The correlations helps us determine which are the stronger relationships between the y and
an x.
– Are there strong x-to-x relationships?
– Look at scatter plots between each pair of variables, too.
– We are only interested in keeping the variables which had strong relationships with the
response variable y.

1
3. Do a regression using the all potential explanatory variables.

– This will include an ANOVA table and coefficients output like in Chapter 10. These
regression results will indicate/confirm which relationships are strong.
– The regression equation is

ŷ = b0 + b1 x1 + . . . + bp xp

– The residual for the ith observation is

ei = yi − ŷi = observed response - predicted response

– The estimate of σ, the constant standard deviation of y, is

v
u n 2
u P
e
t i=1 i
u
s=
n−p−1

ANOVA Table for Multiple Regression

Sum of Squares Degrees of Mean Squares F Sig.

Freedom
n SSM MSM
(ŷi − ȳ)2
P
Regression SSM = DFM = p MSM = p-value
i=1 DFG MSE
n SSE
(yi − ŷi )2
P
Error SSE = DFE = n−p−1 MSE =
i=1 DFE
n SST
(yi − ŷ)2
P
Total SST = DFT = n − 1 MST =
i=1 DFT

• We had ANOVA results for simple linear regression in chapter 10. But since we only had one
regression coefficient β1 , we didn’t need to use it.

• The estimate of constant standard deviation of y is in fact

v
u n 2
u P
e √
t i=1 i
u
s= = MSE
n−p−1

• Coefficient of determination R2 :
SSM
R2 = .
SST
– R2 measures the fraction of the variation in the values of y that is explained by the linear
regression of y on x1 , x2 , . . . , xp .
– It measures the amount of linear association between the response variable y and the
multiple explanatory variables.

Inference on Regression Coefficients

• Analysis of Variance F -test (global test):

– Hypotheses:

H0 : β1 = β2 = . . . = βp ,
Ha : At least one of the regression coefficients is not zero.

– Test statistic:
F = MSM/MSE.
When H0 is true, the F statistic follows the F (p, N − p − 1) distribution. When Ha is true,
the F statistic tends to be large.

2
– P-value (read from the SPSS output).
– State conclusions in terms of the problem.

– Note:
∗ The F -test is an overall test that tells us whether we want to proceed.
∗ Rejecting H0 means that we need to further analysis (individual t-test) to see which
regression coefficient is different from zero (Think back: We did Bonferroni multiple
comparison procedure if we could reject the null hypothesis in a One-way ANOVA F -
test).
∗ Even if the p-value is small, we still need to look at R2 . If the R2 is small, it means
that the model (variables) we are using does not do a very good job of explaining the
variation in y.

• Inference on Individual Regression Coefficients:

– A level C confidence interval for βj is:

bj ± t∗ SEbj ,

∗ Note: SPSS will give us 95% confidence intervals, but you may have to use the estimates
for the coefficients and their standard errors to find other confidence intervals (use t
table and n − p − 1 degrees of freedom to get t∗ ).

– The test statistic for testing H0 : βj = 0 is

bj
t= .
SEbj

The p-value for the test of H0 against Ha is:

∗ For Ha : β1 < 0, p-value = P (T < t),
∗ For Ha : β1 > 0, p-value = P (T > t),
∗ For Ha : β1 6= 0, p-value = 2P (T > |t|),
where T ∼ t(n − p − 1).

4. Interpretation of results.

– Sometimes variables that are significant by themselves may not be significant when other
variables are included too.
– The significance tests for individual regression coefficients assess the significance of each
predictor variable assuming that all other predictors are included in the regression equation.

5. Residuals.

– Use residuals to help determine whether the multiple regression model is appropriate for
the data.
– Plot residuals versus each of the explanatory variable and versus the response variable.
– Look for outliers, influential observations, evidence of a nonlinear relation, and anything
else unusual.
– Use a normal probability plot to determine that the residuals are normally distributed
(Look for your points to make an increasing line).

6. Refine the model - We are interested in keeping only the variables with the strongest
relationship......

– Try deleting deleting the variable with the largest p-value (the weakest relationship), and
re-run the regression. You may have to do this again and again, each time deleting a
variable with a weak relationship.

3
– Check to see if R2 , s, p-values from the F -test and individual t-tests change much.
∗ R2 should not drop too much when you remove a variable.
∗ The standard deviation should be as small as possible.
∗ The test statistic from the ANOVA F -test should be the largest and the p-value should
be the smallest.
∗ Any x variables left in the equations should have a significant p-value from their t-test
(their coefficient confidence intervals should not contain 0) unless taking out a slightly
insignificant coefficient makes the R2 and s move the wrong direction.

How do we know which variables should be included in our model and which should
not?

• ****Procedure 1: Start with a model that contains all your explanatory variables with strong
correlations, run the regression, and then remove one at a time whichever variables aren’t
significant from the t-test until you find that your R2 starts to decrease too rapidly or your s
goes up too rapidly. You may end up leaving in one or more variables which are not significant
on their own. You just have to see what removing them does to the whole model. (This is the
procedure that we will follow in the lecture notes and that you should use for this
class.)

• Procedure 2: Start with a model that contains only one explanatory variable and add one
variable at a time till you find that your R2 is no longer increasing rapidly.

Sometimes there may be more than one appropriate choice for your model. The
most important thing is to be able to explain why you chose the model you did.
Not every model is as easy to define as the one in the CHEESE example below.

Example 1: As cheddar cheese matures a variety of chemical processes take place. The taste
of mature cheese is related to the concentration of several chemicals in the final product. In a
study of cheddar cheese from the La Trobe Valley of Victoria, Australia, samples of cheese were
analyzed for their chemical composition and were subjected to taste tests. Data for one type of
cheese-manufacturing processes appears in below. The variable “Case” is used to number the
observations from 1 to 30. “Taste” is the response variable of interest. The taste scores were
obtained by combining the scores from several tasters.

Three chemicals whose concentrations were measured were acetic acid, hydrogen sulfide, and
lactic acid. For acetic acid and hydrogen sulfide (natural) log transformations were taken.
Thus the explanatory variables are the transformed concentrations of acetic acid (“Acetic”) and
hydrogen sulfide (“H2S”) and the untransformed concentration of lactic acid (“Lactic”). These
data are based on experiments performed by G. T. Lloyd and E. H. Ramshaw of the CSIRO
Division of Food Research, Victoria, Australia.

4
Case Taste Acetic H2S Lactic
1 12.3 4.543 3.135 0.86
2 20.9 5.159 5.043 1.53
3 39 5.366 5.438 1.57
4 47.9 5.759 7.496 1.81
5 5.6 4.663 3.807 0.99
6 25.9 5.697 7.601 1.09
7 37.3 5.892 8.726 1.29
8 21.9 6.078 7.966 1.78
9 18.1 4.898 3.85 1.29
10 21 5.242 4.174 1.58
11 34.9 5.74 6.142 1.68
12 57.2 6.446 7.908 1.9
13 0.7 4.477 2.996 1.06
14 25.9 5.236 4.942 1.3
15 54.9 6.151 6.752 1.52
16 40.9 6.365 9.588 1.74
17 15.9 4.787 3.912 1.16
18 6.4 5.412 4.7 1.49
19 18 5.247 6.174 1.63
20 38.9 5.438 9.064 1.99
21 14 4.564 4.949 1.15
22 15.2 5.298 5.22 1.33
23 32 5.455 9.242 1.44
24 56.7 5.855 10.199 2.01
25 16.8 5.366 3.664 1.31
26 11.6 6.043 3.219 1.46
27 26.5 6.458 6.962 1.72
28 0.7 5.328 3.912 1.25
29 13.4 5.802 6.685 1.08
30 5.5 6.176 4.787 1.25

a. For each of variables in the data set, find the mean, median, standard deviation, and IQR.
Display each distribution with a boxplot.

5
6
b. Make a scatter plot for each pair of variables in the CHEESE data set (you will have 6
plots). Describe the relationships.

c. Which explanatory variables (x’s–Acetic, H2S, Lactic) are most strongly correlated to the
response variable (y, taste)? Calculate the correlation for each pair of variables and report
the P-value for the test of zero population correlation in each case.

7
d. Which variables look important at this time?

e. Perform a multiple regression using the explanatory variables which look important at this
point. Give the fitted regression equation.

f. State your hypotheses for an ANOVA F -test, give the test statistic and its p-value, and
state your conclusion.

8
g. Report the t test statistics and the p-values for the tests of the regression coefficients of
your explanatory variables. What conclusions do you draw from these tests?

h. Give the 95% confidence intervals for the regression coefficients of your explanatory vari-
ables. Do any of the intervals contain the point 0? (This should verify your answer to part
g).

i. What percent of the variation in taste is explained by the regression line?

j. What is the value of s, the estimate for the standard deviation?

k. One variable looks like a good candidate to be dropped. Which one is it? Try running the
multiple regression again without this variable. Look at parts e through j again.

9
10
What changed? What stayed the same or improved?

Original New (H2S and Change

(Acetic, H2S Lactic)
and Lactic)

R2 65.2% 65.2%

s 10.1307 9.9424

F (P-value) 16.221 (0.000) 25.260 (0.000)

Insignificant Acetic None

explanatory
variables

Which is the better model?

l. Using the better model, predict the “taste” for an H2S=4 and Lactic=1.

m. Now look at a residual plot for each of the variables you still have in the model. Do a
normal probability plot, too.

Dataanalysisusing NVivo Thesis Chapter
No ratings yet
Dataanalysisusing NVivo Thesis Chapter
51 pages
Decomposing The Will
100% (8)
Decomposing The Will
367 pages
Multiple Regression Presentation
No ratings yet
Multiple Regression Presentation
19 pages
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
No ratings yet
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
50 pages
Chapter 4 - Multiple Regression Analysis
No ratings yet
Chapter 4 - Multiple Regression Analysis
43 pages
Correlation Regression
No ratings yet
Correlation Regression
29 pages
mlr and pls
No ratings yet
mlr and pls
26 pages
Dr. Sufian M. Salih / Regression and Correlation
No ratings yet
Dr. Sufian M. Salih / Regression and Correlation
14 pages
Stt151a Notes
No ratings yet
Stt151a Notes
14 pages
Bi Is The Slope of The Regression Line Which Indicates The Change in The Mean of The Probablity Bo Is The Y Intercept of The Regression Line
No ratings yet
Bi Is The Slope of The Regression Line Which Indicates The Change in The Mean of The Probablity Bo Is The Y Intercept of The Regression Line
5 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
73 pages
15Multiple Linear Regression
No ratings yet
15Multiple Linear Regression
168 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
ANOVA, Correlation and Regression: Dr. Faris Al Lami MB, CHB PHD FFPH
No ratings yet
ANOVA, Correlation and Regression: Dr. Faris Al Lami MB, CHB PHD FFPH
40 pages
Chapter 14 Mr
No ratings yet
Chapter 14 Mr
35 pages
UKP6053 - L8 Multiple Regression
100% (2)
UKP6053 - L8 Multiple Regression
105 pages
03 - Simple Linear Regression
No ratings yet
03 - Simple Linear Regression
13 pages
Part 2 Exploring Relationships Among Variables
No ratings yet
Part 2 Exploring Relationships Among Variables
8 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
Ch08 Part 2 - Multiple Regression
No ratings yet
Ch08 Part 2 - Multiple Regression
45 pages
Multiple Linear Regression (Multiple Regression Analysis)
No ratings yet
Multiple Linear Regression (Multiple Regression Analysis)
37 pages
A Sociology of Food and Nutrition by John Germov Lauren Williams (Giped)
100% (2)
A Sociology of Food and Nutrition by John Germov Lauren Williams (Giped)
314 pages
Evans - Analytics2e - PPT - 07 and 08 CH
No ratings yet
Evans - Analytics2e - PPT - 07 and 08 CH
50 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
Unit 4-1
No ratings yet
Unit 4-1
29 pages
ADM2304 Multiple Regression Dr. Suren Phansalker
No ratings yet
ADM2304 Multiple Regression Dr. Suren Phansalker
12 pages
BMS2024-Multiple Linear Regression-1 Lesson
No ratings yet
BMS2024-Multiple Linear Regression-1 Lesson
37 pages
Chapter 11
No ratings yet
Chapter 11
18 pages
Chapter 8.2
No ratings yet
Chapter 8.2
33 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
Regression Analysis (Spring, 2000) : by Wonjae
No ratings yet
Regression Analysis (Spring, 2000) : by Wonjae
6 pages
Regression
No ratings yet
Regression
12 pages
Passing Reference Multiple Regression
No ratings yet
Passing Reference Multiple Regression
10 pages
Session 1.3 Notes
No ratings yet
Session 1.3 Notes
39 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Monika Project
No ratings yet
Monika Project
34 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
Multiple-Regression -Batool & Raya
No ratings yet
Multiple-Regression -Batool & Raya
24 pages
Simple Regression 1
No ratings yet
Simple Regression 1
18 pages
Bio2 Module 4 - Multiple Linear Regression
No ratings yet
Bio2 Module 4 - Multiple Linear Regression
20 pages
Documentary Film Worksheet
100% (1)
Documentary Film Worksheet
3 pages
The Desk Reference of Statistical Quality Methods PDF
100% (1)
The Desk Reference of Statistical Quality Methods PDF
560 pages
L4&5 Multiple Regression 2010B
No ratings yet
L4&5 Multiple Regression 2010B
77 pages
REGRESSION
No ratings yet
REGRESSION
8 pages
Chapter 3 Econometrics
No ratings yet
Chapter 3 Econometrics
34 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
Lecture 10
No ratings yet
Lecture 10
5 pages
Cheat Sheet Statistics
No ratings yet
Cheat Sheet Statistics
3 pages
Multiple Regression Analysis 1
No ratings yet
Multiple Regression Analysis 1
57 pages
Notes On Linear Regression - 2
No ratings yet
Notes On Linear Regression - 2
4 pages
QM II Formula Sheet
No ratings yet
QM II Formula Sheet
2 pages
Arizona Engineer Spring 2008
No ratings yet
Arizona Engineer Spring 2008
24 pages
4.1 Multiple Regression Models
No ratings yet
4.1 Multiple Regression Models
6 pages
Voices & Experience of War-Widows in Murewa
No ratings yet
Voices & Experience of War-Widows in Murewa
19 pages
Multiple Linear Regression Analysis
No ratings yet
Multiple Linear Regression Analysis
23 pages
Cheming e
No ratings yet
Cheming e
61 pages
Multiple Linear Regression: y BX BX BX
No ratings yet
Multiple Linear Regression: y BX BX BX
14 pages
BAS-1 T123 Session 5
No ratings yet
BAS-1 T123 Session 5
53 pages
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
No ratings yet
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
3 pages
Factors Influencing Purchase Decision of Organic Products: M Elakkiya
No ratings yet
Factors Influencing Purchase Decision of Organic Products: M Elakkiya
4 pages
Environmental Data Sets With Below Detection Limit Observations
No ratings yet
Environmental Data Sets With Below Detection Limit Observations
27 pages
Ifls5 Book 3a
No ratings yet
Ifls5 Book 3a
59 pages
Inner_Outer_Couple_Saiz
No ratings yet
Inner_Outer_Couple_Saiz
17 pages
What Are Decision-Making / Reasoning Skills?
0% (1)
What Are Decision-Making / Reasoning Skills?
4 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
HHS Public Access: DSM-5 Disruptive Mood Dysregulation Disorder: Correlates and Predictors in Young Children
No ratings yet
HHS Public Access: DSM-5 Disruptive Mood Dysregulation Disorder: Correlates and Predictors in Young Children
21 pages
Chapter 5 and 6
No ratings yet
Chapter 5 and 6
35 pages
The Significance of Task Significance: Job Performance Effects, Relational Mechanisms, and Boundary Conditions
No ratings yet
The Significance of Task Significance: Job Performance Effects, Relational Mechanisms, and Boundary Conditions
17 pages
The Livable Urban Landscape
No ratings yet
The Livable Urban Landscape
11 pages
Sale Data Analysis - Group Project Statistics.
No ratings yet
Sale Data Analysis - Group Project Statistics.
15 pages
Strategic Management Notes
No ratings yet
Strategic Management Notes
54 pages
The Aristotelian Hylemorphism Is Not Mereological
No ratings yet
The Aristotelian Hylemorphism Is Not Mereological
6 pages
Cabrera, Kolacz Et Al. (2018) BPQ Psychometrics
No ratings yet
Cabrera, Kolacz Et Al. (2018) BPQ Psychometrics
12 pages
Frasier Case Study: Search Over 500,000 Essays..
No ratings yet
Frasier Case Study: Search Over 500,000 Essays..
9 pages
Construction of Cement Concrete Road
No ratings yet
Construction of Cement Concrete Road
6 pages
Clinical Practice Evaluation 4: Ashley Fitzwater 20855496
No ratings yet
Clinical Practice Evaluation 4: Ashley Fitzwater 20855496
94 pages
Chapter 5 Mental Health
No ratings yet
Chapter 5 Mental Health
3 pages
Accomplishment Report
100% (5)
Accomplishment Report
5 pages
Assignment I-IV
No ratings yet
Assignment I-IV
17 pages
Artificial Intelligence/Search/Heuristic Search/astar Search
No ratings yet
Artificial Intelligence/Search/Heuristic Search/astar Search
6 pages
surprise 2 26th
No ratings yet
surprise 2 26th
2 pages
Chapter 3
No ratings yet
Chapter 3
6 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Infinite Series
From Everand
Infinite Series
Isidore Isaac Hirschman
4/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet

Multiple Regression

Uploaded by

Multiple Regression

Uploaded by

Lecture 10: Multiple Regression

Jul 21, 2011

• For simple linear regression, the statistical model is:

• For multiple regression, the statistical model is:

where p is the number of explanatory variables.

Multiple Regression Assumptions

What to do when we have multiple x variables?

1. Look at the variables individually.

– The residual for the ith observation is

ei = yi − ŷi = observed response - predicted response

– The estimate of σ, the constant standard deviation of y, is

ANOVA Table for Multiple Regression

Sum of Squares Degrees of Mean Squares F Sig.

• The estimate of constant standard deviation of y is in fact

Inference on Regression Coefficients

• Analysis of Variance F -test (global test):

• Inference on Individual Regression Coefficients:

– A level C confidence interval for βj is:

– The test statistic for testing H0 : βj = 0 is

The p-value for the test of H0 against Ha is:

i. What percent of the variation in taste is explained by the regression line?

j. What is the value of s, the estimate for the standard deviation?

Original New (H2S and Change

F (P-value) 16.221 (0.000) 25.260 (0.000)

Insignificant Acetic None

Which is the better model?

You might also like