0% found this document useful (0 votes)
30 views43 pages

Chapter 4 - Multiple Regression Analysis

Lecture notes on regression analysis

Uploaded by

SolomonSakala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views43 pages

Chapter 4 - Multiple Regression Analysis

Lecture notes on regression analysis

Uploaded by

SolomonSakala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Multiple Regression Analysis

 Predicting an outcome (dependent variable)


based upon several independent variables
simultaneously.
 Behavior is rarely a function of just one
variable, but is instead influenced by many
variables.
 So the idea is that we should be able to
obtain a more accurate predicted score if
using multiple variables to predict our
outcome.
 Multiple regression analysis is more amenable
to ceteris paribus analysis because it allows us
to explicitly control for many other factors
which simultaneously affect the dependent
variable.
 Because multiple regression models can
accommodate many explanatory variables that
may be correlated, we can hope to infer
causality in cases where simple regression
analysis would be misleading.
 Naturally, if we add more factors to our model
that are useful for explaining y, then more of
the variation in y can be explained.
Formal Statement of the Model
General regression model

Y  a  1 x1   2 x2     k xk  u
  , 1, , k are parameters
 X1, X2, …,Xk are known constants
  , the error terms
Several
Several Predictor
Predictor Variables
Variables

Relationship between one dependent & two or


more independent variables is a linear
function

Population Population Random


Y-intercept slopes error


Y 
Y 
00  11X 
X11  22 X
X 22 
 
 PP X 
X PP  

Dependent Independent
(response) (explanatory)
variable variables
 For two independent variable, the general form of
the multiple regression equation is:
 Y/ = α +b1X1 +b2X2 + u
 X1 and X2 are the independent variables.
 a is the Y-intercept.
 b1 is the net change in Y for each unit change in X1,
holding X2 constant.
 It is called a partial regression coefficient, a net
regression coefficient or just a regression
coefficient.
 b2 is the net change in Y for each single unit change
in X2 holding X1 constant
Statistical Model for Multiple Regression
 Correlation matrix is used in multiple regression
to estimate the parameters.
 The analysis of correlation matrix is an
important step in the solution of any problem
involving many independent variables.
 Correlation coefficient ( ) indicates the
relationship between variable 1 and 1 etc.
 Suppose that we took our 5 randomly selected
salespeople and collected the information in
the Table 1.2
Example
 You have the following statistics from the data
 Use coefficient correlation to calculate
Multiple Correlation coefficient (R)
 Therefore R = .9360 and it means the combine
correlation between Years in Education and
Motivation with Annual Sales is 0. 9360.
 So the two variables have strong relationship
with annual sales
Multiple Regression Estimation of Parameters
Lets make a Prediction
 You interviewed the potential sales person and
she had 13 years of education and the scored 49
on the Higgins Motivation scale.
 How Much money would this salesperson would
bring in on annual basis?
Example

 Mr Kuwaza considers a multi regression model


relating to sales volume (Y) to price (x1) and
adverting (x2) as follows:
Correlation Matrix
 The standard Deviation estimates were
calculated as follows:

Sdy = 10.25
SDx1 = 2.75
SDx2 = 4.15

Calculate, B1, B2 and a


Analysis of Variance in Regression

 The specific test considered here is called analysis


of variance (ANOVA) and is a test of hypothesis
that is appropriate to compare means in two or
more independent comparison groups
 The ANOVA test can also be used to test the
validity of the multiple regression model where
an F-test is used to test significance of the overall
model.
Testing the Group of Means
 One study design is to recruit a group of individuals and
then randomly split this group into three or more smaller
groups (i.e., each participant is allocated to one, and only
one, group).
 For example, a researcher wishes to know whether different
pacing strategies affect the time to complete a marathon.
 The researcher randomly assigns a group of volunteers to
either a group that (a) starts slow and then increases their
speed, (b) starts fast and slows down or (c) runs at a steady
pace throughout.
 The time to complete the marathon is the outcome
(dependent) variable.
 This study design is illustrated schematically in the diagram
below
Example
A clinical trial is run to compare weight loss programs and
participants are randomly assigned to one of the
comparison programs and are counseled on the details of
the assigned program. Participants follow the assigned
program for 8 weeks. The outcome of interest is weight
loss, defined as the difference in weight measured at the
start of the study (baseline) and weight measured at the
end of the study (8 weeks), measured in pounds.
Three popular weight loss programs are considered. The
first is a low calorie diet. The second is a low fat diet and
the third is a low carbohydrate diet. For comparison
purposes, a fourth group is considered as a control group.
Participants in the fourth group are told that they are
participating in a study of healthy behaviors with weight
loss only one component of interest.
The control group is included here to assess the placebo
effect (i.e., weight loss due to simply participating in the
study). A total of twenty patients agree to participate in
the study and are randomly assigned to one of the four
diet groups. Weights are measured at baseline and
patients are counseled on the proper implementation of
the assigned diet (with the exception of the control
group). After 8 weeks, each patient's weight is again
measured and the difference in weights is computed by
subtracting the 8 week weight from the baseline weight.
Positive differences indicate weight losses and negative
differences indicate weight gains. For interpretation
purposes, we refer to the differences in weights as
weight losses and the observed weight losses are shown
below.
 Is there a statistically significant difference in
the mean weight loss among the four diets?
Step 1. Set up Hypotheses and determine level of significance

H0: U1 = μ2 = μ3 = μ4
H1: Means are not all equal

At α=0.05
Step 2. Determine the Critical Value
 The appropriate critical value can be found in a
table of probabilities for the F distribution.
 In order to determine the critical value of F we
need degrees of freedom, df1=k-1 and df2=N-k.
In this example, df1=k-1=4-1=3 and df2=N-k=20-
4=16. The critical value is 3.24 and the decision
rule is as follows: Reject H0 if F > Critical Value
 Where k = the number of independent
comparison groups
 N – Total sample Size
Step 3. Compute group Means

 In order to compute the sums of squares we


must first compute the sample means for each
group and the overall mean based on the total
sample.

 If we pool all N=20 observations, the overall


mean is = 3.6.
Step 4: Calculating Sum of Squares Between
Groups

 Compute

 SSB = 5(6.6 - 3.6)2 + 5(3.0 -3.6)2 + 5(3.4 - 3.6)2 +


5(1.2 -3.6)2
 SSB = 45.0 + 0.2 + 28.8 + 1.8 = 75.8
Step 5 :Calculating Sum of Squares of Errors

 Next we Compute
 SSE requires computing the squared differences between
each observation and its group mean. We will compute
SSE in parts. For the participants in the low calorie diet.

 Thus,  (X- 1 )2 = 21.2


 For the participants in the low fat diet:

 Thus, S (X- 2 = 10.0.


)2
 For the participants in the low carbohydrate
die.

 Thus, S (X- 3 = 5.2


) 2
 For the participants in the control group

 Thus, S (X- 4 )2 = 10.8


 Therefore
= 21.4 + 10.0 + 5.4 + 10.8 = 47.6
Step 6: Calculating F Statistic
 Therefore the ANOVA Table
Step 7. Interpretation of Results
 We reject H0 because 8.43 > 3.24. We have statistically
significant evidence at α=0.05 to show that there is a
difference in mean weight loss among the four diets.
 In this example, we find that there is a statistically significant
difference in mean weight loss among the four diets
considered.
 In this example, participants in the low calorie diet lost an
average of 6.6 pounds over 8 weeks, as compared to 3.0 and
3.4 pounds in the low fat and low carbohydrate groups,
respectively.
 Participants in the control group lost an average of 1.2 pounds
which could be called the placebo effect because these
participants were not participating in an active arm of the trial
specifically targeted for weight loss.
Exercise

 Calcium is an essential mineral that regulates the heart, is


important for blood clotting and for building healthy bones. The
National Osteoporosis Foundation recommends a daily calcium
intake of 1000-1200 mg/day for adult men and women. While
calcium is contained in some foods, most adults do not get enough
calcium in their diets and take supplements. Unfortunately some
of the supplements have side effects such as gastric distress,
making them difficult for some patients to take on a regular basis.
 A study is designed to test whether there is a difference in mean
daily calcium intake in adults with normal bone density, adults
with osteopenia (a low bone density which may lead to
osteoporosis) and adults with osteoporosis. Adults 60 years of age
with normal bone density, osteopenia and osteoporosis are
selected at random from hospital records and invited to
participate in the study. Each participant's daily calcium intake is
measured based on reported food intake and supplements. The
data are shown below.
 Is there a statistically significant difference in
mean calcium intake in patients with normal
bone density as compared to patients with
osteopenia and osteoporosis? We will run the
ANOVA using the five-step approach.
Computer Output (ANOVA)
Interpretation
 This is the table that shows the output of the
ANOVA analysis and whether we have a
statistically significant difference between
our group means.
 We can see that the significance level is 0.003
(p = .003), which is below 0.05 and therefore,
there is a statistically significant difference in
the mean.
 Therefore we reject the null hypothesis

You might also like