0% found this document useful (0 votes)
13 views27 pages

Multiple Linear Reg Ex 2

Uploaded by

Engineer JO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views27 pages

Multiple Linear Reg Ex 2

Uploaded by

Engineer JO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Multiple Regression Equation

The coefficients of the multiple regression model are


estimated using sample data

Multiple regression equation with k independent variables:


Estimated Estimated
(or predicted) Estimated slope coefficients
value of y intercept

yˆ i  b0  b1x1i  b 2 x 2i    bk x ki
we will always use a computer to obtain the regression slope
coefficients and other regression summary measures.
MultipleLinearRegEx2 -1
Example of MLRM (2 Independent Variables)
• A distributor of frozen desert pies wants to evaluate factors
thought to influence demand. Data are collected for 15 weeks.
• Dependent variable: Pie sales (units per week)
• Independent variables: Price (in $) Advertising ($100’s)

MultipleLinearRegEx2 -2
Pie Sales Example
Pie Price Advertising Multiple regression equation:
Week Sales ($) ($100s)
1 350 5.50 3.3
2 460 7.50 3.3
3 350 8.00 3.0
Sales = b0 + b1 (Price) + b2 (Advertising)
4
5
430
350
8.00
6.80
4.5
3.0
yˆ  b0  b1 X 1  b2 X 2
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7 MultipleLinearRegEx2 -3
Estimating a Multiple Linear
Regression Equation
• Excel will be used to generate the coefficients and
measures of goodness of fit for multiple
regression

• Excel:
• Tools / Data Analysis... / Regression

MultipleLinearRegEx2 -4
Multiple Regression Output
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341 Sales  306.526 - 24.975(Pri ce)  74.131(Adv ertising)
Observations 15

ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

MultipleLinearRegEx2 -5
The Multiple Regression Equation

Sales  306.526 - 24.975(Pri ce)  74.131(Adv ertising)


where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales b2 = 74.131: sales will
will decrease, on increase, on average,
average, by 24.975 by 74.131 pies per
pies per week for week for each $100
each $1 increase in increase in
selling price, net of advertising, net of the
the effects of changes effects of changes
due to advertising due to price
MultipleLinearRegEx2 -6
Coefficient of Determination, R2

• Reports the proportion of total variation in y explained by all x


variables taken together

SSR regression sum of squares


R 
2

SST total sum of squares
• This is the ratio of the explained variability to total sample variability

MultipleLinearRegEx2 -7
Coefficient of Determination, R2
(continued)
Regression Statistics
SSR 29460.0
Multiple R 0.72213
R 2
  .52148
R Square 0.52148 SST 56493.3
Adjusted R Square 0.44172
Standard Error 47.46341 52.1% of the variation in pie sales
Observations 15 is explained by the variation in
price and advertising
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

MultipleLinearRegEx2 -8
Estimation of Error Variance

• Consider the population regression model

Yi  β 0  β1x1i  β 2 x 2i    βK x Ki  ε i

• The unbiased estimate of the variance of the errors is


n

 i
e 2
SSE
s2e  i1

n  K 1 n  K 1

ei  y i  yˆ i
where

• The square root of the variance, se , is called the standard


error of the estimate MultipleLinearRegEx2 -9
Standard Error, se
Regression Statistics
Multiple R 0.72213
R Square 0.52148
se  47.463
Adjusted R Square 0.44172
The magnitude of this
Standard Error 47.46341
value can be compared to
Observations 15
the average y value
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

MultipleLinearRegEx2 -10
Adjusted Coefficient of Determination,R 2
• R2 never decreases when a new X variable is added to the
model, even if the new variable is not an important
predictor variable
• This can be a disadvantage when comparing models
• What is the net effect of adding a new variable?
• We lose a degree of freedom when a new X variable is
added
• Did the new X variable add enough explanatory power
to offset the loss of one degree of freedom?

MultipleLinearRegEx2 -11
Adjusted Coefficient of Determination,R 2
(continued)
• Used to correct for the fact that adding non-relevant
independent variables will still reduce the error sum of squares
SSE / (n  K  1)
R 2  1
SST / (n  1)
(where n = sample size, K = number of independent variables)

• Adjusted R2 provides a better comparison between multiple


regression models with different numbers of independent
variables
• Penalize excessive use of unimportant independent
variables
• Smaller than R2

MultipleLinearRegEx2 -12
Regression Statistics

Multiple R 0.72213
R 2  .44172
R Square 0.52148

Adjusted R Square 0.44172 44.2% of the variation in pie sales is


Standard Error 47.46341
explained by the variation in price and
advertising, taking into account the sample
Observations 15
size and number of independent variables

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

MultipleLinearRegEx2 -13
Coefficient of Multiple Correlation
• The coefficient of multiple correlation is the correlation between the
predicted value and the observed value of the dependent variable

R  r(yˆ , y)  R 2
• Is the square root of the multiple coefficient of determination
• Used as another measure of the strength of the linear relationship
between the dependent variable and the independent variables
• Comparable to the correlation between Y and X in simple regression

MultipleLinearRegEx2 -14
Evaluating Individual Regression
Coefficients
• Use t-tests for individual coefficients
• Shows if a specific independent variable is conditionally
important

• Hypotheses:
• H0: βj = 0 (Xj has no linear influence on y)
• H1: βj ≠ 0 (xj has linear influence on y)

MultipleLinearRegEx2 -15
Evaluating Individual Regression Coefficients
(continued)

• H0: βj = 0 (Xj has no linear influence on y)


• H1: βj ≠ 0 (xj has linear influence on y)

Test Statistic:

bj - 0
t ~ t (n  k  1)
sb j

MultipleLinearRegEx2 -16
Evaluating Individual
Regression Coefficients
(continued)
Regression Statistics
t-value for Price is t = -2.306, with p-value .0398
Multiple R 0.72213
R Square 0.52148
t-value for Advertising is t = 2.855, with p-value .0145
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15

ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

MultipleLinearRegEx2 -17
Example: Evaluating Individual
Regression Coefficients
From Excel output:
H0: βj = 0
Coefficients Standard Error t Stat P-value
H1: βj  0 Price -24.97509 10.83213 -2.30565 0.03979
Advertising 74.13096 25.96732 2.85478 0.01449
d.f. = 15-2-1 = 12
a = .05 The test statistic for each variable falls
t12, .025 = 2.1788 in the rejection region (p-values < .05)
Decision:
a/2=.025 a/2=.025 Reject H0 for each variable
Conclusion:
There is evidence that both
Reject H0
-tα/2
Do not reject H0
tα/2
Reject H0
Price and Advertising affect
0
-2.1788 2.1788 pie sales at  = .05
MultipleLinearRegEx2 -18
Confidence Interval Estimate
for the Slope
Confidence interval limits for the population slope βj

b j  t nK 1,α/2Sb j where t has


(n – K – 1) d.f.

Coefficients Standard Error


Intercept 306.52619 114.25389 Here, t has
Price -24.97509 10.83213
(15 – 2 – 1) = 12 d.f.
Advertising 74.13096 25.96732

Example: Form a 95% confidence interval for the effect of


changes in price (x1) on pie sales:
-24.975 ± (2.1788)(10.832)
So the interval is -48.576 < β1 < -1.374
MultipleLinearRegEx2 -19
Confidence Interval Estimate
for the Slope
(continued)
Confidence interval for the population slope βi

Coefficients Standard Error … Lower 95% Upper 95%


Intercept 306.52619 114.25389 … 57.58835 555.46404
Price -24.97509 10.83213 … -48.57626 -1.37392
Advertising 74.13096 25.96732 … 17.55303 130.70888

Example: Excel output also reports these interval endpoints:


Weekly sales are estimated to be reduced by between 1.37 to
48.58 pies for each increase of $1 in the selling price

MultipleLinearRegEx2 -20
Test on All Coefficients
• F-Test for Overall Significance of the Model
• Shows if there is a linear relationship between all of
the X variables considered together and Y
• Use F test statistic
• Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent
variable affects Y)

MultipleLinearRegEx2 -21
F-Test for Overall
Significance
• Test statistic:
MSR SSR/K
F 2 
se SSE/(n  K  1)
where F has k (numerator) and
(n – K – 1) (denominator)
degrees of freedom
• The decision rule is

Reject H0 if F  Fk,nK 1,α


MultipleLinearRegEx2 -22
F-Test for Overall
Significance
(continued)
Regression Statistics
Multiple R 0.72213
MSR 14730.0
R Square 0.52148
F   6.5386
Adjusted R Square 0.44172
MSE 2252.8
Standard Error 47.46341
With 2 and 12 degrees P-value for
Observations 15
of freedom the F-Test

ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

MultipleLinearRegEx2 -23
F-Test for Overall
Significance
(continued)

H0: β1 = β2 = 0 Test Statistic:


H1: β1 and β2 not both zero MSR
F  6.5386
 = .05 MSE
df1= 2 df2 = 12 Decision:
Critical Since F test statistic is in
Value: the rejection region (p-
F = 3.885 value < .05), reject H0
 = .05
Conclusion:
0 F There is evidence that at least one
Do not Reject H0
reject H0 independent variable affects Y
F.05 = 3.885
MultipleLinearRegEx2 -24
Using The Equation to Make
Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:

Sales  306.526 - 24.975(Pri ce)  74.131(Adv ertising)


 306.526 - 24.975 (5.50)  74.131 (3.5)
 428.62

Note that Advertising is


Predicted sales in $100’s, so $350
means that X2 = 3.5
is 428.62 pies
MultipleLinearRegEx2 -25
Tests on a Subset of Regression
Coefficients
• Consider a multiple regression model involving variables xj and zj , and the null
hypothesis that the z variable coefficients are all zero:

y i  β 0  β1x1i    βK x Ki  α1z1i   αr z ri  ε i

H0 : α1  α 2    αr  0
H1 : at least one of α j  0 (j  1,...,r)

MultipleLinearRegEx2 -26
Tests on a Subset of Regression
Coefficients
(continued)
• Goal: compare the error sum of squares for the complete
model with the error sum of squares for the restricted model
• First run a regression for the complete model and obtain SSE
• Next run a restricted regression that excludes the z variables (the
number of variables excluded is r) and obtain the restricted error
sum of squares SSE(r)
• Compute the F statistic and apply the decision rule for a significance
level 

( SSE(r)  SSE ) / r
Reject H0 if F  2
 Fr,nK r 1,α
se
MultipleLinearRegEx2 -27

You might also like