Nsbe9ege Ism Ch12
Nsbe9ege Ism Ch12
Multiple Regression
a. ^y =12+5 ( 11 )+ 6 ( 24 ) +2 ( 27 ) =265
b. ^y =12+5 ( 31 ) +6 ( 20 )+ 2 ( 17 )=321
c. ^y =12+5 ( 32 ) +6 ( 29 )+ 2 ( 13 ) =372
d. ^y =12+5 ( 30 )+ 6 ( 26 ) +2 ( 9 )=336
b2 = .065: All else being equal, an increase in the plane’s weight by one ton will
increase the expected number of hours in the design effort by an estimated .065
million or 65 thousand worker-hours.
b3 = -.018: All else being equal, an increase in the percentage of parts in common
with other models will result in a decrease in the expected number of hours in the
design effort by an estimated .018 million or 18 thousand worker-hours.
b 2=−0.073 .All else being equal, an increase of one unit in the change over the
quarter in bond sales by financial institutions results in an estimated .073 decrease in
the change over the quarter in the bond interest rates.
12.8 a. b1 = .052: All else being equal, an increase of one hundred dollars in
weekly income results in an estimated .052 quart per week increase in milk
consumption. b2 = 1.14: All else being equal, an increase in family size by one
person will result in an estimated increase in milk consumption by 1.14 quarts
per week.
b. The intercept term b0 of -.025 is the estimated milk consumption of quarts of
milk per week given that the family’s weekly income is 0 dollars and there are
0 members in the family. This is likely extrapolating beyond the observed data
series and is not a useful interpretation.
12.9 a. b1 = .653: All else being equal, a one unit increase in the average number of
meals eaten per week will result in an estimated .653 pound gained during
freshman year.
b2 = -1.345: All else being equal, a one unit increase in the average number of
hours of exercise per week will result in an estimated 1.345 pound weight loss.
b3 = .613: All else being equal, a one unit increase in the average number of
beers consumed per week will result in an estimated .613 pound weight gain.
b. The intercept term b0 of 7.35 is the estimated amount of weight gain during the
freshman year given that the meals eaten is 0, hours exercise is 0 and there are
no beers consumed per week. This is likely extrapolating beyond the observed
data series and is not a useful interpretation.
12.10
Compute the slope coefficients for the model: ^y i=b0 +b1 x 1i +b 2 x 2 i
s y (r x y −r x x r x y ) s y (r x y −r x x r x y )
Given thatb 1= , b2 =
1 1 2 2 2 1 2 1
2 2
s x (1−r x x )
1 1 2
s x (1−r x x )
2 1 2
a. b 1=100 ¿ ¿
b. b 1=100 ¿ ¿
c. b 1=100 ¿ ¿
d. b 1=100 ¿ ¿
defined as, .
For Y = b0 + b1X1 + b2X2,when the correlation between X1 and X2 is 1, then the
denominator goes to 0 and the slope coefficient is undefined.
Analysis of Variance
Source DF SS MS F P
12-4 Statistics for Business and Economics, 9th Edition, Global Edition
All else being equal, for every one unit increase in the price of electricity, we estimate
that sales will increase by 15,421 mwh. Note that this estimated coefficient is not
significantly different from zero (p-value = .468).
All else being equal, for every additional residential customer who uses electricity in the
heating of their home, we estimate that sales will increase by 2.31 mwh.
Analysis of Variance
Source DF SS MS F P
Regression 1 1.20121E+12 1.20121E+12 289.66 0.000
Residual Error 66 2.73696E+11 4146912960
Total 67 1.47490E+12
Analysis of Variance
Source DF SS MS F P
Regression 2 6.63398E+11 3.31699E+11 26.57 0.000
Residual Error 65 8.11506E+11 12484705377
Total 67 1.47490E+12
All else being equal, an increase in the price of electricity will reduce electricity sales by
173,882 mwh.
All else being equal, an increase in the degree days (departure from normal weather) by
one unit will increase electricity sales by 56.7 mwh.
Note that the coefficient on the price variable is now negative, as expected, and it is
significantly different from zero (p-value = .000)
Analysis of Variance
Source DF SS MS F P
Regression 2 1.20353E+12 6.01767E+11 144.14 0.000
Residual Error 65 2.71369E+11 4174903356
Total 67 1.47490E+12
All else being equal, an increase in personal disposable income by one unit will increase
electricity sales by 318 mwh.
All else being equal, an increase in degree days by one unit will increase electricity sales
by 57.1 mwh.
All else being equal, a one unit increase in the horsepower of the engine will reduce fuel
mileage by .10489 mpg. All else being equal, an increase in the weight of the car by 100
pounds will reduce fuel mileage by .66143 mpg.
e.
Horse power and weight remain significant negative independent variables throughout
whereas the number of cylinders has been insignificant. The size of the coefficients change
as the combinations of independent variables changes. This is likely due to strong
correlation that may exist between the independent variables.
e. Explanatory power has marginally increased from the first model to the last. The
estimated coefficient on price is not significantly different from zero. Displacement
and fuel mileage have the expected signs. The coefficient on weight has the wrong
sign; however, it is not significantly different from zero (p-value of .953).
12.15
From the given Analysis of Variance table,
2 SSE 2 SSR SSE
SST =SSR+ SSE ; se = ;R = =1− ;
n−K−1 SST SST
SSE / ( n−K−1 )
R2=1− ; n−1=( n−K −1 ) + K
SST / ( n−1 )
2 150
a. SSE=150 ; se = =8.33 ; se =2.8868
18
b. SST =SSR+ SSE=200+150=350
2 150 / ( 18 )
c. R =1− =0.3810
350 / ( 26 )
c. = .7368, = .7187
c. = .80, = .7822
c. = .8421, = .8382
12.19
2 SSR 3.474
a. R = = =0.8705 ,the coefficient of determination indicates that 87.05% of
SST 3.991
the variation in dependent variable(s) can be explained by the variation in the
independent variable.
b. SSE=3.991−3.474=0.517
2 SSE / ( n−K−1 ) 0.517/ ( 15−3−1 )
c. R =1− =1− =0.8351
SST / ( n−1 ) 3.991 / ( 15−1 )
d. R=√ 0.8705=0.9330 .The coefficient of correlation is said to be a strong correlation
if the absolute value is greater than 0.70. Thus, there is a strong correlation between
the independent and the dependent variables.
b.
c. . This is the sample correlation between observed and predicted
values of milk consumption.
b.
c. . This is the sample correlation between observed and predicted
values of weight gained.
12.22 a.
Regression Analysis: Y profit versus X2 offices
The regression equation is
Y profit = 1.55 -0.000120 X2 offices
Predictor Coef SE Coef T P
Constant 1.5460 0.1048 14.75 0.000
X2 offi -0.00012033 0.00001434 -8.39 0.000
b.
Regression Analysis: X1 revenue versus X2 offices
The regression equation is
X1 revenue = - 0.078 +0.000543 X2 offices
c.
Regression Analysis: Y profit versus X1 revenue
The regression equation is
Y profit = 1.33 - 0.169 X1 revenue
Predictor Coef SE Coef T P
Constant 1.3262 0.1386 9.57 0.000
X1 reven -0.16913 0.03559 -4.75 0.000
12-12 Statistics for Business and Economics, 9th Edition, Global Edition
d.
Regression Analysis: X2 offices versus X1 revenue
The regression equation is
X2 offices = 957 + 1631 X1 revenue
Predictor Coef SE Coef T P
Constant 956.9 476.5 2.01 0.057
X1 reven 1631.3 122.3 13.34 0.000
12.23 Given the regression results where the numbers in parentheses are the sample standard
error of the coefficient estimates
a. The two-sided 95% confidence intervals for the three regression slope coefficients are
given byb j ±t n−K −1 ,α /2 s b ; t 21, 0.025=2.080
j
12.24
Given the regression results where the numbers in parentheses are the sample standard
error of the coefficient estimates
a. The two-sided 95% confidence intervals for the three regression slope coefficients are
given by
12.26 Given the regression results where the numbers in parentheses are the sample standard
error of the coefficient estimates
a. The two-sided 95% confidence intervals for the three regression slope coefficients are
given by
12.27
Given b j ±t n−K −1 ,α /2 s b
j
The 90% CI for the coefficient β 1=0.631 ±1.729 ( 0.096 )=0.465 up ¿ 0.797
The 95% CI for the coefficient β 1=0.631 ± 2.093 ( 0.096 )=0.430 up ¿ 0.832
b. 2=0.066 ; s b =0.037 ; n=23 ; t 19 ,0.025/ 0.005=2.093 , 2.861
b 2
The 95% CI for the coefficient β 2=0.066 ± 2.093 ( 0.037 ) =−0.011up ¿ 0.143
The 99% CI for the coefficient β 2=0.066 ± 2.861 ( 0.037 )=−0.040 up ¿ 0.172
c. H 0 : β 2=0 ; H 1 : β 2 ≠0.
0.066
t= =1.784 ; t 19 ,0.05 /0.025 =1.729 , 2.093 .Therefore, reject H 0at the 90% level. Do not
0.037
reject . The 95% level.
d. H 0 : β 1=β 2=0 ; H 1 : At least one β 1 ≠0 for i=1 , 2.
SSE ( R )−SSE
K−1 (3.397−0.392)/2
F= = =72.825
SSE /(n−K−1) 0.392 /(23−3−1)
Since the value of the test statistic, F, is greater than the critical value, F2 , 19, 0.01 ,reject H 0 .
12.28 a.
12.29 a.
c.
90% CI: .653 1.721(.189); .3277 up to .9783
95% CI: .653 2.080(.189); .2599 up to 1.0461
99% CI: .653 2.831(.189); .1179 up to 1.1881
12.30 a.
= -1.337
Therefore, do not reject at the 20% level
b. At least one
, F 3,16,.01 = 5.29
Therefore, reject at the 1% level
12.31 a.
95% CI: 7.878 1.988(1.809); 4.2817 up to 11.4743
99% CI: 7.878 2.635(1.809); 3.1113 up to 12.6447
b. , ,
Therefore, reject at the .5% level
12.32 a. All else being equal, an extra $1 in mean per capita personal income leads to an
expected extra $.4 of net revenue per capita from the lottery
b.
95% CI: .8772 2.064(.3107); .2359 up to 1.5185
Therefore the 95% confidence interval for the expected increase in the dollars of net
revenue per capita per year generated by the lottery resulting from a one-unit
increase in number of hotel, motel, inn, and resort rooms per thousand persons in the
country, if the other variables do not change, runs from .2359 up to 1.5185. Also,
since the 95% confidence interval does not include 0, we conclude that the
coefficient on x2 in the population regression is statistically significant.
c.
= -1.318, -1.711
Therefore, reject at the 10% level but not at the 5% level
12.33
a. b 3=5.403 ; s b =3.833 ; n=21; t 17 ,0.025 =2.110
3
12.34
. a.
99% CI: .05146 2.712(.01367); 0.014387 up to 0.088533
Therefore, the 99% confidence interval for the expected increase in the number of
full-time firefighters resulting from a 1 unit increase in the amount of
intergovernmental grants per capita, if the other variables do not change, runs from
0.0144 up to 0.0885.
Also, since the 99% confidence interval for β5 does not include 0, we conclude that
this variable is statistically significant.
b.
= 1.304
Therefore, do not reject at the 20% level. And conclude that the population density
is not statistically significant.
c.
= 2.024, 2.429
Therefore, reject at the 5% level but not at the 2% level. And conclude that the percentage of
the population that is male and between 12 and 21 years of age is statistically significant at the
5% level but not at the 2% level.
12.35
Testing the hypothesis that all three of the predictor variables are equal to zero for the
given Analysis of Variance tables
a. H 0 : β 1=β 2=β 3=0 ; H 1 : At least one β j ≠ 0.
12-18 Statistics for Business and Economics, 9th Edition, Global Edition
F 3,23,.01 = 4.76
Therefore, reject at the 1% level
F 2,45,.01 = 5.18
Therefore, reject at the 1% level
12-20 Statistics for Business and Economics, 9th Edition, Global Edition
, F 2,27,.01 = 5.49
Therefore, reject at the 1% level
b. Analysis of Variance table:
, F 3,21,.01 = 4.87
Therefore, reject H0 at the 1% level
12.40
= =
12.41
Let β 3be the coefficient on the number of teenagers in the household
H 0 : β 3=0 ; H 1 : β 3 ≠ 0
SSE ( R ) −SSE /R (91.3−72.6)/1
F= = =6.18 , F 1 ,24 , 0.01=7.82
2
se 3.025
Therefore, do not reject H 0at the 5% level of significance as there is insufficient evidence
that the number of teenagers in the household affects the consumption of bread.
12-22 Statistics for Business and Economics, 9th Edition, Global Edition
12.42 a. =
= =
b. Since , then
c. =
= where
12.43
Given the estimated multiple regression equation,
a. ^y =4.2+5.3 ( 10 )−4.4 ( 23 ) +6.8 ( 9 )−0.8 ( 12 )=7.6
b. ^y =4.2+5.3 ( 23 )−4.4 ( 18 ) +6.8 ( 10 ) −0.8 ( 11)=106.1
c. ^y =4.2+5.3 ( 15 )−4.4 ( 16 ) +6.8 ( 5 )−0.8 ( 0 )=47.3
d. ^y =4.2+5.3 (−10 )−4.4 ( 13 ) +6.8 (−8 )−0.8 (−16 )=−147.6
12.47 a. All else being equal, a one square foot increase in the lot size is expected to increase
the selling price of the house by $1.468
b. 98.43% of the variation in the selling price of homes can be explained by the
variation in house size, lot size, number of bedrooms, and number of bathrooms
c. , , = 1.341, 1.753
Therefore, reject at the 10% level but not at the 5% level
d.
Analysis of Variance
Source DF SS MS F P
Regression 2 5850.0 2925.0 192.23 0.000
Residual Error 147 2236.8 15.2
Total 149 8086.8
New
Obs Fit SE Fit 95% CI 95% PI
1 21.242 0.975 (19.316, 23.168) (13.296, 29.188)X
New
Obs horspwr weight
1 140 3000
Analysis of Variance
Source DF SS MS F P
Regression 3 5891.6 1963.9 130.62 0.000
Residual Error 146 2195.1 15.0
Total 149 8086.8
New
Obs Fit SE Fit 95% CI 95% PI
1 21.112 0.972 (19.190, 23.033) (13.211, 29.012)
New
Obs horspwr weight cylinder
1 140 3000 6.00
12.49
Computing values of y i when x i = 1, 2, 4, 6, 8, 10
Xi 1 2 4 6 8 10
1.4
y i=2 x 2 5.278 13.929 24.572 36.758 50.238
2
y i=2+6 x i +1.4 x i 6.6 8.4 3.6 -12.4 -39.6 -78
12.51
Computing values of y i when x i = 1, 2, 4, 6, 8, 10
Xi 1 2 4 6 8 10
1.2
y i=3 x 3 6.892 15.834 25.757 36.377 47.547
2
y i=3+ 5 x i +1.9 x i 6.1 5.4 -7.4 -35.4 -78.6 -137
12-26 Statistics for Business and Economics, 9th Edition, Global Edition
12.53 There are many possible answers. Relationships that can be approximated by a non-
linear quadratic model include many supply functions, production functions, and cost
functions including average cost versus the number of units produced.
12.54 To estimate the function with linear least squares, solve the equation for
. Since , plug into the equation and algebraically manipulate:
Conduct the variable transformations and estimate the model using least squares.
12.55
a. All else being equal, 1% increase in annual consumption expenditures will be
associated with a 1.1545% increase in expenditures on vacation travel.
All else being equal, a 1% increase in the size of the household will be associated with a
0.4468% decrease in expenditures on vacation travel.
b. 16.1% of the variation in vacation travel expenditures can be explained by the
variations in the log of total consumption expenditures and log of the number of
members in the household.
c. 95% CI:1.1556 ± 1.96 ( 0.0546 ) =1.048up ¿ 1.261
−0.4468
d. H 0 : β 2=0 ; H 1 : β 2 <0 ; t= =−9.80 ; t 2330, 0.01=−2.33 .Therefore, reject H 0at
0.0456
the 1% level
12.56 a. A 1% increase in median income leads to an expected .68% increase in store size.
12.57
a. All else being equal, a 1% increase in the price of fish will be associated with a
decrease of 0.536% in the tons of fish consumed annually in France.
b. All else being equal, a 1% increase in the price of potatoes will be associated with an
increase of 0.208% in the tons of fish consumed annually in the France.
Using this form, now regression the log of Y on the logs of the three independent variables
and obtain the estimated regression slope coefficients.
12.59 a. Coefficients for exponential models can be estimated by taking the logarithm
of both sides of the multiple regression model to obtain an equation that is linear in the
logarithms of the variables.
b. Constant elasticity for Y versus X4 is the regression slope coefficient on the X4 term of
the logarithm model.
Quadratic model:
Cubic model:
All three of the models appear to fit the data well. The cubic model appears to fit the data
the best as the standard error of the estimate is lowest. In addition, explanatory power is
marginally higher for the cubic model than the other models.
12.61
Results for: GermanImports.xls
Regression Analysis: LogYt versus LogX1t, LogX2t
The regression equation is
LogYt = - 4.07 + 1.36 LogX1t + 0.101 LogX2t
Predictor Coef SE Coef T P VIF
Constant -4.0709 0.3100 -13.13 0.000
LogX1t 1.35935 0.03005 45.23 0.000 4.9
LogX2t 0.10094 0.05715 1.77 0.088 4.9
Analysis of Variance
Source DF SS MS F P
Regression 2 21.345 10.673 4715.32 0.000
Residual Error 28 0.063 0.002
Total 30 21.409
Source DF Seq SS
LogX1t 1 21.338
LogX2t 1 0.007
German real imports will continue to increase with real private consumption, and real
exchange rate.
12.62
The model constants when the dummy variable equals 1 are
a. ^y =9+6 x 1 ; b 0=18
b. ^y =7 +4 x 1 ; b0=9
c. ^y =4+ 4 x 1 ; b 0=12
12.63
The model constants and the slope coefficients of when the dummy variable equals 1 are
a. , b0 = 3.2, slope coefficient of = 13.5
b. , b0 = -3.3, slope coefficient of = 8.6
c. , b0 = 21.1, slope coefficient of = 7.1
12.64 The interpretation of the dummy variable is that we can conclude that for a given
difference between the spot price in the current year and spot price in the previous year,
the difference between the OPEC price in the current year and OPEC price in the previous
years is $5.22 higher in 1974 during the oil embargo than in other years.
12-32 Statistics for Business and Economics, 9th Edition, Global Edition
Dummy variable x2 is equal to 1 for the year 1974 and 0 otherwise to represent the specific
effect of the oil embargo of that year. The graph indicates that the OPEC price in the previous
years is $5.22 higher in 1974 during the oil embargo than in other years keeping other factors
constant.
12.65
a. All else being equal, expected selling price is higher by €3,054 if a car has an airbag.
b. All else being equal, expected selling price is higher by €1,969 if a car has a sunroof.
c. 95% CI:3054 ± 1.984 ( 738 ) =€ 1,590 up ¿ € 4,518
1969
d. H 0 : β 5=0 ; H 1 : β 5 >0 ; t= =2.668 ; t 100 ,0.05=1.660; therefore, reject H 0at the 5%
738
level of significance as the test statistic is greater than the critical value.
12.66 a. All else being equal, the price-earnings ratio is higher by 1.23 for a regional company
than a national company.
b. , , = 2.462, 2.756
Therefore, reject at the 2% level but not at the 1% level
c. At least one
, F 2,29,.05 = 3.33
Therefore, reject at the 5% level and conclude that at least one coefficient is not
equal to 0. Hence at least one predictor variable is a significant predictor of the price-
earnings ratio.
12.67 35.6% of the variation in overall performance in law school can be explained by the
variation in undergraduate GPA, scores on the LSATs, and whether the student’s letter of
recommendation are unusually strong. The overall model is significant since we can
reject the null hypothesis that the model has no explanatory power in favor of the
alternative hypothesis that the model has significant explanatory power. The individual
regression coefficients that are significantly different from zero include the scores on the
LSAT and whether the student’s letters of recommendation were unusually strong. The
coefficient on undergraduate GPA was not found to be significant at the 5% level.
12.68
a. All else being equal, the annual salary of the attorney general is, on average, €5,787
higher if justices of the state supreme court can be removed from office by the
governor, judicial review board, or majority vote of the supreme court.
b. All else being equal, the annual salary of the attorney general of the state is €2,630
lower if the supreme court justices are elected on partisan ballots.
5787
c. H 0 : β 5=0 ; H 1 : β 5 >0 ; t= =2.042 ; t 27 ,0.05=1.703 .Therefore, reject H 0at the 5%
2834
level
−2630
d. H 0 : β 6=0 ; H 1 : β6 < 0; t= =−1.613 ; t 27 ,0.05=1.703 .Therefore, do not reject H 0
1630
at the 5% level
e. t 27 ,0.025=2.052
95% CI:570 ± 2.052 ( 134.9 )=293.185 up ¿ 846.815
Since this confidence interval does not include 0, it can be stated that with 95%
confidence that there is a positive correlation between x i∧^y .
12.69 a. All else being equal, the average rating of a course is 6.21 units higher if a guest
visiting lecturer is brought in than if otherwise.
b. , , = 1.725
Therefore, reject at the 5% level
c. 56.9% of the variation in the average course rating can be explained by the variation in
the percentage of time spent in group discussions, the dollars spent on preparing the
course materials, the dollars spent on food and drinks, and whether a guest lecturer is
brought in.
At least one
F 4,20,.01 = 4.43
Therefore, reject at the 1% level
d. = 2.086
95% CI: .52 2.086(.21); .0819 up to .9581
Therefore, the 95% confidence interval for the expected increase in the average
course rating resulting from a one dollar increase of money spent on preparing the
12-34 Statistics for Business and Economics, 9th Edition, Global Edition
course material, if the other variables do not change, runs from .0819 to .9581. Also,
since the interval does not include zero, the coefficient is statistically significant.
12.70 34.4% of the variation in a test on understanding business statistics can be explained by
which course was taken, the student’s GPA, the teacher that taught the course, the gender
of the student, the pre-test score, the number of credit hours completed, and the age of the
student. The regression model has significant explanatory power:
At least one
Therefore, reject at the 1% level and conclude that at least one coefficient is not
equal to 0. Hence at least one predictor variable is a significant predictor of the business
statistics test.
12.71
Results for: Student Performance.xls
Regression Analysis: Y versus X1, X2, X3, X4, X5
The regression equation is
Y = 2.00 + 0.0099 X1 + 0.0763 X2 - 0.137 X3 + 0.064 X4 + 0.138 X5
Analysis of Variance
Source DF SS MS F P
Regression 5 2.2165 0.4433 1.51 0.229
Residual Error 21 6.1598 0.2933
Total 26 8.3763
The model is not significant (p-value of the F-test = .229). The model only explains 26.5%
of the variation in GPA with the hours spent studying, hours spent preparing for tests, hours
spent in bars, whether or not students take notes or mark highlights when reading texts, and
the average number of credit hours taken per semester. The only independent variables that
are marginally significant (10% level but not the 5% level) include number of hours spent
in bars and average number of credit hours. The other independent variables are not
significant at common levels of alpha.
12.72 a. Begin the analysis with the correlation matrix – identify important independent
variables as well as correlations between the independent variables
Correlations: Salary, Experience, yearsenior, Gender_1F
Salary Experien yearseni
Experien 0.883
0.000
Analysis of Variance
Source DF SS MS F P
Regression 3 5559163505 1853054502 273.54 0.000
Residual Error 146 989063178 6774405
Total 149 6548226683
84.9% of the variation in annual salary (in dollars) can be explained by the variation in the
years of experience, the years of seniority, and the gender of the employee. All of the
variables are significant at the .01 level of significance (p-values of .000, .000, and .006
respectively). The F-test of the significance of the overall model shows that we reject
that all of the slope coefficients are jointly equal to zero in favor of that at least one
slope coefficient is not equal to zero. The F-test yielded a p-value of .000.
b.
, = -2.326
Therefore, reject at the 1% level. And conclude that the annual salaries for
females are statistically significantly lower than they are for males.
c. Adding an interaction term and testing for the significance of the slope coefficient on
the interaction term.
Adding the interaction term of Salary and Gender_1F, the regression equation is
obtained as:
Salary = 23336 + 388 Experience + 467 yearsenior - 12468 Gender_1F + 0.425 int.
term
, = -1.282, -1.645
12-36 Statistics for Business and Economics, 9th Edition, Global Edition
Therefore, do not reject at either level. And conclude that the rate of salary
increase for females is not statistically significantly lower than they are for males at
either level.
12.73
Two variables are included as predictor variables. Following is the effect on the estimated
slope coefficients when these two variables have a correlation equal to
a. 0.91. A very strong linear relationship between the two variables will have a major effect
on the estimated slope coefficients.
b. 0.38. The linear relationship between the two variables is moderately weak, which will
have a slight effect on the estimated slope coefficients.
c. -0.64. This indicates a moderately strong linear relationship between the two variables
and will have a moderate effect on the estimated slope coefficients.
d. -0.11. It shows a very weak linear relationship between the two variables, which will
have almost no effect on the estimated slope coefficients.
12.75
n = 58 with four independent variables. One of the independent variables has a
correlation of 0.48 with the dependent variable.
Correlation between the independent variable and the dependent variable is not
necessarily evidence of a large Student’s t statistic. A high correlation among the
independent variables could result in a very small Student’s t statistic as the correlation
creates a high variance.
12.76 n = 49 with two independent variables. One of the independent variables has a correlation
of .56 with the dependent variable.
Correlation between the independent variable and the dependent variable is not
necessarily evidence of a small Student’s t statistic. A high correlation among the
independent variables could result in a very small Student’s t statistic as the correlation
creates a high variance.
12.77–12.79 Reports can be written by following the extended Case Study on the data file
Cotton – see Section 12.9
12.77
51.5% of the variation in the ratio of company‘s payments for state and local taxes to
total state and local tax revenue can be explained by the variation in insurance company
state concentration ratio, per capita income, ratio of nonfarm income to the sum of farm
and nonfarm income, ratio of insurance company‘s net after-tax income to insurance
reserves, and average of insurance reserves. The only independent variables that are
marginally significant (5% level in fact all common levels of alpha) include ratio of
nonfarm income to the sum of farm and nonfarm income (X3) and average of insurance
reserves (X5). The other independent variables are not significant at common levels of
alpha.
12.78
31.2% of the variation in the overall opinion of residence hall can be explained by the
variation in the satisfaction with roommates, with floor, with hall, and with resident
advisor. The F-test of the significance of the overall model shows that we reject that
all of the slope coefficients are jointly equal to zero in favor of that at least one slope
coefficient is not equal to zero. The F-test yielded a p-value of .000.
All of the variables are significant at the .1 level of significance but not at the .05 level of
significance except the variable – satisfaction with resident advisor. The variable –
satisfaction with resident advisor is significant at all common levels of alpha.
12.79
73% of the variation in the commercial paper certificate of deposit rate less commercial
paper rate can be explained by the variation in commercial paper rate, and ratio of loans
and investments to capital. The two independent variables are significant at the .1 and .05
levels of significance. Commercial paper rate is significant in explaining commercial
paper certificate of deposit rate less commercial paper rate at all common levels of alpha.
12.80 Begin the analysis by selecting all the variables. Stepwise, delete the predictor
variables which are not significant. The final regression model includes the predictor variables
per capita disposable income and percent of population in urban areas
Analysis of Variance
Source DF SS MS F P
Regression 2 2.3760 1.1880 11.51 0.000
Residual Error 48 4.9523 0.1032
Total 50 7.3282
12.81
Regression Analysis: Female Emplo versus Percap Disp, Male Unemplo, ...
The regression equation is
Female Employ = 62.3 + 0.000421 Percap Disp_1 - 0.507 Male Unemploy
- 0.000146 Mfg Pcap - 1.34 Female Unemploy
12-38 Statistics for Business and Economics, 9th Edition, Global Edition
Analysis of Variance
Source DF SS MS F P
Regression 4 419.30 104.82 8.74 0.000
Residual Error 46 551.85 12.00
Total 50 971.15
43.2% of the variation in the percentage of females in the labor force can be explained by
the variation in per capita disposable personal income, the percentage of males
unemployed, the manufacturing payroll per worker, and the unemployment rate of
women. The only independent variable that is statistically significant (10% level in fact
all common levels of alpha) is per capita disposable personal income (X1). The other
independent variables are not significant at common levels of alpha. Also, the F-test of
the significance of the overall model shows that we reject that all of the slope
coefficients are jointly equal to zero in favor of that at least one slope coefficient is
not equal to zero. The F-test yielded a p-value of .000.
12.82
Regression Analysis: y_manufgrowt versus x1_aggrowth, x2_exportgro, ...
The regression equation is
y_manufgrowth = 2.15 + 0.493 x1_aggrowth + 0.270 x2_exportgrowth
- 0.117 x3_inflation
Predictor Coef SE Coef T P VIF
Constant 2.1505 0.9695 2.22 0.032
x1_aggro 0.4934 0.2020 2.44 0.019 1.0
x2_expor 0.26991 0.06494 4.16 0.000 1.0
x3_infla -0.11709 0.05204 -2.25 0.030 1.0
Source DF Seq SS
x1_aggro 1 80.47
x2_expor 1 227.02
x3_infla 1 66.50
39.3% of the variation in the manufacturing growth can be explained by the variation in
agricultural growth, exports growth, and rate of inflation.
The F-test of the significance of the overall model shows that we reject that all of the
slope coefficients are jointly equal to zero in favor of that at least one slope
coefficient is not equal to zero. The F-test yielded a p-value of .000.
All the independent variables are significant at the .1 and .05 levels of significance.
Exports growth is significant in explaining the manufacturing growth at all common
levels of alpha.
All else being equal, the agricultural growth and the exports growth variables have the
expected sign. This is because, as the percentage of agricultural growth and the exports
growth increases, the percentage of manufacturing growth also increases. The rate of
inflation variable is negative which indicates that higher the inflation rate, lower is the
manufacturing growth.
12-40 Statistics for Business and Economics, 9th Edition, Global Edition
12.83 The method of least squares in regression analysis yields estimators that are BLUE –
Best Linear Unbiased Estimators. This result holds when the assumptions regarding the
behavior of the error term are true. BLUE estimators are the most efficient (best)
estimators out of the class of all unbiased estimators. The advent of computing power
incorporating the method of least squares has dramatically increased its use.
12.84 The analysis of variance table identifies how the total variability of the dependent
variable (SST) is split up between the portion of variability that is explained by the
regression model (SSR) and the part that is unexplained (SSE). The Coefficient of
Determination (R2) is derived as the ratio of SSR to SST. The analysis of variance table
also computes the F statistic for the test of the significance of the overall regression –
whether all of the slope coefficients are jointly equal to zero. The associated p-value is also
generally reported in this table.
12.85 a. False – If the regression model does not explain a large enough portion of the
variability of the dependent variable, then the error sum of squares can be larger than
the regression sum of squares.
b. False – The sum of several simple linear regressions will not equal a multiple regression
since the assumption of ‘all else being equal’ will be violated in the simple linear regressions.
The multiple regression ‘holds’ ‘all else being equal’ in calculating the partial effect that a
change in one of the independent variables has on the dependent variable.
c. True
d. False – While the regular coefficient of determination (R2) cannot be negative, the
adjusted coefficient of determination can become negative. If the independent
variables added into a regression equation have very little explanatory power, the loss
of degrees of freedom may more than offset the added explanatory power.
e. True
12.86 If one model contains more explanatory variables, then SST remains the same for both
models but SSR will be higher for the model with more explanatory variables. Since SST
= SSR1 + SSE1 which is equivalent to SSR2 + SSE2 and given that SSR2 > SSR1, then SSE1
> SSE2. Hence, the coefficient of determination will be higher with a greater number of
explanatory variables and the coefficient of determination must be interpreted in
conjunction with whether or not the regression slope coefficients on the explanatory
variables are significantly different from zero.
12.87 This is a classic example of what happens when there is a high degree of correlation
between the independent variables. The overall model can be shown to have significant
explanatory power and yet none of the slope coefficients on the independent variables are
significantly different from zero. This is due to the effect that high correlation among the
independent variables has on the variance of the estimated slope coefficients.
12.88
12.89 a. All else being equal, a unit change in population, industry size, measure of economic
quality, measure of political quality, measure of environmental quality, measure of health
and educational quality, and measure of social life results in a respective 4.983, 2.198,
3.816, -.310, -.886, 3.215, and .085 increase in the new business starts in the industry.
b. . 76.6% of the variability in new business starts in the industry can be
explained by the variability in the independent variables; population, industry size,
economic, political, environmental, health and educational, and social quality of life.
c. t 62,.05 = 1.67, therefore, the 90% CI: = .3708 up to 7.2612
d. , , = ± 1.999
Therefore, do not reject at the 5% level
e. , , = ± 1.999
Therefore, reject at the 5% level
f. At least one
12.90 a. All else being equal, an increase of one question results in a decrease of 1.834 in
expected percentage of responses received. All else being equal, an increase in one
word in length of the questionnaire results in a decrease of .016 in expected percentage
of responses received.
b. 63.7% of the variability in the percentage of responses received can be explained by the
variability in the number of questions asked and the number of words.
c. At least one
12.91 a. All else being equal, a 1% increase in course time spent in group discussions
results in an expected increase of .3817 in the average rating of the course. All else
being equal, a dollar increase in money spent on the preparation of subject matter
materials results in an expected increase of .5172 in the average rating by participants
of the course. All else being equal, a dollar increase in expenditure on non-course-
related materials results in an expected increase of .0753 in the average rating of the
course.
b. 57.9% of the variation in the average rating of the course can be explained by the linear
relationship with % of class time spent on discussions, money spent on the preparation
of subject matter materials, and money spent on non-class-related materials.
c. At least one
F 3,21,.05 = 3.07
Therefore, reject at the 5% level
d. = 1.721, 90% CI: .3817 1.721(.2018); .0344 up to .729
Therefore, the 90% confidence interval for the expected increase in the average rating
of the course resulting from a 1% increase of class time spent on discussions, if the
other variables do not change, runs from .0344 up to .729. Also, since the interval
does not include zero, the coefficient is statistically significant.
f. t = 1.09, = 1.721.
Therefore, do not reject at the 10% level. And we conclude that the expenditure on
non-course-related materials is not statistically significant in explaining the average rating
of the course at the 10% level.
12.92
Regression Analysis: y_rating versus x1_expgrade, x2_Numstudents
91.5% of the variation in the rating can be explained by the linear dependence on the
expected grade and the number of students in the class.
The F-test of the significance of the overall model shows that we reject that all of the
slope coefficients are jointly equal to zero in favor of that at least one slope
coefficient is not equal to zero. The F-test yielded a p-value of .000.
The coefficients of expected grade and number of students are significant at common
levels of alpha.
All else being equal, a 1 point increase in the expected grade is associated with a 1.41
increase in the rating. The number of students variable is negative which indicates that
higher the student count, the lower is the rating.
12.93
At least one
, F 5,55,.01 = 3.37
Therefore, reject at the 1% level
12.94 a. All else being equal, each extra point in a student’s expected score leads to an
expected increase of .469 in the actual score.
b. t 103,.025 = 1.96, therefore, the 95% CI: = 2.4752 up to 4.2628
Therefore, the 95% confidence interval for the expected increase in a student’s actual
score resulting from an increase of 1 hour time spent on the course, if the other
variables do not change, runs from 2.4752 up to 4.2628. Also, since the interval does
not include zero, the coefficient is statistically significant.
12-44 Statistics for Business and Economics, 9th Edition, Global Edition
c. , , = 1.96, 2.58
Therefore, reject at the 5% level but not at the 1% level. And we conclude that a
student’s grade point average is statistically significant in explaining a student’s actual
score in the examination at the 5% level but not at the 1% level.
d. 68.6% of the variation in the exam scores is explained by their linear dependence on a
student’s expected score, hours per week spent working on the course, and a student’s
grade point average.
e. At least one
, F 3,103,.01 = 3.95
Reject at any common levels of alpha
f. . This is the sample correlation between the observed and the
predicted values of a student’s actual scores in the examination.
g.
b. , , = 1.717, 2.074.
Therefore, reject at the 5% level but not the 2.5% level
c.
d. At least one
, F 2,22,.01 = 5.72
Reject at any common levels of alpha
e. . This is the sample correlation between the observed and the
predicted values of the change in the real deposit rate.
c. ,
= -2.576, therefore, reject at the .5% level. And we conclude that the
turnovers per minute is statistically significant in explaining the minutes played in the
season at all common levels of alpha.
d. ,
= 2.576, therefore, reject at the .5% level. And we conclude that the
assists per minute is statistically significant in explaining the minutes played in the
season at all common levels of alpha.
e. 52.39% of the variability in minutes played in the season can be explained by the
variability in all 9 variables.
f. . This is the sample correlation between the observed and the
predicted values of the minutes played in season.
12.97 a.
b. ,
12-46 Statistics for Business and Economics, 9th Edition, Global Edition
= 1.296, therefore, do not reject at the 20% level. And we conclude that the
average tax rate, as a proportion of GNP is not statistically significant in explaining the
growth rate in real GDP at any common level of alpha.
c. 17% of the variation in the growth rate in GDP can be explained by the variations in
real income per capita and the average tax rate, as a proportion of GNP.
d. . This is the sample correlation between the observed and the
predicted values of the growth rate in GDP.
12.98 57.3% of the variation in the female amateur golfers winnings per tournament can be
explained by variations in average length of drive, percentage times drive ends, percentage
times green reached, percentage times par saved after hitting into sand trap, average
number of putts taken on greens reached, average number of putts taken on greens not
reached in regulation, and the number of years the golfer has played.
The independent variables that are significant (10% level in fact all common levels of
alpha) include average length of drive (X1) and average number of putts taken on greens
reached in regulation (X5). The independent variable, average number of putts taken on
greens not reached in regulation (X6) is significant at 10% level but not at the 5% level.
The other independent variables are not significant at common levels of alpha.
The F-test of the significance of the overall model shows that we reject in favor of
that at least one slope coefficient is not equal to zero. The F-test yielded a p-value of .000.
The sample correlation between the observed and the predicted values of the female
amateur golfers winnings per tournament is .7572.
A report can be written by following the Cotton Case Study and testing the significance of the
model. See section 12.9
Source DF Seq SS
SATverb 1 3.7516
SATmath 1 0.9809
HSPct 1 0.2846
The regression model indicates positive coefficients, as expected, for all three
independent variables. The greater the high school rank, and the higher the SAT verbal
and SAT math scores, the larger the Econ GPA. The high school rank variable has the
smallest t-statistic and is removed from the model:
Source DF Seq SS
SATverb 1 3.7109
SATmath 1 1.2379
Both SAT variables are now statistically significant at the .05 level and appear to pick up
separate influences on the dependent variable. The simple correlation coefficient
between SAT math and SAT verbal is relatively low at .353. Thus, multicollinearity will
not be dominant in this regression model.
The final regression model, with conditional t-statistics in parentheses under the
coefficients, is:
(3.36) (2.65)
S = .4196 R2 = .305 n = 67
0.003 0.001
Source DF Seq SS
Acteng 1 3.5362
ACTmath 1 1.0529
ACTss 1 1.4379
ACTcomp 1 0.0001
HSPct 1 1.4983
The regression shows that only high school rank is significant at the .05 level. We may
suspect multicollinearity between the variables, particularly since there is a ‘total’ ACT
score (ACT composite) as well as the components that make up the ACT composite.
Since conditional significance is dependent on which other independent variables are
included in the regression equation, drop one variable at a time. ACTmath has the lowest
t-statistic and is removed:
Source DF Seq SS
Acteng 1 3.5362
ACTss 1 2.1618
ACTcomp 1 0.2211
HSPct 1 1.6048
12-50 Statistics for Business and Economics, 9th Edition, Global Edition
Again, high school rank is the only conditionally significant variable. ACTcomp has the
lowest t-statistic and is removed:
Source DF Seq SS
Acteng 1 3.5362
ACTss 1 2.1618
HSPct 1 1.6579
Now ACTss and high school rank are conditionally significant. ACTenglish has a t-
statistic less than 2 and is removed:
Source DF Seq SS
ACTss 1 4.6377
HSPct 1 1.8746
Both of the independent variables are statistically significant at the .05 level and hence,
the final regression model, with conditional t-statistics in parentheses under the
coefficients, is:
(3.53) (2.70)
S = .5070 R2 = .271 n = 71
c. The regression model with the SAT variables is the better predictor because the
standard error of the estimate is smaller than for the ACT model (.4196 vs. .5070). The
R2 measure cannot be directly compared due to the sample size differences.
12.100
Correlations: hseval, Comper, Homper, Indper, sizehse, incom72
hseval Comper Homper Indper sizehse
Comper -0.335
0.001
All variables are conditionally significant with the exception of Indper and Homper.
Since Homper has the smaller t-statistic, it is removed:
Analysis of Variance
Source DF SS MS F P
Regression 3 822.53 274.18 17.29 0.000
Residual Error 86 1364.10 15.86
Total 89 2186.63
This becomes the final regression model. The three independent variables are
conditionally significant in explaining the house value at .05 level and hence, the final
regression model, with conditional t-statistics in parentheses under the coefficients, is:
The selection of a community with the objective of having larger house values would
include communities where the percent of commercial property is low, the median rooms
per residence is high and the per capita income is high.
b.
Regression Analysis: deaths versus vehwt, impcars, lghttrks, carage
The regression equation is
deaths = 2.60 +0.000064 vehwt - 0.00121 impcars + 0.00833 lghttrks
- 0.0395 carage
Predictor Coef SE Coef T P VIF
Constant 2.597 1.247 2.08 0.043
vehwt 0.0000643 0.0001908 0.34 0.738 10.9
impcars -0.001213 0.005249 -0.23 0.818 10.6
lghttrks 0.008332 0.001397 5.96 0.000 1.2
carage -0.03946 0.01916 -2.06 0.045 1.4
Analysis of Variance
Source DF SS MS F P
Regression 3 0.183482 0.061161 21.96 0.000
Residual Error 45 0.125326 0.002785
Total 48 0.308809
12-54 Statistics for Business and Economics, 9th Edition, Global Edition
Analysis of Variance
Source DF SS MS F P
Regression 2 0.174458 0.087229 29.87 0.000
Residual Error 46 0.134351 0.002921
Total 48 0.308809
The model has light trucks and car age as the significant variables. Note that car age is
marginally significant (p-value of .052) and hence could also be dropped from the
model.
c. The regression modeling indicates that the percentage of light trucks is conditionally
significant in all of the models and hence is an important predictor in the model. Car
age and imported cars are marginally significant predictors when only light trucks is
included in the model.
The proportion of urban population and rural roads that are surfaced are positively related
to crash deaths. Average rural speed is positively related, but the relationship is not as
strong as the proportion of urban population and surfaced roads. The simple correlation
coefficients among the independent variables are relatively low and hence
multicollinearity should not be dominant in this model. Note the relatively narrow range
for average rural speed. This would indicate that there is not much variability in this
independent variable.
b. Multiple regression
Regression Analysis: deaths versus Prurpop, Prsurf, Ruspeed
The regression equation is
deaths = -0.0086 - 0.149 Prurpop - 0.181 Prsurf + 0.00457 Ruspeed
Analysis of Variance
Source DF SS MS F P
Regression 3 0.172207 0.057402 18.91 0.000
Residual Error 45 0.136602 0.003036
Total 48 0.308809
The model has conditionally significant variables for percent urban population and
percent surfaced roads. Since average rural speed is not conditionally significant, it is
dropped from the model:
Analysis of Variance
Source DF SS MS F P
Regression 2 0.169612 0.084806 28.03 0.000
Residual Error 46 0.139197 0.003026
Total 48 0.308809
This becomes the final model since both variables are conditionally significant.
f. Conclude that the proportions of urban populations and the percent of rural roads that
are surfaced are important independent variables in explaining crash deaths. All else
being equal, the higher the proportion of urban population, the higher the crash
deaths. All else being equal, increases in the proportion of rural roads that are
surfaced will result in lower crash deaths. The average rural speed is not conditionally
significant.
12-56 Statistics for Business and Economics, 9th Edition, Global Edition
The correlation matrix shows that multicollinearity is not likely to be a problem in this
model since all of the correlations among the independent variables are relatively low.
The range for applying the regression model (variable means + / - 2 standard errors):
Hseval 21.03 +/- 2(4.957) = 11.12 to 30.94
Sizehse 5.48 +/- 2(.24) = 5.0 to 5.96
Taxhse 130.13 +/- 2(48.89) = 32.35 to 227.91
Comper .16 +/- 2(.063) = .034 to .286
Incom72 3361 +/- 2(317) = 2727 to 3995
Totexp 1488848 +/- 2(1265564) = not a good approximation
b. Regression models:
Regression Analysis: hseval versus sizehse, Taxhse, ...
The regression equation is
hseval = - 31.1 + 9.10 sizehse - 0.00058 Taxhse - 22.2 Comper + 0.00120 incom72
+ 0.000001 totexp
Analysis of Variance
Source DF SS MS F P
Regression 5 982.98 196.60 13.72 0.000
Residual Error 84 1203.65 14.33
Total 89 2186.63
Taxhse is not conditionally significant, nor is income; however, dropping one variable at a time,
eliminate Taxhse first, then eliminate income:
Analysis of Variance
Source DF SS MS F P
Regression 3 974.55 324.85 23.05 0.000
Residual Error 86 1212.08 14.09
Total 89 2186.63
This is the final regression model. All of the independent variables are conditionally
significant.
Both the size of house and total government expenditures enhances market value of
homes while the percent of commercial property tends to reduce market values of homes.
c. In the final regression model, the tax variable was not found to be conditionally
significant and hence it is difficult to support the developer’s claim.
12-58 Statistics for Business and Economics, 9th Edition, Global Edition
There is a negative association between the dependent and the independent variables.
High correlation among the independent variables does not appear to be a problem since
the correlation between the independent variables is low.
Variable Q3 Maximum
Retsales 15.015 20.307
Unemploy 6.400 8.300
PerInc 34867 53448
Analysis of Variance
Source DF SS MS F P
Regression 2 47.674 23.837 6.73 0.003
Residual Error 48 170.122 3.544
Total 50 217.796
b. All things equal, the conditional effect of a $1,000 decrease in per capita income on retail
sales would be to improve retail sales by $.041.
Analysis of Variance
Source DF SS MS F P
Regression 3 48.897 16.299 4.54 0.007
Residual Error 47 168.899 3.594
Total 50 217.796
The population variable is not conditionally significant and adds little explanatory power,
therefore, it will not improve the multiple regression model.
12.105
a. Final regression model 1 to predict residential investment using prime interest rate, GDP,
Money supply, and Price index for finished goods:
Analysis of Variance
Source DF SS MS F P
Regression 4 3903137 975784 395.09 0.000
Residual Error 203 501362 2470
Total 207 440449
12-60 Statistics for Business and Economics, 9th Edition, Global Edition
This will be the final model with prime rate as the interest rate variable since all of the
independent variables except government spending are conditionally significant. Note the
significant multicollinearity that exists between the independent variables.
The variables, prime interest rate and money supply, are negatively related whereas the variables,
GDP and price index, have the expected sign as the dependent variable, residential investment.
The standard error of estimate of 49.7 indicates a significant variation between observed and
predicted values.
Final regression model 2 to predict residential investment using Federal funds interest rate, GDP,
Money supply, and Price index for finished goods:
Regression Analysis: Residential Inve versus Fed Funds Rate, GDP, ...
The regression equation is
Residential Investment = - 170 - 6.81 Fed Funds Rate + 0.0922 GDP
- 0.151 Money Supply + 0.283 Price index_Finished goods
Analysis of Variance
Source DF SS MS F P
Regression 4 3901058 975265 393.25 0.000
Residual Error 203 503441 2480
Total 207 4404499
The model with the federal funds rate as the interest rate variable is also the final model with all
of the independent variables, except government spending, are conditionally significant. Again,
high correlation among the independent variables will be a problem with this regression model.
As expected, the variables, federal funds interest rate and money supply, are negatively related
whereas the variables, GDP and price index, have the expected sign as the dependent variable,
residential investment. The standard error of estimate of 49.8 indicates a significant variation
between observed and predicted values.
In both the regression models, 88.6% of the variation in the residential investment can be
explained by variations in the independent variables and the standard error of the estimate are
almost equal. Hence, both the equations provide the best predictions.
b.
Prime interest rate as the interest rate variable:
b 1±t n−K−1 , α /2 s b
1 : –7.173 +/– 1.97(1.314) = –7.173 +/– 2.589 or (–9.762, –4.584)
12.106
Regression analysis to predict Breast Cancer Death Rate:
Analysis of Variance
Source DF SS MS F P
Regression 3 0.000050366 0.000016789 7.72 0.000
Residual Error 47 0.000102145 0.000002173
Total 50 0.000152511
This is the final regression model. All else being equal, an increase in 1 nurse per 100,000
population increases the B cancer death rate by .000004 per 1000 population. All else being
equal, a 1% increase in the percentage of female smokers increases the death rate by .000194 per
1000 population. All else being equal, a 1% increase in the percentage of binge drinkers
decreases the death rate by .000169 per 1000 population. The t statistics indicate that all the three
independent variables are significant at 5% level and hence a significant relationship exists. The
small standard error of estimate of .0015 indicates a small variation between observed and
predicted values. 33% of the variation in the B cancer death rate is explained by the variation in
the number of nurses, percent of female smokers, and percent of binge drinkers.
12-62 Statistics for Business and Economics, 9th Edition, Global Edition
Median Income
Per Fam Pov -0.748
0.000
Analysis of Variance
Source DF SS MS F P
Regression 5 0.0055526 0.0011105 17.29 0.000
Residual Error 45 0.0028899 0.0000642
Total 50 0.0084425
This is the final regression model. All else being equal, an increase in 1 nurse per 100,000
population increases the L cancer death rate by .000034 per 1000 population. All else being
equal, a 1% increase in the percentage of smokers increases the death rate by .00221. All else
being equal, a 1% increase in the percentage of binge drinkers decreases the death rate
by .00101. All else being equal, a 1$ increase in the median household income decreases the
death rate by .000001. All else being equal, a 1% increase in the percent of families below
poverty decreases the death rate by .00137. The t statistics indicate that all the independent
variables are significant at 5% level and hence a significant relationship exists. The small
standard error of estimate of .008 indicates a small variation between observed and predicted
values. 65.8% of the variation in the L cancer death rate is explained by the variation in the
number of nurses, percent of smokers, percent of binge drinkers, median household income, and
percent of families below poverty.
12.107
Correlations: Salary, age, Experience, Years Jr, Years Senior, Gender, Market
Salary age Experience Years Jr Years Senior Gender
age 0.749
0.000
Experience 0.883 0.877
0.000 0.000
Years Jr 0.698 0.712 0.803
0.000 0.000 0.000
Years Senior 0.777 0.583 0.674 0.312
0.000 0.000 0.000 0.000
Gender -0.429 -0.234 -0.378 -0.367 -0.292
0.000 0.004 0.000 0.000 0.000
Market 0.026 -0.134 -0.150 -0.113 -0.017 0.062
0.750 0.103 0.067 0.169 0.833 0.453
The correlation matrix indicates several independent variables that should provide good
explanatory power in the regression model. We would expect that Experience, years at
junior level analyst, and years at senior level analyst are likely to be conditionally
significant:
Analysis of Variance
Source DF SS MS F P
Regression 6 12947735493 2157955916 172.80 0.000
Residual Error 143 1785774544 12487934
Total 149 14733510038
12-64 Statistics for Business and Economics, 9th Edition, Global Edition
Analysis of Variance
Source DF SS MS F P
Regression 5 12929178260 2585835652 206.37 0.000
Residual Error 144 1804331777 12530082
Total 149 14733510038
This is the final model. All of the independent variables are conditionally significant
and the model explains a sizeable portion of the variability in salary.
To test the hypothesis that the rate of change in female salaries as a function of
Experience is less than the rate of change in male salaries as a function of Experience,
the dummy variable Gender is used to see if the slope coefficient for Experience (X1)
is different for males and females. The following model is used:
Create the variable X4X1 and then test for conditional significance in the regression
model. If it proves to be a significant predictor of salaries then there is strong evidence to
conclude that the rate of change in female salaries as a function of Experience is different
than for males:
Analysis of Variance
Source DF SS MS F P
Regression 6 12964050859 2160675143 174.62 0.000
Residual Error 143 1769459178 12373840
Total 149 14733510038
The regression shows that the newly created variable of Fem(exp) is conditionally
significant at the 10% level but not at the 5% level. And we conclude that the rate of
change in female salaries as a function of experience is less than that of male salaries at
the 10% level. We cannot conclude that the rate of change in female salaries as a function
of Experience differs from that of male salaries at the 5% level.
a. Correlation matrix:
There exists a positive relationship between EconGPA and all of the independent variables,
which is expected. Note that there is a high correlation between the composite ACT score and
the individual components, which is again, as expected. Thus, high correlation among the
independent variables is likely to be a serious concern in this regression model.
Analysis of Variance
Source DF SS MS F P
Regression 6 8.1778 1.3630 5.52 0.000
Residual Error 64 15.8166 0.2471
Total 70 23.9945
As expected, high correlation among the independent variables is affecting the results. A
strategy of dropping the variable with the lowest t-statistic with each successive model
causes the dropping of the following variables (in order): 1) ACTmath, 2) ACTeng, 3)
ACTss, 4) HSPct. The two variables that remain are the final model of gender and
ACTcomp:
Analysis of Variance
Source DF SS MS F P
Regression 2 7.0705 3.5352 14.54 0.000
Residual Error 70 17.0192 0.2431
Total 72 24.0897
b. The model could be used in college admission decisions by creating a predicted GPA
in economics based on sex and ACT comp scores. This predicted GPA could then be
used with other factors in deciding admission. Note that this model predicts that
females will outperform males with equal test scores. Using this model as the only
source of information may lead to charges of unequal treatment.
12.109
Correlations: Real Home Pr, Year, Real Buildin, U. S. Popula, Long Interes, ...
Regression Analysis: Real Home Price versus Year, Real Building Co, ...
Analysis of Variance
Source DF SS MS F P
Regression 5 49651.0 9930.2 57.72 0.000
Residual Error 115 19783.9 172.0
Total 120 69434.9
12-68 Statistics for Business and Economics, 9th Edition, Global Edition
Regression Analysis: Real Home Price versus Year, Real Building Co, ...
The regression equation is
Real Home Price Index = 765 - 0.373 Year + 1.01 Real Building Cost Index - 3.42
Long Interest Rate + 0.328 Consumer Price Index
Analysis of Variance
Source DF SS MS F P
Regression 4 49368 12342 71.34 0.000
Residual Error 116 20067 173
Total 120 69435
a. The model exhibits a tendency to predict low home prices over the long time period.
This is evident from the coefficient, - 0.373 of the independent variable Year.
b. The housing price bubble can be identified by predicting the real home price index
using the obtained model for the years in the first part of the 21st century.
12.110
Correlations: Sale 2 Price, Time Interva, Sale 1 Price, Atlanta, Chicago, ...
Chicago Dallas
Dallas -0.333
0.000
Oakland -0.333 -0.333
0.000 0.000
There exists a negative relationship between Sale 2 price and all of the independent variables
except Sale 1 price and Oakland. Note that there is a high correlation between the Sale 1 price
and Sale 2 price, as expected. High correlation among the independent variables is likely to be a
serious concern in this regression model.
Regression Analysis: Sale 2 Price versus Time Interva, Sale 1 Price, ...
Analysis of Variance
Source DF SS MS F P
Regression 5 5.46776E+13 1.09355E+13 25892.07 0.000
Residual Error 3994 1.68687E+12 422350080
Total 3999 5.63645E+13
97% of the variation in the second or final sales price can be explained by variations in the
interval between house sales, and the initial house sales price with adjustments for the four major
U.S. market areas.
All the variables are highly significant in explaining the final sales price at all levels of alpha.
The time interval and initial sales price have the expected sign in the regression model. The F-
test of the significance of the overall model shows that we reject in favor of that at least
one slope coefficient is not equal to zero. The F-test yielded a p-value of .000.
The sample correlation between the observed and the predicted values of the final sales price
is .985.
12-70 Statistics for Business and Economics, 9th Edition, Global Edition
12.111
House value models:
Analysis of Variance
Source DF SS MS F P
Regression 4 1079.08 269.77 20.70 0.000
Residual Error 85 1107.55 13.03
Total 89 2186.63
All of the independent variables are conditionally significant. Now add the percent of
commercial property to the model to see if it is significant:
Analysis of Variance
Source DF SS MS F P
Regression 5 1080.07 216.01 16.40 0.000
Residual Error 84 1106.56 13.17
Total 89 2186.63
With a t-statistic of -.27 we have not found strong enough evidence to reject that the slope
coefficient on percent commercial property is significantly different from zero. The conditional F
test:
.
With 1 degree of freedom in the numerator and (90-5-1) = 84 degrees of freedom in the
denominator, the critical value of F at the .05 level is 3.95. Thus, at any common level of alpha,
do not reject that the percent commercial property has no effect on house values.
Analysis of Variance
Source DF SS MS F P
Regression 5 1096.77 219.35 16.91 0.000
Residual Error 84 1089.86 12.97
Total 89 2186.63
Likewise, the percent industrial property is not significantly different from zero. The conditional
F test:
.
With 1 degree of freedom in the numerator and (90-5-1) = 84 degrees of freedom in the
denominator, the critical value of F at the .05 level is 3.95. Again this is lower than the critical
value of F based on common levels of alpha, therefore, do not reject that the percent
industrial property has no effect on house values.
Analysis of Variance
Source DF SS MS F P
Regression 3 0.00237926 0.00079309 13.41 0.000
Residual Error 86 0.00508785 0.00005916
Total 89 0.00746711
Analysis of Variance
Source DF SS MS F P
Regression 2 0.0023414 0.0011707 19.87 0.000
Residual Error 87 0.0051257 0.0000589
Total 89 0.0074671
Both of the independent variables are significant. This becomes the base model that we now add
percent commercial property and percent industrial property sequentially:
Analysis of Variance
Source DF SS MS F P
Regression 3 0.0032936 0.0010979 22.62 0.000
Residual Error 86 0.0041735 0.0000485
Total 89 0.0074671
With 1 degree of freedom in the numerator and (90-3-1) = 86 degrees of freedom in the
denominator, the critical value of F at the .05 level is 3.95. Hence we would conclude that the
percentage of commercial property has a statistically significant positive impact on tax rate.
12-74 Statistics for Business and Economics, 9th Edition, Global Edition
Analysis of Variance
Source DF SS MS F P
Regression 3 0.00238178 0.00079393 13.43 0.000
Residual Error 86 0.00508533 0.00005913
Total 89 0.00746711
The percent industrial property is insignificant with a t-statistic of only -.83. The F-test confirms
that the variable does not have a significant impact on tax rate:
In conclusion, we found no evidence to back three of the activists claims and strong evidence to
reject one of them. We concluded that commercial development will have no effect on house
value, while it will actually increase tax rate. In addition, we concluded that industrial
development will have no effect on house value or tax rate.
It was important to include all of the other independent variables in the regression models
because the conditional significance of any one variable is influenced by which other
independent variables are in the regression model. Therefore, it is important to test if direct
relationships can be ‘explained’ by the relationships with other predictor variables.
12.112
a.
To predict the percentage of students who graduate in 4 years from highly ranked private
colleges, the following list of potential predictor variables are selected:
b.
Multiple regression using the listed predictor variables:
Analysis of Variance
Source DF SS MS F P
Regression 4 0.91267 0.22817 53.98 0.000
Residual Error 93 0.39313 0.00423
Total 97 1.30580
c.
The final regression after eliminating the insignificant predictor variable, Admission rate:
Analysis of Variance
Source DF SS MS F P
Regression 3 0.91262 0.30421 72.73 0.000
Residual Error 94 0.39318 0.00418
Total 97 1.30580
12-76 Statistics for Business and Economics, 9th Edition, Global Edition
d.
Undergrad.Enrollment and Quality Rank are negatively related to 4-year Grad.Rate.
Student/faculty Ratio has the expected sign for the dependent variable.
All else being equal, a 1-unit increase in Undergrad.Enrollment will decrease 4-year Grad Rate
by .000009. All else being equal, a 1-unit increase in Student/faculty Ratio will increase 4-year
Grad Rate by .0153. All else being equal, a 1-unit increase in Quality Rank will decrease 4-year
Grad Rate by .0039.
12.113
a.
To predict the cost with financial aid for students at highly ranked private colleges, the following
list of potential predictor variables are selected:
Undergrad. Enrollment, Admission Rate, Cost After Need-based Aid, Need Met, Cost After
Non-Need-Based Aid, Average Debt, and Cost Rank
b.
Multiple regression using the listed predictor variables:
Regression Analysis: FinaidCost versus Undergrad. E, Admission Ra, ...
Analysis of Variance
Source DF SS MS F P
Regression 7 2736003281 390857612 27.89 0.000
Residual Error 89 1247231043 14013832
Total 96 3983234323
c.
The final regression after eliminating the insignificant predictor variables - Undergrad.
Enrollment, Cost After Need-based Aid, Average Debt, and Cost Rank sequentially:
Analysis of Variance
Source DF SS MS F P
Regression 3 2693664779 897888260 64.75 0.000
Residual Error 93 1289569544 13866339
Total 96 3983234323
d.
Admission Rate is negatively related to Cost with financial aid. Need met and Cost After Non-
Need-Based Aid are positively related to Cost with financial aid.
All else being equal, a 1% increase in Admission Rate will decrease Cost with financial aid by
$10,314. All else being equal, a 1% increase in Need met will increase Cost with financial aid by
$22,484. All else being equal, a 1$ increase in Cost After Non-Need-Based Aid will increase
Cost with financial aid by $.27.
12-78 Statistics for Business and Economics, 9th Edition, Global Edition
12.114
a. daycode2 is a dummy variable where first interview is coded 0 and second interview
coded 1.
Analysis of Variance
Source DF SS MS F P
Regression 7 117606 16801 86.77 0.000
Residual Error 8209 1589454 194
Total 8216 1707060
Analysis of Variance
Source DF SS MS F P
Regression 8 187086 23386 126.29 0.000
Residual Error 8208 1519974 185
Total 8216 1707060
Analysis of Variance
Source DF SS MS F P
Regression 8 128336 16042 83.43 0.000
Residual Error 8206 1577814 192
Total 8214 1706151
Analysis of Variance
Source DF SS MS F P
Regression 8 125197 15650 81.53 0.000
Residual Error 8105 1555766 192
Total 8113 1680963
12.115
a.
daycode2 is a dummy variable where first interview is coded 0 and second interview
coded 1. activity_level1 is a dummy variable where 1 is coded 0 and 2 is coded 1.
Regression Analysis: HEI2005 versus sr_did_lm_wt, smoker, ...
Analysis of Variance
Source DF SS MS F P
Regression 9 148319 16480 95.74 0.000
Residual Error 5318 915359 172
Total 5327 1063678
Analysis of Variance
Source DF SS MS F P
Regression 10 177400 17740 106.43 0.000
Residual Error 5317 886279 167
Total 5327 1063678
Analysis of Variance
Source DF SS MS F P
Regression 10 149300 14930 86.87 0.000
Residual Error 5315 913505 172
Total 5325 1062805
Analysis of Variance
Source DF SS MS F P
Regression 10 151259 15126 88.20 0.000
Residual Error 5287 906658 171
Total 5297 1057916
12.116
a. daycode2 is a dummy variable where first interview is coded 0 and second interview
coded 1.
Analysis of Variance
Source DF SS MS F P
Regression 7 7009.6 1001.4 121.80 0.000
Residual Error 8209 67491.9 8.2
Total 8216 74501.5
Analysis of Variance
Source DF SS MS F P
Regression 8 7166.63 895.83 109.20 0.000
Residual Error 8208 67334.92 8.20
Total 8216 74501.55
Analysis of Variance
Source DF SS MS F P
Regression 8 7066.32 883.29 107.54 0.000
Residual Error 8206 67400.82 8.21
Total 8214 74467.14
Analysis of Variance
Source DF SS MS F P
Regression 8 7438.61 929.83 113.83 0.000
Residual Error 8105 66208.22 8.17
Total 8113 73646.83
12.117
a.
daycode2 is a dummy variable where first interview is coded 0 and second interview
coded 1. activity_level1 is a dummy variable where 1 is coded 0 and 2 is coded 1.
Regression Analysis: daily_cost versus sr_did_lm_wt, smoker, ...
Analysis of Variance
Source DF SS MS F P
Regression 9 6451.80 716.87 94.88 0.000
Residual Error 5318 40181.04 7.56
Total 5327 46632.84
12-86 Statistics for Business and Economics, 9th Edition, Global Edition
Analysis of Variance
Source DF SS MS F P
Regression 10 6476.42 647.64 85.75 0.000
Residual Error 5317 40156.42 7.55
Total 5327 46632.84
Analysis of Variance
Source DF SS MS F P
Regression 10 6462.21 646.22 85.58 0.000
Residual Error 5315 40132.81 7.55
Total 5325 46595.02
Analysis of Variance
Source DF SS MS F P
Regression 10 6405.63 640.56 84.47 0.000
Residual Error 5287 40092.04 7.58
Total 5297 46497.67