Homework 2
Homework 2
Data on last year’s sales (y, in 100,000s of dollars) in 15 sales districts are given in the file sales.
This file also contains promotional expenditures (x1, in thousands of dollars), the number of
active accounts (x2), the number of competing brands (x3), and the district potential (x4, coded)
for each of the districts.
a) A model with all four regressors is proposed:
y=β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 +e ,
e N (0 , σ 2)
Estimate the model and interpret the coefficients.
b) Using the model in (a), obtain a prediction for the sales in a district where x1 =
3.0, x2 = 45, x3 = 10 and x4 = 5. Obtain the corresponding 95% prediction interval.
The prediction is 122.384 with a 95% confidence interval of (116.868,
127.900).
c)
i. H o : β 4=0 H a : β 4 ≠ 0
SS E R−SS E F 43967.9−262.1
P−I 4−2 21852.9
F= = = =583.68
SS E F 262.1 37.44
n− p−1 10−2−1
iii. β 2=β 3
4.59
t= =4.41, t ¿ ( 11 )=2.228 ⇒ t >t ¿ ,so accept Ha: β 2 is not equal to β 3
1.04
iv. β 1=β 2=β 3=β 4 =0 ;
n=15,p=4,MSE=26.2, MSR=22321.3, so
MSR
F= =851.96
MSE
F ¿ ( 4,10 )=3.48<851.96 ⇒ F> F ¿
So we will accept Ha: at least one β is not equal to 0.
PROBLEM 2
e)
PMSE of a:
PMSE of b:
PMSE of c:
PMSE of d:
This means the best model is
PROBLEM 3
Data on crime-related statistics for 47 U.S states in 1960 are given in the file crimerate. The dataset
includes:
a) Fit a full model with all the predictors; summarize the model and interpret the coefficients.
With all of the other variables 0, the crime rate would be -692.
For each increase in 1 year in age, the rate will increase by 1.040.
If the state is southern, the crime rate will decrease by 8.3
For each year increase in the median schooling, the crime rate will increase
by 1.802.
For each unit increase in police expenditure in 1959, the crime rate will
increase by 1.61
For each unit increase in police expenditure in 1960, the crime rate will
decrease by 0.67
For each unit increase of labor force participation, the crime rate will
decrease by 0.41
For each 1000
b)
Using an alpha of 0.05, these are the variables that are not significant:
Source F-Value P-Value
Regression 8.46 0.000
S 0.31 0.581
PE 2.31 0.138
PE-1 0.34 0.565
LF 0.07 0.791
M 0.62 0.438
Pop 0.10 0.752
NW 0.01 0.911
UE1 1.89 0.178
Wealth 1.68 0.203
This is the result of the model.
b) Plot the data Y=ln(Price) versus X=Age of Bottle. Fit a straight line through the
data by least square and produce the analysis of variance table.
c) What would you conclude about the price of vintage port as exhibited by
this set of data and your analysis? To the nearest cent, at what per-year
rate would you expect the price of a bottle of vintage port to rise if a
similar price pattern continued into the future?
The price of the wine does increase as the age of the bottle increases. The
ln(price) will increase by 0.03465 each year. Therefore, the price will
increase e 0.03465=$ 1.04 .
d) A subsequent advertisement three years later on Tuesday, November 25, 1975
offered 1937 vintage port at $20.00 per bottle. If it can be assumed that a straight
line relationship is preserved, and applies also to this new data point, how much per
bottle per year does it appear prices have risen in the intervening three years? Are
your answers here and in c. consistent, or does it appear that per year prices have
accelerated?
PROBLEM 5
Question 5:
Presidential Election Data (1916-2008): The data in (Pres Elec.txt, posted in BB), collected
by Professor Ray Fair of Yale University, who has found that the proportion of votes
obtained by a presidential candidate in a United States presidential election can be
predicted accurately by three macroeconomic variables, incumbency, and a variable which
indicates whether the election was held during or just after a war. The variables considered
are given below.
Variables for the Presidential Election Data (1916-2008)
Variable Definition
YEAR Election year
V Democratic share of the two-party presidential vote
I Indicator variable (1 if there is a Democratic incumbent at the time of
the election and -1 if there is a Republican incumbent)
D Indicator variable (1 if a Democratic incumbent is running for election,
-1 if a Republican incumbent is running for election, and 0 otherwise)
W Indicator variable (1 for the elections of 1920, 1944, and 194, and 0
otherwise)
G Growth rate of real per capita GDP in the first three quarters of the
election year
P Absolute value of the growth rate of the GDP deflator in the first 15
quarters of the administration
N Number of quarters in the first 15 quarters of the administration in
which the growth rate of real per capita GDP is greater than 3.2%
All growth rates are annual rates in percentage points. Consider fitting the initial model to
the data.
V = β0 + β 1 . I + β 2 . D+ β 3 . W + β 4 . ( G. I )+ β5 . P+ β6 . N +ϵ
However the value with the best R-sq is the model with all variables.
Using this to predict our values, we get