0% found this document useful (0 votes)
90 views14 pages

Homework 2

This document contains 5 problems related to regression analysis. Problem 1 involves fitting a regression model to sales data and interpreting the coefficients. Problem 2 compares different time series models. Problem 3 fits a full regression model to crime rate data and interprets the coefficients. Problem 4 analyzes the relationship between vintage port prices and age by fitting a linear regression model on logarithmically transformed data. Problem 5 considers a regression model for predicting presidential election outcomes based on macroeconomic variables.

Uploaded by

Aashna Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views14 pages

Homework 2

This document contains 5 problems related to regression analysis. Problem 1 involves fitting a regression model to sales data and interpreting the coefficients. Problem 2 compares different time series models. Problem 3 fits a full regression model to crime rate data and interprets the coefficients. Problem 4 analyzes the relationship between vintage port prices and age by fitting a linear regression model on logarithmically transformed data. Problem 5 considers a regression model for predicting presidential election outcomes based on macroeconomic variables.

Uploaded by

Aashna Mehta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

PROBLEM 1

Data on last year’s sales (y, in 100,000s of dollars) in 15 sales districts are given in the file sales.
This file also contains promotional expenditures (x1, in thousands of dollars), the number of
active accounts (x2), the number of competing brands (x3), and the district potential (x4, coded)
for each of the districts.
a) A model with all four regressors is proposed:
y=β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 +e ,
e N (0 , σ 2)
Estimate the model and interpret the coefficients.

b) Using the model in (a), obtain a prediction for the sales in a district where x1 =
3.0, x2 = 45, x3 = 10 and x4 = 5. Obtain the corresponding 95% prediction interval.
The prediction is 122.384 with a 95% confidence interval of (116.868,
127.900).
c)
i. H o : β 4=0 H a : β 4 ≠ 0

F-value = 0.41 with a p-value = 0.538. 0.538 ≥ α =0.05 . Therefore, we


want to accept the null hypothesis.
ii. H o : β 4=β 3=0 vs H a : at least one β 4 or β 3 ≠ 0

SS E R−SS E F 43967.9−262.1
P−I 4−2 21852.9
F= = = =583.68
SS E F 262.1 37.44
n− p−1 10−2−1

F ¿ ( 2,10 )=4.10<583.68=F ⇒ F > F ¿


so accept Ha: at least one beta is not equal to 0

iii. β 2=β 3
4.59
t= =4.41, t ¿ ( 11 )=2.228 ⇒ t >t ¿ ,so accept Ha: β 2 is not equal to β 3
1.04
iv. β 1=β 2=β 3=β 4 =0 ;

n=15,p=4,MSE=26.2, MSR=22321.3, so
MSR
F= =851.96
MSE
F ¿ ( 4,10 )=3.48<851.96 ⇒ F> F ¿
So we will accept Ha: at least one β is not equal to 0.
PROBLEM 2

a) y t =β 0+ β1 t+ et : Deterministic Trend Model

b) y t =β 0+ β1 y (t −1)+ et : Stochastic Trend Model

c) y t =β 0+ β1 y (t −4) +e t : Stochastic Seasonal Model

d) y t =β 0+ β1 y (t −1)+ β2 y (t −4) +e t : Stochastic Seasonal + Trend Model

e)
PMSE of a:
PMSE of b:
PMSE of c:
PMSE of d:
This means the best model is
PROBLEM 3
Data on crime-related statistics for 47 U.S states in 1960 are given in the file crimerate. The dataset
includes:

 Crime rate: Number of offenses known to police per 1,00,000population


 Age: Age distribution – Number of males aged 14-24 per 1,000 of total state population
 S: Binary variable distinguishing southern states (I) from the rest of the states
 Ed: Mean number of years of schooling x10 of the population, 25 years or older
 PE: Police expenditure – Per capita expenditure on police protection by state and local
government in 1960
 PE-1: Police expenditure – Per capita expenditure on police protection by state and local
government in 1959
 LF: Labor force participation rate per 1,000 civilian urban males in the age group 14-24
 M: The number of males per 1,000 females
 Pop: The state population size in 100,000
 NW: The number of non-whites per 1,000
 UE1: Unemployment rate of urban males per 1,000 in the age group 14-24
 UE2: Unemployment rate of urban males per 1,000 in the age group 35-39
 Wealth: Median value of transferable goods and assets or family income (units 10
dollars)
 IncIneq: Income inequality – Number of families per 1,000 earning below one-half of the
median income

a) Fit a full model with all the predictors; summarize the model and interpret the coefficients.
With all of the other variables 0, the crime rate would be -692.
For each increase in 1 year in age, the rate will increase by 1.040.
If the state is southern, the crime rate will decrease by 8.3
For each year increase in the median schooling, the crime rate will increase
by 1.802.
For each unit increase in police expenditure in 1959, the crime rate will
increase by 1.61
For each unit increase in police expenditure in 1960, the crime rate will
decrease by 0.67
For each unit increase of labor force participation, the crime rate will
decrease by 0.41
For each 1000
b)
Using an alpha of 0.05, these are the variables that are not significant:
Source F-Value P-Value
Regression 8.46 0.000
 S 0.31 0.581
  PE 2.31 0.138
  PE-1 0.34 0.565
  LF 0.07 0.791
 M 0.62 0.438
  Pop 0.10 0.752
  NW 0.01 0.911
  UE1 1.89 0.178
  Wealth 1.68 0.203
This is the result of the model.

For the partial F test

( R p−R f ) ( p−1 ) ( 0.2298−0.7692 )( 4−1 )


=
1−R p 1−22.98
( n−p−1 ) ( 13−4−1 )
−1.62
¿
0.96
¿−1.6857
¿ F< 2.7
This means that at least one of the terms is significant. This can also
be seen by the fact that our R-sq is much lower for the new model.
PROBLEM 4
In “The Chicago Maroon” for Friday, November 10, 1972, the Party Mart advertised per
bottle prices for vintage port as given in the accompanying table.
Year Prices ($) Year Prices ($)
1890 50.00 1941 10.00
1900 35.00 1944 5.99
1920 25.00 1948 8.98
1931 11.98 1950 6.98
1934 15.00 1952 4.99
1935 13.00 1955 5.98
1940 6.98 1960 4.98
a) Plot the data and examine them. Would it be sensible to fit a regression of
the response “price” on the predictor ‘year’? What disadvantages can you
see?

b) Plot the data Y=ln(Price) versus X=Age of Bottle. Fit a straight line through the
data by least square and produce the analysis of variance table.
c) What would you conclude about the price of vintage port as exhibited by
this set of data and your analysis? To the nearest cent, at what per-year
rate would you expect the price of a bottle of vintage port to rise if a
similar price pattern continued into the future?

The price of the wine does increase as the age of the bottle increases. The
ln(price) will increase by 0.03465 each year. Therefore, the price will
increase e 0.03465=$ 1.04 .
d) A subsequent advertisement three years later on Tuesday, November 25, 1975
offered 1937 vintage port at $20.00 per bottle. If it can be assumed that a straight
line relationship is preserved, and applies also to this new data point, how much per
bottle per year does it appear prices have risen in the intervening three years? Are
your answers here and in c. consistent, or does it appear that per year prices have
accelerated?
PROBLEM 5
Question 5:
Presidential Election Data (1916-2008): The data in (Pres Elec.txt, posted in BB), collected
by Professor Ray Fair of Yale University, who has found that the proportion of votes
obtained by a presidential candidate in a United States presidential election can be
predicted accurately by three macroeconomic variables, incumbency, and a variable which
indicates whether the election was held during or just after a war. The variables considered
are given below.
Variables for the Presidential Election Data (1916-2008)
Variable Definition
YEAR Election year
V Democratic share of the two-party presidential vote
I Indicator variable (1 if there is a Democratic incumbent at the time of
the election and -1 if there is a Republican incumbent)
D Indicator variable (1 if a Democratic incumbent is running for election,
-1 if a Republican incumbent is running for election, and 0 otherwise)
W Indicator variable (1 for the elections of 1920, 1944, and 194, and 0
otherwise)
G Growth rate of real per capita GDP in the first three quarters of the
election year
P Absolute value of the growth rate of the GDP deflator in the first 15
quarters of the administration
N Number of quarters in the first 15 quarters of the administration in
which the growth rate of real per capita GDP is greater than 3.2%

All growth rates are annual rates in percentage points. Consider fitting the initial model to
the data.
V = β0 + β 1 . I + β 2 . D+ β 3 . W + β 4 . ( G. I )+ β5 . P+ β6 . N +ϵ

a) Do we need to keep the variable I in the above model?


Yes we do not need to keep the I term because
F−value=1.10 ⇒ p−value=0.310> α =0.05.

b) Do we need to keep the interaction variable (G.I) in the above model?

Yes we do need to keep the interaction term because


F−value=36.53 ⇒ p−value=0.0009< α =0.05.

c) Now consider the data from 1916-2000. Examine different models to


produce the model or models that perform best in predicting future
(2004 and 2008) presidential elections. Include interaction terms if
needed. Write a brief report.

Using the find subset function of Minitab, we can see what


variables give us the best R-sq and therefore theoretically the best
model.
We can see that y=β o + β 1∗D+ β2∗G+ β3∗N + β 4∗( G∗I ) gives us the best
adjusted R-sq value of 73.8. If we use this model to predict, we get for
2004

year =2004 ⇒ vote=0.4502which gives a difference of 0.037 and for 2008

year =2008 ⇒vote=0.5344which gives a difference of 0.0035.

However the value with the best R-sq is the model with all variables.
Using this to predict our values, we get

You might also like