Students Tutorial Answers Week12
Students Tutorial Answers Week12
Students Tutorial Answers Week12
whereprice=usedcarpriceindollarsandage=ageofthecarinyears.
The EXCEL results obtained using Ordinary Least Squares are
presentedbelow:
RegressionStatistics
R 2
0.077
StandardError 42069
Observations
117
(a) Interpret the tStat and the pvalues in the EXCEL output.
Whatdoyouneedtoassume?
Thetstat&pvaluesintheEXCELoutputarederivedfromtwotailtestswith
nullhypothesesthattheassociatedpopulationparameterequalsto0.Hence,
larger tstats and lower pvalues mean we are more confident that the
associatedpopulationparameterisnonzero.Here,pvaluesforbothintercept
and Age coefficients are below 1% &, hence we can be confident that both
populationparametersarestatisticallysignificant(nonzero).
Weneedtoassumethedisturbancesarenormalorbecausethesamplesizeis
largeinvoketheCLT.
1
(b) Calculatea95%confidenceintervalforthecoefficientonage.
Standardnormalcriticalvalueis1.96hence95%confidenceintervalis:
26581.96856=26581678=(4336,980)
(c) InterprettheR2value.
Theregressionmodelincludingageexplains7.7%ofthevariationinusedcar
prices.
(d) TestwhethertheestimatedcoefficientofAgeissignificantlyless
thanzeroatthe5%levelofsignificance.
Unlikein(a)thisisaonetailedtest:
(e) Estimate a 95% confidence interval for the mean price for a
secondhandpassengercarthatis10yearsoldandinterpretthe
result?Note:thesamplemeanofageis6.44years.
A10yearoldcarisexpectedtobevaluedat$47469102658=20889.
Boundariesofconfidenceintervalforthispredictioncanbefoundby:
1
,
wheres=42069,se(b1)=856andhence
42069
856
2415
Hence:
20889
1.98
42069
1
117
10
6.44
2415
20889
9783
Weare95%confidentthatthepriceofa10yearoldcarwillfallbetween
$11,106 and $30,672. While the impact of age on price is precisely
estimated, the CI is quite wide because of the large amount of
unexplainedvariationthatisindicatedbytheverylowR2valuereported.
(Note: use of normal critical values here would be acceptable given the
large sample size and would make little practical difference as the
criticalvaluewouldbe1.96ratherthan1.98)
Anzac Garages pricing scheme based on the age of the car is not
workingoutverywell.Whenitssecondhandcarsarecomparedwith
cars of the same age from other dealers, prices often diverge. One of
their consultants noted that the value of a secondhand car should
dependonboththeOdometerreadingaswellastheAgeofthevehicle.
This consultant wanted to estimate the following two simple linear
regressionmodelsseparately:
whereOdometer=distancethecarhastravelledsinceleavingfactory
in kilometers. A senior consultant advised use of a multiple linear
regressionmodelinstead:
(f) Discuss why the simple linear regression methods may not be
preferable to the multiple regression method, in general, and in
the context of this problem. The resultant OLS estimates for the
multipleregressionmodelgivenbelow:
Thepredictiveperformanceofthemodelwillimproveasrelevantvariablesare
addedtoasimpleregressionmodel.
Alsotheassumptionthatthedisturbanceisuncorrelatedwiththeexplanatory
variables is critical for the unbiased estimation of coefficients of included
variables.Inthesimplepriceonageregressionitwillbeviolatedifvariables
affecting price and correlated with age have been omitted from the model.
Thisislikelytobethecaseherewithdistancethecarhastraveled.
SUMMARYOUTPUT
RegressionStatistics
RSquare
0.150
StandardError40568
Observations 117
CoefficientsStandardErrortStat Pvalue
Intercept
53867
6825
7.893 0.000
Odometer(km)0.270
0.087
3.110 0.002
Age
360
1108
0.325 0.746
2. ComputingExercise#4
Refer to the Computing Work document and answer question 3 on
page23onmultipleregression.
After estimating three import equations, the first two being simple
linearregression,thethirdbeingamultipleregressioncontainingGNR
and relative prices as explanatory variables you were asked the
followingdiscussionquestion:
The pvalues for 1 and 2 are both <0.0005 and hence at all conventional
significancelevelsonewouldrejectthenullhypothesesthatthesecoefficients
areindividuallyequaltozero.
4
In addition though you could argue that the multiple regression model is
betterbecauseitguardsagainsttheomittedvariablebiasthatislikelyinthe
twosimplelinearregressionmodels.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.9867
R Square
0.9736
0.9713
Adjusted R
Standard E 3140.3680
Observatio
26
Intercept
GNE
Price
t Stat
P-value
1.488
0.150
23.406
0.000
-4.722
0.000
3. SIA:Sydneyhousingprices.
RecallthehousingpricedataforSydneysuburbsusedinQuestion6in
Week3.Yourstatisticallynavefriendhasbeendoingsomeanalysisof
Sydneyhousingpricesusingthesedataandhasaskedyouforhelp.In
addition to the price data there are a number of characteristics
associated with the suburb that have been collected and are likely to
explain some of the large variation in housing prices across suburbs
that are observed in the data. Your friend was very interested in the
impact on housing prices of being located under the flight path. The
regression of housing price on the flightpath variable (Model 1)
provided a result that he did not expect. On your advice he ran a
second regression (Model 2) that included several extra explanatory
variables.ResultsforModel1andModel2arepresentedinthetable,
togetherwithafulldescriptionofvariablesusedintheanalysis.
5
Housingpriceisthemeanofthemedianpriceofhousessoldineach
suburbfortwoquarters(SeptemberandDecember2002)measured
inthousandsofdollars;
DistancetoCBDisdistancemeasuredinkilometersofthesuburbfrom
SydneysCBD;
Distance to Airport is distance measured in kilometers of the suburb
fromSydneyAirport;
Distance to beach is distance of the suburb measured in kilometers
fromthenearestbeach;
Flightpathisadummyvariablethatequals1ifthesuburbisunderthe
flightpathandequalto0otherwise.
(a) How would you interpret the regression estimates for the
parameters in Model 1 and explain why your friend found the
resulttobeunexpected?
(b) Explain why the results in Model 1 are unreliable as a basis for
determining the impact on housing prices of being located under
the flight path. Which of the assumptions associated with simple
linearregressionhasclearlybeenviolatedinModel1?
You would like to make the statement about the impact of being under the
flightpathholdingotherfactorsconstant.ThisisnotpossiblewithModel1
as it is a simple linear regression and hence there is potential for omitted
(confounding) variables that lead to biased estimates of the impact of being
situatedundertheflightpath.
Forexample,proximitytothebeachislikelytoimpactonhousingpricesand
be correlated with being under the flightpath. In Model 1, the variable
Distancetobeachisinthedisturbancetermandhenceleadstoaviolationof
assumptionthatE(u|X)=0.
(c)
WriteabriefdescriptionoftheresultsforFlightpathinModel2in
terms of the parameter estimate, its interpretation and its
statisticalsignificance.
Forstatisticalsignificance:
H0: i =0versusH1: i 0where i istheithregressioncoefficient
BecausewehavealargesamplesizewecaninvoketheCLTandusestandard
normalcriticalvalueswhenevaluatingtheteststatisticsgivenbybi/se(bi)
Ifwechoose =0.05thenthedecisionrulewillbetorejectif|bi/se(bi)|>1.96
The test statistic for flightpath (51.5/50.2 = 1.03) indicates that this
parameterisnotstatisticallydifferentfromzero.
(d) InterprettheoverallfitofModel2.
Model2producesanR2of0.372 37.2%ofthevariationinSydneyhousing
prices is explained by the explanatory variables in the regression.
(e) UseModel2topredicttheaveragehousingpriceforthesuburbof
Randwick which is 5.21 kms from the CBD, 1.78 kms from the
beach, 6.62 kms from the airport and is not deemed to be under
theflightpath.
Prediction=853.5+021.55.21+216.6213.91.78
=855.763
ThepredictedaveragehousepriceforRandwickis$855,763
MultipleregressionresultsforSydneyhousingprices*
Dependentvariable:
Housingprice
Model1
Model2
569.9
853.5
Intercept
(20.6)
(35.5)
216.2
51.5
Flightpath
(56.0)
(50.2)
21.5
Distanceto
(3.4)
CBD
Distanceto
21.0
Airport
(2.9)
Distanceto
13.9
beach
(2.3)
Observations
503
503
Rsquared
0.029
0.372
*Numbersinbracketsbelowcoefficientestimatesarestandarderrors.
Explanatory
variables