0% found this document useful (0 votes)
330 views

Tutorial Answers

This document provides sample solutions to tutorial questions related to multiple linear regression analysis. It includes interpretations of regression output, confidence intervals, hypothesis tests on coefficients, and predictions using estimated regression models. Sample questions are worked through related to determinants of used car prices and Sydney housing prices.

Uploaded by

Petrina
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
330 views

Tutorial Answers

This document provides sample solutions to tutorial questions related to multiple linear regression analysis. It includes interpretations of regression output, confidence intervals, hypothesis tests on coefficients, and predictions using estimated regression models. Sample questions are worked through related to determinants of used car prices and Sydney housing prices.

Uploaded by

Petrina
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

ECON 1203 Tutorial Sample Solutions

Semester 1 2015
Weeks 11 and 12

1. Recall the Anzac Garage data (AnzacG.xls) used previously and available in the Excel
data subfolder on the Moodle site under Tutorial Questions and Information. We
previously considered the simple linear regression model given by:

pric ei =0 + 1 ag e i+u i
where price = the price of a used car, in dollars, and age = the age of the car, in years.
The Excel results obtained using ordinary least squares to estimate this model are
presented below:
Regression Statistics
R2

0.077

Standard Error

42069

Observations

117

Coefficients Standard Error

(a)

t Stat

p-value

Intercept

47469

6748

7.035

0.000

Age

-2658

856

-3.106

0.002

Interpret the t-Stat and the p-values in the output above. What do you need
to assume for these interpretations to be correct?

The t-stat and p-values in the Excel output are derived from two-tailed tests with null
hypotheses that the associated population parameter equals 0. Hence, larger t-stats and
lower p-values mean we are more confident that the associated population parameter is nonzero. Here, p-values for both the intercept and the coefficient on age are below 1%, and
hence we can be more than 99% confident in each case that the corresponding population
parameter is not zero.
We need either to assume the disturbances are normal, or because the sample size is large
invoke the CLT.
(b)

Calculate a 95% confidence interval for the coefficient on age.

The standard normal critical value is 1.96, hence the 95% confidence interval is:
-2658 1.96856 = -2658 1678 = (-4336, -980)
(c)

Interpret the R2 value.

The regression model including age explains 7.7% of the variation in used car prices.
(d)

Test whether the estimated coefficient on Age is significantly less than zero at
the 5% level of significance.

Unlike in (a), this is a one-tailed test:


H0: 1=0; H1: 1< 0
Decision rule: Reject H0 if b1/se(b1) < -1.645
Test statistic: b1/se(b1)=-3.106 < -1.645 and hence reject H0
(e)

Estimate a 95% confidence interval for the mean price for a second-hand
passenger car that is 10 years old, and interpret the result. Note: the sample
mean of age is 6.44 years.

A 10-year-old car is expected to be valued at $47469 - 102658=20889.


The boundaries of the confidence interval for this prediction can be found by:

2
X p X )
(
1
Y^ p t s +
,
n ( X i X ) 2
2

where s = 42069, se(b1)=856 and hence


2
( X i X ) =

s
42069
=
=2415
2
2
856
( se( b1) )

Hence:

1 ( 106.44 )
20889 1.98 42069
+
=20889 9783
117
2415
We are 95% confident that the expected price of a 10- year- old car will fall between
$11,106 and $30,672. While the impact of age on price is precisely estimated, the CI is
quite wide because of the large amount of unexplained variation that is indicated by
the very low R2 value reported. (Note: use of normal critical values here would be
acceptable given the large sample size, and would make little practical difference as
the critical value would be 1.96 rather than 1.98)
Anzac Garage is worried about its pricing scheme, which is based solely on the age of
the car. When its second-hand car prices are compared with the prices of cars of the
same age at other dealerships, they are often different. A consultant notes that the
value of a second-hand car should depend on both the odometer reading and the age
of the vehicle. This consultant wanted to estimate the following two simple linear
regression models separately:

pric ei =0 + 1 ag e i+u i
pric ei = 0 + 1 odometer i+v i
where odometer = distance the car has travelled since leaving the factory, in
kilometres. A senior consultant advised the use of a multiple linear regression model
instead, i.e.,:

pric ei = 0+ 1 odomet er i + 2 agei + v i

(f)

Discuss why the simple linear regression methods may not be preferable to the
multiple regression method, in general, and in the context of this problem. The
resultant OLS estimates for the multiple regression model are given below:
SUMMARY OUTPUT

Regression Statistics
R Square

0.150

Standard
Error

40568

Observations 117

Coefficient Standard
s
Error

t Stat

P-value

Intercept

53867

6825

7.893

0.000

Odometer
(km)

-0.270

0.087

-3.110

0.002

Age

-360

1108

-0.325

0.746

The predictive performance of the model will improve as relevant variables are added to a
simple regression model.
Also the assumption that the disturbance is uncorrelated with the explanatory variables is
critical for the unbiased estimation of coefficients of included variables. In the simple price on
age regression it will be violated if variables affecting price and correlated with age have
been omitted from the model. This is likely to be the case here with distance the car has
traveled.
We see the R2 has improved (approximately doubled) with the addition of odometer and the
coefficient on age is now much smaller in magnitude and is now statistically insignificant.
2. Sydney housing prices, encore.
Recall the housing price data for Sydney suburbs used previously. Your statistically
nave friend has been doing some analysis of Sydney housing prices using these data
and has asked you for help. In addition to the price data, a number of characteristics
associated with the suburb have been collected and are likely to explain some of the
large variation in housing prices across suburbs that are observed in the data. Your
friend is very interested in the impact on housing prices of being located under the
flight path. He ran a regression of housing price on the flightpath variable (Model 1) and
the results surprised him. On your advice he ran a second regression (Model 2) that
included several extra explanatory variables. Results for Model 1 and Model 2 are
presented in the table below. Note that:

Housing price is the mean of the median price of houses sold in each suburb for two
quarters (September and December 2002) measured in thousands of dollars;

Distance to CBD is the distance, measured in kilometres, of the suburb from Sydneys
CBD;

Distance to Airport is the distance, measured in kilometres, of the suburb from the
Sydney airport;

Distance to beach is the distance, measured in kilometres, of the suburb from the
nearest beach;

Flightpath is a dummy variable that equals 1 if the suburb is under the flight path and
0 otherwise.
Multiple regression results for Sydney housing prices*
Explanatory
variables

Dependent variable: Housing


price
Model 1

Model 2

Intercept

569.9
(20.6)

853.5
(35.5)

Flightpath

216.2
(56.0)

51.5
(50.2)

Distance to CBD

-21.5
(3.4)

Distance to
Airport

21.0
(2.9)

Distance to
beach

-13.9
(2.3)

Observations
R squared

503

503

0.029

0.372

* Numbers in brackets below coefficient estimates are standard errors.


(a)

How would you interpret the regression estimates for the parameters in Model 1?
Explain why your friend found these results to be unexpected.

Because the estimate of 1 is positive, this means that houses under the flight path on
average sell for more ($216,200 more) than houses not under the flight path. This is
surprising because you would expect aircraft noise associated with being under the flight path
would be unattractive and hence lead to lower, not higher, house prices.
(b)

Explain why the results in Model 1 are unreliable as a basis for determining the
impact on housing prices of being located under the flight path. Which of the
assumptions associated with simple linear regression has clearly been violated in
Model 1?

Your friend would like to make a statement about the impact on prices of being under the
flight path holding other factors constant. This is not possible with Model 1, as it is a
simple linear regression and hence there is the potential for omitted (confounding) variables
that lead to biased estimates of the impact of being situated under the flight path.
For example, given Sydneys geographical layout, proximity to the beach is likely to impact
on housing prices and to be correlated with being under the flight path. In Model 1, the
variable capturing distance to beach is in the disturbance term and hence leads to a violation
of the assumption that E(|Xi) = 0.
(c)

Write a brief description of the results for Flightpath in Model 2 in terms of the
parameter estimate, its interpretation, and its statistical significance.

The estimated parameter indicates a $51,500 premium (much smaller than for Model 1) for
suburbs under the flight path relative to those not under the flight path, holding other factors
constant.
For statistical significance:
H0: i = 0 versus H1: i 0 where i is the ith regression coefficient
Because we have a large sample size we can invoke the CLT and use standard normal critical
values when evaluating the test statistics given by b i/se(bi)
If we choose = 0.05 then the decision rule will be to reject if | b i/se(bi)| > 1.96
The test statistic for flightpath (51.5/50.2 = 1.03) indicates that this parameter is not
statistically different from zero. This indicates that certis paribus, for houses in the Sydney
suburbs, there is no statistically significant effect on price from being located in the flight
path.
(d)

Interpret the overall fit of Model 2.

Model 2 produces an R2 of 0.372 37.2% of the variation in Sydney housing prices is


explained by the explanatory variables in the regression.
(e)

Use Model 2 to predict the average house price for the suburb of Randwick, which
is 5.21 kms from the CBD, 1.78 kms from the beach, 6.62 kms from the airport
and is not deemed to be under the flight path.

Prediction = 853.5 + 0 21.55.21 + 216.62 13.91.78


= 855.763
The predicted average house price for Randwick is $855,763.
3. Work through Problem 44 on p. 622 of Sharpe. (This requires first downloading the data
entitled Demographics from MyStatLab, available under Chapter 17: Multiple
Regression.)
4. Work through Problem 22 on p. 664-665 of Sharpe.

You might also like