0% found this document useful (0 votes)
38 views1 page

Multiple Linear Regression Example: Problem Statement

The document discusses using multiple linear regression to predict used car prices based on mileage for different car brands. It presents two competing hypotheses: 1) That the intercepts differ for BMW, Porsche, and Jaguar but slopes are the same, and 2) That both intercepts and slopes differ between the brands. To test the second hypothesis, the model includes dummy variables for brand and interaction terms between brand and mileage. This full model will determine if price predictions based on mileage have different slopes for each brand.

Uploaded by

Govind Naik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views1 page

Multiple Linear Regression Example: Problem Statement

The document discusses using multiple linear regression to predict used car prices based on mileage for different car brands. It presents two competing hypotheses: 1) That the intercepts differ for BMW, Porsche, and Jaguar but slopes are the same, and 2) That both intercepts and slopes differ between the brands. To test the second hypothesis, the model includes dummy variables for brand and interaction terms between brand and mileage. This full model will determine if price predictions based on mileage have different slopes for each brand.

Uploaded by

Govind Naik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Multiple Linear Regression

Example
Problem Statement
Mileage of used cars is often thought of as a good predictor of sale prices of used cars. Does this
same conjecture hold for so called “luxury cars”: Porches, Jaguars, and BMWs? More precisely, do
the slopes and intercepts differ when comparing mileage and price for these three brands of cars?
To answer this question, data was randomly selected from an Internet car sale site. (Tweaked a bit
from Cannon et al. 2013 [Chapter 1 and Chapter 4])

Competing Hypotheses
There are many hypothesis tests to run here. It’s important to first think about the model that we will
fit to address these questions. We want to predict Price (in thousands of dollars) based
on Mileage (in thousands of miles). A simple linear regression equation for this would
be Price^=b0+b1∗MileagePrice^=b0+b1∗Mileage.
We are dealing with a more complicated example in this case though. We need to also include
in CarType to our model. Since CarType has three levels: BMW, Porche, and Jaguar, we encode this
as two dummy variables with BMW as the baseline (since it occurs first alphabetically in the list of
three car types). This model would help us determine if there is a statistical difference in the
intercepts of predicting Price based on Mileage for the three car types, assuming that the slope is
the same for all three lines:

Price^=b0+b1∗Mileage+b2∗Porche+b3∗Jaguar.Price^=b0+b1∗Mileage+b2∗Porche+b3∗Jaguar.
This is not exactly what the problem is asking for though. It wants us to see if there is also a
difference in the slopes of the three fitted lines for the three car types. To do so, we need to
incorporate interaction terms on the dummy variables of Porche and Jaguar with Mileage. This also
creates a baseline interaction term of BMW:Mileage, which is not specifically included in the model
but comes into play by setting Jaguar and Porche equal to 0:

Price^=b0+b1∗Mileage+b2∗Porche+b3∗Jaguar+b4Mileage∗Jaguar+b5Mileage∗Porche.

You might also like