MTH613 Lecture 4 - Topic 5
MTH613 Lecture 4 - Topic 5
MULTIPLE
LINEAR
REGRESSION When we have two or more explanatory variables in our model,
it is called linear regression.
EXAMPLE
Let’s say that we would like to predict the price of a car, and use
age and mileage as explanatory variables. Since we use more
MULTIPLE than one explanatory variable, it is called multiple linear
LINEAR regression.
REGRESSION
MULTIPLE
LINEAR
REGRESSION
Notice that Excel and Minitab print a second R2 statistic, called the coefficient of
determination adjusted for degrees of freedom, which has been adjusted to take into
account the sample size and the number of independent variables.
MULTIPLE The rationale for this statistic is that, if the number of independent variables k is large
relative to the sample size n, the unadjusted R2 value may be unrealistically high.
LINEAR To understand this point, consider what would happen if the sample size is 2 in a
simple linear regression model.
REGRESSION The line would fit the data perfectly resulting in R2 = 1 when, in fact, there may be no
linear relationship.
To avoid creating a false impression, the adjusted R2 is often calculated. Its formula is
as follows:
If the null hypothesis is true, none of the independent variables x1, x2, . . . , xk is
linearly related to y, and therefore the model is invalid. If at least one βi is not equal
to 0, the model does have some validity.
MULTIPLE The total variation in the dependent variable [measured by a ( yi − y)2] can be
decomposed into two parts: the explained variation (measured by SSR) and the
LINEAR unexplained variation (measured by SSE). That is:
Total variation in y = SSR + SSE
REGRESSION
If SSR is large relative to SSE, the coefficient of determination will be high—
signifying a good model.
On the other hand, if SSE is large, most of the variation will be unexplained, which
indicates that the model provides a poor fit and consequently has little validity.
To judge whether SSR is large enough relative to SSE to allow us to infer that at least
one coefficient is not equal to 0, we compute the ratio of the two mean squares.
The calculation of the test statistic is summarized in an analysis of variance (ANOVA)
Prepared & Compiled by Brian. Q. Inomea table in excel.
TESTING THE VALIDITY OF THE MODEL
MULTIPLE
Sample calculation of the test statistic summarized in an analysis of
LINEAR variance (ANOVA) table in excel.
REGRESSION
MULTIPLE
LINEAR
REGRESSION
Coefficient of determination
MULTIPLE
LINEAR
The relationship among sε, R2, and F is summarized in Table below
REGRESSION
MULTIPLE
LINEAR
REGRESSION
EXAMPLE
Using multiple linear regression, create a model from the data
given such that both age and mileage are included as
explanatory variables.
MULTIPLE
LINEAR
REGRESSION
EXAMPLE
Solution
Using excel to generate the regression analysis
MULTIPLE
LINEAR
REGRESSION
EXAMPLE
Solution
Using excel to generate the regression analysis
MULTIPLE
LINEAR
REGRESSION
EXAMPLE
Solution
We are asked to predict the price of a car that is two years old
and that has been driven 50 thousands miles.
MULTIPLE
LINEAR
REGRESSION
From the analysis report;
Intercept = 32.45
Age = -1.54
Mileage = -0.15
Standard error = 0.79
Prepared & Compiled by Brian. Q. Inomea
MULTIPLE LINEAR REGRESSION
EXAMPLE
Solution
Now our least square equation for this problem is as follows,
LINEAR
Inserting the known variables into the equation to find the
REGRESSION corresponding price. Given the car is two years old and that has
been driven 50 thousands miles.
Y = 32.45 – 1.54(2) – 0.15(50) + 0.79
Y=?