0% found this document useful (0 votes)
91 views

Stat Modelling Assignment 5

This document appears to be an assignment submitted by a group of 7 students for a statistical modelling course. It contains their work and responses to 5 questions regarding multiple linear regression, model selection, and residual analysis. The assignment includes derivation of the least squares estimator in multiple linear regression, analysis of individual predictor significance, model selection using various criteria, and examination of residual plots.

Uploaded by

Fariha Ahmad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

Stat Modelling Assignment 5

This document appears to be an assignment submitted by a group of 7 students for a statistical modelling course. It contains their work and responses to 5 questions regarding multiple linear regression, model selection, and residual analysis. The assignment includes derivation of the least squares estimator in multiple linear regression, analysis of individual predictor significance, model selection using various criteria, and examination of residual plots.

Uploaded by

Fariha Ahmad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

STQS 2234

STATISTICAL MODELLING

TITLE
ASSIGNMENT 5

PREPARED FOR
DR. MARINA BINTI ZAHARI

GROUP 7
NORAINSYIRAH BINTI MOHAMED NORDIN

A151588

NUR SYAHIDAH BINTI KHALAPIAH

A150105

NUR NADIRAH BINTI MOHAMAD YOHYI

A151055

SITI NUR ZAWANIE BINTI MD SOBRI

A149121

NUR AZILA BINTI BAHARUDDIN

A148328

NURFARIHA NADHIRAH BINTI AHMAD

A151675

NOORMARINA BINTI MOKHTAR

A151110

QUESTION 1:
1. Consider the multiple linear regression model y = X + . If denotes the least squares
estimator of , show that = + [( )1 ] .
y = X +
Minimizes SSE; SSE = =1 2 = where = ; =
= ( )( )
= ( )( )
= +
=
=0
Derived from fitted model = ;
=
= ()
()1 =
From multiple linear regression model y = X + ;
= X +

()1

= X +

= [()1 ][ + ]
= + [()1 ]

3(e)
i.

Residual versus x1 -

ii.

Residual versus x2

iii.

Residual versus x3

iv.

Residual versus x4

QUESTION 2:

QUESTION 3
a) H 0 : j 0 ,

H 1 : At least one of the j is not equal to zero


j 1,2,...,5

Test Statistic: f 0 4.81


Compare f 0 4.81 > f 0.01,5,19 4.17 and p-value = 0.0052 < 0.01
Here, we reject H 0 . We conclude that the data is linearly related to x1 , x2 , x3 , x4 and x5 .

H 0 : 1 0
H 1 : 1 0
Test Statistic: t 0 2.47
Compare t 0 2.47 t 0.05 / 2,19 2.093 and p-value = 0.023 < 0.05
With 0.05 , we would reject the null hypothesis. This indicates the predictor contribute to
the model.

H0 : 2 0
H1 : 2 0
Test Statistic: t 0 2.74
Compare t 0 2.74 t 0.05 / 2,19 2.093 and p-value = 0.013 < 0.05
With 0.05 , we would reject the null hypothesis. This indicates the predictor contribute to
the model.

H 0 : 3 0
H1 : 3 0
Test Statistic: t 0 2.42
Compare t 0 2.42 t 0.05 / 2,19 2.093 and p-value = 0.026 < 0.05
With 0.05 , we would reject the null hypothesis. This indicates the predictor contribute to
the model.

H0 : 4 0
H1 : 4 0
Test Statistic: t 0 2.79
Compare t 0 2.79 t 0.05 / 2,19 2.093 and p-value = 0.012 < 0.05
With 0.05 , we would reject the null hypothesis. This indicates the predictor contribute to
the model.

H 0 : 5 0
H1 : 5 0
Test Statistic: t 0 0.25
Compare t 0 0.25 t 0.05 / 2,19 2.093 and p-value = 0.801 > 0.05
With 0.05 , we fail to reject the null hypothesis. This indicates the predictor could be
deleted from the model.

b) When x5 was excluded from the model and the model was re-fitted, it shows that the model is
better compare to the model with x5 .

c) Some of the residuals are having large number and these become the outliers.

2
2
2
d) In (a), R adj
is the least compared to the R adj
in (b) and (c). R adj
for (b) is higher than (a) and
2
lower than (c) when x5 is removed with 25 observations. For (c), the R adj
is the highest when
2
increases as test. This is because the
x5 is removed with 24 observations. It shows that the R adj

variables in the model are all useful for the model.

e)
Residuals versus x1
-

Based on the plots, the points are randomly scattered. More points are plotted at the
bottom of the graph or at negative region. This is because it is over-predicted. There are
also some outliers.

Residuals versus x 2
-

Based on the plots, the points are randomly scattered. More points are plotted at the
bottom of the graph or at negative region. This is because it is over-predicted. There are
also some outliers. Points at the left bottom are plotted closely to each other.

Residuals versus x3
-

Based on the plots, the points are randomly scattered. More points are plotted at the
bottom of the graph or at negative region. This is because it is over-predicted. There are
also some outliers. The points at the negative region of the graph are mostly at the same x
value.

Residuals versus x 4
-

Based on the plots, the points are randomly scattered. The points plotted are most likely
same at both regions. There are also some outliers.

f) H 0 : j 0 ,

H 1 : At least one of the j is not equal to zero


j 1,2,3,4

Test Statistic: f 0 21.79


Compare f 0 21.79 > f 0.05, 4, 20 2.87
Here, we reject H 0 . We conclude that the data is linearly related to x1 , x2 , x3 and x 4 .

H 0 : 1 0
H 1 : 1 0
Test Statistic: t 0 5.76
Compare t 0 5.76 t 0.05 / 2, 20 1.725
With 0.05 , we would reject the null hypothesis. This indicates the predictor contribute to
the model.

H0 : 2 0
H1 : 2 0
Test Statistic: t 0 5.96
Compare t 0 5.96 t 0.05 / 2, 20 1.725
With 0.05 , we would reject the null hypothesis. This indicates the predictor contribute to
the model.

H 0 : 3 0
H1 : 3 0
Test Statistic: t 0 2.90
Compare t 0 2.90 t 0.05 / 2, 20 1.725
With 0.05 , we would reject the null hypothesis. This indicates the predictor contribute to
the model.

H0 : 4 0
H1 : 4 0
Test Statistic: t 0 4.99
Compare t 0 4.99 t 0.05 / 2, 20 1.725
With 0.05 , we would reject the null hypothesis. This indicates the predictor contribute to
the model.

g) The residual plots against x1 , x2 , x3 and x 4 has boundary between -1 and 1 and there are no
pattern shown in the plots. All of the plots have an outlier. The points in the all plots are
symmetrically distributed and most of the points are near to zero.

QUESTION 4:

Based on the 2 -value criterion, the best model is the model with the two predictors PctComp
and PctTD as the 2 -value give a substantial increase by jumps from 64.8 to 85.1.

Based on the adjusted 2 -value and MSE criteria, the best model is the model with the seven
predictors Att, PctComp, Yds, YdsperAtt, TD, PctTD and PctInt as the model have the largest
adjusted 2 -value (100.0) and the smallest (5.1).

Based on the criterion, there are eight possible best models


i.

the model with 6 predictors containing Att, PctComp, Yds, YdsperAtt, PctTD and
PctInt;

ii.

the model with 6 predictors containing Att, Comp, PctComp, YdsperAtt, PctTD and
PctInt;

iii.

the model with 7 predictors containing Att, PctComp, Yds, YdsperAtt, TD, PctTD and
PctInt;

iv.

the model with 7 predictors containing Att, PctComp, Yds, YdsperAtt, PctTD, Int and
PctInt;

v.

the model with 8 predictors containing Att, PctComp, Yds, YdsperAtt, TD, PctTD, Int
and PctInt;

vi.

the model with 8 predictors containing Att, Comp, PctComp, Yds, YdsperAtt, TD,
PctTD and PctInt;

vii.

the model with 9 predictors containing Att, PctComp, Yds, YdsperAtt, TD, PctTD, Lng,
Int and PctInt;

viii.

and the model with 9 predictors containing Att, Comp, PctComp, Yds, YdsperAtt, TD,
PctTD, Int and PctInt.

As all of those models are unbiased models, because their values equal (or are below) the
number of parameters, .

QUESTION 5:

You might also like