L2C-Multiple Regression C 2022-03-03 21 - 20 - 04
L2C-Multiple Regression C 2022-03-03 21 - 20 - 04
1
Outline
Comparing Two Regression Models
Full Model Vs. Reduced Model
Partial F-test
The Change in SSE
Sequential Sum Squares Regression
2
Comparing Two Regression Models
Two models are nested if one model contains all the
terms of the other
3
Comparing Two Regression Models
– Full Model vs. Reduced Model Cont’d
Full model
Has 𝐾 𝑋-variables
𝑌" = 𝑏! + 𝑏" 𝑋" + ⋯ + 𝑏# 𝑋# + 𝑏#$" 𝑋#$" + ⋯ + 𝑏% 𝑋%
Reduced model
Has 𝐿 𝑋-variables
The subset of 𝑋-variables being tested are not in it
𝑌" = 𝑏! + 𝑏" 𝑋" + ⋯ + 𝑏# 𝑋#
4
Partial F-test
𝐻! : 𝛽#$" = 𝛽#$& = ⋯ = 𝛽% = 0
𝑋-variables in the subset do not significantly improve the
model when all the other 𝑋-variables are included
5
Partial F-test
Look at how much the SSE change before and after the
inclusion of the subset of 𝑋-variables
As 𝑆𝑆𝑇 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸
The reduced model has fewer 𝑋-variables, the SSR and 𝑟 " is
expected to be smaller, while the SSE is expected to be
larger
The full model contains more 𝑋-variables, the SSR and 𝑟 "
should be larger, while the SSE should be smaller
6
Partial F-test Cont’d
Partial F-test statistic
((()!"#$%"#*(()&$'' )/(%*#)
F= (()&$'' /(-*%*")
with (𝐾 − 𝐿), (𝑛 − 𝐾 − 1) d.f.
where 𝑆𝑆𝐸!"#$%"# = SSE for the reduced model
𝑆𝑆𝐸&$'' = SSE for the full model
𝐾 = no. of 𝑋-variables in the full model
𝐿 = no. of 𝑋-variables in the reduced model
p-value = 𝑃(𝐹 %*# ,(-*%*") ≥ F)
Reject 𝐻! if F > 𝐹/,(%*#),(-*%*") or p-value < 𝛼
7
Example
As gross floor area and net floor area are both about the
size of an apartment, examine them as a group to indicate
whether “size” is related to apartment price
Given that the variables “Age” and “Floor” are included in
the model
8
Example Cont’d
Full model
9
Example – Discussion of Results Cont’d
The reduced model had 𝑅 & = 0.3487, which almost
doubled to 0.6631 by further considering gross floor area
and net floor area
The p-value for GrossFA = 0.0096, while that for NetFA =
0.1554, showing that size is significantly affecting
apartment price, but may not need both of these variables
in the model
10
Example Cont’d
11
Example – Partial F-test Cont’d
𝐻! : 𝛽0123345 = 𝛽67845 = 0
𝐻" : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝛽0123345 , 𝛽67845 ≠ 0
12
Type I SS –
Sequential Sum Squares Regression
Type I SS
The SSR due to a particular 𝑋-variable after including the
preceding 𝑋-variables
Type I SS = 𝑆𝑆𝑅(𝑋#$!|𝑋!, … , 𝑋# )
Increment of SSR by having an extra 𝑋-variable (i.e. 𝑋#$! )
The sequence of the variables being entered into the model
would affect the Type I SS
13
Type I SS –
Sequential Sum Squares Regression Cont’d
Price
SSR(GrossFA)
SSR(Age|GrossFA)
GrossFA
Age
Overlapping area: considered in
SSR(GrossFA) as GrossFA is the first 𝑋-
variable being entered
Hence, SSR(GrossFA) + SSR(Age|GrossFA) = SSR(Full) 14
Type I SS –
Sequential Sum Squares Regression Cont’d
For a regression model with two 𝑋-variables (𝑋" and 𝑋& )
𝑌" = 𝑏! + 𝑏" 𝑋" + 𝑏& 𝑋&
𝑆𝑆𝑅@ABB = 𝑆𝑆𝑅 𝑋" + 𝑆𝑆𝑅(𝑋& |𝑋" )
Partial F-test using Type I SS
𝐻! : 𝛽& = 0
𝐻" : 𝛽& ≠ 0
((C(D(|D))
𝐹= with 1, (𝑛 − 𝐾 − 1) d.f.
(()&$'' /(-*%*")
15
Type I SS –
Sequential Sum Squares Regression Cont’d
For a regression model with 𝐾 𝑋-variables
𝑌1 = 𝑏( + 𝑏!𝑋! + ⋯ + 𝑏# 𝑋# + 𝑏#$!𝑋#$! + ⋯ + 𝑏% 𝑋%
Partial F-test
𝐻(: 𝛽#$! = 𝛽#$" = ⋯ = 𝛽% = 0
𝐻!: 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝛽#$!, 𝛽#$", … , 𝛽% ≠ 0
))*(,!"# ,…,,$ |,# ,…,,! )/(%2#)
F= ))3%&'' /(42%2!)
with (𝐾 − 𝐿), (𝑛 − 𝐾 − 1) d.f.
where 𝑆𝑆𝑅(𝑋*+, , … , 𝑋- |𝑋, , … , 𝑋* )
= 𝑆𝑆𝑅 𝑋*+, 𝑋, , … , 𝑋* + 𝑆𝑆𝑅 𝑋*+. 𝑋, , … , 𝑋*+,
+ ⋯ + 𝑆𝑆𝑅(𝑋- |𝑋, , … , 𝑋-/, )
16
Example – Type I SS Cont’d
𝑋-variables being tested are
listed towards the end
SSR(Age)
SSR(Floor|Age)
SSR(GrossFA|Age, Floor)
SSR(NetFA|Age, Floor, GrossFA)
17
Example – Type I SS Cont’d
𝐻! : 𝛽"#$%%&' = 𝛽()*&' = 0
𝐻+ : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝛽"#$%%&' , 𝛽()*&' ≠ 0
[𝑆𝑆𝑅 𝐺𝑟𝑜𝑠𝑠𝐹𝐴 𝐴𝑔𝑒, 𝐹𝑙𝑜𝑜𝑟 + 𝑆𝑆𝑅 𝑁𝑒𝑡𝐹𝐴 𝐴𝑔𝑒, 𝐹𝑙𝑜𝑜𝑟, 𝐺𝑟𝑜𝑠𝑠𝐹𝐴 ] /(𝐾 − 𝐿)
𝐹=
𝑆𝑆𝐸,-.. /(𝑛 − 𝐾 − 1)
()*.,-.-/0.*-1,)/(,4))
= = 34.987
15.,1*0/(*04,45)
The same partial F-test statistic and conclusion as in the analysis using the change in
SSE.
18
Example – Type I SS Cont’d
What happened if we put the 𝑋-variables being tested
at the beginning of the model statement?
Type I SS changed!
Ordering matters!
SSR(GrossFA)
SSR(NetFA|GrossFA)
SSR(Age|GrossFA, NetFA)
SSR(Floor|GrossFA, NetFA, Age)
19
Type II SS –
Partial Sum Squares Regression
Type II SS
The SSR due to a particular 𝑋-variable after including all the
other 𝑋-variables
Type II SS = 𝑆𝑆𝑅 𝑋Y 𝑎𝑙𝑙 𝑋Z 𝑤ℎ𝑒𝑟𝑒 𝑗 ≠ 𝑖)
Additional SSR only caused by this 𝑋-variable
The sequence of the variables being entered into the model
would not affect the Type II SS
Can be used to determine an individual 𝑋-variable significance
𝐻0 : 𝛽1 = 0
𝐻, : 𝛽1 ≠ 0
223 4/ 566 40 789:9 ;<1)
Partial F = d.f. = 1, (𝑛 − 𝐾 − 1)
22>1233 /(A/-/,)
20
Type II SS –
Partial Sum Squares Regression Cont’d
Price
SSR(GrossFA|Age)
SSR(Age|GrossFA)
GrossFA
Age
Partial F-test
𝐻0 : 𝛽B6CC: = 0
𝐻, : 𝛽B6CC: ≠ 0
223 𝑋B6CC: 𝑋D:CEEBF , 𝑋G9HBF , 𝑋FI9 ..KLMK
Partial F = = L,.NLO0/(O0/N/,)
22>1233 /(A/-/,)
= 6.2877
At 𝛼 = 5% d.f. = 1, 75 C.V. = 4.00
Reject 𝐻0 , as partial F > C.V., i.e. Floor is significantly related to the
apartment price
Same conclusion with the t-test, as (t). = (2.508). = partial F 23
Summary – Type I & Type II SS
Type I SS is a decomposition of SSR, measuring the
contributions of predictors in a specific order
Used for the partial F-test
24