STAT4027 Assignment 1: Lewis Hastie
STAT4027 Assignment 1: Lewis Hastie
Lewis Hastie
STAT4027
1 Question 1
1.1 a)
We will first calculate the moment estimator 𝑟,̂ we have
𝐸(𝑌 ) = 𝑦 ̄ (1)
𝑟
= 𝑦̄ (2)
𝑟−1
𝑦̄
𝑟̂= (3)
𝑦̄ − 1
𝑥
𝑉 𝑎𝑟[𝑔(𝑌 )] ≈ 𝜎2 [𝑔′(𝜇)]2 , where, 𝑔(𝑥) = , from above (4)
𝑥−1
Now 𝑌 in this instance is our sample mean, thus,
1
𝜎2 = 𝑉 𝑎𝑟(𝑦)̄ = × 𝑉 𝑎𝑟(𝑌 ) (5)
𝑛
𝑟
= (6)
𝑛(𝑟 − 1)2 (𝑟 − 2)
and,
−1
𝑔′(𝑥) = (7)
(𝑥 − 1)2
As such,
2
𝑟 −1 𝑟
𝑉 𝑎𝑟(𝑔(𝑦))
̄ = ×( ) , where, 𝜇 = (8)
𝑛(𝑟 − 1)2 (𝑟 − 2) (𝜇 − 1)2 𝑟−1
𝑟 1
= 2
× 4
(9)
𝑛(𝑟 − 1) (𝑟 − 2) ( 𝑟
− 1)
𝑟−1
𝑟
= × (𝑟 − 1)4 (10)
𝑛(𝑟 − 1)2 (𝑟 − 2)
𝑟(̂ 𝑟 ̂ − 1)2
𝑉 𝑎𝑟(𝑟)̂ = , (11)
𝑛(𝑟 ̂ − 2)
2
Thus we have moment estimator 𝑟 ̂ = , and variance, 𝑉 𝑎𝑟(𝑟)̂ = .
𝑦̄ 𝑟(̂ 𝑟−1)
̂
𝑦−1
̄ 𝑛(𝑟−2) ̂
1
August 29, 2023
2 Question 2
2.1 a)
We have the log-likelihood of our equation,
𝑛 𝑛 1 𝑛 1
ℓ𝑁 (𝛽) = − log(2𝜋) − log 𝜎2 + ∑ log 𝜆𝑖 − (𝑌 − 𝑋𝛽)′ 𝛴−1 (𝑌 − 𝑋𝛽), (12)
2 2 2 𝑖=1 2𝜎2
Now we know, 𝜕𝑠
𝜕𝑥 (𝑥 − 𝐴𝑠)′ 𝑊 (𝑥 − 𝐴𝑠) = −2𝐴′ 𝑊 (𝑥 − 𝐴𝑠), assuming 𝐴 is symmetric (which
𝛴−1 is also). Thus the derivative of our log-likelihood with respect to 𝛽 will be,
𝜕ℓ𝑁 (𝛽) −1
= (−2𝑋 ′ 𝛴−1 (𝑌 − 𝑋𝛽)) (13)
𝜕𝛽 2𝜎2
−1
0= (−2𝑋 ′ 𝑊 −1 (𝑌 − 𝑋𝛽)) (14)
2𝜎2
0 = −2𝑋 ′ 𝑊 −1 (𝑌 − 𝑋𝛽) (15)
0 = −2𝑋 ′ 𝑊 𝑌 + 2𝑋 ′ 𝑊 𝑋𝛽 (16)
𝑋 ′𝑊 𝑌 = 𝑋 ′ 𝑊 𝑋𝛽 (17)
−1
𝛽̂ = (𝑋 ′ 𝑊 𝑋) 𝑋 ′𝑊 𝑌 (18)
(19)
𝜕ℓ𝑁 (𝛽) −𝑛 1
= + ((𝑌 − 𝑋𝛽)′ 𝛴−1 (𝑌 − 𝑋𝛽)) (20)
𝜕𝛽 2𝜎2 2𝜎4
−𝑛 1
0= 2
+ ((𝑌 − 𝑋𝛽)′ 𝑊(𝑌 − 𝑋𝛽)) (21)
2𝜎 2𝜎4
𝑛 1
2
= ((𝑌 − 𝑋𝛽)′ 𝑊 (𝑌 − 𝑋𝛽)) (22)
2𝜎 2𝜎4
1
𝜎̂ 2 = ((𝑌 − 𝑋𝛽)′ 𝑊 (𝑌 − 𝑋𝛽)) (23)
𝑛
As required.
2.2 b)
Figure 1 indicates the converged parameter estimates, figure 2 indicates the final values
for 𝜆𝑖 .
2
August 29, 2023
2.3 c)
We can see from the above density plot that the normal density fit is the worst as it
fails to capture the outliers. The t distribution fit is significantly better as the fatter tails
capture the presence of outliers.
3
August 29, 2023
Figure 5: Model fit for Normal Distribution (in red), and student t distribution (in black)
4
August 29, 2023
Figure 6: Density plot with the observed smoothed density and fitted normal and t densities
for the residuals
3 Question 3
3.1 a)
It is known that the CDF of a sample follows a uniform distribution. Thus we can find
the inverse function of our CDF, who’s input variable follows a uniform distribution on the
domain zero to one, that is 𝑋 ∼ 𝑈(0, 1). Thus by randomly sampling from 𝑋, and determing
the value of our function, we can simulate from a Rayleigh distribution.
𝑡2
Let 𝑥 = 𝐹(𝑡) = 1 − exp (− ). (24)
2𝜎2
−𝑡
= ln(1 − 𝑥) (25)
2𝜎2
𝑡 2 = −2𝜎2 (1 − 𝑥) (26)
3.2 b)
We have the pdf of our Rayleigh distribution, MIGHT NEED TO CHANGE THIS UP.
𝑡 𝑡2
𝑓 (𝑡) = 2
exp (− 2 ) (28)
𝜎 2𝜎
Thus we wish to write the joint likelihood of our 𝑛 samples in exponential family form,
5
August 29, 2023
𝑛 𝑛 𝑡𝑖2
𝑡𝑖
∏ 𝑓 (𝑡𝑖 ) = ∏ exp (− ) (29)
𝑖=1 𝑖=1 𝜎2 2𝜎2
𝑛 𝑛 𝑡2
𝑡𝑖
= exp (ln ∏ 2
− ∑ 𝑖 2) (30)
𝑖=1 𝜎 𝑖=1 2𝜎
1 𝑛 2 𝑛 𝑛
= exp (− 2
∑ 𝑡𝑖 + ∑ ln 𝑡𝑖 − ∑ ln(𝜎2 )) (31)
2𝜎 𝑖=1 𝑖=1 𝑖=1
Hence, 𝑇 (𝑡) = ∑𝑖=1 𝑡𝑖2 , 𝜃 = , 𝑏(𝜃) = ln(𝜎2 ) = − ln(−2𝜃). We can now the expected
𝑛 −1
2𝜎2
value and variance results in a straightforward manner,
𝜕𝑏(𝜃) −2 −1
𝐸[𝑇 2 ] = 𝐸(𝑌 2 ) = =− = = 2𝜎2 (32)
𝜕𝜃 −2𝜃 𝜃
𝜕2 𝑏(𝜃) 1
𝑉 𝑎𝑟[𝑇 2 ] = 𝑉 𝑎𝑟[𝑌 2 ] = = 2 = 4𝜎4 (33)
𝜕𝜃2 𝜃
∞
𝐸(𝑇 ) = ∫ 𝑡 × 𝑓 (𝑡)𝑑𝑡 (34)
0
∞ 𝑡 𝑡2
=∫ 𝑡× exp (− 2 ) 𝑑𝑡 (35)
0 𝜎2 2𝜎
applying integration by parts. (36)
∞
−𝑡 2 ∞ 𝑡2
= [−𝑡 exp ( )] + ∫ exp (− ) 𝑑𝑡 (37)
2𝜎2 0
0 2𝜎2
∞ 1 𝑡2
= 0 + √2𝜋𝜎 ∫ exp (− ) 𝑑𝑡 (38)
0 √2𝜋𝜎 2𝜎2
√2𝜋
= (39)
2𝜎
𝜋
=√ 𝜎 (40)
2
Now we know 𝐸(𝑇 2 ) = 2𝜎2 , thus we can calculate our variance as follows,
6
August 29, 2023
3.3 c)
We wish to find the log-likelihood function and find the value of 𝜎 that maximises this
likelihood.
𝑛 𝑡𝑖2
𝑡𝑖
𝐿(𝜎; 𝑡𝑖 ) = ∏ exp (− ) (45)
𝑖=1 𝜎2 2𝜎2
𝑛 𝑛 −𝑡 2
1
= 2𝑛
× ∏ 𝑡𝑖 × exp (∑ 𝑖2 ) (46)
𝜎 𝑖=1 𝑖=1 2𝜎
thus our log-likelihood is given by, (47)
𝑛 𝑛 𝑡𝑖2
ℓ(𝜎; 𝑡𝑖 ) = ∑ ln(𝑡𝑖 ) − 𝑛 ln(𝜎2 ) − ∑ (48)
𝑖=1 𝑖=1 2𝜎2
taking the derivative with respect to 𝜎2 , (49)
𝜕ℓ(𝜎; 𝑡𝑖 ) −𝑛 1 𝑛 2
= 2 + ∑𝑡 (50)
𝜕𝜎 2 𝜎 2𝜎4 𝑖=1 𝑖
setting our derivative to zero and solving, (51)
𝑛
−𝑛 1
0= + ∑ 𝑡2 (52)
𝜎2 2𝜎4 𝑖=1 𝑖
𝑛 1 𝑛 2
= ∑𝑡 (53)
𝜎2 2𝜎4 𝑖=1 𝑖
1 𝑛 2
𝜎2 = ∑𝑡 (54)
2𝑛 𝑖=1 𝑖
1 𝑛 2
𝜎̂ = √ ∑𝑡 (55)
2𝑛 𝑖=1 𝑖
Now in order to obtain the standard error, we can make use of the fact that the Rayleigh
distribution belongs to the one parameter exponential family distributions, and as such its
MLE will be unbiased, and obtain the Cramer-Rao lower bound, in which the variance of
any unbiased estimator is bounded by the reciprocal of the Fischer information 𝐼(𝜃), where
𝜕2
𝐼(𝜃) = −𝐸 [ ln 𝑓 (𝑥; 𝜃)] (56)
𝜕𝜃2
7
August 29, 2023
𝜕ℓ(𝜎; 𝑡𝑖 ) −2𝑛 1 𝑛
= + 3 ∑ 𝑡𝑖2 (57)
𝜕𝜎 𝜎 𝜎 𝑖=1
taking the second derivative, (58)
𝑛
𝜕2 ℓ(𝜎; 𝑡 𝑖) 2𝑛 3
= − ∑ 𝑡2 (59)
𝜕𝜎2 𝜎2 𝜎4 𝑖=1 𝑖
taking expectation (60)
𝜕2 2𝑛 3
𝐸[ ln 𝑓 (𝑥; 𝜃)] = − × 2𝑛𝜎2 (61)
𝜕𝜃2 𝜎2 𝜎4
−4𝑛
= 2 (62)
𝜎
multiplying by −1 and taking the reciprocal (63)
1 𝜎2
𝑉 𝑎𝑟(𝜎)̂ = = (64)
𝐼(𝜃) 4𝑛
𝜎
𝑆𝐸(𝜎)̂ = (65)
2√𝑛
3.4 d)
We have that our quasi-likelihood function is given by,
𝜇 𝑡−𝑥
𝑄(𝜇, 𝑡) = ∫ 𝑑𝑥 (66)
𝑡 𝜙𝑉 (𝑥)
Given 𝜇 = 𝐸(𝑇 ) = √ 𝜋
2
𝜎, 𝑉 𝑎𝑟(𝑇 ) = 𝜇 ,
4−𝜋 2
2
we can find our function 𝑉 (𝜇) as follows,
𝜋
𝜇=√ (67)
2
2
𝜎=√ 𝜇 (68)
𝜋
substituting this into our variance, (69)
2
4 − 𝜋 ⎛√ 2 ⎞
𝑉 (𝜇) = ×⎜ 𝜇⎟ (70)
2 ⎝ 𝜋 ⎠
4−𝜋 2
𝑉 (𝜇) = 𝜇 (71)
4
8
August 29, 2023
𝑡−𝑥
𝜇
𝑄(𝜇, 𝑡) = ∫ 𝑑𝑥 (72)
𝑡𝜙𝑉 (𝑥)
𝜇𝑡−𝑥
= ∫ 4−𝜋 𝑑𝑥 (73)
𝑡 2
𝜋 𝑥
𝜋 𝜇 𝑡 1
= ∫ − 𝑑𝑥 (74)
4 − 𝜋 𝑡 𝑥2 𝑥
𝜋 −𝑡 𝜇
= [ − ln(𝑥)] (75)
4−𝜋 𝑥 𝑡
𝜋 −𝑡
= [ + 1 − ln(𝜇) + ln(𝑡)] (76)
4−𝜋 𝜇
as required (77)
Next we will find the MQLE and show it is equivalent to the MOM estimator. We begin
by finding the MQLE for 𝜎.
𝑛
𝜋 𝑛 −𝑡𝑖
∑ 𝑄(𝜇𝑖 , 𝑡𝑖 ) = ∑( + 1 − ln(𝜇) + ln(𝑡𝑖 )) (78)
𝑖=1
4 − 𝜋 𝑖=1 𝜇
𝜋 −1 𝑛 𝑛
= [ ∑ 𝑡𝑖 + 𝑛 − 𝑛 ln(𝜇) + ∑ ln(𝑡𝑖 )] (79)
4 − 𝜋 𝜇 𝑖=1 𝑖=1
taking the derivative, (80)
𝑛
𝜕𝑄 1 𝑛
= ∑𝑡 − (81)
𝜕𝜇 𝜇2 𝑖=1 𝑖 𝜇
setting to zero and solving yields, (82)
𝑛
1
𝜇= ∑𝑡 = 𝑡̄ (83)
𝑛 𝑖=1 𝑖
substituting into our expression for 𝜎 and 𝜇, (84)
2
𝜎̂ 𝑀𝑄𝐿𝐸 = √ 𝑡̄ (85)
𝜋
𝐸(𝑇 ) = 𝑡 ̄ (86)
𝜋
√ 𝜎 = 𝑡̄ (87)
2
2
𝜎̂ 𝑀𝑂𝑀 = √ 𝑡̄ (88)
𝜋
9
August 29, 2023
2
𝑉 𝑎𝑟(𝜎̂ 𝑀𝑄𝐿𝐸 ) = 𝑉 𝑎𝑟(𝑡)̄ (89)
𝜋
𝑛
2 1
= × 2 × ∑ 𝑉 𝑎𝑟(𝑡𝑖 ) (90)
𝜋 𝑛 𝑖=1
2 1 4−𝜋 2
= × ×𝑛×( 𝜎 ) (91)
𝜋 𝑛2 2
4−𝜋 2
= 𝜎 (92)
𝑛𝜋
4−𝜋
𝑆𝐸(𝜎̂ 𝑀𝑄𝐿𝐸 ) = 𝜎√ (93)
𝑛𝜋
Now to assess its efficiency compared to the MLE we can compare the ratio of their
variances,
Thus as the ratio of our variances is less then one, we can conclude that the MQLE is
less efficient then our MLE.
3.5 e)
3.5.1 i)
1 𝑛 2
𝑉 𝑎𝑟(𝜎̂ 2 ) = 𝑉 𝑎𝑟 ( ∑𝑡 ) (97)
2𝑛 𝑖=1 𝑖
1 2 𝑛
=( ) ∑ 𝑉 𝑎𝑟(𝑡𝑖2 ) (98)
2𝑛 𝑖=1
1 1
= × 𝑛 × 4𝜎4 = 𝜎4 (99)
4𝑛2 𝑛
𝜎2
𝑆𝐸(𝜎̂ 2 ) = (100)
√𝑛
3.5.2 ii)
𝜎2
𝑉 𝑎𝑟(𝜎)̂ = . (101)
4𝑛
taking the derivative of 𝑔 and substituting 𝜎 yields,
10
August 29, 2023
𝑔′(𝜎) = 2𝜎 (102)
thus,
𝜎2
𝑉 𝑎𝑟(𝜎̂ 2 ) ≈ (2𝜎)2 × (103)
4𝑛
𝜎4
= (104)
𝑛
𝜎 2
𝑆𝐸(𝜎̂ 2 ) = (105)
√𝑛
as required.
4 Question 4
4.1 a)
We first wish to put the pdf of our distribution into exponential family form,
𝛤(𝑦𝑖 + 𝑟)
𝑓 (𝑦𝑖 ) = (1 − 𝑝𝑖 )𝑟 𝑝𝑦𝑖 𝑖 (106)
𝛤(𝑦𝑖 + 1)𝛤(𝑟)
𝛤(𝑦𝑖 + 𝑟)
= exp (ln ( ) + 𝑟 ln(1 − 𝑝𝑖 ) + 𝑦𝑖 ln(𝑝𝑖 )) (107)
𝛤(𝑦𝑖 + 1)𝛤(𝑟)
We wish to use the fact that 𝐸(𝑇 (𝑌 )) = 𝐸(𝑦𝑖 ) = 𝑏′(𝜃), and 𝑉 𝑎𝑟(𝑇 (𝑌 )) = 𝑉 𝑎𝑟(𝑦𝑖 ) = 𝑏′′(𝜃).
Thus from above we can deduce,
𝑟𝑒𝜃𝑖 𝑟𝑝𝑖
𝜇𝑖 = 𝐸(𝑦𝑖 ) = 𝑏′(𝜃) = = , (110)
1 − 𝑒𝜃𝑖 1 − 𝑝𝑖
and (111)
𝑟𝑒𝜃𝑖 𝑟𝑝𝑖
𝑉 𝑎𝑟(𝑦𝑖 ) = 𝑏′′(𝜃) = = (112)
(1 − 𝑒𝜃𝑖 )2 (1 − 𝑝𝑖 )2
Rearranging our expression for the mean and substituting this into our variance,
11
August 29, 2023
𝑟𝑝𝑖 𝜇𝑖
𝜇𝑖 = → 𝑝𝑖 = , (113)
1 − 𝑝𝑖 𝑟 + 𝜇𝑖
substituting (114)
2
𝜇𝑖 𝜇𝑖
𝑉 𝑎𝑟(𝑦𝑖 ) = (𝑟 ⋅ ) / (1 − ) (115)
𝑟 + 𝜇𝑖 𝑟 + 𝜇𝑖
2
𝜇𝑖 𝑟
= (𝑟 ⋅ )/( ) (116)
𝑟 + 𝜇𝑖 𝑟 + 𝜇𝑖
𝜇𝑖 (𝜇𝑖 + 𝑟)
= (117)
𝑟
as required.
4.2 b)
We have the modified response,
𝜕𝜂𝑖
𝑧𝑖 = 𝜂𝑖 + (𝑦𝑖 − 𝜇𝑖 ) (118)
𝜕𝜇𝑖
Where,
𝜂𝑖 = ln(𝜇𝑖 ) (119)
𝜕𝜂𝑖 1
= (120)
𝜕𝜇𝑖 𝜇𝑖
Thus,
1
𝑧𝑖 = ln(𝜇𝑖 ) + (𝑦𝑖 − 𝜇𝑖 ) × (121)
𝜇𝑖
𝜕𝜇𝑖 2 1
𝑤𝑖 = 𝑉 𝑎𝑟(𝑧𝑖 )−1 = ( ) (122)
𝜕𝜂𝑖 𝑉𝑖
Where,
Thus,
𝑟
𝑤𝑖 = × 𝜇2𝑖 (124)
(𝜇𝑖 + 𝑟)𝜇𝑖
𝑟𝜇𝑖
= (125)
𝜇𝑖 + 𝑟
12
August 29, 2023
4.3 c)
We can estimate 𝑟 using the Newton-Raphson algorithm based on the following recursion,
where in this case 𝜃 = 𝑟, and our derivative are evaluated at 𝜃 = 𝜃(𝑘) . It should be noted
that we are conditioning on 𝛽 and thus assuming 𝜇𝑖 is known.
−1
𝜕2 ℓ(𝜃) 𝜕ℓ(𝜃)
𝜃(𝑘+1) = 𝜃(𝑘) − ( ) × (126)
𝜕𝜃2 𝜕𝜃
Thus it is required that we must compute the derivatives of our log-likelihood equation.
We have our likelihood,
𝑛 𝑛 𝑛
𝛤(𝑦𝑖 + 𝑟)
𝐿(𝑟; 𝑦𝑖 ) = ∏ [ ] ∏(1 − 𝑝𝑖 )𝑟 ∏ 𝑝𝑦𝑖 𝑖
𝑖=1
𝛤(𝑦𝑖 + 1)𝛤(𝑟) 𝑖=1 𝑖=1
𝑛 𝑛 𝑛
𝛤(𝑦𝑖 + 𝑟)
ℓ(𝑟; 𝑦𝑖 ) = ∑ ln ( ) + ∑ 𝑟 ln(1 − 𝑝𝑖 ) + ∑ 𝑦𝑖 ln(𝑝𝑖 )
𝑖=1
𝛤(𝑦𝑖 + 1)𝛤(𝑟) 𝑖=1 𝑖=1
𝑛 𝑛 𝑛
𝑟 𝜇𝑖
ℓ(𝑟; 𝑦𝑖 ) = ∑ [ln(𝛤(𝑦𝑖 + 𝑟)) − ln(𝛤(𝑦𝑖 + 1)) − ln(𝛤(𝑟))] + ∑ 𝑟 ln ( ) + ∑ 𝑦𝑖 ln ( )
𝑖=1 𝑖=1
𝜇𝑖 + 𝑟 𝑖=1
𝜇𝑖+𝑟
𝜕 𝑛 𝑟 𝑛
𝑟 𝑛
𝜇𝑖 𝜕 𝑛 𝜇𝑖 𝑛
𝑦
[∑ 𝑟 ln ( )] = ∑ ln ( )+∑ , [∑ 𝑦𝑖 ln ( )] = ∑ − 𝑖
𝜕𝑟 𝑖=1 𝜇𝑖 + 𝑟 𝑖=1
𝜇𝑖 + 𝑟 𝜇
𝑖=1 𝑖
+𝑟 𝜕𝑟 𝑖=1 𝜇𝑖 + 𝑟 𝑖=1
𝜇 𝑖+𝑟
𝑛 𝑛 𝑛 𝑛
𝜕ℓ(𝑟; 𝑦𝑖 ) 𝑟 𝜇𝑖 𝑦𝑖
= ∑ [𝜓(𝑦𝑖 + 𝑟) − 𝜓(𝑟)] + ∑ ln ( )+∑ −∑
𝜕𝑟 𝑖=1 𝑖=1
𝜇𝑖 + 𝑟 𝜇
𝑖=1 𝑖
+ 𝑟 𝜇
𝑖=1 𝑖
+𝑟
Where 𝜓 is the di-gamma function. We can calculate the second derivative of our log-
likelihood in a similar fashion, with resulting expression,
13
August 29, 2023
𝑛 𝑛 𝑛 𝑛
𝜕2 ℓ(𝑟; 𝑦𝑖 ) 𝜇𝑖 𝜇𝑖 𝑦𝑖
= ∑ [𝜓′(𝑦 𝑖 + 𝑟) − 𝜓′(𝑟)] + ∑ − ∑ + ∑ .
𝜕𝑟 2 𝑖=1 𝑖=1
𝑟(𝑟 + 𝜇 𝑖 ) 𝑖=1 (𝜇𝑖 + 𝑟) 2
𝑖=1 (𝜇 𝑖 + 𝑟)
2
𝑛 𝑛 𝑛 𝑛 −1
𝜇𝑖 𝜇𝑖 𝑦𝑖
𝑟 (𝑘+1) = 𝑟 (𝑘) − [∑ [𝜓′(𝑦𝑖 + 𝑟 (𝑘) ) − 𝜓′(𝑟 (𝑘) )] + ∑ −∑ + ∑ ]
𝑖=1 𝑖=1 𝑟 (𝑘) (𝑟 (𝑘) + 𝜇𝑖 ) 𝑖=1
(𝑘)
(𝜇𝑖 + 𝑟 ) 2 (𝑘) 2
𝑖=1 (𝜇𝑖 + 𝑟 )
𝑛 𝑛 𝑛 𝑛
𝑟 (𝑘) 𝜇𝑖 𝑦𝑖
× [∑ [𝜓(𝑦𝑖 + 𝑟 (𝑘) ) − 𝜓(𝑟 (𝑘) )] + ∑ ln ( (𝑘)
)+∑ (𝑘)
−∑ ]
𝑖=1 𝑖=1 𝜇𝑖 + 𝑟 𝜇
𝑖=1 𝑖 + 𝑟 𝜇
𝑖=1 𝑖 + 𝑟 (𝑘)
Where an estimate for 𝛽, 𝛽̂ is first computed using IRLS conditioned on 𝑟, and then an
estimate for 𝑟 is calculated through repeated iteration of the above expression, given 𝛽.
4.4 d)
14
August 29, 2023
4.5 e)
Figure 9: Fitted mean and variance of Negative Binomial Model and Poisson Model
We can see in figure 9 that the negative binomial and Poisson models predict very similar
mean values when the variance of our negative binomial is not substantially larger then its
mean: intuitively this is true as the assumption for a Poisson model is that it has mean
15
August 29, 2023
equal to its variance. For larger values of the variance ( > 40) the NB and Poisson model
mean values diverge, as the variance of the NB model grows significantly larger then its
mean value.
4.6 f)
The table of estimates provided from the glm.nb package correspond extremely closely
to the estimates derived in figure 7: any difference observed based on these two figures alone
is due to rounding in figure 7.
5 Question 5
5.1 a)
5.1.1 i)
Based on the above table, we can see that m1 provides a balance between all three models
in terms of r-squared, and AIC metrics: although m3 has a higher r-squared, this is due to
it possessing more parameters, thus artificially increasing the r-squared: as a result it has a
higher AIC score. m3, whilst possessing the lowest AIC, has significantly reduced r-squared
compared to the other two models. Thus it is clear that m1 provides a parsimonious balance
between model complexity and prediction accuracy.
16
August 29, 2023
5.1.2 ii)
Figure 13: fitted line plot for each quantile and a mean regression
Unfortunately figure fourteen indicates that for extreme quantile levels the standard error
of our intercept estimate diverges rapidly, a secondary plot of just the intercept estimate is
provided in figure fifteen to observe how it changes and its standard error for the majority
of its range.
5.2 b)
5.2.1 i)
Find below the three fitted GAM models, where gam0 corresponds to a linear model,
gam1 with default splines, and gam2 with cubic splines.
The results indicate that although the two true GAMS perform better then the linear
model, with r-squared scores of approximately 6%, the model fit in any case is quite bad: for
17
August 29, 2023
the linear model the r-squared is reported as negative, indicating a poor fit. The r-squared
is very low due to the poor fit of month to the data on its own, as the model is failing to
capture the relationship between month and air passengers due to not controlling for year,
which when controlled for demonstrates a clear pattern.
5.2.2 ii)
Find below the summary tables, including parameter estimates, r-squared and deviance
for each year. Note that the r-squared and deviance is very high compared to those found
in GAM0, GAM1, and GAM2. This is due to each gam receiving data pertaining to one
year only, and thus controlling for the year. Furthermore, each successive year has a higher
intercept, indicating that there are more passengers with each successive year. Figure 32
illustrates these observations more clearly.
18
August 29, 2023
Figure 16: Prediction Value and confidence interval for December 1962
5.2.3 iii)
Assuming that the observations are taken in the middle of the month, the GAM for 1960
indicates there will be a total of 539.4304 individual passengers on June 15th, 1960.
19
August 29, 2023
20
August 29, 2023
21
August 29, 2023
22
August 29, 2023
23
August 29, 2023
24
August 29, 2023
Figure 32: Fitted GAMs for years 1949-1960. Shaded Area represents standard error
Figure 33: Observed and fitted GAM for 1960 with splines
25