0% found this document useful (0 votes)
30 views26 pages

STAT4027 Assignment 1: Lewis Hastie

Uploaded by

lewis.hastie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views26 pages

STAT4027 Assignment 1: Lewis Hastie

Uploaded by

lewis.hastie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

STAT4027 Assignment 1

Lewis Hastie

STAT4027

August 29, 2023


August 29, 2023

1 Question 1
1.1 a)
We will first calculate the moment estimator 𝑟,̂ we have

𝐸(𝑌 ) = 𝑦 ̄ (1)
𝑟
= 𝑦̄ (2)
𝑟−1
𝑦̄
𝑟̂= (3)
𝑦̄ − 1

Now it is given that,

𝑥
𝑉 𝑎𝑟[𝑔(𝑌 )] ≈ 𝜎2 [𝑔′(𝜇)]2 , where, 𝑔(𝑥) = , from above (4)
𝑥−1
Now 𝑌 in this instance is our sample mean, thus,

1
𝜎2 = 𝑉 𝑎𝑟(𝑦)̄ = × 𝑉 𝑎𝑟(𝑌 ) (5)
𝑛
𝑟
= (6)
𝑛(𝑟 − 1)2 (𝑟 − 2)

and,

−1
𝑔′(𝑥) = (7)
(𝑥 − 1)2
As such,

2
𝑟 −1 𝑟
𝑉 𝑎𝑟(𝑔(𝑦))
̄ = ×( ) , where, 𝜇 = (8)
𝑛(𝑟 − 1)2 (𝑟 − 2) (𝜇 − 1)2 𝑟−1
𝑟 1
= 2
× 4
(9)
𝑛(𝑟 − 1) (𝑟 − 2) ( 𝑟
− 1)
𝑟−1
𝑟
= × (𝑟 − 1)4 (10)
𝑛(𝑟 − 1)2 (𝑟 − 2)
𝑟(̂ 𝑟 ̂ − 1)2
𝑉 𝑎𝑟(𝑟)̂ = , (11)
𝑛(𝑟 ̂ − 2)
2
Thus we have moment estimator 𝑟 ̂ = , and variance, 𝑉 𝑎𝑟(𝑟)̂ = .
𝑦̄ 𝑟(̂ 𝑟−1)
̂
𝑦−1
̄ 𝑛(𝑟−2) ̂

1
August 29, 2023

2 Question 2
2.1 a)
We have the log-likelihood of our equation,

𝑛 𝑛 1 𝑛 1
ℓ𝑁 (𝛽) = − log(2𝜋) − log 𝜎2 + ∑ log 𝜆𝑖 − (𝑌 − 𝑋𝛽)′ 𝛴−1 (𝑌 − 𝑋𝛽), (12)
2 2 2 𝑖=1 2𝜎2

Now we know, 𝜕𝑠
𝜕𝑥 (𝑥 − 𝐴𝑠)′ 𝑊 (𝑥 − 𝐴𝑠) = −2𝐴′ 𝑊 (𝑥 − 𝐴𝑠), assuming 𝐴 is symmetric (which
𝛴−1 is also). Thus the derivative of our log-likelihood with respect to 𝛽 will be,

𝜕ℓ𝑁 (𝛽) −1
= (−2𝑋 ′ 𝛴−1 (𝑌 − 𝑋𝛽)) (13)
𝜕𝛽 2𝜎2

Setting our derivative to zero will yield and letting 𝑊 = 𝛴−1 ,

−1
0= (−2𝑋 ′ 𝑊 −1 (𝑌 − 𝑋𝛽)) (14)
2𝜎2
0 = −2𝑋 ′ 𝑊 −1 (𝑌 − 𝑋𝛽) (15)
0 = −2𝑋 ′ 𝑊 𝑌 + 2𝑋 ′ 𝑊 𝑋𝛽 (16)
𝑋 ′𝑊 𝑌 = 𝑋 ′ 𝑊 𝑋𝛽 (17)
−1
𝛽̂ = (𝑋 ′ 𝑊 𝑋) 𝑋 ′𝑊 𝑌 (18)
(19)

Similarly, the derivative of our log-likelihood with respect to 𝜎2 will be,

𝜕ℓ𝑁 (𝛽) −𝑛 1
= + ((𝑌 − 𝑋𝛽)′ 𝛴−1 (𝑌 − 𝑋𝛽)) (20)
𝜕𝛽 2𝜎2 2𝜎4

Setting our derivative to zero will yield and letting 𝑊 = 𝛴−1 ,

−𝑛 1
0= 2
+ ((𝑌 − 𝑋𝛽)′ 𝑊(𝑌 − 𝑋𝛽)) (21)
2𝜎 2𝜎4
𝑛 1
2
= ((𝑌 − 𝑋𝛽)′ 𝑊 (𝑌 − 𝑋𝛽)) (22)
2𝜎 2𝜎4
1
𝜎̂ 2 = ((𝑌 − 𝑋𝛽)′ 𝑊 (𝑌 − 𝑋𝛽)) (23)
𝑛

As required.

2.2 b)
Figure 1 indicates the converged parameter estimates, figure 2 indicates the final values
for 𝜆𝑖 .

2
August 29, 2023

Figure 1: Converged Estimates for Parameters

Figure 2: Final Values of 𝜆𝑖

Figure 3 indicates the observations considered outliers, deemed so as 1


𝜆𝑖 > 2.5. They
correspond to the first, second, third, fourth, and fourteenth data point.

Figure 3: Observations deemed outliers

2.3 c)
We can see from the above density plot that the normal density fit is the worst as it
fails to capture the outliers. The t distribution fit is significantly better as the fatter tails
capture the presence of outliers.

3
August 29, 2023

Figure 4: Convergence Plot

Figure 5: Model fit for Normal Distribution (in red), and student t distribution (in black)

4
August 29, 2023

Figure 6: Density plot with the observed smoothed density and fitted normal and t densities
for the residuals

3 Question 3
3.1 a)
It is known that the CDF of a sample follows a uniform distribution. Thus we can find
the inverse function of our CDF, who’s input variable follows a uniform distribution on the
domain zero to one, that is 𝑋 ∼ 𝑈(0, 1). Thus by randomly sampling from 𝑋, and determing
the value of our function, we can simulate from a Rayleigh distribution.

𝑡2
Let 𝑥 = 𝐹(𝑡) = 1 − exp (− ). (24)
2𝜎2
−𝑡
= ln(1 − 𝑥) (25)
2𝜎2
𝑡 2 = −2𝜎2 (1 − 𝑥) (26)

𝑡 = √−2𝜎2 (1 − 𝑥), as 𝑡 ≥ 0. 𝑥 ∼ 𝑈(0, 1). (27)

3.2 b)
We have the pdf of our Rayleigh distribution, MIGHT NEED TO CHANGE THIS UP.

𝑡 𝑡2
𝑓 (𝑡) = 2
exp (− 2 ) (28)
𝜎 2𝜎

Thus we wish to write the joint likelihood of our 𝑛 samples in exponential family form,

5
August 29, 2023

and observe the sufficient statistic.

𝑛 𝑛 𝑡𝑖2
𝑡𝑖
∏ 𝑓 (𝑡𝑖 ) = ∏ exp (− ) (29)
𝑖=1 𝑖=1 𝜎2 2𝜎2
𝑛 𝑛 𝑡2
𝑡𝑖
= exp (ln ∏ 2
− ∑ 𝑖 2) (30)
𝑖=1 𝜎 𝑖=1 2𝜎
1 𝑛 2 𝑛 𝑛
= exp (− 2
∑ 𝑡𝑖 + ∑ ln 𝑡𝑖 − ∑ ln(𝜎2 )) (31)
2𝜎 𝑖=1 𝑖=1 𝑖=1

Hence, 𝑇 (𝑡) = ∑𝑖=1 𝑡𝑖2 , 𝜃 = , 𝑏(𝜃) = ln(𝜎2 ) = − ln(−2𝜃). We can now the expected
𝑛 −1
2𝜎2
value and variance results in a straightforward manner,

𝜕𝑏(𝜃) −2 −1
𝐸[𝑇 2 ] = 𝐸(𝑌 2 ) = =− = = 2𝜎2 (32)
𝜕𝜃 −2𝜃 𝜃
𝜕2 𝑏(𝜃) 1
𝑉 𝑎𝑟[𝑇 2 ] = 𝑉 𝑎𝑟[𝑌 2 ] = = 2 = 4𝜎4 (33)
𝜕𝜃2 𝜃

Now 𝐸(𝑇 ) is defined as,


𝐸(𝑇 ) = ∫ 𝑡 × 𝑓 (𝑡)𝑑𝑡 (34)
0
∞ 𝑡 𝑡2
=∫ 𝑡× exp (− 2 ) 𝑑𝑡 (35)
0 𝜎2 2𝜎
applying integration by parts. (36)

−𝑡 2 ∞ 𝑡2
= [−𝑡 exp ( )] + ∫ exp (− ) 𝑑𝑡 (37)
2𝜎2 0
0 2𝜎2
∞ 1 𝑡2
= 0 + √2𝜋𝜎 ∫ exp (− ) 𝑑𝑡 (38)
0 √2𝜋𝜎 2𝜎2
√2𝜋
= (39)
2𝜎
𝜋
=√ 𝜎 (40)
2

Now we know 𝐸(𝑇 2 ) = 2𝜎2 , thus we can calculate our variance as follows,

𝑉 𝑎𝑟(𝑇 ) = 𝐸(𝑇 2 ) − 𝐸(𝑇 )2 (41)


2
𝜋
= 2𝜎2 − ⎛
⎜√ 𝜎 ⎞
⎟ (42)
⎝ 2 ⎠
𝜋
= 2𝜎2 − 𝜎2 (43)
2
4−𝜋 2
= 𝜎 (44)
2

6
August 29, 2023

3.3 c)
We wish to find the log-likelihood function and find the value of 𝜎 that maximises this
likelihood.

𝑛 𝑡𝑖2
𝑡𝑖
𝐿(𝜎; 𝑡𝑖 ) = ∏ exp (− ) (45)
𝑖=1 𝜎2 2𝜎2
𝑛 𝑛 −𝑡 2
1
= 2𝑛
× ∏ 𝑡𝑖 × exp (∑ 𝑖2 ) (46)
𝜎 𝑖=1 𝑖=1 2𝜎
thus our log-likelihood is given by, (47)
𝑛 𝑛 𝑡𝑖2
ℓ(𝜎; 𝑡𝑖 ) = ∑ ln(𝑡𝑖 ) − 𝑛 ln(𝜎2 ) − ∑ (48)
𝑖=1 𝑖=1 2𝜎2
taking the derivative with respect to 𝜎2 , (49)
𝜕ℓ(𝜎; 𝑡𝑖 ) −𝑛 1 𝑛 2
= 2 + ∑𝑡 (50)
𝜕𝜎 2 𝜎 2𝜎4 𝑖=1 𝑖
setting our derivative to zero and solving, (51)
𝑛
−𝑛 1
0= + ∑ 𝑡2 (52)
𝜎2 2𝜎4 𝑖=1 𝑖
𝑛 1 𝑛 2
= ∑𝑡 (53)
𝜎2 2𝜎4 𝑖=1 𝑖
1 𝑛 2
𝜎2 = ∑𝑡 (54)
2𝑛 𝑖=1 𝑖
1 𝑛 2
𝜎̂ = √ ∑𝑡 (55)
2𝑛 𝑖=1 𝑖

Now in order to obtain the standard error, we can make use of the fact that the Rayleigh
distribution belongs to the one parameter exponential family distributions, and as such its
MLE will be unbiased, and obtain the Cramer-Rao lower bound, in which the variance of
any unbiased estimator is bounded by the reciprocal of the Fischer information 𝐼(𝜃), where

𝜕2
𝐼(𝜃) = −𝐸 [ ln 𝑓 (𝑥; 𝜃)] (56)
𝜕𝜃2

Thus taking the derivative of our log-likelihood with respect to 𝜎 is,

7
August 29, 2023

𝜕ℓ(𝜎; 𝑡𝑖 ) −2𝑛 1 𝑛
= + 3 ∑ 𝑡𝑖2 (57)
𝜕𝜎 𝜎 𝜎 𝑖=1
taking the second derivative, (58)
𝑛
𝜕2 ℓ(𝜎; 𝑡 𝑖) 2𝑛 3
= − ∑ 𝑡2 (59)
𝜕𝜎2 𝜎2 𝜎4 𝑖=1 𝑖
taking expectation (60)
𝜕2 2𝑛 3
𝐸[ ln 𝑓 (𝑥; 𝜃)] = − × 2𝑛𝜎2 (61)
𝜕𝜃2 𝜎2 𝜎4
−4𝑛
= 2 (62)
𝜎
multiplying by −1 and taking the reciprocal (63)
1 𝜎2
𝑉 𝑎𝑟(𝜎)̂ = = (64)
𝐼(𝜃) 4𝑛
𝜎
𝑆𝐸(𝜎)̂ = (65)
2√𝑛

3.4 d)
We have that our quasi-likelihood function is given by,

𝜇 𝑡−𝑥
𝑄(𝜇, 𝑡) = ∫ 𝑑𝑥 (66)
𝑡 𝜙𝑉 (𝑥)

Given 𝜇 = 𝐸(𝑇 ) = √ 𝜋
2
𝜎, 𝑉 𝑎𝑟(𝑇 ) = 𝜇 ,
4−𝜋 2
2
we can find our function 𝑉 (𝜇) as follows,

𝜋
𝜇=√ (67)
2
2
𝜎=√ 𝜇 (68)
𝜋
substituting this into our variance, (69)
2
4 − 𝜋 ⎛√ 2 ⎞
𝑉 (𝜇) = ×⎜ 𝜇⎟ (70)
2 ⎝ 𝜋 ⎠
4−𝜋 2
𝑉 (𝜇) = 𝜇 (71)
4

Thus we can now calculate our quasi-likelihood function as given above,

8
August 29, 2023

𝑡−𝑥
𝜇
𝑄(𝜇, 𝑡) = ∫ 𝑑𝑥 (72)
𝑡𝜙𝑉 (𝑥)
𝜇𝑡−𝑥
= ∫ 4−𝜋 𝑑𝑥 (73)
𝑡 2
𝜋 𝑥
𝜋 𝜇 𝑡 1
= ∫ − 𝑑𝑥 (74)
4 − 𝜋 𝑡 𝑥2 𝑥
𝜋 −𝑡 𝜇
= [ − ln(𝑥)] (75)
4−𝜋 𝑥 𝑡
𝜋 −𝑡
= [ + 1 − ln(𝜇) + ln(𝑡)] (76)
4−𝜋 𝜇
as required (77)

Next we will find the MQLE and show it is equivalent to the MOM estimator. We begin
by finding the MQLE for 𝜎.

𝑛
𝜋 𝑛 −𝑡𝑖
∑ 𝑄(𝜇𝑖 , 𝑡𝑖 ) = ∑( + 1 − ln(𝜇) + ln(𝑡𝑖 )) (78)
𝑖=1
4 − 𝜋 𝑖=1 𝜇
𝜋 −1 𝑛 𝑛
= [ ∑ 𝑡𝑖 + 𝑛 − 𝑛 ln(𝜇) + ∑ ln(𝑡𝑖 )] (79)
4 − 𝜋 𝜇 𝑖=1 𝑖=1
taking the derivative, (80)
𝑛
𝜕𝑄 1 𝑛
= ∑𝑡 − (81)
𝜕𝜇 𝜇2 𝑖=1 𝑖 𝜇
setting to zero and solving yields, (82)
𝑛
1
𝜇= ∑𝑡 = 𝑡̄ (83)
𝑛 𝑖=1 𝑖
substituting into our expression for 𝜎 and 𝜇, (84)
2
𝜎̂ 𝑀𝑄𝐿𝐸 = √ 𝑡̄ (85)
𝜋

Now the MOM estimator for 𝜎,

𝐸(𝑇 ) = 𝑡 ̄ (86)
𝜋
√ 𝜎 = 𝑡̄ (87)
2
2
𝜎̂ 𝑀𝑂𝑀 = √ 𝑡̄ (88)
𝜋

We can find its standard error as follows,

9
August 29, 2023

2
𝑉 𝑎𝑟(𝜎̂ 𝑀𝑄𝐿𝐸 ) = 𝑉 𝑎𝑟(𝑡)̄ (89)
𝜋
𝑛
2 1
= × 2 × ∑ 𝑉 𝑎𝑟(𝑡𝑖 ) (90)
𝜋 𝑛 𝑖=1
2 1 4−𝜋 2
= × ×𝑛×( 𝜎 ) (91)
𝜋 𝑛2 2
4−𝜋 2
= 𝜎 (92)
𝑛𝜋
4−𝜋
𝑆𝐸(𝜎̂ 𝑀𝑄𝐿𝐸 ) = 𝜎√ (93)
𝑛𝜋

Now to assess its efficiency compared to the MLE we can compare the ratio of their
variances,

𝑉 𝑎𝑟(𝜎̂ 𝑀𝐿𝐸 ) 𝜎2 4−𝜋 2


= ( )/( 𝜎 ) (94)
𝑉 𝑎𝑟(𝜎̂ 𝑀𝑄𝐿𝐸 ) 4𝑛 𝑛𝜋
𝜎2 × 𝑛𝜋
= (95)
4𝑛(4 − 𝜋)
𝜋
= <1 (96)
4(4 − 𝜋)

Thus as the ratio of our variances is less then one, we can conclude that the MQLE is
less efficient then our MLE.

3.5 e)
3.5.1 i)

1 𝑛 2
𝑉 𝑎𝑟(𝜎̂ 2 ) = 𝑉 𝑎𝑟 ( ∑𝑡 ) (97)
2𝑛 𝑖=1 𝑖
1 2 𝑛
=( ) ∑ 𝑉 𝑎𝑟(𝑡𝑖2 ) (98)
2𝑛 𝑖=1
1 1
= × 𝑛 × 4𝜎4 = 𝜎4 (99)
4𝑛2 𝑛
𝜎2
𝑆𝐸(𝜎̂ 2 ) = (100)
√𝑛

3.5.2 ii)

Using the delta method, we wish to know 𝑉 𝑎𝑟(𝜎̂ 2 ) ≈ [𝑔′(𝜎)]2 × 𝑉 𝑎𝑟(𝜎),


̂ where 𝑔(𝑥) = 𝑥 2 .
Recall from 3c) we have that,

𝜎2
𝑉 𝑎𝑟(𝜎)̂ = . (101)
4𝑛
taking the derivative of 𝑔 and substituting 𝜎 yields,

10
August 29, 2023

𝑔′(𝜎) = 2𝜎 (102)

thus,

𝜎2
𝑉 𝑎𝑟(𝜎̂ 2 ) ≈ (2𝜎)2 × (103)
4𝑛
𝜎4
= (104)
𝑛
𝜎 2
𝑆𝐸(𝜎̂ 2 ) = (105)
√𝑛

as required.

4 Question 4
4.1 a)
We first wish to put the pdf of our distribution into exponential family form,

𝛤(𝑦𝑖 + 𝑟)
𝑓 (𝑦𝑖 ) = (1 − 𝑝𝑖 )𝑟 𝑝𝑦𝑖 𝑖 (106)
𝛤(𝑦𝑖 + 1)𝛤(𝑟)
𝛤(𝑦𝑖 + 𝑟)
= exp (ln ( ) + 𝑟 ln(1 − 𝑝𝑖 ) + 𝑦𝑖 ln(𝑝𝑖 )) (107)
𝛤(𝑦𝑖 + 1)𝛤(𝑟)

Thus we have that,

𝑇 (𝑌 ) = 𝑦𝑖 , 𝜃𝑖 = ln(𝑝𝑖 ) → 𝑝𝑖 = 𝑒𝜃𝑖 , (108)


𝛤(𝑦𝑖 + 𝑟)
𝑏(𝜃𝑖 ) = −𝑟 ln(1 − 𝑒𝜃𝑖 ), 𝑎(𝜙) = 1, 𝑐(𝑦) = ln ( ). (109)
𝛤(𝑦𝑖 + 1)𝛤(𝑟)

We wish to use the fact that 𝐸(𝑇 (𝑌 )) = 𝐸(𝑦𝑖 ) = 𝑏′(𝜃), and 𝑉 𝑎𝑟(𝑇 (𝑌 )) = 𝑉 𝑎𝑟(𝑦𝑖 ) = 𝑏′′(𝜃).
Thus from above we can deduce,

𝑟𝑒𝜃𝑖 𝑟𝑝𝑖
𝜇𝑖 = 𝐸(𝑦𝑖 ) = 𝑏′(𝜃) = = , (110)
1 − 𝑒𝜃𝑖 1 − 𝑝𝑖
and (111)
𝑟𝑒𝜃𝑖 𝑟𝑝𝑖
𝑉 𝑎𝑟(𝑦𝑖 ) = 𝑏′′(𝜃) = = (112)
(1 − 𝑒𝜃𝑖 )2 (1 − 𝑝𝑖 )2

Rearranging our expression for the mean and substituting this into our variance,

11
August 29, 2023

𝑟𝑝𝑖 𝜇𝑖
𝜇𝑖 = → 𝑝𝑖 = , (113)
1 − 𝑝𝑖 𝑟 + 𝜇𝑖
substituting (114)
2
𝜇𝑖 𝜇𝑖
𝑉 𝑎𝑟(𝑦𝑖 ) = (𝑟 ⋅ ) / (1 − ) (115)
𝑟 + 𝜇𝑖 𝑟 + 𝜇𝑖
2
𝜇𝑖 𝑟
= (𝑟 ⋅ )/( ) (116)
𝑟 + 𝜇𝑖 𝑟 + 𝜇𝑖
𝜇𝑖 (𝜇𝑖 + 𝑟)
= (117)
𝑟

as required.

4.2 b)
We have the modified response,

𝜕𝜂𝑖
𝑧𝑖 = 𝜂𝑖 + (𝑦𝑖 − 𝜇𝑖 ) (118)
𝜕𝜇𝑖
Where,

𝜂𝑖 = ln(𝜇𝑖 ) (119)
𝜕𝜂𝑖 1
= (120)
𝜕𝜇𝑖 𝜇𝑖

Thus,

1
𝑧𝑖 = ln(𝜇𝑖 ) + (𝑦𝑖 − 𝜇𝑖 ) × (121)
𝜇𝑖

Now for our weight, we have that

𝜕𝜇𝑖 2 1
𝑤𝑖 = 𝑉 𝑎𝑟(𝑧𝑖 )−1 = ( ) (122)
𝜕𝜂𝑖 𝑉𝑖
Where,

𝜕𝜇𝑖 2 (𝜇𝑖 + 𝑟)𝜇𝑖


(123)
2
( ) = (𝜇𝑖 ) , 𝑉𝑖 = 𝑉 𝑎𝑟(𝑦𝑖 ) =
𝜕𝜂𝑖 𝑟

Thus,

𝑟
𝑤𝑖 = × 𝜇2𝑖 (124)
(𝜇𝑖 + 𝑟)𝜇𝑖
𝑟𝜇𝑖
= (125)
𝜇𝑖 + 𝑟

12
August 29, 2023

4.3 c)
We can estimate 𝑟 using the Newton-Raphson algorithm based on the following recursion,
where in this case 𝜃 = 𝑟, and our derivative are evaluated at 𝜃 = 𝜃(𝑘) . It should be noted
that we are conditioning on 𝛽 and thus assuming 𝜇𝑖 is known.

−1
𝜕2 ℓ(𝜃) 𝜕ℓ(𝜃)
𝜃(𝑘+1) = 𝜃(𝑘) − ( ) × (126)
𝜕𝜃2 𝜕𝜃
Thus it is required that we must compute the derivatives of our log-likelihood equation.
We have our likelihood,

𝑛 𝑛 𝑛
𝛤(𝑦𝑖 + 𝑟)
𝐿(𝑟; 𝑦𝑖 ) = ∏ [ ] ∏(1 − 𝑝𝑖 )𝑟 ∏ 𝑝𝑦𝑖 𝑖
𝑖=1
𝛤(𝑦𝑖 + 1)𝛤(𝑟) 𝑖=1 𝑖=1

thus our log-likelihood,

𝑛 𝑛 𝑛
𝛤(𝑦𝑖 + 𝑟)
ℓ(𝑟; 𝑦𝑖 ) = ∑ ln ( ) + ∑ 𝑟 ln(1 − 𝑝𝑖 ) + ∑ 𝑦𝑖 ln(𝑝𝑖 )
𝑖=1
𝛤(𝑦𝑖 + 1)𝛤(𝑟) 𝑖=1 𝑖=1

substituting our expression for 𝑝𝑖 and simplifying our logarithmic expression,

𝑛 𝑛 𝑛
𝑟 𝜇𝑖
ℓ(𝑟; 𝑦𝑖 ) = ∑ [ln(𝛤(𝑦𝑖 + 𝑟)) − ln(𝛤(𝑦𝑖 + 1)) − ln(𝛤(𝑟))] + ∑ 𝑟 ln ( ) + ∑ 𝑦𝑖 ln ( )
𝑖=1 𝑖=1
𝜇𝑖 + 𝑟 𝑖=1
𝜇𝑖+𝑟

now we have the derivatives,

𝜕 𝑛 𝑟 𝑛
𝑟 𝑛
𝜇𝑖 𝜕 𝑛 𝜇𝑖 𝑛
𝑦
[∑ 𝑟 ln ( )] = ∑ ln ( )+∑ , [∑ 𝑦𝑖 ln ( )] = ∑ − 𝑖
𝜕𝑟 𝑖=1 𝜇𝑖 + 𝑟 𝑖=1
𝜇𝑖 + 𝑟 𝜇
𝑖=1 𝑖
+𝑟 𝜕𝑟 𝑖=1 𝜇𝑖 + 𝑟 𝑖=1
𝜇 𝑖+𝑟

thus the derivative of our log-likelihood is,

𝑛 𝑛 𝑛 𝑛
𝜕ℓ(𝑟; 𝑦𝑖 ) 𝑟 𝜇𝑖 𝑦𝑖
= ∑ [𝜓(𝑦𝑖 + 𝑟) − 𝜓(𝑟)] + ∑ ln ( )+∑ −∑
𝜕𝑟 𝑖=1 𝑖=1
𝜇𝑖 + 𝑟 𝜇
𝑖=1 𝑖
+ 𝑟 𝜇
𝑖=1 𝑖
+𝑟

Where 𝜓 is the di-gamma function. We can calculate the second derivative of our log-
likelihood in a similar fashion, with resulting expression,

13
August 29, 2023

𝑛 𝑛 𝑛 𝑛
𝜕2 ℓ(𝑟; 𝑦𝑖 ) 𝜇𝑖 𝜇𝑖 𝑦𝑖
= ∑ [𝜓′(𝑦 𝑖 + 𝑟) − 𝜓′(𝑟)] + ∑ − ∑ + ∑ .
𝜕𝑟 2 𝑖=1 𝑖=1
𝑟(𝑟 + 𝜇 𝑖 ) 𝑖=1 (𝜇𝑖 + 𝑟) 2
𝑖=1 (𝜇 𝑖 + 𝑟)
2

Thus the recursive relationship between 𝑟 (𝑘+1) and 𝑟 𝑘 is as follows,

𝑛 𝑛 𝑛 𝑛 −1
𝜇𝑖 𝜇𝑖 𝑦𝑖
𝑟 (𝑘+1) = 𝑟 (𝑘) − [∑ [𝜓′(𝑦𝑖 + 𝑟 (𝑘) ) − 𝜓′(𝑟 (𝑘) )] + ∑ −∑ + ∑ ]
𝑖=1 𝑖=1 𝑟 (𝑘) (𝑟 (𝑘) + 𝜇𝑖 ) 𝑖=1
(𝑘)
(𝜇𝑖 + 𝑟 ) 2 (𝑘) 2
𝑖=1 (𝜇𝑖 + 𝑟 )
𝑛 𝑛 𝑛 𝑛
𝑟 (𝑘) 𝜇𝑖 𝑦𝑖
× [∑ [𝜓(𝑦𝑖 + 𝑟 (𝑘) ) − 𝜓(𝑟 (𝑘) )] + ∑ ln ( (𝑘)
)+∑ (𝑘)
−∑ ]
𝑖=1 𝑖=1 𝜇𝑖 + 𝑟 𝜇
𝑖=1 𝑖 + 𝑟 𝜇
𝑖=1 𝑖 + 𝑟 (𝑘)

Where an estimate for 𝛽, 𝛽̂ is first computed using IRLS conditioned on 𝑟, and then an
estimate for 𝑟 is calculated through repeated iteration of the above expression, given 𝛽.

4.4 d)

Figure 7: Estimates at each iteration rounded to three decimal places

14
August 29, 2023

4.5 e)

Figure 8: Convergence of Estimates

Figure 9: Fitted mean and variance of Negative Binomial Model and Poisson Model

We can see in figure 9 that the negative binomial and Poisson models predict very similar
mean values when the variance of our negative binomial is not substantially larger then its
mean: intuitively this is true as the assumption for a Poisson model is that it has mean

15
August 29, 2023

equal to its variance. For larger values of the variance ( > 40) the NB and Poisson model
mean values diverge, as the variance of the NB model grows significantly larger then its
mean value.

4.6 f)

Figure 10: Estimates provided by glm.nb package

The table of estimates provided from the glm.nb package correspond extremely closely
to the estimates derived in figure 7: any difference observed based on these two figures alone
is due to rounding in figure 7.

5 Question 5
5.1 a)
5.1.1 i)

Figure 11: Performance metrics for the three models

Based on the above table, we can see that m1 provides a balance between all three models
in terms of r-squared, and AIC metrics: although m3 has a higher r-squared, this is due to
it possessing more parameters, thus artificially increasing the r-squared: as a result it has a
higher AIC score. m3, whilst possessing the lowest AIC, has significantly reduced r-squared
compared to the other two models. Thus it is clear that m1 provides a parsimonious balance
between model complexity and prediction accuracy.

16
August 29, 2023

5.1.2 ii)

Figure 12: Table of estimates for a given quantile level

Figure 13: fitted line plot for each quantile and a mean regression

Unfortunately figure fourteen indicates that for extreme quantile levels the standard error
of our intercept estimate diverges rapidly, a secondary plot of just the intercept estimate is
provided in figure fifteen to observe how it changes and its standard error for the majority
of its range.

5.2 b)
5.2.1 i)

Find below the three fitted GAM models, where gam0 corresponds to a linear model,
gam1 with default splines, and gam2 with cubic splines.
The results indicate that although the two true GAMS perform better then the linear
model, with r-squared scores of approximately 6%, the model fit in any case is quite bad: for

17
August 29, 2023

Figure 14: Parameter value for a given quantile level

Figure 15: Intercept value for a given quantile level

the linear model the r-squared is reported as negative, indicating a poor fit. The r-squared
is very low due to the poor fit of month to the data on its own, as the model is failing to
capture the relationship between month and air passengers due to not controlling for year,
which when controlled for demonstrates a clear pattern.

5.2.2 ii)

Find below the summary tables, including parameter estimates, r-squared and deviance
for each year. Note that the r-squared and deviance is very high compared to those found
in GAM0, GAM1, and GAM2. This is due to each gam receiving data pertaining to one
year only, and thus controlling for the year. Furthermore, each successive year has a higher
intercept, indicating that there are more passengers with each successive year. Figure 32
illustrates these observations more clearly.

18
August 29, 2023

Figure 16: Prediction Value and confidence interval for December 1962

Figure 17: Summary Table for GAM0 (Linear Model)

5.2.3 iii)

Assuming that the observations are taken in the middle of the month, the GAM for 1960
indicates there will be a total of 539.4304 individual passengers on June 15th, 1960.

19
August 29, 2023

Figure 18: Summary Table for GAM1

Figure 19: Summary Table for GAM2

Figure 20: Summary Table for 1949

20
August 29, 2023

Figure 21: Summary Table for 1950

Figure 22: Summary Table for 1951

Figure 23: Summary Table for 1952

21
August 29, 2023

Figure 24: Summary Table for 1953

Figure 25: Summary Table for 1954

Figure 26: Summary Table for 1955

22
August 29, 2023

Figure 27: Summary Table for 1956

Figure 28: Summary Table for 1957

Figure 29: Summary Table for 1958

23
August 29, 2023

Figure 30: Summary Table for 1959

Figure 31: Summary Table for 1960

24
August 29, 2023

Figure 32: Fitted GAMs for years 1949-1960. Shaded Area represents standard error

Figure 33: Observed and fitted GAM for 1960 with splines

25

You might also like