0% found this document useful (0 votes)
4 views

2023 Assignment3

Uploaded by

boatingmen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

2023 Assignment3

Uploaded by

boatingmen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

MAST30027: Modern Applied Statistics

Assignment 3, 2023.
Due: 5pm Monday September 25th

• This assignment is worth 14% of your total mark.

• To get full marks, show your working including 1) R commands and outputs you use, 2)
mathematics derivation, and 3) rigorous explanation why you reach conclusions or answers.
If you just provide final answers, you will get zero mark.
• The assignment you hand in must be typed (except for math formulas), and be submitted
using LMS as a single PDF document only (no other formats allowed). For math formulas,
you can take a picture of them. Your answers must be clearly numbered and in the same
order as the assignment questions.
• The LMS will not accept late submissions. It is your responsibility to ensure that your
assignments are submitted correctly and on time, and problems with online submissions are
not a valid excuse for submitting a late or incorrect version of an assignment.

• We will mark a selected set of problems. We will select problems worth ≥ 50% of the full
marks listed.
• If you need an extension, please contact the lecturer before the due date with appropriate
justification and supporting documents. Late assignments will only be accepted if you have
obtained an extension from the lecturer before the due date. To ensure that the lecturer
responds to your extension request email before the due date, please contact 24h before the
due date. Under no circumstances an assignment will be marked if solutions for it have been
released.
• Also, please read the “Assessments” section in “Subject Overview” page of the LMS.

1. The file assignment3 prob1 2023.txt contains 300 observations. We can read the obser-
vations and make a histogram as follows.

> X = scan(file="assignment3_prob1_2023.txt", what=double())


Read 300 items
> length(X)
[1] 300
> hist(X)

We will model the observed data using a mixture of three Poisson distributions. Specifically,
we assume the observations X1 , . . . , X300 are independent to each other, and each Xi follows
this mixture model:
Zi ∼ categorical (π1 , π2 , 1 − π1 − π2 ),
Xi |Zi = 1 ∼ Poisson(λ1 ),
Xi |Zi = 2 ∼ Poisson(λ2 ),
Xi |Zi = 3 ∼ Poisson(λ3 ).

1
The Poisson distribution has probability mass function

λx e−λ
f (x; λ) = .
x!
We aim to obtain MLE of parameters θ = (π1 , π2 , λ1 , λ2 , λ3 ) using the EM algorithm.

(a) (5 marks) Let X = (X1 , . . . , X300 ) and Z = (Z1 , . . . , Z300 ). Derive the expectation
of the complete log-likelihood, Q(θ, θ0 ) = EZ|X,θ0 [log(P (X, Z|θ))].

(b) (3 marks) Derive E-step of the EM algorithm.

(c) (5 marks) Derive M-step of the EM algorithm.

(d) (5 marks) Note: Your answer for this problem should be typed. Hand-
written solution or screen-captured R codes/figures won’t be marked.
Implement the EM algorithm and obtain MLE of the parameters by applying the imple-
mented algorithm to the observed data, X1 , . . . , X300 . Set EM iterations to stop when either
the number of EM-iterations reaches 100 (max.iter = 100) or the incomplete log-likelihood
has changed by less than 0.00001 ( = 0.00001). Run the EM algorithm two times with
the following two different initial values and report estimators with the highest incomplete
log-likelihood.

π1 π2 λ1 λ2 λ3
1st initial values 0.3 0.3 3 20 35
2nd initial values 0.1 0.2 5 25 40

For each EM run, check that the incomplete log-likelihoods increase at each EM-step by
plotting them.

2. The file assignment3 prob2 2023.txt contains 100 observations. We can read the 300
observations from the problem 1 and the new 100 observations and make histograms as
follows.

> X = scan(file="assignment3_prob1_2023.txt", what=double())


Read 300 items

2
> X0 = scan(file="assignment3_prob2_2023.txt", what=double())
Read 100 items
> length(X)
[1] 300
> length(X0)
[1] 100
> par(mfrow=c(1,2))
> hist(X0)
> hist(c(X,X0))

Let X1 , . . . , X300 and X301 , . . . , X400 denote the 300 observations from assignment3 prob1 2023.txt
and the 100 observations from assignment3 prob2 2023.txt, respectively. We assume the
observations X1 , . . . , X400 are independent to each other. We model X1 , . . . , X300 (from
assignment3 prob1 2023.txt) using the mixture of three Poisson distributions (as we did
in the problem 1), but we model X301 , . . . , X400 (from assignment3 prob2 2023.txt) using
one of the three Poisson distributions. Specifically, for i = 1, . . . , 300, Xi follows this mixture
model:
Zi ∼ categorical (π1 , π2 , 1 − π1 − π2 ),
Xi |Zi = 1 ∼ Poisson(λ1 ),
Xi |Zi = 2 ∼ Poisson(λ2 ),
Xi |Zi = 3 ∼ Poisson(λ3 ),
and for i = 301, . . . , 400,
Xi ∼ Poisson(λ2 ).
We aim to obtain MLE of parameters θ = (π1 , π2 , λ1 , λ2 , λ3 ) using the EM algorithm.

(a) (5 marks) Let X = (X1 , . . . , X400 ) and Z = (Z1 , . . . , Z300 ). Derive the expectation
of the complete log-likelihood, Q(θ, θ0 ) = EZ|X,θ0 [log(P (X, Z|θ))].

(b) (5 marks) Derive E-step and M-step of the EM algorithm.

(c) (5 marks) Note: Your answer for this problem should be typed. Hand-
written solution or screen-captured R codes/figures won’t be marked.

3
Implement the EM algorithm and obtain MLE of the parameters by applying the imple-
mented algorithm to the observed data, X1 , . . . , X400 . Set EM iterations to stop when either
the number of EM-iterations reaches 100 (max.iter = 100) or the incomplete log-likelihood
has changed by less than 0.00001 ( = 0.00001). Run the EM algorithm two times with
the following two different initial values and report estimators with the highest incomplete
log-likelihood.

π1 π2 λ1 λ2 λ3
1st initial values 0.3 0.3 3 20 35
2nd initial values 0.1 0.2 5 25 40

For each EM run, check that the incomplete log-likelihoods increase at each EM-step by
plotting them.

You might also like