0% found this document useful (0 votes)
3 views2 pages

Report AI

Uploaded by

Hoang Giang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views2 pages

Report AI

Uploaded by

Hoang Giang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

INT 3405E 21: Machine Learning First Semester 2023-2024

Week 2
Student Name: Nguyen Tuan Hung UID: 21020205
• Feel free to talk to other students in the class when doing the homework. You should, however, write
down your solution yourself. You also must indicate on each homework with whom you collaborated
and cite any other sources you use including Internet sites.
• You will write your solution in LaTeX and submit the pdf file in zip files, including relevant materials,
through courses.uet.vnu.edu.vn
• Dont be late.

1 Homework 1 - 10pts

SOLUTION

The likelihood of θ with respect to data set D is computed as follows:

L(θ) = n1 N
Q si  si
∗ θ ∗ (1 − θ)N −si
log
−→ l(θ) = α + n1 si ∗ log θ − (N − si ) ∗ log (1 − θ)
P

Since log θ covariates with θ, therefore arg maxθ l(θk ) ≈ arg maxθ L(θk )
Having that in mind, we can find sensible value by function’s maximum.

l′ (θ) = n1 log eP ∗ ( sθiP+ N1−θ


−si
P
)
′ si N −si
l (θ) = 0 ⇔ θ = P1−θ
θ = Ns2 with s = n1 si

Having l′ ( Ns2 − ) > 0 and l′ ( Ns2 + ) < 0, s


N2
is our sensible value for θ

1
2 Homework 2 - 10pts

SOLUTION.
Regarding multi-class classification:

P (y = k|x) = θk (y) ⇒ P (y = k, x = k) = θk (x) ∗ θk (y)


Therefore, p(X) is independent.(1)
On the other hand, having I(x = k) is the indicator function, the likelihood of θ with
respect to data set D is computed as follows:

Q I(x=k)
L(θk ) = n1 θx (1)
log Pn
−→ l(θk ) = 1 I(xi = k) ∗ log(θx )
By using Lagrange multiplier with constraint (x) = sumC
1 θi − 1 = 0, we can augment
original function as:

Pn
la (θk ) = 1 I(xi = k) ∗ thetax − λ ∗ g(x)
′ ′ ′
MLE Estimator: arg maxθ la (θk ) ⇔ la (θk ) = 0 & la (θk+ ) < 0 & la (θk− ) > 0
In regards to la (θ) derivative with respect to θk

δla (θ)
= n1 I(xi = 1
P
δθk
k) ∗ θk (xi )
−λ
PN
δla (θ) I(x =k)
δθk
=0⇔ λ = 1 θk i = Nθkk with Nk = no. value k in D
On the other side, considering λ as a scalar, by repeating the equation above for all possible
values of k, we can witness that:

λ = Nθii ∀i ∈ C
PC ⇒ Ni =P λ ∗ θi
⇒ i Ni = λ ∗ C i θi = λ = N
Nk Nk
Therefore, θk = λ
= N
is our sensible value

You might also like