0% found this document useful (0 votes)
162 views6 pages

CS 4870: Machine Learning - Homework #4: Professor Kilian Weinberger, 8:40 AM

This document provides instructions for homework 4 in the machine learning course CS 4870. It includes 4 problems related to applying Bayes' rule and naive Bayes classification. Problem 1 has multiple parts applying Bayes' rule to coin flipping scenarios. Problem 2 applies naive Bayes to classify emails as spam or ham based on features. It discusses improving the model with more data and features. Problem 3 provides the formula for calculating the posterior probability in naive Bayes classification.

Uploaded by

Kevin Gao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views6 pages

CS 4870: Machine Learning - Homework #4: Professor Kilian Weinberger, 8:40 AM

This document provides instructions for homework 4 in the machine learning course CS 4870. It includes 4 problems related to applying Bayes' rule and naive Bayes classification. Problem 1 has multiple parts applying Bayes' rule to coin flipping scenarios. Problem 2 applies naive Bayes to classify emails as spam or ham based on features. It discusses improving the model with more data and features. Problem 3 provides the formula for calculating the posterior probability in naive Bayes classification.

Uploaded by

Kevin Gao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CS 4870: Machine Learning - Homework #4

Due on March 6, 2018 at 11:59 PM

Professor Kilian Weinberger, 8:40 AM

A,D,K,M,Z

1
A,D,K,M,Z CS 4870: Homework #4

Problem 1
a. From the question:
P (R) = 1/2
P (B) = 1/2
P (H|R) = 3/5

By Law of Total Probability, we have:

3 1 7 1 6 7 13
P (H) = P (H|R)P (R) + P (H|B)P (B) = · + · = + =
5 2 10 2 20 20 20
By Bayes Rule:

P (H|R)P (R) 3 1 20 6
P (R|H) = = · · =
P (H) 5 2 13 13

b. From the question (probabilities are P ([coin] is heads|hat color):

P (P |R) = 3/5 P (P |B) = 7/10


P (N |R) = 3/10 P (N |B) = 1/5
P (D|R) = 1/2 P (D|B) = 1/10
P (Q|R) = 4/5 P (Q|B) = 2/5

We make the naive Bayes assumption. By Bayes Rule (notice the 1/2 terms cancel):

3 3 1 4 9
P (HHT H|R) = · · · =
5 10 2 5 125
7 1 9 2 63
P (HHT H|B) = · · · =
10 5 10 5 1250

P (HHT H|R) 10
P (R|HHT H) = =
P (HHT H|R) + P (HHT H|B) 17

c. After examining the data table (probabilities are P ([coin] is heads|hat color):

P (P |R) = 3/4 P (P |B) = 1/10


P (N |R) = 7/8 P (N |B) = 3/10
P (D|R) = 1/2 P (D|B) = 9/10
P (Q|R) = 1/8 P (Q|B) = 4/10

By Bayes Rule (notice the 1/2 terms cancel):

3 7 1 1 21
P (HHT H|R) = · · · =
4 8 2 8 512
1 3 1 4 3
P (HHT H|B) = · · · =
10 10 10 10 2500

P (HHT H|R) 4375


P (R|HHT H) = =
P (HHT H|R) + P (HHT H|B) 4503

Problem 1 continued on next page. . . 2


A,D,K,M,Z CS 4870: Homework #4 Problem 1 (continued)

d. X = Data(heads/tails) and y = red/blue hat. The Naive Bayes assumption holds because different
coins are independent of each other (i.e. the event spaces are disjoint); hence, the features can be assumed
to be conditionally independent.

Problem 2
a.
P (0, 0, 1|Ham)P (Ham) 0
P (Ham|0, 0, 1) = =
P (0, 0, 1) 5

P (1, 1, 1|Ham)P (Ham) 1/5 ∗ 1/3


P (Ham|1, 1, 1) = = =1
P (1, 1, 1) 1/5 ∗ 1/3 + 0 ∗ 2/3

P (1, 0, 0|Ham)P (Ham)


P (Ham|1, 0, 0) = =0
P (1, 0, 0)

P (0, 0, 0|Ham)P (Ham) 0


P (Ham|0, 0, 0) = = = undef ined
P (0, 0, 0 0
Yes, last one is undefined, all seem unreasonable for them to guaranteed to to be 0 or 1 or undefined. This
is due to the fact that we don’t utilize Laplace smoothing.

b.

• Collecting more emails would help with our predictions because a larger data sample would give us
more realistic probabilities

• Extracting more features for each email would allow us to classify each email more accurately

• Duplicating emails with uncommon features would not help, it changes the distribution of the emails

• Making stronger assumptions is helpful, assuming our features are independent of each other would be
more realistic for our data.

c.

P (1, 0, 1|Ham) = P (bacon = 1|Ham)P (ip = 0|Ham)P (mispell = 1|Ham) = 1 ∗ 2/5 ∗ 3/5 = 6/25

d.
1
P (bacon = 1|Spam) =
10
3
P (ip = 1|Spam) =
10
7
P (mispell = 1|Spam) =
10
5
P (bacon = 1|Ham) =
5
3
P (ip = 1|Ham) =
5
3
P (mispell = 1|Ham) =
5
P (Spam) = 2/3
P (Ham) = 1/3

Problem 2 continued on next page. . . 3


A,D,K,M,Z CS 4870: Homework #4 Problem 2 (continued)

P (0, 0, 1|Ham)P (Ham)


P (Ham|0, 0, 1) = =0
P (0, 0, 1)
P (1, 1, 1|Ham)P (Ham)
P (Ham|1, 1, 1) = = 60/67
P (1, 1, 1)
P (1, 0, 0|Ham)P (Ham)
P (Ham|1, 0, 0) = = 80/101
P (1, 0, 0)
P (0, 0, 0|Ham)P (Ham)
P (Ham|0, 0, 0) = =0
P (0, 0, 0)

e.
5
P (bacon = 1|Spam) =
18
7
P (ip = 1|Spam) =
18
11
P (mispell = 1|Spam) =
18
9
P (bacon = 1|Ham) =
13
7
P (ip = 1|Ham) =
13
7
P (mispell = 1|Ham) =
13
P (Spam) = 18/31
P (Ham) = 13/31

P (1, 0, 1|Ham) = P (bacon = 1|Ham)P (ip = 0|Ham)P (mispell = 1|Ham) = 9/13 ∗ 6/13 ∗ 7/13 = 0.172

P (0, 0, 1|Ham)P (Ham)


P (Ham|0, 0, 1) = = 0.16996
P (0, 0, 1)
P (1, 1, 1|Ham)P (Ham)
P (Ham|1, 1, 1) = = 0.613
P (1, 1, 1)
P (1, 0, 0|Ham)P (Ham)
P (Ham|1, 0, 0) = = 0.617
P (1, 0, 0)
P (0, 0, 0|Ham)P (Ham)
P (Ham|0, 0, 0) = = 0.216
P (0, 0, 0)

4
A,D,K,M,Z CS 4870: Homework #4

Problem 3
1.
Qd
α=1 p([x]α |y = 1)p(y = 1)
p(y = 1|x) =
p(x)
= (given sum rule)
Qd
α=1 p([x]α |y = 1)p(y = 1)
p(x|y = 1)p(y = 1) + p(x|y = 0)p(y = 0)
= (given Naive Bayes assumption and product rule)
Qd
α=1 p([x]α |y = 1)p(y = 1)
Qd Qd
α=1 p([x]α |y = 1)p(y = 1) + α=1 p([x]α |y = 0)p(y = 0)

2. Dividing both sides by the numerator.


Qd
p([x]α |y = 1)p(y = 1)
α=1
p(y = 1|x) = Qd Qd
α=1 p([x]α |y = 1)p(y = 1)p(y = 1) + α=1 p([x]α |y = 0)p(y = 0)
1
= Qd
p([x]α |y=0)p(y=0)
1+ Qα=1
d
α=1 p([x]α |y=1)p(y=1)

1
= Qd
α=1 p([x]α |y=0)p(y=0)
1 + exp (log Qd )
α=1 p([x]α |y=1)p(y=1)

1
= Qd
p([x]α |y=1)p(y=1)
1 + exp (− log Qα=1
d )
α=1 p([x]α |y=0)p(y=0)

3. Define w
~ and b as follows:
µα1 − µα0
wα = [w]α =
σα2
d
 p(y = 1)  X µ2α1 − µ2α0
b = log −
p(y = 0) α=1
2σα2

Problem 3 continued on next page. . . 5


A,D,K,M,Z CS 4870: Homework #4 Problem 3 (continued)

(xα −µαy )2
Then, given that p([x]α |y) = √ 1 exp (− 2 ),
2πσα 2σα

P (y = 1|~x)
h(~x) = 1 ⇐⇒ >1
P (y = 0|~x)
Qd
α=1 p([x]α |y = 1)p(y = 1)
⇐⇒ Qd >1
α=1 p([x]α |y = 0)p(y = 0)
Qd
p([x]α |y = 1) p(y = 1)
⇐⇒ log Qdα=1 + log >0
α=1 p([x]α |y = 0)
p(y = 0)
Qd (xα −µα1 )2
√ 1
α=1 2πσα exp (− 2
2σα ) p(y = 1)
⇐⇒ log Qd (xα −µα0 )2
+ log >0
1 p(y = 0)
α=1 2πσα exp (− )
√ 2
2σα
2
exp (− α=1 (xα −µ
Pd α1 )
2σα2 ) p(y = 1)
⇐⇒ log Pd (xα −µα0 )2 + log >0
exp (− 2 ) p(y = 0)
α=1 2σα
d
X (xα − µα1 )2 − (xα − µα0 )2 p(y = 1)
⇐⇒ log(exp (− 2
)) + log >0
α=1
2σα p(y = 0)
d
X (xα − µα1 )2 − (xα − µα0 )2 p(y = 1)
⇐⇒ − 2
+ log >0
α=1
2σ α p(y = 0)
d
X −2xα µα1 + µ2α1 + 2xα µα0 − µ2α0 p(y = 1)
⇐⇒ − 2
+ log >0
α=1
2σ α p(y = 0)
d d
X µα1 − µα0 X µ2α1 − µ2α0 p(y = 1)
⇐⇒ 2
· x α − 2
+ log >0
α=1
σα α=1
2σα p(y = 0)
d d
X   p(y = 1)  X µ2α1 − µ2α0 
⇐⇒ wα xα + log − >0
α=1
p(y = 0) α=1
2σα2
⇐⇒ w
~ · ~x + b > 0

Therefore, the Gaussian Naives Bayes with shared variance is linear.

You might also like