Classification (Naive Bayes)
Classification (Naive Bayes)
*The instructor thanks Dr Jae-Gil Lee for sharing the lecture slides.
Contents
• Bayes theorem: P ( A | C ) P (C )
P(C | A)
P ( A)
Rule of Multiplication
The probability that Events A and B both occur is equal to the probability that
Event A occurs times the probability that Event B occurs, given that A has
occurred.
P(A ∩ B) = P(A) P(B|A)
5
Example: Rule of Multiplication
• An urn contains 6 red marbles and 4 black marbles. Two marbles are
drawn without replacement from the urn. What is the probability that both of
the marbles are black?
6
Example: Rule of Multiplication
• An urn contains 6 red marbles and 4 black marbles. Two marbles are
drawn without replacement from the urn. What is the probability that both of the
marbles are black?
• Let A = the event that the first marble is black; and
• let B = the event that the second marble is black
P(A) = 4/10 (4 out of 10 in the urn are black)
P(B|A) = 3/9 (3 out of 9 in the urn are black now)
• Bayes theorem: P ( A | C ) P (C )
P(C | A)
P ( A)
Example (1): Bayes Theorem
• A doctor knows that meningitis causes stiff neck 50% of the time P(S|
M)=0.5
• Prior probability of any patient having meningitis is 1/50,000 P(M) =
1/50000
• Prior probability of any patient having stiff neck is 1/20 P(S) = 1/20
• If a patient has stiff neck, what’s the probability he/she has meningitis?
P(M|S) ?
P ( S | M ) P ( M ) 0.5 1 / 50000
P( M | S ) 0.0002
P( S ) 1 / 20
Example (2): Bayes Theorem
10
Example (2): Bayes Theorem
11
Example (2): Bayes Theorem
• P( A ) = 5/365 =0.0136985
1 [It rains 5 days out of the year.]
• P( A ) = 360/365 = 0.9863014 [It does not rain 360 days out of the year.]
2
• P( B | A ) = 0.9
1 [When it rains, the weatherman predicts rain 90% of
the
time.]
• P( B | A ) = 0.1 [When it does not rain, the weatherman predicts rain 10% of
2
the
time.]
12
Example (2): Bayes Theorem
13
Example (2): Bayes Theorem
14
Even when the weatherman predicts rain, it only rains only about 11%
of the time.
Example (2): Bayes Theorem
Even when the weatherman predicts rain, it only rains only about 11%
of the time.
15
Bayesian Classifiers (1/2)
P ( A A A | C ) P (C )
P (C | A A A ) 1 2 n
P( A A A )
1 2 n
1 2 n
•• We
can estimate P(A | C ) for all A and C
i j i j
1 ( Ai ij ) 2
2
i j 2 2 No Married 100K No
ij 3 No Single 70K No
4 Yes Married 120K No
1
( 120110 ) 2
Taxable income:
Since P(X|No)P(No) > P(X|Yes)P(Yes)
If class=No: sample mean = 110 0.0024 * 7/10 0 * 3/10
sample variance = 2975
If class=Yes: sample mean = 90 Therefore P(No|X) > P(Yes|X)
sample variance = 25 => Class = No
M-Estimate of Conditional Probability
• An example
• (instead of 0!)
• We are assuming that = 1 / number of attribute values = 1 / 3
• Our value is arbitrary, and we will use = 4
Summary of Naïve Bayes
B2
B2
Basic Idea
B2 Support vectors
b21
b22
margin
b11
b12
Let the data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples
associated with the class labels yi
There are infinite lines (hyperplanes) separating the two classes, but we want to find
the best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., maximum
marginal hyperplane (MMH)
Formalization (1/3)
• A separating hyperplane: w x – b = 0
• w: a normal vector
• b: a scalar value (bias)
• Primal form
Substituting ||w|| with for
mathematical convenience
Linearly separable
• http://www.csie.ntu.edu.tw/~cjlin/libsvm/
SVM Related Links
• http://www.svms.org/
• http://www.support-vector-machines.org/
• http://www.kernel-machines.org/
Thank You!
Questions?
41