5naive Bayes
5naive Bayes
S. Sumitra
Department of Mathematics
Indian Institute of Space Science and Technology
Binary Classification
Attributes are Discrete valued
p(y )p(x | y )
Bayes Theorem: p(y | x) =
p(x)
Joint Probability distribution
P(ABC) = P(A)P(BC | A)
= P(A)P(B/A)P(C/AB)
Definition
Let X , Y , and Z be sets of random variables. X is conditionally
independent of Y given Z in a distribution P if
P(X = x, Y = y /Z = z) = P(X = x/Z = z)P(Y = y /Z =
z)∀x ∈ Range(X ), y ∈ Range(Y ), z ∈ Range(Z ).
Binary Classification
{(x1 , y1 ), (x2 , y2 ), . . . (xN , yN )} be the given data where
xi ∈ Rn , yi ∈ {1, 0}.
Discrete valued attributes
Bayes Theorem
p(y = 1)p(x | y )
p(y = 1 | x) =
p(x)
Conditional Independence
x = (W = F , X = T , Y = F )T
Categorical Attributes & Multiclass
Data A1 A2 A3 A4 yi : 1/0
x1T 1 1 0 1 1
x2T 2 0 1 1 2
x3T 1 1 0 1 1
x4T 3 1 1 0 0
x5T 3 2 1 0 0
x6T 1 2 1 0 2
x T = (3, 0, 1, 1)
Categorical Attributes
m trials
n outcomes
pi is the probability related with i th outcome where
i = 1, 2, . . . n
Consider the data x T = (x 1 , x 2 , . . . x n ) where x j is the number
of times j th outcome appears in m trials where j = 1, 2, . . . n.
m!
x1 xn Pn j
x 1 ! · · · x n ! p1 × · · · × pn , when j=1 x = m
1 2 n
p(x , x , . . . x ) =
0 otherwise
Naive Bayes
Two types
Categorical features
Aj ∈ 1, 2, . . . l
p(x j = αP | y = c) = [θαc ]j
i 1(x ij = α ∩ y = c) + 1
[θαc ]j = P
Q1(y = c) + l
ŷ = arg maxc πc j [θαc ]j
Multinomial features
Multinomial Naive Bayes
= wo + w⊤
cx