0% found this document useful (0 votes)
260 views15 pages

Data Mining Classification: Alternative Techniques

The document discusses Bayesian classifiers and the Naive Bayes classifier for solving classification problems. It introduces Bayes' theorem and how it can be used for classification by estimating the posterior probability P(Y|X1,X2,...Xd) for each class Y. The Naive Bayes classifier makes the assumption of conditional independence between attributes given the class, allowing it to estimate P(Xi|Y) from training data to classify new points based on the class with the maximum probability.

Uploaded by

Yosua Siregar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
260 views15 pages

Data Mining Classification: Alternative Techniques

The document discusses Bayesian classifiers and the Naive Bayes classifier for solving classification problems. It introduces Bayes' theorem and how it can be used for classification by estimating the posterior probability P(Y|X1,X2,...Xd) for each class Y. The Naive Bayes classifier makes the assumption of conditional independence between attributes given the class, allowing it to estimate P(Xi|Y) from training data to classify new points based on the class with the maximum probability.

Uploaded by

Yosua Siregar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Data  Mining  

Classification:  Alternative  Techniques

Bayesian  Classifiers

Introduction  to  Data  Mining,  2nd Edition


by
Tan,  Steinbach,  Karpatne,  Kumar

Bayes  Classifier

● A  probabilistic  framework  for  solving  classification  


problems
● Conditional  Probability: P( X , Y )
P(Y | X ) =
P( X )
P( X , Y )
P( X | Y ) =
P(Y )
● Bayes  theorem:
P( X | Y ) P(Y )
P(Y | X ) =
P( X )
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   2
Example  of  Bayes  Theorem

● Given:  
– A  doctor  knows  that  meningitis  causes  s tiff  neck  50%  of  the  
time
– Prior  probability   of  any  patient  having  meningitis   is  1/50,000
– Prior  probability   of  any  patient  having  stiff  neck  is  1/20

● If  a  patient  has  stiff  neck,  what’s  the  probability  


he/she  has  meningitis?
P( S | M ) P( M ) 0.5 ×1 / 50000
P( M | S ) = = = 0.0002
P( S ) 1 / 20

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   3

Using  Bayes  Theorem  for  Classification

● Consider  each  attribute  and  class  label  as  r andom  


variables

● Given  a  r ecord  with  attributes  ( X1,  X2,…,  Xd)  


– Goal  is  to  predict  class  Y
– Specifically,  we  want  to  find  the  value  of  Y  that  
maximizes  P(Y|  X1,  X2,…,  Xd   )

● Can  we  estimate  P(Y|  X1,  X2,…,  Xd   )  directly  from  


data?

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   4


Example  Data
Given  a  Test  Record:
l l
ic
a
ic
a us
go
r
go
r X =tin(uRefund
o
s = No, Divorced, Income = 120K)
te te n as
ca ca co cl
Tid   Refund   Marital  
Status  
Taxable  
Income   Evade   ● Can  we  estimate
1   Yes   Single   125K   No   P(Evade  =  Yes  |  X)  and  P(Evade  =  No  |  X)?
2   No   Married   100K   No  
3   No   Single   70K   No  
4   Yes   Married   120K   No   In  the  following   we  will  replace  
5   No   Divorced   95K   Yes  
Evade  =  Yes  by  Yes,  and  
6   No   Married   60K   No  
7   Yes   Divorced   220K   No   Evade  =  No  by  No
8   No   Single   85K   Yes  
9   No   Married   75K   No  
10   No   Single   90K   Yes  
10  

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   5

Using  Bayes  Theorem  for  Classification

● Approach:
– compute  posterior  probability  P(Y  |  X1,  X2,  …,  Xd)  using  
the  Bayes  theorem
P( X 1 X 2 ! X d | Y ) P(Y )
P(Y | X 1 X 2 ! X n ) =
P( X 1 X 2 ! X d )

– Maximum  a-­posteriori:  Choose  Y  that  maximizes  


P(Y  |  X1,  X2,  …,  Xd)

– Equivalent  to  choosing  value  of  Y  that  maximizes


P(X1,  X2,  …,  Xd|Y)  P(Y)

● How  to  estimate  P(X1,  X2,   …,  Xd   |  Y  )?


02/14/2018 Introduction   to   Data  Mining,  2nd Edition   6
Example  Data
Given  a  Test  Record:
l l
ic
a
ic
a us
go
r
go
r X =tinu(oRefund
s = No, Divorced, Income = 120K)
te te n as
ca ca co cl
Tid   Refund   Marital   Taxable  
Status   Income   Evade  

1   Yes   Single   125K   No  


2   No   Married   100K   No  
3   No   Single   70K   No  
4   Yes   Married   120K   No  
5   No   Divorced   95K   Yes  
6   No   Married   60K   No  
7   Yes   Divorced   220K   No  
8   No   Single   85K   Yes  
9   No   Married   75K   No  
10   No   Single   90K   Yes  
10  

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   7

Naïve  Bayes  Classifier

● Assume  independence  among  attributes  Xi when  class  is  


given:        
– P(X1,  X2,  …,  Xd |Yj)  =  P(X1|  Yj)  P(X2|  Yj)…  P(Xd|  Yj)

– Now  we  can  estimate  P(Xi|  Yj)  for  all  Xi and  Yj
combinations  from  the  training  data

– New  point  is  classified  to  Yj if    P(Yj)  Π P(Xi|  Yj)    is  
maximal.

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   8


Conditional  Independence

●X and  Y are  conditionally  independent  given  Z if  


P(X|YZ)  =  P(X|Z)

● Example:  Arm  length  and  r eading  skills  


– Young  child  has  shorter  arm  length  and  
limited  reading  skills,  compared  to  adults
– If  age  is  fixed,  no  apparent  relationship  
between  arm  length  and  reading  skills
– Arm  length  and  reading  skills  are  conditionally  
independent  given  age

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   9

Naïve  Bayes  on  Example  Data


Given  a  Test  Record:
l l
ic
a
ic
a us
go
r
go
r X =tin(uRefund
o
s = No, Divorced, Income = 120K)
te te n as
ca ca co cl
Tid   Refund   Marital   Taxable  
Status   Income   Evade   ● P(X  |  Yes)  =  
No  
1   Yes   Single   125K  
P(Refund  =  No  |  Yes)  x  
2   No   Married   100K   No  
3   No   Single   70K   No  
P(Divorced  |  Yes)  x  
4   Yes   Married   120K   No   P(Income  =  1 20K  |  Yes)
5   No   Divorced   95K   Yes  
6   No   Married   60K   No  
7   Yes   Divorced   220K   No   ● P(X  |  No)  =  
8   No   Single   85K   Yes   P(Refund  =  No  |  No)  x  
9   No   Married   75K   No  
P(Divorced  |  No)  x  
10   No   Single   90K   Yes  
10  

P(Income  =  1 20K  |  No)

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   10


Estimate  l P robabilities  
l
from  Data
a a s
r ic r ic
u ou
go go tin s
ca
te
ca
te n
cl
as ● Class:    P(Y)  =  Nc /N
co
Tid   Refund   Marital   Taxable   – e.g.,    P(No)  =  7 /10,  
Status   Income   Evade   P(Yes)  =  3/10
1   Yes   Single   125K   No  
2   No   Married   100K   No  
● For  categorical  attributes:
3   No   Single   70K   No  
4   Yes   Married   120K   No   P(Xi |  Yk )  =  |Xik |/  Nc k
5   No   Divorced   95K   Yes  
– where  |Xik|  is  number  of  
6   No   Married   60K   No  
instances  having  attribute  
7   Yes   Divorced   220K   No  
value  Xi and  belonging  to  
8   No   Single   85K   Yes  
class  Yk
9   No   Married   75K   No  
10   No   Single   90K   Yes  
– Examples:
P(Status=Married|No)  =  4 /7
10  

P(Refund=Yes|Yes)=0
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   11

Estimate  P robabilities  from  Data

● For  continuous  attributes:  


– Discretization: Partition  the  range  into  bins:
u Replace  continuous  value  with  bin  value
k
– Attribute changed from continuous to ordinal

– Probability  density  estimation:


u Assume  attribute  follows  a  normal  distribution
u Use  data  to  estimate  parameters  of  distribution  
(e.g.,  mean  and  standard  deviation)
u Once  probability  distribution  is  known,  use  it  to  
estimate  the  conditional  probability  P(Xi|Y)

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   12


Estimate  
o r ic P robabilities  
al
o r ic
al
uo
us from  Data
t eg t eg n tin as
s
ca ca co cl
Tid   Refund   Marital  
Status  
Taxable  
Income   Evade  
● Normal  distribution:
( X i − µij ) 2
1   Yes   Single   125K   No   1 −
2σ ij2
2   No   Married   100K   No  
P( X i | Y j ) = e
2
2πσ ij
3   No   Single   70K   No  
4   Yes   Married   120K   No   – One  for  each  (Xi,Yi)  pair
5   No   Divorced   95K   Yes  
6   No   Married   60K   No   ● For  (Income,  Class=No):
7   Yes   Divorced   220K   No  
– If  Class=No
8   No   Single   85K   Yes  
u sample  mean  =  1 10
9   No   Married   75K   No  
10   No   Single   90K   Yes   u sample  variance  =  2 975
10  

1 −
( 120 −110 ) 2

P( Income = 120 | No) = e 2 ( 2975 )


= 0.0072
2π (54.54)
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   13

Example  of  Naïve  Bayes  Classifier


Given  a  Test  Record:

X = (Refund = No, Divorced, Income = 120K)


Naïve    Bayes   Classifier:

P(Refund  =  Yes  |  No)  =  3/7


P(Refund  =  No  |  No)  =  4/7 ● P(X  |  No)  =  P(Refund=No   |  No)
P(Refund  =  Yes  |  Yes)  =  0 × P(Divorced  |  No)
P(Refund  =  No  |  Yes)  =  1 × P(Income=120K   |  No)
P(Marital  Status  =  Single  |  No)  =  2/7 =  4/7  × 1/7   × 0.0072   =  0.0006
P(Marital  Status  =  Divorced  |  No)  =  1/7
P(Marital  Status  =  Married  |  No)  =  4/7
● P(X  |  Yes)  =  P(Refund=No   |  Yes)
P(Marital  Status  =  Single  |  Yes)  =  2/3
P(Marital  Status  =  Divorced  |  Yes)  =  1/3 × P(Divorced  |  Yes)
P(Marital  Status  =  Married  |  Yes)  =  0 × P(Income=120K   |  Yes)
=  1  × 1/3  × 1.2  × 10-­9 =  4  × 10-­10
For  Taxable  Income:
If  class  =  No:  sample  mean  =  110
sample  variance  =  2975
Since  P(X|No)P(No)   >  P(X|Yes)P(Yes)
If  class  =  Yes:  sample  mean  =  90 Therefore  P(No|X)  >  P(Yes|X)
sample  variance  =  25
=>  Class  =  No

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   14


Example  of  Naïve  Bayes  Classifier
Given  a  Test  Record:

X = (Refund = No, Divorced, Income = 120K)


Naïve    Bayes   Classifier:
● P(Yes)  =  3/10
P(Refund  =  Yes  |  No)  =  3/7
P(No)  =  7/10
P(Refund  =  No  |  No)  =  4/7
P(Refund  =  Yes  |  Yes)  =  0
P(Refund  =  No  |  Yes)  =  1
● P(Yes  |  Divorced)  =  1/3  x  3/10  /  P(Divorced)
P(Marital  Status  =  Single  |  No)  =  2/7
P(Marital  Status  =  Divorced  |  No)  =  1/7 P(No  |  Divorced)  =  1/7   x  7/10   /  P(Divorced)
P(Marital  Status  =  Married  |  No)  =  4/7
P(Marital  Status  =  Single  |  Yes)  =  2/3
P(Marital  Status  =  Divorced  |  Yes)  =  1/3 ● P(Yes  |  Refund   =  No,  Divorced)  =  1  x  1/3  x  3/10   /  
P(Marital  Status  =  Married  |  Yes)  =  0
P(Divorced,  Refund   =  No)
For  Taxable  Income: P(No  |  Refund   =  No,  Divorced)  =  4/7  x  1/7  x  7/10   /        
If  class  =  No:  sample  mean  =  110 P(Divorced,  Refund   =  No)
sample  variance  =  2975
If  class  =  Yes:  sample  mean  =  90
sample  variance  =  25

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   15

Issues  with  Naïve  Bayes  Classifier

Naïve    Bayes   Classifier:


● P(Yes)  =  3/10
P(Refund  =  Yes  |  No)  =  3/7 P(No)  =  7/10
P(Refund  =  No  |  No)  =  4/7
P(Refund  =  Yes  |  Yes)  =  0
P(Refund  =  No  |  Yes)  =  1 ● P(Yes  |  Married)  =  0  x  3/10  /  P(Married)
P(Marital  Status  =  Single  |  No)  =  2/7
P(Marital  Status  =  Divorced  |  No)  =  1/7 P(No  |  Married)  =  4/7  x  7/10  /  P(Married)
P(Marital  Status  =  Married  |  No)  =  4/7
P(Marital  Status  =  Single  |  Yes)  =  2/3
P(Marital  Status  =  Divorced  |  Yes)  =  1/3
P(Marital  Status  =  Married  |  Yes)  =  0

For  Taxable  Income:


If  class  =  No:  sample  mean  =  110
sample  variance  =  2975
If  class  =  Yes:  sample  mean  =  90
sample  variance  =  25

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   16


Issues  with  Naïve  Bayes  Classifier
l l a a s
ric ric u ou
o o
t eg
t n eg a tin ss
Consider  the   cl Naïve    Bayes   Classifier:
c a table  with  Tid
ca =  7  deleted
co
Tid   Refund   Marital   Taxable  
Status   Income   Evade   P(Refund  =  Yes  |  No)  =  2/6
P(Refund  =  No  |  No)  =  4/6
1   Yes   Single   125K   No   P(Refund  =  Yes  |  Yes)  =  0
2   No   Married   100K   No   P(Refund  =  No  |  Yes)  =  1
P(Marital  Status  =  Single  |  No)  =  2/6
3   No   Single   70K   No  
P(Marital  Status  =  Divorced  |  No)  =  0
4   Yes   Married   120K   No   P(Marital  Status  =  Married  |  No)  =  4/6
5   No   Divorced   95K   Yes   P(Marital  Status  =  Single  |  Yes)  =  2/3
P(Marital  Status  =  Divorced  |  Yes)  =  1/3
6   No   Married   60K   No   P(Marital  Status  =  Married  |  Yes)  =  0/3
7   Yes   Divorced   220K   No   For  Taxable  Income:
If  class  =  No:  sample  mean  =  91
8   No   Single   85K   Yes  
sample  variance  =  685
9   No   Married   75K   No   If  class  =  No:  sample  mean  =  90
10   No   Single   90K   Yes   sample  variance  =  25
10  

Given  X  =  (Refund  =  Yes,  Divorced,  120K)


Naïve  Bayes  will  not  be  able  to  
P(X  |  No)  =  2 /6  X  0  X  0 .0083  =  0
classify  X  as  Yes  or  No!
P(X  |  Yes)  =  0  X  1 /3  X  1 .2  X  1 0 -­9 =  0

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   17

Issues  with  Naïve  Bayes  Classifier

● If  one  of  the  conditional  probabilities  is  zero,  then  


the  entire  expression  becomes  zero
● Need  to  use  other  estimates  of  conditional  probabilities  
than  simple  fractions c: number of classes
● Probability  estimation: p: prior probability of
the class
N ic
Original : P( Ai | C ) = m: parameter
Nc
N ic + 1 N c: number of instances
Laplace : P( Ai | C ) = in the class
Nc + c
N ic + mp
m - estimate : P( Ai | C ) = N ic: number of instances
Nc + m having attribute value Ai
in class c
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   18
Example  of  Naïve  Bayes  Classifier

Name Give  Birth Can  Fly Live  in  Water Have  Legs Class
human yes no no yes mammals A:  attributes
python no no no no non-­mammals M:  mammals
salmon no no yes no non-­mammals
whale yes no yes no mammals N:  non-­mammals
frog no no sometimes yes non-­mammals
komodo no no no yes non-­mammals
6 6 2 2
bat yes yes no yes mammals P ( A | M ) = × × × = 0.06
pigeon
cat
no
yes
yes
no
no
no
yes
yes
non-­mammals
mammals
7 7 7 7
leopard  shark yes no yes no non-­mammals 1 10 3 4
turtle no no sometimes yes non-­mammals P ( A | N ) = × × × = 0.0042
penguin no no sometimes yes non-­mammals 13 13 13 13
porcupine yes no no yes mammals
eel no no yes no non-­mammals 7
salamander no no sometimes yes non-­mammals P ( A | M ) P ( M ) = 0.06 × = 0.021
gila  monster no no no yes non-­mammals 20
platypus no no no yes mammals
owl no yes no yes non-­mammals 13
dolphin yes no yes no mammals P ( A | N ) P ( N ) = 0.004 × = 0.0027
eagle no yes no yes non-­mammals 20

Give  Birth Can  Fly Live  in  Water Have  Legs Class P(A|M)P(M)  >  P(A|N)P(N)
yes no yes no ? =>  Mammals

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   19

Naïve  Bayes  (Summary)

● Robust  to  isolated  noise  points

● Handle  m issing  values  by  ignoring  the  instance  


during  probability  estimate  calculations

● Robust  to  irrelevant  attributes

● Independence  assumption  m ay  not  hold  for  some  


attributes
– Use  other  techniques  such  as  Bayesian  Belief  
Networks  (BBN)
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   20
Naïve  Bayes

● How  does  Naïve  B ayes  p erform  o n  the  following  d ataset?

Conditional  independence  of  attributes  is  v iolated

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   21

Naïve  Bayes

● How  does  Naïve  B ayes  p erform  o n  the  following  d ataset?

Naïve  Bayes  c an  c onstruct  oblique  decision  boundaries

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   22


Naïve  Bayes

● How  does  Naïve  B ayes  p erform  o n  the  following  d ataset?

Y  =  1 1 1 1 0

Y  =  2 0 1 0 0

Y  =  3 0 0 1 1

Y  =  4 0 0 1 1

X  =  1 X  =  2 X  =  3 X  =  4

Conditional  independence  of  attributes  is  v iolated

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   23

Bayesian  Belief  Networks

● Provides  graphical  r epresentation  of  probabilistic  


relationships  among  a  set  of  random  variables
● Consists  of:
– A  directed  acyclic  graph  (dag) A B
u Node  corresponds  to  a  variable
u Arc  corresponds  to  d ependence  
C
relationship  between  a  pair  of  variables

– A  probability  table  associating  each  node  to  its  


immediate  parent

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   24


Conditional  Independence

D
D  is  parent  of  C
A  is  child  of  C
C
B  is  descendant  of  D
D  is  ancestor  of  A

A B

● A  node  in  a  Bayesian  network  is  conditionally  


independent  of  all  of  its  nondescendants,  if  its  
parents  are  known
02/14/2018 Introduction   to   Data  Mining,  2nd Edition   25

Conditional  Independence

● Naïve  Bayes  assumption:

X1 X2 X3 X4 ... Xd

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   26


Probability  Tables

● If  X  does  not  have  any  parents,  table  contains  


prior  probability  P(X)
Y

● If  X  has  only  one  parent  ( Y),  table  contains  


conditional  probability  P(X|Y) X

● If  X  has  m ultiple  parents  ( Y1,  Y2,…,  Yk ),  table  


contains  conditional  probability  P(X|Y1,  Y2,…,  Yk )

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   27

Example  of  Bayesian  Belief  Network

Exercise=Yes 0.7 Diet=Healthy 0.25


Exercise=No 0.3 Diet=Unhealthy 0.75

Exercise Diet

E=Healthy   E=Healthy   E=Unhealthy   E=Unhealthy  


Heart   D=Yes D=No D=Yes D=No
Disease HD=Yes 0.25 0.45 0.55 0.75
HD=No 0.75 0.55 0.45 0.25

Blood
Chest  Pain
Pressure

  HD=Yes HD=No   HD=Yes HD=No


CP=Yes 0.8 0.01 BP=High 0.85 0.2
CP=No 0.2 0.99 BP=Low 0.15 0.8

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   28


Example  of  Inferencing  using  BBN

● Given:  X  =  (E=No,  D=Yes,  CP=Yes,  BP=High)


– Compute  P(HD|E,D,CP,BP)?
● P(HD=Yes|  E=No,D=Yes)  =  0.55
P(CP=Yes|  HD=Yes)  =  0.8
P(BP=High|  HD=Yes)  =  0.85
– P(HD=Yes|E=No,D=Yes,CP=Yes,BP=High)  
∝ 0.55  × 0.8  × 0.85  =  0 .374
Classify  X  
● P(HD=No|  E=No,D=Yes)  =  0.45 as  Yes
P(CP=Yes|  HD=No)  =  0.01
P(BP=High|  HD=No)  =  0.2
– P(HD=No|E=No,D=Yes,CP=Yes,BP=High)  
∝ 0.45  × 0.01  × 0.2  =  0 .0009

02/14/2018 Introduction   to   Data  Mining,  2nd Edition   29

You might also like