0% found this document useful (0 votes)
30 views23 pages

CS3491-AI ML-Chapter 3

CS3491-ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Uploaded by

Steephen Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views23 pages

CS3491-AI ML-Chapter 3

CS3491-ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Uploaded by

Steephen Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

INTRODUCTION TO

Machine
Learning
CHAPTER 3:

Bayesian Decision
Theory
Probability and Inference
 Result of tossing a coin is {Heads,Tails}
 Random var X {1,0}
Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X)
 Sample: X = {xt }Nt =1
Estimation: po = # {Heads}/#{Tosses} = ∑t xt / N
 Prediction of next toss:
Heads if po > ½, Tails otherwise

3
Classification
 Credit scoring: Inputs are income and savings.
Output is low-risk vs high-risk
 Input: x = [x1,x2]T ,Output: C  {0,1}
 Prediction:
C 1 if P (C 1 | x1,x2 )  0.5
choose 
C 0 otherwise
or equivalently
C 1 if P (C 1 | x1,x2 )  P (C 0 | x1,x2 )
choose 
C 0 otherwise

4
Bayes’ Rule
prior likelihood
posterior
P C p x | C 
P C | x 
p x

evidence

P C 0  P C 1 1
p x p x | C 1P C 1  p x | C 0P C 0
p C 0 | x  P C 1 | x 1

5
Bayes’ Rule: K>2 Classes
p x | C i P C i 
P C i | x 
p x
p x | C i P C i 
 K
 p x |C k P C k 
k 1

K
P C i  0 and  P C i  1
i 1

choose C i if P C i | x maxk P C k | x

6
Losses and Risks
 Actions: αi
 Loss of αi when the state is Ck : λik
 Expected risk (Duda and Hart, 1973)
K
R i | x   ik P C k | x
k 1

choosei if R i | x  min k R  k | x

7
Losses and Risks: 0/1 Loss
0 if i k
ik  
1 if i k
K
R i | x   ik P C k | x
k 1

  P C k | x
k i

1  P C i | x

For minimum risk, choose the most probable class

8
Losses and Risks: Reject
0 if i k

ik   if i  K  1, 0    1
1 otherwise

K
R  K 1 | x   P C k | x  
k 1

R i | x   P C k | x 1  P C i | x
k i

chooseC i if P C i | x  P C k | x k i and P C i | x  1  
reject otherwise

9
Discriminant Functions
chooseC i if gi x  maxk gk x gi x, i 1, , K

 R i | x

gi x  P C i | x
p x | C P C 
 i i

K decision regions R1,...,RK

R i x | gi x  maxk gk x

10
K=2 Classes
 Dichotomizer (K=2) vs Polychotomizer (K>2)
 g(x) = g1(x) – g2(x)

C 1 if g x  0
choose 
C 2 otherwise
 Log odds:
P C 1 | x
log
P C 2 | x

11
Utility Theory
 Prob of state k given exidence x: P (Sk|x)
 Utility of αi when state is k: Uik
 Expected utility:
EU i | x  U ik P Sk | x
k

Choose αi if EU i | x  max EU  j | x


j

12
Value of Information
 Expected utility using x only

EU x  max U ik P Sk | x


i
k

 Expected utility using x and new feature z

EU x,z   max U ik P Sk | x,z 


i
k

 z is useful if EU (x,z) > EU (x)

13
Bayesian Networks
 Aka graphical models, probabilistic networks
 Nodes are hypotheses (random vars) and the
prob corresponds to our belief in the truth of the
hypothesis
 Arcs are direct direct influences between
hypotheses
 The structure is represented as a directed acyclic
graph (DAG)
 The parameters are the conditional probs in the
arcs
 (Pearl, 1988, 2000; Jensen, 1996; Lauritzen,
1996)

14
Causes and Bayes’ Rule

Diagnostic inference:
diagnostic Knowing that the grass is wet,
what is the probability that rain is
causal the cause?

P W | R P R 
P R |W  
P W 
P W | R P R 

P W | R P R   P W |~R P ~ R 
0.9 0.4
 0.75
0.9 0.4  0.2 0.6

15
Causal vs Diagnostic Inference
Causal inference: If the
sprinkler is on, what is the
probability that the grass is wet?

P(W|S) = P(W|R,S) P(R|S) +


P(W|~R,S) P(~R|S)
= P(W|R,S) P(R) +
P(W|~R,S) P(~R)
= 0.95 0.4 + 0.9 0.6 = 0.92

Diagnostic inference: If the grass is wet, what is the probability


that the sprinkler is on? P(S|W) = 0.35 > 0.2 P(S)
P(S|R,W) = 0.21 Explaining away: Knowing that it has rained
decreases the probability that the sprinkler is on.
16
Bayesian Networks: Causes
Causal inference:
P(W|C) = P(W|R,S) P(R,S|C) +
P(W|~R,S) P(~R,S|C) +
P(W|R,~S) P(R,~S|C) +
P(W|~R,~S) P(~R,~S|C)

and use the fact that


P(R,S|C) = P(R|C) P(S|C)

Diagnostic: P(C|W ) = ?

17
Bayesian Nets: Local structure

P (F | C) = ?

P C ,S , R ,W , F   P C P S | C P R | C P W | S , R P F | R 
d
P X 1 , X d   P X i | parentsX i 
i 1 18
Bayesian Networks: Inference
P (C,S,R,W,F) = P (C) P (S|C) P (R|C) P (W|R,S) P (F|
R)

P (C,F) = ∑S ∑R ∑W P (C,S,R,W,F)

P (F|C) = P (C,F) / P(C) Not efficient!

Belief propagation (Pearl, 1988)


Junction trees (Lauritzen and Spiegelhalter, 1988)

19
Bayesian Networks:
Classification

Bayes’ rule inverts the arc:


diagnosti
c p x | C P C 
P C | x 
p x
P (C | x )

20
Naive Bayes’ Classifier

Given C, xj are independent:

p(x|C) = p(x1|C) p(x2|C) ... p(xd|C)

21
Influence Diagrams
decision node

chance node utility node

22
Association Rules
 Association rule: X  Y
 Support (X  Y):
#customers who bought X and Y 
P X ,Y  
#customers
 Confidence (X  Y):
P X ,Y 
P Y | X  
P( X )
#customers who bought X and Y 

#customers who bought X 
Apriori algorithm (Agrawal et al., 1996)
23

You might also like