CS3491-AI ML-Chapter 3
CS3491-AI ML-Chapter 3
Machine
Learning
CHAPTER 3:
Bayesian Decision
Theory
Probability and Inference
Result of tossing a coin is {Heads,Tails}
Random var X {1,0}
Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X)
Sample: X = {xt }Nt =1
Estimation: po = # {Heads}/#{Tosses} = ∑t xt / N
Prediction of next toss:
Heads if po > ½, Tails otherwise
3
Classification
Credit scoring: Inputs are income and savings.
Output is low-risk vs high-risk
Input: x = [x1,x2]T ,Output: C {0,1}
Prediction:
C 1 if P (C 1 | x1,x2 ) 0.5
choose
C 0 otherwise
or equivalently
C 1 if P (C 1 | x1,x2 ) P (C 0 | x1,x2 )
choose
C 0 otherwise
4
Bayes’ Rule
prior likelihood
posterior
P C p x | C
P C | x
p x
evidence
P C 0 P C 1 1
p x p x | C 1P C 1 p x | C 0P C 0
p C 0 | x P C 1 | x 1
5
Bayes’ Rule: K>2 Classes
p x | C i P C i
P C i | x
p x
p x | C i P C i
K
p x |C k P C k
k 1
K
P C i 0 and P C i 1
i 1
choose C i if P C i | x maxk P C k | x
6
Losses and Risks
Actions: αi
Loss of αi when the state is Ck : λik
Expected risk (Duda and Hart, 1973)
K
R i | x ik P C k | x
k 1
7
Losses and Risks: 0/1 Loss
0 if i k
ik
1 if i k
K
R i | x ik P C k | x
k 1
P C k | x
k i
1 P C i | x
8
Losses and Risks: Reject
0 if i k
ik if i K 1, 0 1
1 otherwise
K
R K 1 | x P C k | x
k 1
R i | x P C k | x 1 P C i | x
k i
chooseC i if P C i | x P C k | x k i and P C i | x 1
reject otherwise
9
Discriminant Functions
chooseC i if gi x maxk gk x gi x, i 1, , K
R i | x
gi x P C i | x
p x | C P C
i i
10
K=2 Classes
Dichotomizer (K=2) vs Polychotomizer (K>2)
g(x) = g1(x) – g2(x)
C 1 if g x 0
choose
C 2 otherwise
Log odds:
P C 1 | x
log
P C 2 | x
11
Utility Theory
Prob of state k given exidence x: P (Sk|x)
Utility of αi when state is k: Uik
Expected utility:
EU i | x U ik P Sk | x
k
12
Value of Information
Expected utility using x only
13
Bayesian Networks
Aka graphical models, probabilistic networks
Nodes are hypotheses (random vars) and the
prob corresponds to our belief in the truth of the
hypothesis
Arcs are direct direct influences between
hypotheses
The structure is represented as a directed acyclic
graph (DAG)
The parameters are the conditional probs in the
arcs
(Pearl, 1988, 2000; Jensen, 1996; Lauritzen,
1996)
14
Causes and Bayes’ Rule
Diagnostic inference:
diagnostic Knowing that the grass is wet,
what is the probability that rain is
causal the cause?
P W | R P R
P R |W
P W
P W | R P R
P W | R P R P W |~R P ~ R
0.9 0.4
0.75
0.9 0.4 0.2 0.6
15
Causal vs Diagnostic Inference
Causal inference: If the
sprinkler is on, what is the
probability that the grass is wet?
Diagnostic: P(C|W ) = ?
17
Bayesian Nets: Local structure
P (F | C) = ?
P C ,S , R ,W , F P C P S | C P R | C P W | S , R P F | R
d
P X 1 , X d P X i | parentsX i
i 1 18
Bayesian Networks: Inference
P (C,S,R,W,F) = P (C) P (S|C) P (R|C) P (W|R,S) P (F|
R)
P (C,F) = ∑S ∑R ∑W P (C,S,R,W,F)
19
Bayesian Networks:
Classification
20
Naive Bayes’ Classifier
21
Influence Diagrams
decision node
22
Association Rules
Association rule: X Y
Support (X Y):
#customers who bought X and Y
P X ,Y
#customers
Confidence (X Y):
P X ,Y
P Y | X
P( X )
#customers who bought X and Y
#customers who bought X
Apriori algorithm (Agrawal et al., 1996)
23