AI Learning
AI Learning
LESSON 13-14
Reading
Chapter 18
Chapter 20
2. Statistical Learning
◦ Parameter Estimation:
◦ Maximum Likelihood (ML); Maximum A Posteriori (MAP); Bayesian; Continuous case
◦ Learning Parameters for a Bayesian Network
◦ Naive Bayes
◦ Maximum Likelihood estimates; Priors
◦ Learning Structure of Bayesian Networks
Type of feedback:
◦ Supervised learning: correct answers for each example
◦ Unsupervised learning: correct answers not given
◦ Reinforcement learning: occasional rewards
Trivially, there is a consistent decision tree for any training set with one path to leaf for
each example (unless f nondeterministic in x) but it probably won't generalize to new
examples
p n
IG( A) I ( , ) remainder( A)
pn pn
Choose the attribute with the largest IG
Patrons has the highest IG of all attributes and so is chosen by the DTL
algorithm as the root
Substantially simpler than “true” tree---a more complex hypothesis isn’t justified by
small amount of data
503043 - Learning
3/21/2016 28
Coin Flip
C1 C2 C3
C1 C2 C3
C1 C2 C3
C1 C2 C3
C1 C2 C3
C1 C2 C3
C2
P(H|C2) = 0.5
P(C2) = 1/3
3/21/2016 503043 - LEARNING 38
Using Prior Knowledge
Should we always use a Uniform Prior ?
Background knowledge:
◦ Heads => we have take-home midterm
◦ Dan doesn’t like take-homes…
◦ => Dan is more likely to use a coin biased in his favor
C1 C2 C3
C1 C2 C3
C1 C2 C3
C1 C2 C3
C1 C2 C3
C1 C2 C3
C1 C2 C3
C1 C2 C3
C1 C2 C3
P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9
= 0.680
C1 C2 C3
P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9
3/21/2016 503043 - LEARNING 51
Comparison
After more experiments: HTH8
ML (Maximum Likelihood):
P(H) = 0.5
after 10 experiments: P(H) = 0.9
Bayesian:
P(H) = 0.68
after 10 experiments: P(H) = 0.9
Prior Hypothesis
In general,
◦we have to pick from a continuous distribution
◦of biased coins
Continuous Case
2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
uniform
1 1
0 0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
3 3 3
with background
2
knowledge 2 2
1
1 1
0
0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
w/ uniform prior
ML Estimate 1
MAP Estimate
Bayesian Estimate 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
2
with background
knowledge
1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
w/ uniform prior
ML Estimate
2
MAP Estimate 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Bayesian Estimate -1
with background
3
knowledge
2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-1
A
A B
True
B
63
A AB
B
True
BC
3/21/2016 503043 - LEARNING 65
P( E | H ) P( H )
Bayes Rule P( H | E )
P( E )
P( H E )
P( H | E ) (Def. cond. prob.)
P( E )
P( H E )
P( E | H ) (Def. cond. prob.)
P( H )
P ( H E ) P ( E | H ) P ( H ) (Mult by P(H) in line 1)
P( E | H ) P( H )
QED: P( H | E ) (Substitute #3 in #2)
P( E )
Pr(A|E,B)
e,b 0.9 (0.1)
e,b 0.2 (0.8)
Radio Alarm
e,b 0.85 (0.15)
e,b 0.01 (0.99)
Nbr1Calls Nbr2Calls
503043 - Learning
3/21/2016 67
Given Parents, X is Independent of
Non-Descendants
E B R A J M
T F T T F T
F F F F F T
F T F T T T
F F F T T T
F T F F F F
We have: ...
- Bayes Net structure and observations
- We need: Bayes Net parameters
3/21/2016 503043 - LEARNING 70
Parameter Estimation and Bayesian
Networks
E B R A J M
T F T T F T
F F F F F T
F T F T T T
F F F T T T
F T F F F F
...
Prior
25
20
20
18
16
Now compute
P(B) = ? + data = either MAP or
14
15
12
10
10
8
4
5
Bayesian estimate
0
0 0.2 0.4 0.6 0.8 1
0
0 0.2 0.4 0.6 0.8 1 -2
-5
P(A|E,¬B) = ? 2
P(A|¬E,B) = ? 1
+ data= 1
P(A|¬E,¬B) = ? 0
0 0.2 0.4 0.6 0.8 1
0
0 0.2 0.4 0.6 0.8 1
Earthqk Burgl
Spam
Alarm
Nigeria Sex Nude
N1 N2
503043 - Learning
3/21/2016 74
What if we don’t know structure?
Learning The Structure
of Bayesian Networks
Search thru the space…
◦ of possible network structures!
◦ (for now, assume we observe all variables)
For each structure, learn parameters
Pick the one that fits observed data best
◦ Caveat – won’t we end up fully connected????
Problem?
Exponential number of networks!
And we need to learn parameters for each!
Exhaustive search out of the question!
So what now?
Local search!
◦ Start with some network structure
◦ Try to make a change
◦ (add or delete or reverse edge)
◦ See if the new network is any better
503043 - Learning
3/21/2016 80
The Big Picture
We described how to do MAP (and ML) learning of a Bayes net
(including structure)