Open navigation menu

Scribd

0% found this document useful (0 votes)

29 views81 pages

AI Learning

The document discusses topics related to machine learning, including inductive learning methods like decision tree learning using information gain, and statistical learning topics such as parameter estimation, naive Bayes classification, and learning Bayesian networks. It provides examples and explanations of key concepts in machine learning like constructing hypotheses from examples and measuring learning performance on test data.

Uploaded by

Nguyễn Thị Mỹ Hân

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views81 pages

AI Learning

The document discusses topics related to machine learning, including inductive learning methods like decision tree learning using information gain, and statistical learning topics such as parameter estimation, naive Bayes classification, and learning Bayesian networks. It provides examples and explanations of key concepts in machine learning like constructing hypotheses from examples and measuring learning performance on test data.

Uploaded by

Nguyễn Thị Mỹ Hân

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

Learning

LESSON 13-14
Reading
Chapter 18
Chapter 20

3/21/2016 503043 - LEARNING 2

Outline
1. Inductive Learning
• Learning agents
• Inductive learning
• Decision tree learning

2. Statistical Learning
◦ Parameter Estimation:
◦ Maximum Likelihood (ML); Maximum A Posteriori (MAP); Bayesian; Continuous case
◦ Learning Parameters for a Bayesian Network
◦ Naive Bayes
◦ Maximum Likelihood estimates; Priors
◦ Learning Structure of Bayesian Networks

3/21/2016 503043 - LEARNING 3

Learning
Learning is essential for unknown environments,
◦ i.e., when designer lacks omniscience

Learning is useful as a system construction method,

◦ i.e., expose the agent to reality rather than trying to write
it down

Learning modifies the agent's decision mechanisms

to improve performance

3/21/2016 503043 - LEARNING 4

Learning agents

3/21/2016 503043 - LEARNING 5

Learning element
Design of a learning element is affected by
◦ Which components of the performance element are to be
learned
◦ What feedback is available to learn these components
◦ What representation is used for the components

Type of feedback:
◦ Supervised learning: correct answers for each example
◦ Unsupervised learning: correct answers not given
◦ Reinforcement learning: occasional rewards

3/21/2016 503043 - LEARNING 6

Inductive learning
Simplest form: learn a function from examples

f is the target function

An example is a pair (x, f(x))

Problem: find a hypothesis h

such that h ≈ f
given a training set of examples

(This is a highly simplified model of real learning:

◦ Ignores prior knowledge
◦ Assumes examples are given)
◦

3/21/2016 503043 - LEARNING 7

Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

3/21/2016 503043 - LEARNING 8

Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

3/21/2016 503043 - LEARNING 9

Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

3/21/2016 503043 - LEARNING 10

Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

3/21/2016 503043 - LEARNING 11

Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

3/21/2016 503043 - LEARNING 12

Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)

E.g., curve fitting:

Ockham’s razor: prefer the simplest hypothesis consistent with data

3/21/2016 503043 - LEARNING 13

Learning decision trees
Problem: decide whether to wait for a table at a restaurant,
based on the following attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)

3/21/2016 503043 - LEARNING 14

Attribute-based
representations
Examples described by attribute values (Boolean, discrete, continuous)
E.g., situations where I will/won't wait for a table:

Classification of examples is positive (T) or negative (F)

3/21/2016 503043 - LEARNING 15

Decision trees
One possible representation for hypotheses E.g., here is the “true” tree
for deciding whether to wait:

3/21/2016 503043 - LEARNING 16

Expressiveness
Decision trees can express any function of the input attributes.
E.g., for Boolean functions, truth table row → path to leaf:

Trivially, there is a consistent decision tree for any training set with one path to leaf for
each example (unless f nondeterministic in x) but it probably won't generalize to new
examples

Prefer to find more compact decision trees

3/21/2016 503043 - LEARNING 17
Hypothesis spaces
How many distinct decision trees with n Boolean attributes?
= number of Boolean functions
n
= number of distinct truth tables with 2n rows = 22

E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616

trees

3/21/2016 503043 - LEARNING 18

Hypothesis spaces
How many distinct decision trees with n Boolean attributes?
= number of Boolean functions
n
= number of distinct truth tables with 2n rows = 22

E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees

How many purely conjunctive hypotheses (e.g., Hungry  Rain)?

Each attribute can be in (positive), in (negative), or out
 3n distinct conjunctive hypotheses
More expressive hypothesis space
◦ increases chance that target function can be expressed
◦ increases number of hypotheses consistent with training set
 may get worse predictions

3/21/2016 503043 - LEARNING 19

Decision tree learning
Aim: find a small tree consistent with the training examples
Idea: (recursively) choose "most significant" attribute as root of
(sub)tree

3/21/2016 503043 - LEARNING 20

Choosing an attribute
Idea: a good attribute splits the examples into subsets that
are (ideally) "all positive" or "all negative"

Patrons? is a better choice

3/21/2016 503043 - LEARNING 21

Using information theory
To implement Choose-Attribute in the DTL
algorithm
Information Content (Entropy):
I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi)
For a training set containing p positive examples
and n negative examples:
p n p p n n
I( , ) log 2  log 2
pn pn pn pn pn pn

3/21/2016 503043 - LEARNING 22

Information gain
A chosen attribute A divides the training set E into subsets E1, … ,
Ev according to their values for A, where A has v distinct values.
v
p i  ni pi ni
remainder( A)   I( , )
i 1 p  n pi  ni pi  ni
Information Gain (IG) or reduction in entropy from the attribute
test:

p n
IG( A)  I ( , )  remainder( A)
pn pn
Choose the attribute with the largest IG

3/21/2016 503043 - LEARNING 23

Information gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit

Consider the attributes Patrons and Type (and others too):

2 4 6 2 4
IG( Patrons)  1  [ I (0,1)  I (1,0)  I ( , )]  .0541 bits
12 12 12 6 6
2 1 1 2 1 1 4 2 2 4 2 2
IG(Type)  1  [ I ( , )  I ( , )  I ( , )  I ( , )]  0 bits
12 2 2 12 2 2 12 4 4 12 4 4

Patrons has the highest IG of all attributes and so is chosen by the DTL
algorithm as the root

3/21/2016 503043 - LEARNING 24

Example contd.
Decision tree learned from the 12 examples:

Substantially simpler than “true” tree---a more complex hypothesis isn’t justified by
small amount of data

3/21/2016 503043 - LEARNING 25

Performance measurement
How do we know that h ≈ f ?
1. Use theorems of computational/statistical learning theory
2. Try h on a new test set of examples
(use same distribution over example space as training set)

Learning curve = % correct on test set as a function of training set size

3/21/2016 503043 - LEARNING 26

Summary 1
Learning needed for unknown environments, lazy
designers
Learning agent = performance element + learning
element
For supervised learning, the aim is to find a simple
hypothesis approximately consistent with training
examples
Decision tree learning using information gain
Learning performance = prediction accuracy measured
on test set

3/21/2016 503043 - LEARNING 27

Statistical Learning
Parameter Estimation:
◦ Maximum Likelihood (ML)
◦ Maximum A Posteriori (MAP)
◦ Bayesian
◦ Continuous case

Learning Parameters for a Bayesian Network

Naive Bayes
◦ Maximum Likelihood estimates
◦ Priors

Learning Structure of Bayesian Networks

503043 - Learning
3/21/2016 28
Coin Flip
C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

Which coin will I use?

P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3

Prior: Probability of a hypothesis

before we make any observations
3/21/2016 503043 - LEARNING 29
Coin Flip
C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

Which coin will I use?

P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
Uniform Prior: All hypothesis are equally likely
before we make any observations
3/21/2016 503043 - LEARNING 30
Experiment 1: Heads
Which coin did I use?
P(C1|H) = ? P(C2|H) = ? P(C3|H) = ?

C1 C2 C3

P(H|C1)=0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1)=1/3 P(C2) = 1/3 P(C3) = 1/3

3/21/2016 503043 - LEARNING 31
Experiment 1: Heads
Which coin did I use?
P(C1|H) = 0.066 P(C2|H) = 0.333 P(C3|H) = 0.6

Posterior: Probability of a hypothesis given data

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
3/21/2016 503043 - LEARNING 32
Terminology
Prior:
◦ Probability of a hypothesis before we see any data
Uniform Prior:
◦ A prior that makes all hypothesis equaly likely
Posterior:
◦ Probability of a hypothesis after we saw some data
Likelihood:
◦ Probability of data given hypothesis

3/21/2016 503043 - LEARNING 33

Experiment 2: Tails
Which coin did I use?
P(C1|HT) = ? P(C2|HT) = ? P(C3|HT) = ?

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
3/21/2016 503043 - LEARNING 34
Experiment 2: Tails
Which coin did I use?
P(C1|HT) = 0.21 P(C2|HT) = 0.58 P(C3|HT) = 0.21

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
3/21/2016 503043 - LEARNING 35
Experiment 2: Tails
Which coin did I use?
P(C1|HT) = 0.21P(C2|HT) = 0.58 P(C3|HT) = 0.21

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3

3/21/2016 503043 - LEARNING 36

Your Estimate?
What is the probability of heads after two experiments?

Most likely coin: Best estimate for P(H)

C2 P(H|C2) = 0.5

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 1/3 P(C2) = 1/3 P(C3) = 1/3
3/21/2016 503043 - LEARNING 37
Your Estimate?
Maximum Likelihood Estimate: The best hypothesis
that fits observed data assuming uniform prior

Most likely coin: Best estimate for P(H)

C2 P(H|C2) = 0.5

C2

P(H|C2) = 0.5
P(C2) = 1/3
3/21/2016 503043 - LEARNING 38
Using Prior Knowledge
Should we always use a Uniform Prior ?
Background knowledge:
◦ Heads => we have take-home midterm
◦ Dan doesn’t like take-homes…
◦ => Dan is more likely to use a coin biased in his favor

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

3/21/2016 503043 - LEARNING 39
Using Prior Knowledge
We can encode it in the prior:

P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

3/21/2016 503043 - LEARNING 40
Experiment 1: Heads
Which coin did I use?
P(C1|H) = ? P(C2|H) = ? P(C3|H) = ?

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70

3/21/2016 503043 - LEARNING 41

Experiment 1: Heads
Which coin did I use?
P(C1|H) = 0.006 P(C2|H) = 0.165 P(C3|H) = 0.829
Compare with ML posterior after Exp 1:
P(C1|H) = 0.066 P(C2|H) = 0.333 P(C3|H) = 0.600
C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
3/21/2016 503043 - LEARNING 42
Experiment 2: Tails
Which coin did I use?

P(C1|HT) = ? P(C2|HT) = ? P(C3|HT) = ?

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
3/21/2016 503043 - LEARNING 43
Experiment 2: Tails
Which coin did I use?
P(C1|HT) = 0.035P(C2|HT) = 0.481P(C3|HT) = 0.485

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
3/21/2016 503043 - LEARNING 44
Experiment 2: Tails
Which coin did I use?
P(C1|HT) = 0.035 P(C2|HT)=0.481 P(C3|HT) = 0.485

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
3/21/2016 503043 - LEARNING 45
Your Estimate?
What is the probability of heads after two experiments?

Most likely coin: Best estimate for P(H)

C3 P(H|C3) = 0.9

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

P(C1) = 0.05 P(C2) = 0.25 P(C3) = 0.70
3/21/2016 503043 - LEARNING 46
Your Estimate?
Maximum A Posteriori (MAP) Estimate:
The best hypothesis that fits observed data
assuming a non-uniform prior

Most likely coin:

C3
C3

Best estimate for P(H) P(H|C3) = 0.9

P(C3) = 0.70
P(H|C3) = 0.9

3/21/2016 503043 - LEARNING 47

Did We Do The Right Thing?
P(C1|HT)=0.035 P(C2|HT)=0.481 P(C3|HT)=0.485

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

3/21/2016 503043 - LEARNING 48

Did We Do The Right Thing?
P(C1|HT) =0.035 P(C2|HT)=0.481 P(C3|HT)=0.485
C2 and C3 are almost
equally likely

C1 C2 C3

P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

3/21/2016 503043 - LEARNING 49

A Better Estimate
Recall: = 0.680

P(C1|HT)=0.035 P(C2|HT)=0.481 P(C3|HT)=0.485

C1 C2 C3
P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9

3/21/2016 503043 - LEARNING 50

Bayesian Estimate
Bayesian Estimate: Minimizes prediction error,
given data and (generally) assuming a non-uniform prior

= 0.680

P(C1|HT)=0.035 P(C2|HT)=0.481 P(C3|HT)=0.485

C1 C2 C3
P(H|C1) = 0.1 P(H|C2) = 0.5 P(H|C3) = 0.9
3/21/2016 503043 - LEARNING 51
Comparison
After more experiments: HTH8

ML (Maximum Likelihood):
P(H) = 0.5
after 10 experiments: P(H) = 0.9

MAP (Maximum A Posteriori):

P(H) = 0.9
after 10 experiments: P(H) = 0.9

Bayesian:
P(H) = 0.68
after 10 experiments: P(H) = 0.9

3/21/2016 503043 - LEARNING 52

Comparison
ML (Maximum Likelihood):
Easy to compute
MAP (Maximum A Posteriori):
Still easy to compute
Incorporates prior knowledge
Bayesian:
Minimizes error => great when data is scarce
Potentially much harder to compute

3/21/2016 503043 - LEARNING 53

Summary For Now

Prior Hypothesis

Maximum Likelihood Uniform The most likely

Estimate
Maximum A Any The most likely
Posteriori Estimate
Weighted
Bayesian Estimate Any
combination

3/21/2016 503043 - LEARNING 54

Continuous Case
In the previous example,
◦we chose from a discrete set of three coins

In general,
◦we have to pick from a continuous distribution
◦of biased coins

3/21/2016 503043 - LEARNING 55

Continuous Case

3/21/2016 503043 - LEARNING 56

3

Continuous Case
2

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

3/21/2016 503043 - LEARNING 57

Continuous Case
Prior Exp 1: Heads Exp 2: Tails
2 3 2

uniform
1 1

0 0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

3 3 3

with background
2
knowledge 2 2

1
1 1

0
0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

3/21/2016 503043 - LEARNING 58

Continuous Case
Posterior after 2 experiments:
2

w/ uniform prior
ML Estimate 1

MAP Estimate
Bayesian Estimate 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

2
with background
knowledge
1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

3/21/2016 503043 - LEARNING 59

After 10 Experiments...
Posterior:
5

w/ uniform prior
ML Estimate
2

MAP Estimate 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Bayesian Estimate -1

with background
3

knowledge
2

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-1

3/21/2016 503043 - LEARNING 60

After 100 Experiments...

3/21/2016 503043 - LEARNING 61

Topics
Parameter Estimation:
◦ Maximum Likelihood (ML)
◦ Maximum A Posteriori (MAP)
◦ Bayesian
◦ Continuous case

Learning Parameters for a Bayesian Network

Naive Bayes
◦ Maximum Likelihood estimates
◦ Priors

Learning Structure of Bayesian Networks

3/21/2016 503043 - LEARNING 62

Review: Conditional
Probability
P(A | B) is the probability of A given B
Assumes that B is the only info known.
Defined by: P( A  B)
P( A | B) 
P( B)

A
A B
True

B
63

3/21/2016 503043 - LEARNING

Conditional Independence
A&B not independent, since P(A|B) < P(A)

A AB

B
True

3/21/2016 503043 - LEARNING 64

Conditional Independence
But: A&B are made independent by C

A AB AC P(A|C) =

P(A|B,C)
B
True

BC
3/21/2016 503043 - LEARNING 65
P( E | H ) P( H )
Bayes Rule P( H | E ) 
P( E )

Simple proof from def of conditional probability:

P( H  E )
P( H | E )  (Def. cond. prob.)
P( E )
P( H  E )
P( E | H )  (Def. cond. prob.)
P( H )
P ( H  E )  P ( E | H ) P ( H ) (Mult by P(H) in line 1)

P( E | H ) P( H )
QED: P( H | E )  (Substitute #3 in #2)
P( E )

3/21/2016 503043 - LEARNING 66

An Example Bayes Net
Pr(B=t) Pr(B=f)
Earthquake Burglary 0.05 0.95

Pr(A|E,B)
e,b 0.9 (0.1)
e,b 0.2 (0.8)
Radio Alarm
e,b 0.85 (0.15)
e,b 0.01 (0.99)

Nbr1Calls Nbr2Calls

503043 - Learning
3/21/2016 67
Given Parents, X is Independent of
Non-Descendants

3/21/2016 503043 - LEARNING 68

Given Markov Blanket, X is Independent of All
Other Nodes

MB(X) = Par(X)  Childs(X)  Par(Childs(X))

3/21/2016 503043 - LEARNING 69

Parameter Estimation and Bayesian
Networks

E B R A J M
T F T T F T
F F F F F T
F T F T T T
F F F T T T
F T F F F F
We have: ...
- Bayes Net structure and observations
- We need: Bayes Net parameters
3/21/2016 503043 - LEARNING 70
Parameter Estimation and Bayesian
Networks
E B R A J M
T F T T F T
F F F F F T
F T F T T T
F F F T T T
F T F F F F
...

Prior
25

20
20

18

16
Now compute
P(B) = ? + data = either MAP or
14
15
12

10

10
8

4
5

Bayesian estimate
0
0 0.2 0.4 0.6 0.8 1
0
0 0.2 0.4 0.6 0.8 1 -2

-5

3/21/2016 503043 - LEARNING 71

Parameter Estimation and Bayesian
Networks
E B R A J M
T F T T F T
F F F F F T
F T F T T T
F F F T T T
F T F F F F
...
P(A|E,B) = ?
P(A|E,¬B) = ?
P(A|¬E,B) = ?
P(A|¬E,¬B) = ?
3/21/2016 503043 - LEARNING 72
Parameter Estimation and Bayesian
Networks
E B R A J M
T F T T F T
F F F F F T
F T F T T T
F F F T T T
F T F F F F
...
P(A|E,B) = ? Prior 2

P(A|E,¬B) = ? 2

P(A|¬E,B) = ? 1
+ data= 1

P(A|¬E,¬B) = ? 0
0 0.2 0.4 0.6 0.8 1
0
0 0.2 0.4 0.6 0.8 1

3/21/2016 503043 - LEARNING 73

Recap
Given a BN structure (with discrete or continuous variables), we can learn the
parameters of the conditional prop tables.

Earthqk Burgl

Spam

Alarm
Nigeria Sex Nude

N1 N2
503043 - Learning
3/21/2016 74
What if we don’t know structure?
Learning The Structure
of Bayesian Networks
Search thru the space…
◦ of possible network structures!
◦ (for now, assume we observe all variables)
For each structure, learn parameters
Pick the one that fits observed data best
◦ Caveat – won’t we end up fully connected????

When scoring, add a penalty

 model complexity
3/21/2016 503043 - LEARNING 76
Learning The Structure
of Bayesian Networks
Search thru the space
For each structure, learn parameters
Pick the one that fits observed data best

Problem?
Exponential number of networks!
And we need to learn parameters for each!
Exhaustive search out of the question!
So what now?

3/21/2016 503043 - LEARNING 77

Learning The Structure
of Bayesian Networks

Local search!
◦ Start with some network structure
◦ Try to make a change
◦ (add or delete or reverse edge)
◦ See if the new network is any better

◦What should be the initial state?

3/21/2016 503043 - LEARNING 78

Initial Network Structure?
Uniform prior over random networks?

Network which reflects expert knowledge?

3/21/2016 503043 - LEARNING 79

Learning BN Structure

503043 - Learning
3/21/2016 80
The Big Picture
We described how to do MAP (and ML) learning of a Bayes net
(including structure)

How would Bayesian learning (of BNs) differ?

Find all possible networks

Calculate their posteriors
When doing inference, return weighed
combination of predictions from all
networks!
3/21/2016 503043 - LEARNING 81

You might also like

Planning, Organizing Controlling, in Educational Management
0% (1)
Planning, Organizing Controlling, in Educational Management
18 pages
French 8 Grammar PDF
No ratings yet
French 8 Grammar PDF
11 pages
B 3
No ratings yet
B 3
6 pages
Learning and Teaching Strategies Preferences of Btled Students During The New Normal of Education
No ratings yet
Learning and Teaching Strategies Preferences of Btled Students During The New Normal of Education
26 pages
Kechari Mudra
100% (1)
Kechari Mudra
3 pages
Styles of Citation
No ratings yet
Styles of Citation
4 pages
Solid Waste Engineering A Global Perspective 3rd Edition Worrell
No ratings yet
Solid Waste Engineering A Global Perspective 3rd Edition Worrell
310 pages
Semi Detailed Lesson Plan in Measure of Central Tendency
No ratings yet
Semi Detailed Lesson Plan in Measure of Central Tendency
3 pages
TKC Biodata
0% (1)
TKC Biodata
39 pages
Texas Hospitals' CMS Overall Quality Ratings
No ratings yet
Texas Hospitals' CMS Overall Quality Ratings
9 pages
1 Pengantar Manajemen Mutu - 2019
100% (1)
1 Pengantar Manajemen Mutu - 2019
76 pages
Gestalt Therapy 100 Key Points and Techniques 2nd Edition ISBN 1138067725, 9781138067721 All Sections Download
No ratings yet
Gestalt Therapy 100 Key Points and Techniques 2nd Edition ISBN 1138067725, 9781138067721 All Sections Download
14 pages
Examples Regression
No ratings yet
Examples Regression
19 pages
Private Hotel Management Colleges in Delhi NCR
No ratings yet
Private Hotel Management Colleges in Delhi NCR
4 pages
Lesson 7 Function of Communication
No ratings yet
Lesson 7 Function of Communication
45 pages
2.0 LP - Cheer Dance Hand Movements and Feet Positions
No ratings yet
2.0 LP - Cheer Dance Hand Movements and Feet Positions
3 pages
He Leading Parallel Luster File Sy Stem: WWW - Beegfs.io
No ratings yet
He Leading Parallel Luster File Sy Stem: WWW - Beegfs.io
8 pages
Open House Parent Quiz
No ratings yet
Open House Parent Quiz
2 pages
Aieee CCB Spot Round
No ratings yet
Aieee CCB Spot Round
4 pages
VLJ Training Guidelines
No ratings yet
VLJ Training Guidelines
14 pages
Toilet Drawing
No ratings yet
Toilet Drawing
1 page
Project Template
No ratings yet
Project Template
11 pages
SOCI
No ratings yet
SOCI
14 pages
M6 - LINKAGES AND NETWORKING WITH ORGANIZATIONS (Responded)
No ratings yet
M6 - LINKAGES AND NETWORKING WITH ORGANIZATIONS (Responded)
3 pages
Review On Remote Sensing Methods For Landslide Detection Using Machine and Deep Learning
No ratings yet
Review On Remote Sensing Methods For Landslide Detection Using Machine and Deep Learning
24 pages
Applsci 12 09820 v2
No ratings yet
Applsci 12 09820 v2
15 pages
WRIT 200 Course Outline LAS 2014-2015
No ratings yet
WRIT 200 Course Outline LAS 2014-2015
10 pages
Tran-Eportfolio Clinical Exemplar
No ratings yet
Tran-Eportfolio Clinical Exemplar
5 pages
Using An Inquiry Approach To Teach Science To Seco
No ratings yet
Using An Inquiry Approach To Teach Science To Seco
7 pages
Past Simple Activity Cards
No ratings yet
Past Simple Activity Cards
5 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)