0% found this document useful (0 votes)

42 views58 pages

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation

This document provides an introduction to Bayesian networks. It discusses how Bayesian networks can model complex probability distributions through local conditional probabilities rather than requiring specification of a joint probability distribution. The key components of Bayesian networks are defined, including nodes to represent variables, directed edges to represent dependencies, and conditional probability tables. Methods for performing inference in Bayesian networks are also introduced, including marginalization, conditioning, and propagating information along edges in the network.

Uploaded by

Aneez Shrestha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views58 pages

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation

Uploaded by

Aneez Shrestha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Bayesian Networks: Construction,

Inference, Learning and Causal

Interpretation
Volker Tresp
Summer 2014

1
Introduction

• So far we were mostly concerned with supervised learning: we predicted one or several
target variables based on information on one or several input variables

• If applicable, supervised learning is often the best solution and powerful models such
as Neural Networks and SVMs are available

• But there are cases where supervised learning is not applicable: when there is not one
target variable of interest but many, or when in each data point different variables
might be available or missing

• Typical example: medical domain with many kinds of diseases, symptoms, and context
information: for a given patient little is known and one is interested in the prediction
of many possible diseases

• Bayesian networks can deal with these challenges, which is the reason for their popu-
larity in probabilistic reasoning and machine learning

2
Bayes Nets

• Deterministic rule-based systems were the dominating approach during the first phase
of AI. A major problem was that deterministic rules cannot deal with uncertainty and
ambiguity

• Probability was rejected since a naive representation would require 2M parameters;

how should all these numbers be specified?

• In Bayes nets one constructs a high-dimensional probability distribution based on a

set of local probabilistic rules

• Bayes nets are closely related to a causal world model

• Bayes Nets started as a small community and then developed to one of the main
approaches in AI

• They also have a great influence on machine learning

3
Definition of a Bayes Net

• The random variables in a domain are displayed as nodes (vertices)

• Directed links (arcs, edges) represent direct (causal) dependencies between parent
node and child node

• Quantification of the dependency:

– For nodes without parents one specifies a priori probabilities

P (A = i) ∀i

– For nodes with parents, one specifies conditional probabilities. E.g., for two parents

P (A = i|B = j, C = k) ∀i, j, k

4
Joint Probability Distribution

• A Bayes net specifies a probability distribution in the form

M
Y
P (X1, . . . XM ) = P (Xi|par(Xi))
i=1
where par(Xi) is the set of parent nodes. This set can be zero.

5
Factorization of Probability Distributions

• Let’s start with the factorization of a probability distribution (see review on Probability
Theory)
M
Y
P (X1, . . . XM ) = P (Xi|X1, . . . , Xi−1)
i=1

• This decomposition can be done with an arbitrary ordering of variables; each variable
is conditioned on all predecessor variables

• The dependencies can be simplified if a variable does not depend on all of its prede-
cessors
P (Xi|X1, . . . , Xi−1) = P (Xi|par(Xi))
with
par(Xi) ⊆ X1, . . . , Xi−1

6
Causal Ordering

• When the ordering of the variables corresponds to a causal ordering, we obtain a causal
probabilistic network

• A decomposition obeying the causal ordering typically yields a representation with the
smallest number of parent variables

• Deterministic and probabilistic causality: it might be that the underlying true model is
deterministic causal. The probabilistic model leaves out many influential factors. The
assumption is that the un-modeled factors should only significantly influence individual
nodes (and thus appear as noise), but NOT pairs or larger sets of variables (which
would induce dependencies)!

7
Design of a Bayes Net

• The expert needs to be clear about the important variables in the domain

• The expert must indicate direct causal dependencies by specifying the directed link in
the net

• The expert needs to quantify the causal dependencies: define the conditional proba-
bility tables

8
Inference

• The most important important operation is inference: given that the state a set of
random variables is known, what is the probability distribution of one or several of the
remaining variables

• Let X be the set of random variables. Let X m ⊆ X be the set of known (measured)
variables and let X q ∈ X \X m be the variable of interest and let X r = X \(X m ∪
X q ) be the set of remaining variables

9
Inference: Marginalization and Conditioning

• In the simplest inference approach one proceeds as follows :

– We calculate the probability distribution of the known variables and the query
variable via marginalization
X
q m
P (X , X ) = P (X1, . . . XM )
Xr

– The normalization is calculated as

X
P (X m) = P (X q , X m)
Xq

– Calculation of the conditional probability distributions

q m P (X q , X m)
P (X |X ) =
P (X m)

10
Inference in Bayes nets without Cycles in Undirected Net

• By construction there are no cycles in the directed net; the structure of a Bayesian
net is a directed acyclic graph (DAG)

• But there might be cycles when one ignores the directions

• Let’s first consider the simpler case without cycles in the undirected graph; the struc-
ture of the the Bayes net is a poly-tree: there is at most one directed path between
two nodes

11
Marginalization

• The marginalization can be computationally expensive: the sum contains exponentially

r
many terms; for binary variables 2|X |

• The goal is now to exploit the structure of the net to calculate the marginalization
efficient

12
Example: Markov Chain

• Consider the Markov chain of length M with binary variables where the first variable
X1 and the last variable XM are known

• We can apply the iterative formula for propagating information on X1 down the chain
X X
π(Xi) = P (Xi|X1) = P (Xi, Xi−1|X1) = P (Xi|Xi−1)P (Xi−1|X1)
Xi−1 Xi−1

• We can apply the iterative formula for propagating information on XM up the chain
X
λ(Xi) = P (XM |Xi) = P (XM , Xi+1|Xi)
Xi+1
X
= P (Xi+1|Xi)P (XM |Xi+1)
Xi+1

13
Posterior Probability

• Then we can calculate for any Xi

π(Xi = 1)λ(Xi = 1)
=
π(Xi = 1)λ(Xi = 1) + π(Xi = 0)λ(Xi = 0)

14
Propagation in Polytrees

• Inference in polytrees can be performed in a similar (but more complex) way

• π and λ propagation is generalized to polytrees

15
Max-Propagation

• With similar efficiency, one can calculate the maximum likely configuration (Max-
product Rule)

r
Xmax = arg max P (X r , X m)
Xr

16
Hidden Markov Models

• Hidden Markov Models (HMMs) are the basis of modern speech recognition systems.
An HMM is a Bayesian network with latent variables

• States corresponds to phonemes; measurements correspond to the acoustic spectrum

• The HMM contains the transition probability between states P (Xi|Xi−1) and emis-
sion probabilities P (Y |X).

• To find the most likely sequence of states, the Viterbi Algorithm is employed, which
is identical to the Max-Prop Algorithm

17
Loopy Belief Propagation: Belief Propagation in Bayes Nets
with Loops

• When there a loops in the undirected Bayes net, belief propagation is not applicable:
There cannot be a local message passing rule since information arriving at a node
from different paths can be correlated

• Loopy Belief Propagation the application of belief propagation to Bayes nets with
cycles (although strictly not correct)

• The local update rules are applied until convergence is achieved (which is not always
the case)

• Loopy Belief Propagation is applied in Probabilistic Relational Models which are ty-
pically large Bayes nets describing domains where relationships are important, e.g.,
friend of

18
Loopy Belief Propagation in Decoding

• Most interesting and surprising: Turbo codes and Low-Density Parity-Check (LDPC)
Codes use Loopy Belief Propagation for decoding. Both become very close to the
Shannon limit and require long code words and thus produce delays. Thus they are
not useful for interactive communication but for broadcasting of information (mobile
communication) and in space applications (NASA)

19
Junction tree algorithm: Correct Inference

• Most (non-)commercial Bayes nets contain the junction tree algorithm which realizes
correct probabilistic inference

• The junction tree algorithm combined variables such that the resulting net has no
loops

• The junction tree can be inefficient when the combines variables have many states

20
Example: Rumor Network

• Alan hears a rumor that the CEO gets fired with P (A) = 0.5

• Mary and Susan completely believe Alan P (M |A) = P (S|A) = 1 but they also
have independent sources P (M |¬A) = P (S|¬A) = 0.2

• Furthermore let’s assume that P (J|M, S) = 1, P (J|M, ¬S) = P (J|¬M, S) =

0.8, P (J|¬S, ¬M ) = 0

• What is the probability that Jack believes the rumor?

• In the example, we get P (J) = 0.648

21
Rumors: Loopy Belief Propagation

• One correctly calculates

X
P (M ) = P (M |A)P (A)
A
X
P (S) = P (S|A)P (A)
A

• But then incorrectly

X
P (J) = P (J|M, S)P (M )P (S)
M,S
since P (M, S) 6= P (M )P (S). In the example, we get P (J) = 0.744

• Incidentally, P (J|A) and of course P (J|M, S) would be calculated correctly. Con-

ditioning can break the loop!

22
Rumors: Junction Tree

• One forms the variable Z with four states and P (Z = z1,1|A) = P (M |A)P (S|A),
P (Z = z1,0|A) = P (M |A)(1 − P (S|A)), P (Z = z0,1|A) = (1 −
P (M |A))P (S|A), P (Z = z0,0|A) = (1−P (M |A))(1−P (S|A)), P (Z =
z1,1|¬A) = P (M |¬A)P (S|¬A), ...

• Also, P (J|Z = z1,1) = P (J|M, S), P (J|Z = z1,0) = P (J|M, ¬S),

P (J|Z = z0,1) = P (J|¬M, S), P (J|Z = z0,0) = P (J|¬M, ¬S)

• Then
X
P (Z) = P (Z|A)P (A)
A
and
X
P (J) = P (J|Z)P (Z)
Z
gives the correct solution

23
Design of a Bayes Net

• The expert needs to be clear about the important variables in the domain

• The expert must indicate direct causal dependencies by specifying the directed link in
the net

• The expert needs to quantify the causal dependencies: define the conditional proba-
bility tables

• This can be challenging if a node has many parents: if a binary node has n binary
parents, then the expert needs to specify 2n numbers!

• To simplify this task one often makes simplifying assumptions; the best-known one is
the Noisy-Or Assumption

24
Noisy-Or

• Let X ∈ {0, 1} be a binary node with binary parents U1, . . . , Un

• Let qi be the probability, that X = 0 (symptom), when Ui = 1 (disease) and all

other (diseases) Uj = 0; this means, that ci = 1 − qi is the influence of a single
parent (disease) on X (the symptom)
Qn
• Then one assumes, P (X = 0|U1, . . . , Un) = i=1 Uiqi, or equivalently,
n
Y
P (X = 1|U1, . . . , Un) = 1 − Uiqi
i=1

• This means that if several diseases are present that can cause the symptom, then the
probability of the symptom increases (if compared to the probabilities for the single
disease

25
Maximum Likelihood Learning with Complete Data

• We assume that all nodes in the Bayesian net have been observed for N instances
(e.g., N patients)

• ML-parameter estimation simply means counting

• Let θi,j,k be defines as

θi,j,k = P (Xi = k|par(Xi) = j)

• This means that θi,j,k is the probability that Xi is in state k ist, when its parents
are in the state j (we assume that the states of the parents can be enumerated in a
systematic way)

• Than the ML estimate is

ML = Ni,j,k
θi,j,k P
k Ni,j,k
Here, Ni,j,k is the number of samples in which Xi = k and par(Xi) = j.

26
MAP-estimate for Integrating Prior Knowledge

• Often counts are very small and a ML-estimate has high variance

• One simply specifies efficient counts (counts from virtual data) which then can be
treated as real counts Let αi,j,k ≥ 0 be virtual counts for Ni,j,k

• One obtains the maximum a posteriori (MAP) estimate as

M AP = αi,j,k + Ni,j,k
θi,j,k P
k (αi,j,k + Ni,j,k )

27
Missing Data

• The problem of missing data is an important issue in statistical modeling

• In the simplest case, one can assume that data are missing at random

• Data is not missing at random, if for example, I analyse the wealth distribution in a
city and rich people tend to refuse to report their income

• For some models the EM (Expectation Maximization)-algorithm can be applied and

leads to ML or MAP estimates

• Consider a particular data point l. In the E-step we calculate the probability for mar-
ginal probabilities of interest given the known information in that data point dl and
given the current estimates of the parameters θ̂, using belief proagation or the junction
tree algorithm. Then we get
N
X
E(Ni,j,k ) = P (Xi = k, par(Xi) = j|dl , θ̂)
l=1
28
Note that if the parent and child nodes are known then P (Xi = k, par(Xi) =
j|dl , θ̂) is either zero or one, otherwise a number between zero and one

• Based on the E-step, we get in the M-step

ML = E(Ni,j,k )
θ̂i,j,k P
k E(Ni,j,k )

• E-Step and M-Step are iterated until convergence. One can show that EM does not
decrease the likelihood in each step; EM might converge to local optima, even if there
might only be a global optimum for the model with complete data
EM-Learning with the HMM and the Kalman Filter

• HMMs are trained using the so-called Baum-Welch-Algorithm. This is exactly an EM

algorithm

• Offline Versions of the Kalman filter can also be performed via EM

29
Beyond Table Representations

• For learning the conditional probabilities, many approaches have been employed

• Decision Trees, Neural networks, log-linear models, ...

30
Structural Learning in Bayes Nets

• One can also consider learning the structure of a Bayes net and maybe even discover
causality

• In structural learning, several points need to be considered

• There are models that are structural equivalent. For example in a net with only two
variables A and B one might show that there is statistical correlation between the
two variables, but it is impossible to decide if A → B or A ← B. Colliders (nodes
where arrow-head meet) can make directions identifiable

• If C is highly correlated with A and B and A and B are also highly correlated, it
might be clear from the data that C depends on both A and B but difficult to decide
if it only depends on A or only depends on B

• In general, the structure of the Bayes model and its parameters model the joint dis-
tribution of all the variables under consideration

31
• Recall that the way the data was collected can also have influence: I will get a different
distribution if I consider data from the general population and data collected from
patients visiting a specialists in a rich neighborhood
Causal Interpretation of Structure an Parameters

• One has to be very cautious with a causal interpretation od a learned model

• One assumption is that the world under consideration is causal

• Another assumption is that all relevant information in part of the model. Variables
that influence more than one variable in the domain are confounders and can lead
to structures and quantified dependencies which do not agree with the causal world
assumption (a gene for the ability of eating with sticks; smoking and lung cancer,
considering the correlation with income of some genetic mutation)

• Sometimes temporal information is available which constraints the direction of links

(Granger causality)

32
Structure Learning via Greedy Search

• In the most common approaches one defines a cost function and looks for the structure
that is optimal under the cost function. One has to deal with many local optima

• Greedy Search: One starts with an initial network (fully connected, empty))and makes
local changes (removal of directed link, adding a link, reversing direction of a link, ...)
and accepts the change, when the cost function improves

• Greedy search can be started from different initial conditions

• Alternatives: Simulated Annealing, Genetic Algorithms

33
Cost Functions

• As a cost function one might use a cross-validation set

• Often BIC (Bayesian Information Criterion) is used:

1 M
log L − log N
N 2N
(M is the number of parameters; N is the number of data points)

• Recall, that Bayesian model selection is based on P (D|M). For some models with
complete data, this term can be calculated explicitly and can be used for model selec-
tion

34
Constrained-Based Methods for Structural Learning

• One performs statistical independence tests and uses those to decide on network
structure

• For the example in the V-structure in the image M and S are marginally mutually
independent but they might become dependent given J. J is dependent on both M
and S

• In the other structure S and M are dependent and S and S are dependent. Also, M
and S are dependent. But now M and S become independent given that J is known!

• The structure with the collider can be identified. The structures (2), (3), (4) are all
structurally equivalent

35
Concluding Remarks

• Bayesian networks are used as expert systems in medical domains

• The underlying theory can be used to derive the likelihood of complex models (e.g.,
HMMs)

• Markov Nets are related to Bayesian networks and use undirected edges; they typical
do not have a causal interpretation

• Bayes nets and Markov nets are the most important representatives of the class of
graphical models

• In the lecture we focussed on nets with discrete variables. Also commonly studied are
nets with continous variables and Gaussian distributions

Service Manual: DNP-720AE
No ratings yet
Service Manual: DNP-720AE
58 pages
Strategy Papers and Cases Questions
0% (1)
Strategy Papers and Cases Questions
9 pages
Natgeo-Formation-Of-Earth-2000002398-Article Quiz and Answers
No ratings yet
Natgeo-Formation-Of-Earth-2000002398-Article Quiz and Answers
4 pages
Bayes Nets 2016
No ratings yet
Bayes Nets 2016
62 pages
CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes
No ratings yet
CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes
29 pages
Lecture 5 Bayesian Networks
No ratings yet
Lecture 5 Bayesian Networks
12 pages
Unit 6
No ratings yet
Unit 6
126 pages
AI Unit 5 Notes
No ratings yet
AI Unit 5 Notes
35 pages
Bayesian Neworks
No ratings yet
Bayesian Neworks
32 pages
Lecture Bayesian Networks
No ratings yet
Lecture Bayesian Networks
50 pages
UNIT - V - ML - Final
No ratings yet
UNIT - V - ML - Final
105 pages
Good BayesianNetworksPrimer
No ratings yet
Good BayesianNetworksPrimer
23 pages
Mount Zion College of Engineering and Technology
No ratings yet
Mount Zion College of Engineering and Technology
22 pages
BNetwork Presentation
No ratings yet
BNetwork Presentation
18 pages
ML Unit-5
No ratings yet
ML Unit-5
104 pages
AAI Module 3 Notes
No ratings yet
AAI Module 3 Notes
7 pages
Unit 4 - Acting Logically
No ratings yet
Unit 4 - Acting Logically
33 pages
AI & ML Unit 2 Notes
No ratings yet
AI & ML Unit 2 Notes
12 pages
CS3491-AI ML-Chapter 3
No ratings yet
CS3491-AI ML-Chapter 3
23 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
58 pages
2021 Lecture09 BayesianNetworks
No ratings yet
2021 Lecture09 BayesianNetworks
60 pages
Lec7 - Bayesian Network I
No ratings yet
Lec7 - Bayesian Network I
62 pages
Bayesian and Inference
No ratings yet
Bayesian and Inference
86 pages
Exp1 A09 DS
No ratings yet
Exp1 A09 DS
6 pages
Bayesian Belief Network, Exact Inference, Approx Inference, Causal Network
No ratings yet
Bayesian Belief Network, Exact Inference, Approx Inference, Causal Network
15 pages
Artificial Intelligence: Adina Magda Florea
No ratings yet
Artificial Intelligence: Adina Magda Florea
36 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
53 pages
Bayesian Networks: Independencies and Inference: Scott Davies and Andrew Moore
No ratings yet
Bayesian Networks: Independencies and Inference: Scott Davies and Andrew Moore
21 pages
Bayesian Networks and Belief Propagation
No ratings yet
Bayesian Networks and Belief Propagation
67 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
41 pages
Prob Inf
No ratings yet
Prob Inf
56 pages
ASHTIKA
No ratings yet
ASHTIKA
9 pages
Lecture-8 Machine Learning With Python
No ratings yet
Lecture-8 Machine Learning With Python
35 pages
EECS6895 AdvancedBigDataAnalytics Lecture6
No ratings yet
EECS6895 AdvancedBigDataAnalytics Lecture6
81 pages
Learning Bayesian Networks (Neapolitan, Richard) PDF
100% (1)
Learning Bayesian Networks (Neapolitan, Richard) PDF
704 pages
PPT06-Probabilistic Reasoning
No ratings yet
PPT06-Probabilistic Reasoning
31 pages
4.1 Bayes Decision Theory
No ratings yet
4.1 Bayes Decision Theory
23 pages
Bayesian Networks in AI
No ratings yet
Bayesian Networks in AI
8 pages
Bayesian Networks: A Tutorial
No ratings yet
Bayesian Networks: A Tutorial
73 pages
Unit-5 Bayes' Rule and Bayesian Network
No ratings yet
Unit-5 Bayes' Rule and Bayesian Network
9 pages
BN DBN SSM HMM - Ghahramani
No ratings yet
BN DBN SSM HMM - Ghahramani
30 pages
Mount Zion College of Engineering and Technology
No ratings yet
Mount Zion College of Engineering and Technology
23 pages
Ba Yes Network
No ratings yet
Ba Yes Network
73 pages
Chapter 9 Data Mining
No ratings yet
Chapter 9 Data Mining
147 pages
Uncertain Knowledge
No ratings yet
Uncertain Knowledge
31 pages
13 Bayes Nets
No ratings yet
13 Bayes Nets
38 pages
Libpgm For Bayesian Networks: Dr. A. Obulesh Associate Professor
No ratings yet
Libpgm For Bayesian Networks: Dr. A. Obulesh Associate Professor
59 pages
DHS2 11revised
No ratings yet
DHS2 11revised
8 pages
BayesianNetworks Reduced
No ratings yet
BayesianNetworks Reduced
14 pages
Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
Unit 2
No ratings yet
Unit 2
45 pages
4.2 Bayes-Nets
No ratings yet
4.2 Bayes-Nets
33 pages
Bayesian Networks
No ratings yet
Bayesian Networks
48 pages
Machine Learning: (Computer Engineering and Technology) (TYB - Tech)
No ratings yet
Machine Learning: (Computer Engineering and Technology) (TYB - Tech)
85 pages
Unit-4 Bayesian Networks
No ratings yet
Unit-4 Bayesian Networks
19 pages
Introduction to Minimax
From Everand
Introduction to Minimax
V. F. Dem’yanov
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Integration, Measure and Probability
From Everand
Integration, Measure and Probability
H. R. Pitt
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Gear Trains Definition and Types With PDF
No ratings yet
Gear Trains Definition and Types With PDF
4 pages
Intro To Gyro
No ratings yet
Intro To Gyro
36 pages
Configuration: Dacia Duster
No ratings yet
Configuration: Dacia Duster
5 pages
Balancing PDF
No ratings yet
Balancing PDF
13 pages
Surface Finish Measurement Notes
No ratings yet
Surface Finish Measurement Notes
32 pages
Comparators and Its Types
No ratings yet
Comparators and Its Types
16 pages
Go & NoGo Gauges
No ratings yet
Go & NoGo Gauges
4 pages
Limits, Fits & Tolerances
No ratings yet
Limits, Fits & Tolerances
7 pages
Landing Page Inspiration 3
No ratings yet
Landing Page Inspiration 3
1 page
Naukri VinitaSingh 1790045 - 08 00 - 1
No ratings yet
Naukri VinitaSingh 1790045 - 08 00 - 1
3 pages
ECE CAD Introduction To AutoCAD
No ratings yet
ECE CAD Introduction To AutoCAD
5 pages
American Ethnologist - February 1987 - BROWN - Religion Class and Context Continuities and Discontinuities in Brazilian
No ratings yet
American Ethnologist - February 1987 - BROWN - Religion Class and Context Continuities and Discontinuities in Brazilian
21 pages
Upgrading Cimplicity 6.1 To 8.1 License Issue
No ratings yet
Upgrading Cimplicity 6.1 To 8.1 License Issue
2 pages
Manual HON 370 20 GB
No ratings yet
Manual HON 370 20 GB
51 pages
Airbnb Seasonality and Revenue Data Trends For Grand Prairie - AirDNA MarketMinder
No ratings yet
Airbnb Seasonality and Revenue Data Trends For Grand Prairie - AirDNA MarketMinder
2 pages
Thesis Port Service
100% (3)
Thesis Port Service
7 pages
BROSURABFPLOFT20112
No ratings yet
BROSURABFPLOFT20112
6 pages
Monsoon Theories
100% (1)
Monsoon Theories
14 pages
Trainz 2004 DRAFT Content Creation Procedures
100% (1)
Trainz 2004 DRAFT Content Creation Procedures
101 pages
K00200 - 20211027174133 - Rubrics Individual Assignment Paf3113 Sem A202
No ratings yet
K00200 - 20211027174133 - Rubrics Individual Assignment Paf3113 Sem A202
7 pages
Intervention21120-5570393 152823
No ratings yet
Intervention21120-5570393 152823
10 pages
Nama Alat Dan Spesifikasi
No ratings yet
Nama Alat Dan Spesifikasi
128 pages
Agriculture Assist (Synopsis)
79% (14)
Agriculture Assist (Synopsis)
13 pages
RC Column Sample Problem
No ratings yet
RC Column Sample Problem
10 pages
Keats
100% (1)
Keats
15 pages
Chemical Engineering in Practice Second Edition - Sampler
100% (1)
Chemical Engineering in Practice Second Edition - Sampler
99 pages
Seafarer Medical Certificate
No ratings yet
Seafarer Medical Certificate
2 pages
#01 G.R. No. 100113
No ratings yet
#01 G.R. No. 100113
19 pages
Practical Set-1: The Result Is 600 The Result Is 70
No ratings yet
Practical Set-1: The Result Is 600 The Result Is 70
12 pages
4as Tle7 LC4
No ratings yet
4as Tle7 LC4
5 pages
Master Thesis Vu Amsterdam
100% (2)
Master Thesis Vu Amsterdam
8 pages
Mabini Colleges, Inc.: College of Nursing and Midwifery
No ratings yet
Mabini Colleges, Inc.: College of Nursing and Midwifery
2 pages
Microlink Information Technology College Department of Computer Science
No ratings yet
Microlink Information Technology College Department of Computer Science
87 pages
Metalsa Supplier Manual Rev 4 1
No ratings yet
Metalsa Supplier Manual Rev 4 1
58 pages
Essay Topics Grade 11
100% (2)
Essay Topics Grade 11
5 pages

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation

Uploaded by

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation

Uploaded by

Bayesian Networks: Construction,

Inference, Learning and Causal

• Probability was rejected since a naive representation would require 2M parameters;

• In Bayes nets one constructs a high-dimensional probability distribution based on a

• Bayes nets are closely related to a causal world model

• They also have a great influence on machine learning

• The random variables in a domain are displayed as nodes (vertices)

• Quantification of the dependency:

– For nodes without parents one specifies a priori probabilities

• A Bayes net specifies a probability distribution in the form

• In the simplest inference approach one proceeds as follows :

– The normalization is calculated as

– Calculation of the conditional probability distributions

• But there might be cycles when one ignores the directions

• The marginalization can be computationally expensive: the sum contains exponentially

• Then we can calculate for any Xi

• Inference in polytrees can be performed in a similar (but more complex) way

• π and λ propagation is generalized to polytrees

• States corresponds to phonemes; measurements correspond to the acoustic spectrum

• Furthermore let’s assume that P (J|M, S) = 1, P (J|M, ¬S) = P (J|¬M, S) =

• What is the probability that Jack believes the rumor?

• In the example, we get P (J) = 0.648

• One correctly calculates

• But then incorrectly

• Incidentally, P (J|A) and of course P (J|M, S) would be calculated correctly. Con-

• Also, P (J|Z = z1,1) = P (J|M, S), P (J|Z = z1,0) = P (J|M, ¬S),

• Let X ∈ {0, 1} be a binary node with binary parents U1, . . . , Un

• Let qi be the probability, that X = 0 (symptom), when Ui = 1 (disease) and all

• ML-parameter estimation simply means counting

• Let θi,j,k be defines as

θi,j,k = P (Xi = k|par(Xi) = j)

• Than the ML estimate is

• One obtains the maximum a posteriori (MAP) estimate as

• The problem of missing data is an important issue in statistical modeling

• For some models the EM (Expectation Maximization)-algorithm can be applied and

• Based on the E-step, we get in the M-step

• HMMs are trained using the so-called Baum-Welch-Algorithm. This is exactly an EM

• Offline Versions of the Kalman filter can also be performed via EM

• Decision Trees, Neural networks, log-linear models, ...

• In structural learning, several points need to be considered

• One has to be very cautious with a causal interpretation od a learned model

• One assumption is that the world under consideration is causal

• Sometimes temporal information is available which constraints the direction of links

• Greedy search can be started from different initial conditions

• Alternatives: Simulated Annealing, Genetic Algorithms

• As a cost function one might use a cross-validation set

• Often BIC (Bayesian Information Criterion) is used:

• Bayesian networks are used as expert systems in medical domains

You might also like