0% found this document useful (0 votes)
9 views37 pages

BN Lecture2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views37 pages

BN Lecture2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Bayesian Networks

Knowledge Representation

• Andreas Sauter
• Dec. 2023
• (Content adapted from Erman Acar)
Overview

1. Foundations
Degrees of Belief, Belief Dynamics, Independence, Bayes Theorem, Marginalization

2. Bayesian Networks
Graphs and their Independencies, Bayesian Networks, d-Separation

3. Tools for Inference


Factors, Variable Elimination, Elimination Order, Interaction Graphs, Graph pruning

4. Exact Inference in Bayesian Networks


Posterior Marginal, Maximum – A-posteriori, Most Probable Explanation

2 Department of Computer Science


Lecture 2: Bayesian Networks
Lecture Overview

Directed Acyclic Graphs


Nodes, Edges, Ancestry, Special Paths

Bayesian Networks
Motivation, Formal Definition

Independence through DAGs


Markov Property, Symmetry, Decomposition, Weak Union, Contraction

d-Separation
d-Blocked Paths, d-Separation, Pruning for d-Separation

4 Department of Computer Science


Directed Acyclic Graphs
Motivation

As we will see later, graphs are an important component of Bayesian networks.

This section serves as a refresher of basic and relevant concepts of graphs.

We will consider only directed acyclic graphs (DAGs) in the context of BNs.

These are graphs that have:


• Nodes B C
• Directed edges
• No cycles
A D

6 Department of Computer Science


SA(0

Basic Definitions

The nodes represent propositional variables.


E.g: Alarm, Burglary, Earthquake

The directed edges represent direct influences.

We define a DAG as a set of directed edges over nodes that do


not induce a cycle.

A path from to is a node-edge sequence in which B can be


reached from A.

A directed path is a path in which all edges are directed towards the end node.
7 Department of Computer Science
Slide 7

SA(0 Add definition of leaf node


Sauter, A. (Andreas), 2022-12-07T12:32:02.564
Ancestry

The parents of a node is the set of variables which have a directed edge
to .

The descendants of a node is the set of variables which can be


reached by a directed path from

The non-descendants of a node is the set of all variables which


are neither descendants nor parents of A.

A leaf node is a node without descendants.

8 Department of Computer Science


Special Paths

Three special types of paths play an important role in BNs.

For we call a path a:

• Sequence if 𝜋 = 𝐴 → 𝑊 → 𝐵

• Fork if 𝜋 = 𝐴 ← 𝑊 → 𝐵

• Collider if 𝜋 = 𝐴 → 𝑊 ← 𝐵

9 Department of Computer Science


Examples

is a path
is a directed path

is a fork
is a collider
is a sequence

10 Department of Computer Science


Bayesian Networks
Motivation

We know that joint probability tables are useful for reasoning about beliefs.

Unfortunately, representing this table needs rows even in the simplest case.

Bayesian networks address this issue by factorizing the joint probability


distribution by means of the independence structure of the variables.

BNs acknowledge the fact that independence forms a significant aspect of beliefs
and that it can be elicited relatively easily using the language of graphs.

Furthermore, BNs enable more efficient inference algorithms on probabilistic


knowledge.
12 Department of Computer Science
Example B P(B) C B P(C|B)
True True 0.255
True 0.47
A B C D P(A,B,C,D) False 0.53
True False 0.35
False True 0.745
True
True
True
True
True
True
True
False
0.020
0.046
B C False False 0.65

True True False True 0.191


True True False False 0.002
True False True True 0.022 A D
True False True False 0.009
True False False True 0.004
A B P(A|B) D B C P(D|B,C)
True False False False 0.031
True True 0.55 True True True 0.3
False True True True 0.016
True False 0.1 True True False 0.99
False True True False 0.043
False True 0.45 True False True 0.5
False True False True 0.156
False False 0.9 True False False 0.11
False True False False 0.002
False True True 0.7
False False True True 0.083
False True False 0.01
False False True False 0.083
False False True 0.5
False False False True 0.034
False False False 0.89
False False False False 0.276

13 Department of Computer Science


Definition

Intuitively, a Bayesian network consists of a structure and a parametrization


where the overall joint distribution can be determined with the chain rule.

Formally: Given a D and a set of conditional probability tables


a tuple is a Bayesian network if

Where is the joint distribution over all variables and denotes the
parents of w.r.t. the graph

14 Department of Computer Science


Notations

For a less cluttered formulation, we will use the following notations.

For a variable , we will write if and if

We will denote the conditional probability table as |


where

For an assignment we will denote |

15 Department of Computer Science


Instantiations

An assignment of all network variables will be called a network instantiation.

A conditional probability table is compatible with an assignment z, denoted


| iff the instantiations x, u and z agree on their assignments.
e.g. Θ |¬ ∼ 𝑎, ¬𝑏, 𝑐 or Θ ∼ 𝑎, ¬𝑏, 𝑐

For an instantiation we can re-write the chain rule to compute its probability as
| ∼ |

E.g. Assume a sequence then | |

16 Department of Computer Science


Instantiations – Examples

| ¬ | | ,¬ ¬ |¬

0.0216

¬ ¬ |¬ ¬ |¬ ¬ |¬ ,¬ ¬ |¬

From this we can recover


17 Department of Computer Science
Independence through DAGs
Introduction

If defined accordingly, DAGs can be a great tool to determine independence.

We don’t even need to know what the actual distributions are!

Some intuition:
An alarm directly triggers a call from the neighbour.

If we get a radio report that an earthquake took place, then the


belief in A changes, which in turn changes our belief in C.

Yet, if we know already that no alarm was triggered, then the


belief in the call stays unchanged. Hence, 𝐶⫫𝑅 | 𝐴

19 Department of Computer Science


Markov Property

One information a DAG can give us about independence is the Markov Property.

The Markov property tells us that every variable is conditionally independent of


its non-descendants given its parents, or formally for :

Example:
{𝐶} ⫫ 𝐵, 𝐸, 𝑅 | {𝐴}
{𝑅} ⫫ 𝐴, 𝐵, 𝐶 | {𝐸}
𝐴 ⫫ {𝑅} | 𝐵, 𝐸
{𝐵} ⫫ 𝐸, 𝑅 | ∅
𝐸 ⫫ {𝐵} | ∅

20 Department of Computer Science


Markov Property – Further Example

This property is also often used in the Hidden Markov Model (HMM) which has
many applications in reinforcement learning, NLP, ect.

It consists of states at discrete time steps and observations resulting


from the state (e.g. measurements), with the following structure:

The Markov property tells us, that

Informally: “The current state can be fully


determined by the previous state”
21 Department of Computer Science
Properties of Independence

Although all independencies of the Markov property are encoded by the DAG,
the DAG implies even more independencies.

E.g. the graph on the right also implies


{ , which does not follow
from the Markov property.

These additional independencies can be


derived by a set of independence properties.

22 Department of Computer Science


Symmetry (recap)

Intuitively, if observing does not influence our belief in , then learning does
not influence our belief in either.

Formally, symmetry is expressed as the following:

Example: If the graph encodes because


of the Markov property, then it also encodes
because of symmetry.

23 Department of Computer Science


Decomposition

Intuitively, if observing does not influence our belief in , then learning


alone or alone, will not influence our belief in

Formally, decomposition can be expressed as the following:

Note: The opposite direction does not hold in general.

Example: If w , then also


and .

24 Department of Computer Science


Decomposition - Application

More generally, with the help of decomposition we can state that

This is especially useful when calculating a joint probability distribution with the
chain rule.

Example:

Can be simplified to:

25 Department of Computer Science


Weak Union

Intuitively, this expresses that, if the information is not relevant to our


belief in , then the partial information will not make relevant.

Formally, weak union can be expressed as the following:

Example: If w , then we can


conclude by weak union also .

26 Department of Computer Science


Contraction

Intuitively, given if observing an irrelevant information makes irrelevant,


then must have been irrelevant from the start.

Formally, contraction can be expressed as the following:

Example: If w
then

27 Department of Computer Science


d-Separation
Introduction

We have seen that deriving new independencies from the Markov property can
be cumbersome.

Luckily, there is a graphical test called d-separation which captures the same
independencies as the rules described before.

If X is d-separated from Y by Z, we denote this as .

Each d-separation implies an independence in a Bayesian network:


Y|Z
Important: Not the other way round!
29 Department of Computer Science
d-Blocked Paths

Recall the three special paths described in before: sequence, fork, collider

These can be, what we call, d-blocked in the following cases:


• A sequence 𝐴 → 𝑊 → 𝐵 is d-blocked by Z, iff W ∈ Z.
• A fork 𝐴 ← 𝑊 → 𝐵 is d-blocked by Z, iff W ∈ Z.
• A collider 𝐴 → 𝑊 ← 𝐵 is d-blocked by Z, iff neither W nor any desc(W) ∈ Z.

In general, a path is d-blocked iff it contains at least one d-blocked sequence, fork
or collider.

30 Department of Computer Science


Examples

closed open open

open

Is d-blocked by ? Is d-blocked by ?
Yes! No!
31 Department of Computer Science
d-Separation

We now have all the tools to properly define d-separation.

Given disjoint sets , we say and are d-separated by , iff every


path between a node in to a node in is d-blocked by .

Intuitively this means that there is no way information can flow between and
when we condition on .

32 Department of Computer Science


Examples

Does hold?
Yes!

Does hold?
No!

Does hold?
Yes!

Does hold?
Yes!

33 Department of Computer Science


D-Separation through Pruning

Paths between sets of nodes can be exponentially many. The following method
guarantees that d-separation can be decided in linear time in the graph size.

Given a DAG and disjoint sets of nodes a pruned DAG w.r.t. is


determined by:
• Deleting every leaf node 𝑊 ∉ 𝑋 ∪ 𝑌 ∪ 𝑍,
• Deleting all edges outgoing from nodes in 𝑍,
• Performing both rules iteratively until they can’t be applied anymore.

Then, in and are disconnected in w.r.t. .

34 Department of Computer Science


Examples

Recall rules:
• Deleting every leaf node 𝑊 ∉ 𝑋 ∪ 𝑌 ∪ 𝑍,
• Deleting all edges outgoing from nodes in 𝑍,

Is ?
Yes!

Is ?
Yes!

35 Department of Computer Science


Lecture 2: Summary

• We introduced DAGs and defined important graph-theoretic concepts.

• We investigated which independencies a DAG can encode.

• We introduced and formally defined Bayesian networks.

• We saw an easy way to derive the independencies implied by the structure of


a Bayesian Network.

36 Department of Computer Science

You might also like