0% found this document useful (0 votes)
38 views21 pages

Prob Nets

This document defines probabilistic models and discusses graphical representations of probabilistic models using graphs. It introduces concepts such as the joint probability distribution (JPD), conditional probability distributions, conditional independence statements, and how qualitative structures like graphs can be used to simplify probabilistic models. Specifically, it discusses using directed acyclic graphs (DAGs) and conditional independence relationships to represent probabilistic models graphically using the notions of d-separation and u-separation.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views21 pages

Prob Nets

This document defines probabilistic models and discusses graphical representations of probabilistic models using graphs. It introduces concepts such as the joint probability distribution (JPD), conditional probability distributions, conditional independence statements, and how qualitative structures like graphs can be used to simplify probabilistic models. Specifically, it discusses using directed acyclic graphs (DAGs) and conditional independence relationships to represent probabilistic models graphically using the notions of d-separation and u-separation.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

DEFINING PROBABILISTIC MODELS

Cantabria University

The Joint Probability Distribution (JPD) of a set of n binary variables involve a huge number of parameters 2n (larger than 1025 for only 100 variables).
x 0 0 0 0 1 1 1 1 y 0 0 1 1 0 0 1 1 z p(x, y, z) 0 0.12 1 0.18 0 0.04 1 0.16 0 0.09 1 0.21 0 0.02 1 0.18

We can use the qualitative structure of the model to simplify the probabilistic structure.
Graphically Specified Models Qualitative Structure (Factorization) Quantitative Structure (Parameter estimation)

Models Specified by Input Lists

Conditional Probability Distributions (CPDs)

Probabilistic Model (JPD)

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


1

Cantabria University

CONDITIONAL PROBABILITY. INDEPENDENCE

Denicin 1 Conditional probability. Let X o and Y be two disjoint subsets of variables such that p(y) > 0. Then, the conditional probability distribution (CPD) of X given Y = y is given by p(x, y) p(X = x|Y = y) = p(x|y) = . p(y) (1)

Denicin 2 Independence of two variables. o Let X and Y be two disjoint subsets of the set of random variables {X1, . . . , Xn}. Then X is said to be independent of Y if and only if p(x|y) = p(x), (2)

for all possible values x and y of X and Y ; otherwise X is said to be dependent on Y . Also, if X is independent of Y , we can then combine (1) and (2) and obtain which implies p(x, y) = p(x)p(y). If {X1, . . . , Xm } are independent, then p(x1, . . . , xm ) =
m i=1

(3)

p(xi),

(4)

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


2

CONDITIONAL INDEPENDENCE
Cantabria University

Denicin 3 Conditional independence. o Let X, Y and Z be three disjoint sets of variables, then X is said to be conditionally independent of Y given Z, if and only if p(x|z, y) = p(x|z). p(x, y|z) = p(x|z)p(y|z). When X and Y are conditionally independent given Z, we write I(X, Y |Z). The statement I(X, Y |Z) is referred to as a conditional independence statement (CIS). Similarly, when X and Y are conditionally dependent given Z, we write D(X, Y |Z), which is called a conditional dependence statement. The denition of conditional independence conveys the idea that once Z is known, knowing Y can no longer inuence the probability of X. In other words, if Z is already known, knowledge of Y does not add any new information about X. Note that (unconditional) independence can be treated as a particular case of conditional independence. For example, we can write I(X, Y |), to mean that X and Y are unconditionally independent.
PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS
3

FACTORIZATIONS OF A JPD
Cantabria University

Denicin 4 Factorization by potentials. Let o C1, . . . , Cm be subsets of a set of variables X = {X1, . . . , Xn }. p(x1, . . . , xn) =
m i=1

i(ci),

where the functions i, called factor potentials, are nonnegative. Denicin 5 Chain rule factorizations. Any o JPD of a set of ordered variables {X1, . . . , Xn} can be expressed as a product of m CPDs of the form p(x1, . . . , xn ) = where Bi = {Y1, . . . , Yi1}. Ejemplo 1 Chain rule. Consider a case of four variables {X1, . . . , X4}. Then the following are equivalent chain rule factorizations of the JPD: p(x1, . . . , x4) = p(x1)p(x2|x1)p(x3|x1, x2)p(x4|x1, x2, x3) and p(x1, . . . , x4) = p(x1|x2, x3, x4)p(x2|x3, x4)p(x3|x4)p(x4).
PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS
4

m i=1

p(yi|bi),

(5)

IMPOSING INDEPENDENCIES
Cantabria University

Consider the variables {X1, X2, X3, X4} and suppose we have: I(X3, X1|X2) and I(X4, {X1, X3}|X2). (6)

We wish to compute the constraints among the parameters of the JPD imposed by these CISs. The rst of these statements implies p(x3|x1, x2) = p(x3|x2), and the second statement implies p(x4|x1, x2, x3) = p(x4|x2). (8) (7)

Note that the general form of the JPD is not a suitable representation for calculating the constraints given by (7) and (8). However, by using these two equalities we obtain p(x1, . . . , x4) = p(x1)p(x2|x1)p(x3|x2)p(x4|x2). (9) Therefore, the two CISs in (6) give rise to a reduction in the number of parameters from 15 to 7.

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


5

Cantabria University

DEPENDENCY MODELS. SEPARATION

Denicin 6 Dependency model. o Any model M of a set of variables {X1, . . . , Xn} from which we can determine whether I(X, Y |Z) is true, for all possible triplets of disjoint subsets X, Y , and Z, is called a dependency model. A JPD (with the denition of conditional independence) is a dependency model. A graph can also dene a dependency model (using the corresponding separation criterion), The qualitative structure of a probabilistic model can be represented by a graphical dependency model that provides a way to factorize the corresponding JPD. Denicin 7 U-separation. Let X, Y , and Z be o three disjoint subsets of nodes in an undirected graph G. We say that Z separates X and Y i every path between each node in X and each node in Y contains at least one node in Z. When Z separates X and Y in G, we write I(X, Y |Z)G; otherwise D(X, Y |Z)G. Given an undirected graph, one can derive all CISs from the graph using the above U -separation criterion.
PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS
6

U-SEPARATION EXAMPLE
Cantabria University

A B D G H (a) I(A, I | E) E I C F G D H B

A C E F

(b) D(A, I | B)

A B D G H E I C F G D H B

A C E I F

(c) I({A, C}, {D, H} | {B, E})

(d) D({A, C}, {D, H} | {E, I})

Every path between A and I contains E. Thus, I(A, I|E)G . There is a path (A C E I) that does not contain B. Thus I(A, I|B)G . Every path between the two subsets contains either B or E. I({A, C}, {D, H}|{B, E})G . There is a path (A B D} that does not contain the variables E and I.
PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS
7

D-SEPARATION
Cantabria University

Denicin 8 D-Separation. Let X, Y , and Z be o three disjoint subsets of nodes in a DAG D; then Z is said to D-separate X and Y , i along every undirected path from each node in X to each node in Y there is an intermediate node A such that either 1. A is a head-to-head node in the path, and neither A nor its descendants are in Z, or 2. A is not a head-to-head node in the path and A is in Z. When Z D-separates X and Y in D, we write I(X, Y |Z)D to indicate that this CIS is derived from D; otherwise we write D(X, Y |Z)D to indicate that X and Y are conditionally dependent given Z in the graph D. Denicin 9 D-Separation. Let X, Y , and Z be o three disjoint subsets of nodes in a DAG D, then Z is said to D-separate X and Y i Z separates X and Y in the moral graph of the smallest ancestral set containing X, Y , and Z.

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


8

D-SEPARATION EXAMPLE
Cantabria University

Employment

Investment income

V
Health

Wealth

C
Contributions

P
Happiness

H C

P (a) I(E, V | ) V

P (b) D(E, H | P)

H C

P (c) I(C, P | {E, W})

P (d) D(C, {H, P} | E)

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


9

Cantabria University

PROPERTIES OF CONDITIONAL INDEPENDCE

Symmetry: If X is conditional independent (c.i.) of Y , then Y is c.i. of X I(X, Y |Z) I(Y, X|Z). Decomposition: If X is c.i. of Y W given Z, then X is c.i. of Y given Z, and X is c.i. of W given Z, that is, I(X, Y W |Z) I(X, Y |Z) and I(X, W |Z). Weak Union: I(X, Y W |Z) I(X, W |ZY ) and I(X, Y |ZW ). Contraction: If W is irrelevant to X after the learning of some irrelevant information Y , then W must have been irrelevant before we knew Y , that is, I(X, W |Z Y ) and I(X, Y |Z) I(X, Y W |Z). The weak union and contraction properties together mean that irrelevant information should not alter the relevance of other relevant information in the system. In other words, what was relevant remains relevant, and what was irrelevant remains irrelevant.
PROBABILISTICfour properties hold for any JPD. The above NETWORK MODELS: BAYESIAN NETWORKS
10

GRAPHICAL ILLUSTRATION
Cantabria University

&

(a) Symmetry
X X

(b) Decomposition
X

&

(c) Weak Union


X X X

&
W Y

(d) Contraction
X X X

&
W Y

(e) Intersection

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


11

OTHER PROPERTIES
Cantabria University

1. Strong Union: If X is c.i. of Y given Z, then X is also c.i. of Y given Z W , that is, I(X, Y |Z) I(X, Y |Z W ). Strong union property violation by DAGs.
X X

Y
(a)

Y
(b)

2. Intersection:
I(X, W |Z Y ) and I(X, Y |Z W ) I(X, Y W |Z).

3. Strong Transitivity:
D(X, A|Z) and D(A, Y |Z) D(X, Y |Z),

4. Weak Transitivity:
D(X, A|Z) and D(A, Y |Z) D(X, Y |Z) or D(X, Y |Z A),

5. Chordality:
D(A, C|B) and D(A, C|D) D(A, C|BD) or D(B, D|AC),
PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS
12

GRAPHICAL ILLUSTRATION
Cantabria University

or

(a) Strong Union


X X

(b) Strong Transitivity


X X

&
A Y

or

(c) Weak Transitivity


A A

B A A C B C D

or

&

B C

D A A

or

(d) Chordality

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


13

Cantabria University

GRAPHICAL REPRESENTATIONS OF PROBABILISTIC MODELS

Graphs display the relationships among the variables explicitly and are intuitive and easy to explain. It is important to analyze whether or not dependence models associated with probabilistic models can be given by graphical models. Denicin 10 Perfect map. A graph G is said to o be a perfect map of a dependency model M if every CIS derived from G can also be derived from M and vice versa, that is, I(X, Y |Z)M I(X, Y |Z)G Z separates X f rom Y. Unfortunately, not every dependency model can be represented by a directed or undirected perfect map.

Ejemplo 2 Dependency model with no directed perfect map. Consider the set of three variables {X, Y, Z} and the dependency model M = {I(X, Y |Z), I(Y, Z|X), I(Y, X|Z), I(Z, Y |X)}. There is no directed acyclic graph (DAG) D that is a perfect map of the dependency model M .
PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS
14

Cantabria University

MODEL WITH NO UNDIRECTED PERFECT MAP

M = {I(X, Y |), I(Y, X|)},


X Y
(a)

Y
(b)

Y
(c)

Y
(d)

Y
(e)

Y
(f)

Y
(g)

Y
(h)

Graph G G not in M M not in G (a) I(X, Z|) (b) I(X, Z|) I(X, Y |) (c) I(Y, Z|) (d) I(X, Z|) (e) I(Y, Z|X) I(X, Y |) (f) I(X, Z|Y ) I(X, Y |) (g) I(X, Y |Z) I(X, Y |) (h) I(X, Y |)
PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS
15

I-MAPS AND D-MAPS


Cantabria University

Denicin 11 Independence map. A graph G is o said to be an independence map (I-map) of a dependency model M if I(X, Y |Z)G I(X, Y |Z)M , that is, if all CISs derived from G hold in M .

Denicin 12 Dependency map. A graph G is o said to be a dependency map (D-map) of a dependency model M if D(X, Y |Z)G D(X, Y |Z)M , that is, all CISs derived from G hold in M .

Denicin 13 Minimal I-map. A graph G is said o to be a minimal I-map of a dependency model M if it is an I-map of M , but it is not an I-map of M when removing any link from it.

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


16

Cantabria University

MODELS WITH UNDIRECTED PERFECT MAPS

A necessary and sucient condition for a dependency model M to have an undirected perfect map is that M must satisfy the following properties: Symmetry: I(X, Y |Z)M I(Y, X|Z)M . Decomposition: I(X, Y W |Z)M I(X, Y |Z)M and I(X, W |Z)M . Intersection:
I(X, W |Z Y )M and I(X, Y |Z W )M I(X, Y W |Z)M .

Strong union: I(X, Y |Z)M I(X, Y |Z W )M . Strong transitivity: I(X, Y |Z)M I(X, A|Z)M or I(Y, A|Z)M , where A is a single node not in {X, Y, Z}.

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


17

MARKOV NETWORKS
Cantabria University

Denicin 14 Markov network. A Markov neto work is a pair (G, ) where G is an undirected graph and = {1(c1), . . . , m(cm)} is a set of positive potential functions dened on the cliques C1, . . . , Cm of G that denes the JPD p(x) as p(x) =
n i=1

i(ci).

(10)

If the undirected graph G is triangulated, then p(x) can also be factorized, using probability functions P = {p(r1|s1), . . . , p(rm|sm)}, as p(x1, . . . , xn ) =
m i=1

p(ri|si),

(11)

where Ri and Si are the separator and residual of the cliques. In this case, the Markov network model is dened by (G, P ). The graph G is an undirected I-map of p(x). Thus, a Markov network can be used to dene the qualitative structure of a probabilistic model through a factorization of the corresponding JPD in terms of potential functions or probability functions. The quantitative structure is then obtained by numerically specifying the functions appearing in the factorization.
PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS
18

Cantabria University

MARKOV NETWORKS EXAMPLE

A
B
2

A
1

C1 C2
E
4

B D E
(a)

C F
D
5

C
3

C3

C4

F
6

(b)

The cliques of the graph are : C1 = {A, B, C}, C2 = {B, C, E}, C3 = {B, D}, C4 = {C, F }. (12)

p(a, b, c, d, e, f ) = 1 (c1)2(c2 )3(c3)4 (c4) = 1 (a, b, c)2 (b, c, e)3 (b, d)4 (c, f ). (13) Since the graph is triangulated, another factorization of the JPD in terms of probability functions can be obtained i Clique Ci Separator Si Residual Ri 1 A, B, C A, B, C 2 B, C, E B, C E 3 B, D B D 4 C, F C F
4

p(a, b, c, d, e, f ) =
i=1

p(ri|si) (14)

= p(a, b, c)p(e|b, c)p(d|b)p(f |c).

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


19

Cantabria University

MODELS WITH DIRECTED PERFECT MAPS

A necessary condition for a dependency model M to have a directed perfect map is that M must satisfy the following properties: Symmetry: I(X, Y |Z)M I(Y, X|Z)M . Composition-Decomposition: Intersection:
I(X, W |Z Y )M and I(X, Y |Z W )M I(X, Y W |Z)M . I(X, Y W |Z)M I(X, Y |Z)M and I(X, W |Z)M .

Weak union:
I(X, Y Z|W )M I(X, Y |W Z)M .

Weak transitivity:
I(X, Y |Z)M and I(X, Y |ZA)M I(X, A|Z)M orI(Y, A|Z)M ,

Contraction:
I(X, Y |Z W )M and I(X, W |Z)M I(X, Y W |Z)M .

Chordality:
I(A, B|CD)M , I(C, D|AB)M I(A, B|C)M or I(A, B|D)M ,

PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS


20

BAYESIAN NETWORKS
Cantabria University

Denicin 15 Bayesian network. A Bayesian o network is a pair (D, P ), where D is a DAG, P = {p(x1|1), . . . , p(xn |n)} is a set of n CPDs, one for each variable, and i is the set of parents of node Xi in D. The set P denes the associated JPD as p(x) =
n i=1

p(xi|i).

(15)

The DAG D is a minimal directed I-map of p(x). Example

p(X) = p(a)p(b)p(c|a)p(d|a, b)p(e)p(f |d)p(g|d, e).


PROBABILISTIC NETWORK MODELS: BAYESIAN NETWORKS
21

You might also like