0% found this document useful (0 votes)
67 views22 pages

Markov Networks

Markov networks are undirected graphical models that can represent conditional independence relationships between variables. They define a joint probability distribution as a product of potential functions over the cliques in the graph. This allows them to model domains where directed models are not natural, such as spatial or relational data. While more expressive than directed models, Markov networks are more computationally expensive to estimate and have less interpretable parameters.

Uploaded by

vmadhuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views22 pages

Markov Networks

Markov networks are undirected graphical models that can represent conditional independence relationships between variables. They define a joint probability distribution as a product of potential functions over the cliques in the graph. This allows them to model domains where directed models are not natural, such as spatial or relational data. While more expressive than directed models, Markov networks are more computationally expensive to estimate and have less interpretable parameters.

Uploaded by

vmadhuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Markov Networks

Alan Ritter
Markov Networks
• Undirected graphical models
Smoking Cancer

Asthma Cough
 Potential functions defined over cliques
Smoking Cancer Ф(S,C)
1
P( x)    c ( xc ) False False 4.5
Z c
False True 4.5

Z    c ( xc ) True False 2.7


x c
True True 4.5
Undirected Graphical Models:
Motivation
• Terminology:
– Directed graphical models = Bayesian Networks
– Undirected graphical models = Markov Networks
• We just learned about DGMs (Bayes Nets)
• For some domains being forced to choose a
direction of edges is awkward.
• Example: consider modeling an image
– Assumption: neighboring pixels are correlated
– We could create a DAG model w/ 2D topology
2D Bayesian Network
Markov Random Field
(Markov Network)
UGMs (Bayes Nets) vs
DGMs (Markov Nets)
• Advantages
1. Symmetric
• More natural for certain domains (e.g. spatial or relational
data)
2. Discriminative UGMs (A.K.A Conditional Random
Fields) work better than discriminative UGMs
• Disadvantages
1. Parameters are less interpretable and modular
2. Parameter estimation is computationally more
expensive
Conditional Independence Properties

• Much Simpler than Bayesian Networks


– No d-seperation, v-structures, etc…
• UGMs define CI via simple graph separation

• E.g. if we remove all the evidence nodes from


the graph, are there any paths connecting A
and B?
Markov Blanket
• Also Simple
– Markov blanket of a node is just the set of it’s
immediate neighbors
– Don’t need to worry about co-parents
Independence Properties

G L P

p(x) > 0

Pairwise:

Local:

Global:
Converting a Bayesian Network to a Markov
Network
• Tempting:
– Just drop directionality of the edges
– But this is clearly incorrect (v-structure)
– Introduces incorrect CI statements
• Solution:
– Add edges between “unmarried” parents
– This process is called moralization
Example: moralization
2 4
1 5 7
3 6

• Unfortunately, this looses some CI information


– Example:
Directed vs. Undirected GMs
• Q: which has more “expressive power”?
• Recall:
– G is an I-map of P if:
• Now define:
– G is a perfect I-map of P if:
• Graph can represent all (and only) CIs in P

Bayesian Networks and Markov Networks are


perfect maps for different sets of distributions
Parameterization
• No topological ordering on undirected graph
• Can’t use the chain rule of probability to
represent P(y)
• Instead we will use potential functions:
– associate potential functions with each maximal
clique in the graph
– A potential can be any non-negative function
• Joint distribution is defined to be proportional
to product of clique potentials
Parameterization (con’t)
• Joint distribution is defined to be
proportional to product of clique potentials
• Any positive distribution whose CI properties
can be represented by an UGM can be
represented this way.
Hammersly-Clifford Theorem
• A positive distribution P(Y) > 0 satisfies the CI
properties of an undirected graph G iff P can
be represented as a product of factors, one
per maximal clique

Z is the partition
function
1
Example
2 3
• If P satisfies the conditional
independence assumptions 4 5
of this graph, we can write
Pairwise MRF
• Potentials don’t need to
correspond to maximal 1
cliques
• We can also restrict 2 3
parameterization to edges
(or any other cliques) 4 5
• Pairwise MRF:
Representing Potential Functions
• Can represent as CPTs like we did for Bayesian
Networks (DGMs)
– But, potentials are not probabilities
– Represent relative “compatibility” between
various assignments
Representing Potential Functions
• More general approach:
– Represent the log potentials as a linear function of
the parameters
– Log-linear (maximum entropy) models
Log-Linear Models

Smoking Cancer

Asthma Cough
 Log-linear model:
1  
P( x)  exp   wi f i ( x) 
Z  i 
Weight of Feature i Feature i

 1 if  Smoking  Cancer
f1 (Smoking, Cancer )  
 0 otherwise
w1  0.51
Log-Linear models can
represent Table CPTs
• Consider pairwise MRF where each edge has
an associated potential w/ K^2 features:

• Then we can convert into a potential function


using the weight for each feature:

• But, log-linear model is more general


– Feature vectors can be arbitrarily designed

You might also like