0% found this document useful (0 votes)
83 views

Markov Clustering Algorithm

The document describes the Markov Clustering (MCL) algorithm for graph clustering. MCL uses random walks within a graph and two main operations - expansion and inflation. Expansion simulates multiple random walks to enhance flow between well-connected nodes. Inflation increases inequality in flow distribution, favoring nodes that receive more flow. Together, these operations cause flow to accumulate and form high-density clusters. MCL can find overlapping clusters and works by analyzing flow distributions within a graph to reveal its inherent cluster structure.

Uploaded by

Aaryan Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Markov Clustering Algorithm

The document describes the Markov Clustering (MCL) algorithm for graph clustering. MCL uses random walks within a graph and two main operations - expansion and inflation. Expansion simulates multiple random walks to enhance flow between well-connected nodes. Inflation increases inequality in flow distribution, favoring nodes that receive more flow. Together, these operations cause flow to accumulate and form high-density clusters. MCL can find overlapping clusters and works by analyzing flow distributions within a graph to reveal its inherent cluster structure.

Uploaded by

Aaryan Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

 Introduction

 Important Concepts in MCL Algorithm

 MCL Algorithm

 The Features of MCL Algorithm

 Summary
Graph Clustering
 Intuition:
◦ High connected nodes could be in one cluster
◦ Low connected nodes could be in different clusters.
 Model:
◦ A random walk may start at any node
◦ Starting at node r, if a random walk will reach node
t with high probability, then r and t should be
clustered together.
Markov Clustering (MCL)
 Markov process
◦ The probability that a random will take an edge at
node u only depends on u and the given edge.
◦ It does not depend on its previous route.
◦ This assumption simplifies the computation.
MCL
 Flow network is used to approximate the
partition
 There is an initial amount of flow injected

into each node.


 At each step, a percentage of flow will goes

from a node to its neighbors via the outgoing


edges.
MCL
 Edge Weight
◦ Similarity between two nodes
◦ Considered as the bandwidth or connectivity.
◦ If an edge has higher weight than the other, then
more flow will be flown over the edge.
◦ The amount of flow is proportional to the edge
weight.
◦ If there is no edge weight, then we can assign the
same weight to all edges.
Intuition of MCL
 Two natural clusters

A B

 When the flow reaches the border points, it is likely


to return back, than cross the border.
MCL
 When the flow reaches A, it has four possible
outcomes.
◦ Three back into the cluster, one leak out.
◦ ¾ of flow will return, only ¼ leaks.
 Flow will accumulate in the center of a cluster
(island).
 The border nodes will starve.
 Simualtion of Random Flow in graph

 Two Operations: Expansion and Inflation

 Intrinsic relationship between MCL process


result and cluster structure
 Popular Description: partition into graph so
that

 Intra-partition similarity is the highest

 Inter-partition similarity is the lowest


 Observation 1:

 The number of Higher-Length paths in G is


large for pairs of vertices lying in the same
dense cluster

 Small for pairs of vertices belonging to


different clusters
 Oberservation 2:

 A Random Walk in G that visits a dense


cluster will likely not leave the cluster until
many of its vertices have been visited
Definitions
 nxn Adjacency matrix A.
◦ A(i,j) = weight on edge from i to j
◦ If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric

 nxn Transition matrix P.


◦ P is row stochastic
◦ P(i,j) = probability of stepping on node j from node i
= A(i,j)/∑iA(i,j)

 nxn Laplacian Matrix L.


◦ L(i,j)=∑iA(i,j)-A(i,j)
◦ Symmetric positive semi-definite for undirected graphs
◦ Singular
Definitions

Adjacency matrix A
Transition matrix P

1 1
1 1/2
1 1

1 1/2
What is a random walk
t=0
1
1/2
1

1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2

t=2
1
1/2
1

1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2

t=2 t=3
1
1/2 1
1
1/2
1

1/2
1/2
Probability Distributions
 xt(i) = probability that the surfer is at node i at time t

 xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i)


=∑jxt(j)*P(j,i)

 xt+1 = xtP = xt-1*P*P= xt-2*P*P*P = …=x0 Pt

 What happens when the surfer keeps walking for a long


time?
Flow Formulation

• Flow: Transition probability from a node to another node.


• Flow matrix: Matrix with the flows among all nodes; ith
column represents flows out of ith node. Each column sums
to 1.

1 2 3

1 2 3
0.5 0.5 1 0 0.5 0
Flow
1 2 3 2 1.0 0 1.0
1 1 Matrix
3 0 0.5 0

20
 Measure or Sample any of these—high-length
paths, random walks and deduce the cluster
structure from the behavior of the samples
quantities.

 Cluster structure will show itself as a peaked


distribution of the quantities

 A lack of cluster structure will result in a flat


distribution
 Markov Chain

 Random Walk on Graph

 Some Definitions in MCL


 A Random Process with Markov Property

 Markov Property: given the present state,


future states are independent of the past
states

 At each step the process may change its state


from the current state to another state, or
remain in the same state, according to a
certain probability distribution.
 A walker takes off on some arbitrary vertex

 He successively visits new vertices by


selecting arbitrarily one of outgoing edges

 There is not much difference between


random walk and finite Markov chain.
 Simple Graph

 Simple graph is undirected graph in which


every nonzero weight equals 1.
 Associated Matrix

 The associated matrix of G, denoted MG ,is


defined by setting the entry (MG)pq equal to
w(vp,vq)
 Markov Matrix

 The Markov matrix associated with a graph G


is denoted by TG and is formally defined by
letting its qth column be the qth column of M
normalized
 The associate matrix and markov matrix is
actually for matrix M+I

 I denotes diagonal matrix with nonzero


element equals 1

 Adding a loop to every vertex of the graph


because for a walker it is possible that he will
stay in the same place in his next step
 Find Higher-Length Path

 Start Point: In associated matrix that the


quantity (Mk)pq has a straightforward
interpretation as the number of paths of
length k between vp and vq
MG

(MG+I)2
MG
 Flow is easier with dense regions than across
sparse boundaries,

 However, in the long run, this effect


disappears.

 Power of matrix can be used to find higher-


length path but the effect will diminish as the
flow goes on.
 Idea: How can we change the distribution of
transition probabilities such that prefered
neighbours are further favoured and less
popular neighbours are demoted.

 MCL Solution: raise all the entries in a given


column to a certain power greater than 1 (e.g.
squaring) and rescaling the column to have
the sum 1 again.
 Expansion Operation: power of matrix,
expansion of dense region

 Inflation Operation: mention aboved,


elimination of unfavoured region
The MCL algorithm
Input: A, Adjacency matrix
Initialize M to MG, the canonical
Enhances flow to well-connected nodes
transition matrix M:= MG:= (A+I) D-1
as well as to new nodes.

Expand: M := M*M

Inflate: M := M.^r (r usually Increases inequality in each column.


2), renormalize columns “Rich get richer, poor get poorer.”

Prune

Saves memory by removing entries close


No
Converged to zero.
?
Yes

Output clusters
43
Multi-level Regularized MCL
Run R-MCL to convergence, output clusters.

Input Graph Input Graph

Coarsen Run Curtailed R-MCL,project flow.

Intermediate Intermediate
Graph Graph Initializes flow
Coarsen matrix of refined
... ... graph

Run Curtailed R-MCL, project flow.


Coarsen
Captures global
topology of graph Faster to run on
smaller graphs first
Coarsest Graph

44
 http://www.micans.org/mcl/ani/mcl-animati
on.html
 Find attractor: the node a is an attractor if
Maa is nonzero
 Find attractor system: If a is an attractor then

the set of its neighbours is called an attractor


system.
 If there is a node who has arc connected to

any node of an attractor system, the node will


belong to the same cluster as that attractor
system.
Attractor Set={1,2,3,4,5,6,7,8,9,10}
The Attractor System is {1,2,3},{4,5,6,7},{8,9},{10}
The overlapping clusters are {1,2,3,11,12,15},{4,5,6,7,13},
{8,9,12,13,14,15},{10,12,13}
 how many steps are requred before the
algorithm converges to a idempoent matrix?

 The number is typically somewhere between


10 and 100

 The effect of inflation on cluster granularity


R denotes the inflation operation
constants. a denotes the loop weight.
 MCL stimulates random walk on graph to find
cluster

 Expansion promotes dense region while


Inflation demotes the less favoured region

 There is intrinsic relationship between MCL


result and cluster structure

You might also like