0% found this document useful (0 votes)

198 views53 pages

Markov Clustering Algorithm

The document describes the Markov Clustering (MCL) algorithm for graph clustering. MCL uses random walks within a graph and two main operations - expansion and inflation. Expansion simulates multiple random walks to enhance flow between well-connected nodes. Inflation increases inequality in flow distribution, favoring nodes that receive more flow. Together, these operations cause flow to accumulate and form high-density clusters. MCL can find overlapping clusters and works by analyzing flow distributions within a graph to reveal its inherent cluster structure.

Uploaded by

Aaryan Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views53 pages

Markov Clustering Algorithm

Uploaded by

Aaryan Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 53

 Introduction

 Important Concepts in MCL Algorithm

 MCL Algorithm

 The Features of MCL Algorithm

 Summary
Graph Clustering
 Intuition:
◦ High connected nodes could be in one cluster
◦ Low connected nodes could be in different clusters.
 Model:
◦ A random walk may start at any node
◦ Starting at node r, if a random walk will reach node
t with high probability, then r and t should be
clustered together.
Markov Clustering (MCL)
 Markov process
◦ The probability that a random will take an edge at
node u only depends on u and the given edge.
◦ It does not depend on its previous route.
◦ This assumption simplifies the computation.
MCL
 Flow network is used to approximate the
partition
 There is an initial amount of flow injected

into each node.

 At each step, a percentage of flow will goes

from a node to its neighbors via the outgoing

edges.
MCL
 Edge Weight
◦ Similarity between two nodes
◦ Considered as the bandwidth or connectivity.
◦ If an edge has higher weight than the other, then
more flow will be flown over the edge.
◦ The amount of flow is proportional to the edge
weight.
◦ If there is no edge weight, then we can assign the
same weight to all edges.
Intuition of MCL
 Two natural clusters

A B

 When the flow reaches the border points, it is likely

to return back, than cross the border.
MCL
 When the flow reaches A, it has four possible
outcomes.
◦ Three back into the cluster, one leak out.
◦ ¾ of flow will return, only ¼ leaks.
 Flow will accumulate in the center of a cluster
(island).
 The border nodes will starve.
 Simualtion of Random Flow in graph

 Two Operations: Expansion and Inflation

 Intrinsic relationship between MCL process

result and cluster structure
 Popular Description: partition into graph so
that

 Intra-partition similarity is the highest

 Inter-partition similarity is the lowest

 Observation 1:

 The number of Higher-Length paths in G is

large for pairs of vertices lying in the same
dense cluster

 Small for pairs of vertices belonging to

different clusters
 Oberservation 2:

 A Random Walk in G that visits a dense

cluster will likely not leave the cluster until
many of its vertices have been visited
Definitions
 nxn Adjacency matrix A.
◦ A(i,j) = weight on edge from i to j
◦ If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric

 nxn Transition matrix P.

◦ P is row stochastic
◦ P(i,j) = probability of stepping on node j from node i
= A(i,j)/∑iA(i,j)

 nxn Laplacian Matrix L.

◦ L(i,j)=∑iA(i,j)-A(i,j)
◦ Symmetric positive semi-definite for undirected graphs
◦ Singular
Definitions

Adjacency matrix A
Transition matrix P

1 1
1 1/2
1 1

1 1/2
What is a random walk
t=0
1
1/2
1

1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2

t=2
1
1/2
1

1/2
What is a random walk
t=0 t=1
1 1
1/2 1/2
1 1

1/2 1/2

t=2 t=3
1
1/2 1
1
1/2
1

1/2
1/2
Probability Distributions
 xt(i) = probability that the surfer is at node i at time t

 xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i)

=∑jxt(j)*P(j,i)

 xt+1 = xtP = xt-1PP= xt-2PP*P = …=x0 Pt

 What happens when the surfer keeps walking for a long

time?
Flow Formulation

• Flow: Transition probability from a node to another node.

• Flow matrix: Matrix with the flows among all nodes; ith
column represents flows out of ith node. Each column sums
to 1.

1 2 3

1 2 3
0.5 0.5 1 0 0.5 0
Flow
1 2 3 2 1.0 0 1.0
1 1 Matrix
3 0 0.5 0

20
 Measure or Sample any of these—high-length
paths, random walks and deduce the cluster
structure from the behavior of the samples
quantities.

 Cluster structure will show itself as a peaked

distribution of the quantities

 A lack of cluster structure will result in a flat

distribution
 Markov Chain

 Random Walk on Graph

 Some Definitions in MCL

 A Random Process with Markov Property

 Markov Property: given the present state,

future states are independent of the past
states

 At each step the process may change its state

from the current state to another state, or
remain in the same state, according to a
certain probability distribution.
 A walker takes off on some arbitrary vertex

 He successively visits new vertices by

selecting arbitrarily one of outgoing edges

 There is not much difference between

random walk and finite Markov chain.
 Simple Graph

 Simple graph is undirected graph in which

every nonzero weight equals 1.
 Associated Matrix

 The associated matrix of G, denoted MG ,is

defined by setting the entry (MG)pq equal to
w(vp,vq)
 Markov Matrix

 The Markov matrix associated with a graph G

is denoted by TG and is formally defined by
letting its qth column be the qth column of M
normalized
 The associate matrix and markov matrix is
actually for matrix M+I

 I denotes diagonal matrix with nonzero

element equals 1

 Adding a loop to every vertex of the graph

because for a walker it is possible that he will
stay in the same place in his next step
 Find Higher-Length Path

 Start Point: In associated matrix that the

quantity (Mk)pq has a straightforward
interpretation as the number of paths of
length k between vp and vq
MG

(MG+I)2
MG
 Flow is easier with dense regions than across
sparse boundaries,

 However, in the long run, this effect

disappears.

 Power of matrix can be used to find higher-

length path but the effect will diminish as the
flow goes on.
 Idea: How can we change the distribution of
transition probabilities such that prefered
neighbours are further favoured and less
popular neighbours are demoted.

 MCL Solution: raise all the entries in a given

column to a certain power greater than 1 (e.g.
squaring) and rescaling the column to have
the sum 1 again.
 Expansion Operation: power of matrix,
expansion of dense region

 Inflation Operation: mention aboved,

elimination of unfavoured region
The MCL algorithm
Input: A, Adjacency matrix
Initialize M to MG, the canonical
Enhances flow to well-connected nodes
transition matrix M:= MG:= (A+I) D-1
as well as to new nodes.

Expand: M := M*M

Inflate: M := M.^r (r usually Increases inequality in each column.

2), renormalize columns “Rich get richer, poor get poorer.”

Prune

Saves memory by removing entries close

No
Converged to zero.
?
Yes

Output clusters
43
Multi-level Regularized MCL
Run R-MCL to convergence, output clusters.

Input Graph Input Graph

Coarsen Run Curtailed R-MCL,project flow.

Intermediate Intermediate
Graph Graph Initializes flow
Coarsen matrix of refined
... ... graph

Run Curtailed R-MCL, project flow.

Coarsen
Captures global
topology of graph Faster to run on
smaller graphs first
Coarsest Graph

44
 http://www.micans.org/mcl/ani/mcl-animati
on.html
 Find attractor: the node a is an attractor if
Maa is nonzero
 Find attractor system: If a is an attractor then

the set of its neighbours is called an attractor

system.
 If there is a node who has arc connected to

any node of an attractor system, the node will

belong to the same cluster as that attractor
system.
Attractor Set={1,2,3,4,5,6,7,8,9,10}
The Attractor System is {1,2,3},{4,5,6,7},{8,9},{10}
The overlapping clusters are {1,2,3,11,12,15},{4,5,6,7,13},
{8,9,12,13,14,15},{10,12,13}
 how many steps are requred before the
algorithm converges to a idempoent matrix?

 The number is typically somewhere between

10 and 100

 The effect of inflation on cluster granularity

R denotes the inflation operation
constants. a denotes the loop weight.
 MCL stimulates random walk on graph to find
cluster

 Expansion promotes dense region while

Inflation demotes the less favoured region

 There is intrinsic relationship between MCL

result and cluster structure

hw3 P
50% (2)
hw3 P
15 pages
MA252 - Combinatorial Optimisation
No ratings yet
MA252 - Combinatorial Optimisation
9 pages
UNIT - 4 Advanced Algorithm PDF
100% (1)
UNIT - 4 Advanced Algorithm PDF
36 pages
Wa0036.
No ratings yet
Wa0036.
92 pages
Ch-8-Daa-Mb Students
No ratings yet
Ch-8-Daa-Mb Students
27 pages
Svdthesis
No ratings yet
Svdthesis
175 pages
E-Note 19716 Content Document 20240513011047PM
No ratings yet
E-Note 19716 Content Document 20240513011047PM
84 pages
Chapter 11 Stochastic Methods Rooted in Statistical Mechanics
No ratings yet
Chapter 11 Stochastic Methods Rooted in Statistical Mechanics
24 pages
Cours GA Online
No ratings yet
Cours GA Online
49 pages
Mari Belajar Backtracking
No ratings yet
Mari Belajar Backtracking
59 pages
Bab Amrkov
No ratings yet
Bab Amrkov
96 pages
Graph
No ratings yet
Graph
54 pages
Cmu850 f20
No ratings yet
Cmu850 f20
309 pages
Graph Algorithms
No ratings yet
Graph Algorithms
45 pages
IT257 DAA Approximation Algorithms
No ratings yet
IT257 DAA Approximation Algorithms
55 pages
10 Graph Algorithms Visually Explained
No ratings yet
10 Graph Algorithms Visually Explained
16 pages
Ada Unit V
No ratings yet
Ada Unit V
28 pages
DAA (Algorithms Knowledge Capsule 1 by Dr. Choudhary Ravi Singh)
No ratings yet
DAA (Algorithms Knowledge Capsule 1 by Dr. Choudhary Ravi Singh)
15 pages
ML Group 5
No ratings yet
ML Group 5
21 pages
GUISE Uniform Sampling of Graphlets For Large Graph Analysis Removed
No ratings yet
GUISE Uniform Sampling of Graphlets For Large Graph Analysis Removed
4 pages
Book PDF
No ratings yet
Book PDF
516 pages
AnthonyTrubiano WrittenAndOral
No ratings yet
AnthonyTrubiano WrittenAndOral
8 pages
Chapter Four
No ratings yet
Chapter Four
64 pages
Agao22 Script
No ratings yet
Agao22 Script
208 pages
TM3 ch05 Link Analysis
No ratings yet
TM3 ch05 Link Analysis
69 pages
Anirban CMI StatFin 2019 II
No ratings yet
Anirban CMI StatFin 2019 II
92 pages
4th Sem DAA Module 3
No ratings yet
4th Sem DAA Module 3
10 pages
Week 5 and 6 - Greedy Strategy Technique
No ratings yet
Week 5 and 6 - Greedy Strategy Technique
58 pages
Advanced Algorithms, CSE 5311, Prof. Chris Ding: Shilpa Goley & Raghunath Ravi
No ratings yet
Advanced Algorithms, CSE 5311, Prof. Chris Ding: Shilpa Goley & Raghunath Ravi
8 pages
Huffman Codes: Spanning Tree
No ratings yet
Huffman Codes: Spanning Tree
6 pages
MCMC
No ratings yet
MCMC
7 pages
Daa Module4 Slides
No ratings yet
Daa Module4 Slides
47 pages
DAA Ass Group3
No ratings yet
DAA Ass Group3
19 pages
LeetCode Graph
No ratings yet
LeetCode Graph
40 pages
Week 7 & 11 - Dynamic Programming Strategy
No ratings yet
Week 7 & 11 - Dynamic Programming Strategy
58 pages
Unit 2
No ratings yet
Unit 2
181 pages
13 - Chapter 22
No ratings yet
13 - Chapter 22
42 pages
Warshall's Algorithm - Updated
No ratings yet
Warshall's Algorithm - Updated
12 pages
Markov Chains
No ratings yet
Markov Chains
37 pages
AAD Flow Networks and Divide and Conquer
No ratings yet
AAD Flow Networks and Divide and Conquer
17 pages
List of Algorithms - Wikipedia, The Free Encyclopedia
No ratings yet
List of Algorithms - Wikipedia, The Free Encyclopedia
34 pages
3.0 Search
No ratings yet
3.0 Search
95 pages
Sarthak Tomar53 Unit-4 DAA
No ratings yet
Sarthak Tomar53 Unit-4 DAA
9 pages
Unit-4: Graph Coloring
No ratings yet
Unit-4: Graph Coloring
17 pages
ADA Module 4 - Full
No ratings yet
ADA Module 4 - Full
76 pages
On Clustering Using Random Walks: Abstract. We Propose A Novel Approach To Clustering, Based On Deter
No ratings yet
On Clustering Using Random Walks: Abstract. We Propose A Novel Approach To Clustering, Based On Deter
24 pages
Quantum Walks
No ratings yet
Quantum Walks
125 pages
A and Weighted A Search: Maxim Likhachev Carnegie Mellon University
No ratings yet
A and Weighted A Search: Maxim Likhachev Carnegie Mellon University
55 pages
Graph Theory - Notes After MST
No ratings yet
Graph Theory - Notes After MST
11 pages
Greedy Algorithms
No ratings yet
Greedy Algorithms
11 pages
Ds 15warshall
No ratings yet
Ds 15warshall
25 pages
5 Weighted Matching
No ratings yet
5 Weighted Matching
27 pages
Belmen Ford Algorithm
No ratings yet
Belmen Ford Algorithm
27 pages
Warshall Algorithm: Algorithm, or The WFI Algorithm
No ratings yet
Warshall Algorithm: Algorithm, or The WFI Algorithm
15 pages
18CS42 - Module 4
No ratings yet
18CS42 - Module 4
68 pages
MCMC
No ratings yet
MCMC
70 pages
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
No ratings yet
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
102 pages
Graph Algorithms: Text Book: Introduction To Algorithms Byclrs
No ratings yet
Graph Algorithms: Text Book: Introduction To Algorithms Byclrs
142 pages
Geyer - Markov Chain Monte Carlo Lecture Notes
No ratings yet
Geyer - Markov Chain Monte Carlo Lecture Notes
166 pages
Saharsa Dist
No ratings yet
Saharsa Dist
1 page
CHEM F212 Organic Chemistry Midsem Test Notice
No ratings yet
CHEM F212 Organic Chemistry Midsem Test Notice
1 page
IR - A1 - Mid Sem Marks - Pre Compre
No ratings yet
IR - A1 - Mid Sem Marks - Pre Compre
8 pages
Role of Lnrna and Chromatin
No ratings yet
Role of Lnrna and Chromatin
5 pages
Devbio
No ratings yet
Devbio
140 pages
Germ Cell and Fertilization 1
No ratings yet
Germ Cell and Fertilization 1
70 pages
Development Theories 1 2
No ratings yet
Development Theories 1 2
66 pages
(I) Explain The Terms Chiral Non-Racemic and Chiral Racemic. 2
No ratings yet
(I) Explain The Terms Chiral Non-Racemic and Chiral Racemic. 2
2 pages
Trigonometry Notes
No ratings yet
Trigonometry Notes
4 pages
Trigonometry Varsity Practice Sheet
No ratings yet
Trigonometry Varsity Practice Sheet
14 pages
Midterm Lec Exam
No ratings yet
Midterm Lec Exam
14 pages
Prime Numbers
No ratings yet
Prime Numbers
27 pages
Paper A MS - C3 Solomon
No ratings yet
Paper A MS - C3 Solomon
4 pages
Combinatorics of Set Partitions Mansour Toufik PDF Download
No ratings yet
Combinatorics of Set Partitions Mansour Toufik PDF Download
77 pages
Mathematics 511 Assignment 2
No ratings yet
Mathematics 511 Assignment 2
16 pages
PPT4-Exponential and Logarithmic Functions
No ratings yet
PPT4-Exponential and Logarithmic Functions
17 pages
RD Sharma Jan2021 Class 9 Maths Chapter 4 Exercise 4.3
No ratings yet
RD Sharma Jan2021 Class 9 Maths Chapter 4 Exercise 4.3
8 pages
27 Slide
No ratings yet
27 Slide
31 pages
Data Structures and Algorithms Assignment
100% (1)
Data Structures and Algorithms Assignment
25 pages
Algorithm
100% (5)
Algorithm
327 pages
Samrat Mondal Cse A (M)
No ratings yet
Samrat Mondal Cse A (M)
12 pages
Beautiful Conjectures in Graph Theory
No ratings yet
Beautiful Conjectures in Graph Theory
20 pages
COMP4500 - 7500 - 2011, Sem 2
No ratings yet
COMP4500 - 7500 - 2011, Sem 2
8 pages
MITIT 2025 Winter Editorials Advanced Round 1
No ratings yet
MITIT 2025 Winter Editorials Advanced Round 1
6 pages
1-MAT31009 Toán Sơ Cấp (Elementary Mathematics) - k62 - CLC
No ratings yet
1-MAT31009 Toán Sơ Cấp (Elementary Mathematics) - k62 - CLC
13 pages
Chapter 01 The Gamma and Beta Functions 01 $cotha76$
No ratings yet
Chapter 01 The Gamma and Beta Functions 01 $cotha76$
29 pages
Programming Assignment 1
No ratings yet
Programming Assignment 1
2 pages
Greedy Algorithms: CSE373: Design and Analysis of Algorithms
No ratings yet
Greedy Algorithms: CSE373: Design and Analysis of Algorithms
52 pages
3p-Ebk - Calculus Early Transcendentals
No ratings yet
3p-Ebk - Calculus Early Transcendentals
2 pages
CPSC Algorithms Cheat Sheet
No ratings yet
CPSC Algorithms Cheat Sheet
6 pages
Byjus Com Maths Trigonometry Questions 1 15 - Removed
No ratings yet
Byjus Com Maths Trigonometry Questions 1 15 - Removed
6 pages
Min Cost Flow and Succsesive Shortest Path Algorithm
No ratings yet
Min Cost Flow and Succsesive Shortest Path Algorithm
18 pages
Chromatic Graph Theory Second Edition Gary Chartrand - Quickly Download The Ebook To Explore The Full Content
100% (1)
Chromatic Graph Theory Second Edition Gary Chartrand - Quickly Download The Ebook To Explore The Full Content
71 pages
1 Lactus Rectum
No ratings yet
1 Lactus Rectum
11 pages
02 Special Functions
No ratings yet
02 Special Functions
2 pages
Planarity and Eulers Formula 2
No ratings yet
Planarity and Eulers Formula 2
9 pages
DAA Unit3 Notes and QBank
100% (1)
DAA Unit3 Notes and QBank
37 pages
64e8739a823b6d0018c648c1 ## Inverse Trigonometric Functions DPP
No ratings yet
64e8739a823b6d0018c648c1 ## Inverse Trigonometric Functions DPP
3 pages

Markov Clustering Algorithm

Uploaded by

Markov Clustering Algorithm

Uploaded by

 Introduction

 Important Concepts in MCL Algorithm

 The Features of MCL Algorithm

into each node.

from a node to its neighbors via the outgoing

 When the flow reaches the border points, it is likely

 Two Operations: Expansion and Inflation

 Intrinsic relationship between MCL process

 Intra-partition similarity is the highest

 Inter-partition similarity is the lowest

 The number of Higher-Length paths in G is

 Small for pairs of vertices belonging to

 A Random Walk in G that visits a dense

 nxn Transition matrix P.

 nxn Laplacian Matrix L.

 xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i)

 xt+1 = xtP = xt-1*P*P= xt-2*P*P*P = …=x0 Pt

 What happens when the surfer keeps walking for a long

• Flow: Transition probability from a node to another node.

 Cluster structure will show itself as a peaked

 A lack of cluster structure will result in a flat

 Random Walk on Graph

 Some Definitions in MCL

 Markov Property: given the present state,

 At each step the process may change its state

 He successively visits new vertices by

 There is not much difference between

 Simple graph is undirected graph in which

 The associated matrix of G, denoted MG ,is

 The Markov matrix associated with a graph G

 I denotes diagonal matrix with nonzero

 Adding a loop to every vertex of the graph

 Start Point: In associated matrix that the

 However, in the long run, this effect

 Power of matrix can be used to find higher-

 MCL Solution: raise all the entries in a given

 Inflation Operation: mention aboved,

Inflate: M := M.^r (r usually Increases inequality in each column.

Saves memory by removing entries close

Input Graph Input Graph

Coarsen Run Curtailed R-MCL,project flow.

Run Curtailed R-MCL, project flow.

the set of its neighbours is called an attractor

any node of an attractor system, the node will

 The number is typically somewhere between

 The effect of inflation on cluster granularity

 Expansion promotes dense region while

 There is intrinsic relationship between MCL

You might also like

 xt+1 = xtP = xt-1PP= xt-2PP*P = …=x0 Pt