CS369M: Algorithms for Modern Massive Data Set Analysis
Lecture 12 - 11/04/2009
Introduction to Graph Partitioning
Lecturer: Michael Mahoney
Scribes: Noah Youngs and Weidong Shao
*Unedited Notes
Graph Partition
A graph partition problem is to cut a graph into 2 or more good pieces. The methods are based on
1. spectral. Either global (e.g., Cheeger inequality,) or local.
2. ow-based. min-cut/max-ow theorem. LP formulation. Embeddings. Local Improvement.
3. combination of spectral and ow.
Note that not all graphs have good partitions.
Question: Can we certify that there are no good clusters in a graph?
Good clusters have the following properties:
1. internally (intra) - well connected.
2. externally (inter) - relatively poor
How do we quantify this?
Extreme cases:
1. split into 2 disconnected pieces
2. split into
S, S
on 2 maximum complete induced subgraphs.
Min cut problem
, where S V .
Define Given G = (V, E), a cut is a partition of V , (S, S)
Given
s, t V ,
an
(s, t) cut is a cut s.t. s S, t S
(u, v) : (u, v) E, u S, v S
A cut set of a cut is
The min cut problem: nd the cut of "smallest" edge weights
1. good: Polynomial time algorithm (min-cut = max ow)
2. bad: often get very inbalanced cut
3. in theory: cut algorithms are used as a sub-routine in divide and conquer algorithm
4. in practice: often want to "interpret" the clusters or partitions
Max Flow Problem
Define Call the capacity of an edge (u, v) E : euv
c : E R+
f : E R+
Let there be a cost function:
Then a ow is function of
1.
2.
, delineated
fuv Cuv u, v (capacity constraints)
P
P
fvu (conservation of
(u,v)E fuv =
cuv
or
ce
ows)
Then the value of the ow
|f | =
fsv
v
The MAX ow problem:
max |f |
The capacity of
(s, t)
cut is
= P Cuv .
c(S, S)
The min cut problem is
min C(S, T )
Note: this is a "single ow problem" ... i.e. only one
Theorem: the max value of an
st
and one
ow is equal to the min capacity of an
st
cut.
Proof idea:
max f low min cut
(weak duality)
Does there exists a cut that achieves equality?
Yes, from the strong duality theorem we can also solve the dual of the max-ow problem, which is the
min-ow problem
Primal: (max ow)
max |f |
subject to
fuv Cuv
Dual: (min cut)
min
cij dij
(i,j)E
s.t.
dij pi + pj 0, ij E
ps = 1, pt = 0, pi 0, V
dij 0, ij E
Can we add a "balance" condition?
1. want a good cut value
2. want
S, S
E(S, S)
both to be balanced - same size, or approximately same size
the answer is "Yes"
Explicit balance conditions:
= n/2
|S| = |S|
= (1 )n
|S| = n, |S|
Graph bisection - min cut s.t.
balanced cut min cut s.t
Implicit Balance conditions:
1. input balance constraints
2. expansion.
3. sparsity
E(S,S)
|S|
n
(def this as :h(S) )
E(S,S)
(def this as :sp(S) )
|S||S|
4. conductance
E(S,S)
V ol(S)
n
5. normalized cut
(with
V ol(S) =
ijE
deg(Vi )
E(S,S)
vol(|S|)vol(|S|)
(latter two are used in ML)
6. quotien cut
E(S,S)
min(vol(|S|),vol(|S|))
expansion and sparcity: are "same" (in the following sense:)
min h(S) min sp(S)
Quotient cuts yield a tight bound on cheeger inequality
In-practice: bias towards high degree nodes
Note:
quotient cuts get balanced implicitly, no explicit constraints on inter or intra connectivity
Z2
on random geometric graps or nice planer graphs yield good quotient cuts
More generally, - very inbalanced - disconnected clusters.
Example: extremely sparse random graph
G(n, p)
model,
Graph Partition Algorithms
4.1 Local Improvement
Developed in the 70's
Often it is a greedy improvemnt
Local minima are a big problem
p log n2 /n
expander
p logn/n
Usual methods improve them by constant factors
- simulated annealing
- big dierence in practice
Kernighan-Lin algorithm, fundamental work, no-longer used due to
(n2 )
performance
Fiduccia-Mattheyses algorithm, linear time, still commonly used
METIS algorithm from Karypis and Kumar, works very well in practice, especially on low dimensional graphs
4.2 Spectral methods
Develped in the 70's and 80's
Serivce level gaurantee (Cheeger's inequality)
At root, this is relaxation or rounding method related to QIP formualation :
t
M AXx(1.1)n xxtLx
x
- quadratic worst case.
hyperplane rounding:
-compute an eigenvector
- cut according to some rules
- post processing with local improvments
4.3 Flow-based methods
Developed in the 90's
Consider all pairs, multi-commodity ow problem.
Want to route the commodities s.t. the constraints are satised without bottlenecks.
Idea: bottleneck in ow computation corresponds to good cuts.
kcommodity
(logn)
problem: does not satisfy strong duality.
releax ow to LP
embed solution in l1
Round soltuion to
0, 1, (log n)
does satisfy approx min-cut max ow value gap
worst case.
4.4 Additional Graph Partitioning Notes
These methods "fail".... i.e. achieve the worst case, on the following graphs:
- spectral methods - fail on long stringy pieces
- ow-based methods - fail on expander graphs. n choose 2 pairs but most pairs are far apart. (log n) apart.
Improvements/extensions for large data:
there exist hybrid ow based and local methods
(cut around the cut) local spectrum methods
good cut around a start node of a given size
time depends on the size of the output.
4.5 Methods that combine spectral and ow
ARV algorithm (developed a few years ago by Arora, Rao, and Vazirani)
most hyrbid algorithms are theoretical, but some implementations embed in SDP.
approximate solution (two-player game).
boosting & emsemble methods
References
1. Schaeer, "Graph Clustering", Computer Science Review 1(1): 27-64, 2007
2. Kernighan, B. W.; Lin, Shen (1970). "An ecient heuristic procedure for partitioning graphs". Bell
Systems Technical Journal 49: 291-307.
3. CM Fiduccia, RM Mattheyses. "A Linear-Time Heuristic for Improving Network Partitions". Design
Automation Conference.
4. G Karypis, V Kumar (1999). "A Fast and High Quality Multilevel Scheme for Partitioning Irregular
Graphs". Siam Journal on Scientic Computing.