1 SS 2011 - Graph-Based Methods For NLP - UKP Lab - Wolfgang Stille
1 SS 2011 - Graph-Based Methods For NLP - UKP Lab - Wolfgang Stille
Organisatorisches:
Graph isomorphism
Adjacency
Matrix
ì
î
Adjacency
List
P = NP
?
There
are
efficient
(polynomial)
algorithms
for
the
exact
solu1on
of
many
problems
on
graphs,
e.g.
• Graph
Traversal
(DFS,
Shortest
Paths,
Max-‐Capacity
Paths,
…)
• Op1mal
Trees
and
Branchings
(MST,
MAX-‐FOREST,
MAX-‐BRANCHING,
…)
• Graph
Clustering
(Min-‐Cut,
Markow
Clustering,
Chinese
Whispers,
…)
• Graph
Ranking
(PageRank,
Random
Walks,
Markow
Chain
Theory)
• Graph
Distances
(local:
Paths,
global:
Graph
Edit
Distance,
…)
• Flows
on
Graphs
(MAX-‐FLOW,
MIN-‐COST
FLOW,
…)
• Matching
and
Assignment
(Hungarian
Method,
Edmond’s
Algorithm)
• many
more
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 8
Efficient
Algorithms!
There
are
efficient
approxima#on
algorithms
and
heuris#cs
for
the
approximate
solu1on
of
many
graphs
problems,
e.g.
• Subgraph
Problems
(Dense
Subgraphs,
Minors,
…)
• Op1mal
Tour
Problems
(TSP,
PCTSP,
VRP,
…)
• Steiner
Trees
• many
more
§ The
equa1on
is
recursive,
but
it
may
be
computed
page page
by
star1ng
with
any
set
of
ranks
and
itera1ng
the
A C
computa1on
un1l
it
converges.
§ Rank
sink
problem:
cycle
of
pages
that
page
accumulates
rank
within
the
cycle,
but
never
D
page
distributes
rank
outside
X
§ Need
damping:
uniform
rank
distribu1on
for
all
pages
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 13
Random
Surfer
Model
§ When
normalizing
PageRank
over
all
pages
to
1,
R(u)
can
be
thought
of
as
the
probability
that
a
random
surfer
looks
at
a
page
u.
§ Damping
corresponds
to
“teleporta1on”:
With
some
probability
d,
the
random
surfer
is
teleported
to
some
other
page
page
B
page
X page page
A C
page
D
p0 = 1/N 1
t=0;
repeat until δ < ϵ:
t=t+1;
pt = MTpt−1 ;
δ = ||pt − pt−1 ||;
return pt ;
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 15
LexRank:
Applica#on
to
Mul#-‐Document
Summariza#on
Mul2-‐document
summariza2on
task:
1. iden1fy
important
topics
of
the
documents
to
be
summarized
2. iden1fy
sentences
belonging
to
a
certain
topic
3. from
these
sentences
belonging
to
the
same
topic,
select
the
ones
that
best
describe
the
topic
4. concatenate
sentences
from
different
topics
and
make
sure
they
fit
together
w w w w w w w w
Sentence 1 2 3 n 1 2 3 n
.27
3 0 2 0 0 0 0
This is a sentence that
talks about some topic.
§ Centroid
§ Idea:
select
an
average
sentence.
Compute
average
point
of
sentence
vectors
(centroid)
§ select
sentence
that
is
most
similar
to
the
centroid
for
summariza1on
§ Degree
Centrality
§ Idea:
sentences
that
cover
most
of
the
content
have
a
high
node
degree
(number
of
edges):
since
word
overlap
is
responsible
for
edges,
node
degree
measures
word
overlap
with
the
overall
set
of
sentences
§ for
summariza1on,
choose
the
sentence
with
the
highest
degree
§ LexRank
Centrality
§ Idea:
it
does
not
suffice
to
be
similar
to
many
sentences:
similarity
to
important
sentences
counts
more.
§ normalize
the
adjacency
sentence
similarity
to
make
it
a
stochas1c
matrix
§ run
PageRank
to
obtain
scores
that
are
used
for
ranking
the
sentences
§ for
summariza1on,
choose
sentence
with
highest
score
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 19
Evalua#on
of
graph-‐based
mul#-‐document
summariza#on
§ Scores:
ROUGE
metric:
similar
to
BLEU,
between
manual
summaries
and
system
summaries
§ random
baseline:
select
any
sentence
from
set
by
chance
§ lead-‐based:
select
based
on
posi1on
of
sentence
within
document
è LexRank
is
a
simple
method
for
genng
high
scores.
It
uses
the
whole
structure
of
the
graph,
as
opposed
to
Centroid
or
Degree.
This
technique
also
works
well
for
single-‐document
summariza1on.
§ Keyword
extrac#on:
find
the
most
salient
keywords
for
a
document
§ Keyword
extrac#on
with
PageRank:
§ preprocess
document:
iden1fy
adjec1ves
and
nouns
as
targets
§ target
co-‐occurrence
graph:
targets
co-‐occurring
within
a
window
of
2-‐10
words
§ apply
PageRank
to
get
ranking
scores
on
nodes
§ select
highest
scoring
keywords,
possibly
concatenate
ADJ-‐NOUN-‐NOUN
sequences
if
present
in
the
text
§ Comparison:
Supervised
system
that
is
trained
on
manually
assigned
keywords,
using
frequency
and
contextual
features
§ Note
that
TextRank
is
unsupervised:
no
training
necessary
§ Task:
Find
meaningful
groups
of
nodes
in
graph
by
cunng
edges
§ Intui1on:
Connectedness
within
a
cluster
is
higher
than
between
clusters
§ Many
graph
clustering
algorithms
find
the
number
of
clusters
automa1cally
3 3 3
http://elisa.dyndns-web.com/~elisa/publications/
http://scienceblogs.com/goodmath/2007/08/maximum_flow_and_minimum_cut_1.php
§ Clustering
based
on
random
walks:
MCL
is
the
parallel
simula1on
of
all
possible
random
walks
up
to
a
finite
length
on
a
graph
G
§ Idea:
a
random
walker
on
the
graph
is
more
likely
to
stay
within
the
same
cluster
than
to
end
up
in
a
different
cluster
a[er
a
small
number
of
steps
§ Algorithm:
can
show
convergence
to
a
limit
T
Add loops: transition matrix T= column-normalize (AG + I)
MCL process: alternate between
T=Tt // expansion: raise T to its power of t
T=inflate(T) // inflation: increase contrast within
columns by raising values to their power
of s (s>0) and normalize column-wise
Interpret T as a clustering: use strongest connection as label
Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May 2000.
§ (stochas1c)
adjacency
matrix
T:
probabili1es
to
walk
from
node
in
column
to
node
in
row
in
a
single
step.
§ T2:
probabili1es
to
walk
from
A
to
B
in
2
steps.
AG
loops
added
T T2
x2
x2 norm
alize
x2
§ Inflate
the
differences
within
a
column
by
taking
the
k-‐th
power
of
the
value,
then
normalize
to
ensure
stochas1c
property.
k
regulates
the
cluster
sizes
§ Clustering:
Highest
entry
in
column
vector
is
cluster
label
variants:
§ Could
add
small
random
noise
to
break
1es
§ Op1miza1on:
Only
keep
K
largest
values,
only
keep
values
over
threshold
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 27
Chinese
Whispers
Graph
Clustering
Semantic enrichment:
• Use the nodes on the paths / flows for enriching to overcome the knowledge
acquisition bottleneck
§ Graph
representa1on
is
a
natural
representa1on
of
en11es
and
their
rela1ons
§ We
might
use
well-‐known
(efficient)
graph
algorithms
for
the
solu1on
of
specific
NLP
problems
§ Taking
the
overall
structure
into
account
some
NLP
tasks
might
be
improved
(enriching
seman1cs)
§ Graph
clustering
algorithms
solve
unsupervised
NLP
tasks
without
the
need
to
specify
the
number
of
clusters
§ We
can
enrich
informa1on
by
walks
on
graphs