05 Smallworlds
05 Smallworlds
05 Smallworlds
CS 224W
Outline
¤ Local structure
¤ clustering coefficient
¤ motifs
MA
NE
Milgram’s experiment
Instructions:
Given a target individual (stockbroker in Boston), pass the
message to a person you correspond with who is “closest” to
the target.
Outcome:
email experiment
Dodds, Muhamad, Watts,
Science 301, (2003)
(optional reading)
•18 targets
•13 different countries
•60,000+ participants
•24,163 message chains
•384 reached their targets
•average path length 4.0
Source: An Experimental Study of Search in Global Social Networks: Peter Sheridan Dodds, Roby Muhamad, and
Duncan J. Watts (8 August 2003); Science 301 (5634), 827.
Quiz Q:
‘recovered’
histogram of path
lengths
inter-country
intra-country
Source: An Experimental Study of Search in Global Social Networks: Peter Sheridan Dodds, Roby Muhamad, and
Duncan J. Watts (8 August 2003); Science 301 (5634), 827.
Navigation and accuracy
What does it mean to be 1, 2, 3 hops apart on
Facebook, Twitter, LinkedIn, Google Plus?
Transitivity, triadic closure, clustering
¤Transitivity:
¤ if A is connected to B and B is connected to C
what is the probability that A is connected to C?
A ?
C
B
Clustering
¤For a vertex i
¤ The fraction pairs of neighbors of the node that
are themselves connected
¤ Let ni be the number of neighbors of vertex i
ni = 4
max number of connections:
4*3/2 = 6
3 connections present
Ci = 3/6 = 0.5
i
link present
link absent
Quiz Q:
(a)0
(b)1/3
(c)1/2
(d)2/3
i
Explanation
¤ni = 3
¤there are 2 connections present out of
max of 3 possible
¤Ci = 2/3
i
beyond social networks
Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:440-442.
Watts-Strogatz model:
Generating small world graphs
Select a fraction p of edges
Reposition on of their endpoints
Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:440-442.
Watts-Strogatz model:
Generating small world graphs
¤Each node has K>=4 nearest neighbors
(local)
¤tunable: vary the probability p of rewiring any
given edge
¤small p: regular lattice
¤large p: classical random graph
Quiz question:
Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:440-442.
Trying this with NetLogo
http://web.stanford.edu/class/cs224w/NetLogo/SmallWorldWS.nlogo
WS model clustering coefficient
0.8
0.6
C(p)/C(0)
0.4
0.2
MA
NE
Kleinberg’s geographical small world model
http://web.stanford.edu/class/cs224w/
NetLogo/SmallWorldSearch.nlogo
geographical search when network lacks locality
When r=0, links are randomly distributed, ASP ~ log(n), n size of grid
When r=0, any decentralized algorithm is at least a0n2/3
When r<2,
expected
time at
least αrn(2-r)/3
p ~ p0
Overly localized links on a lattice
When r>2 expected search time ~ N(r-2)/(r-1)
1
p~ 4
d
Just the right balance
When r=2, expected time of a DA is at most C (log N)2
1
p~ 2
d
Navigability
λ2|R|<|R’|<λ|R|
R
R’
k = c log2n calculate probability that s fails to have a link in R’
Quiz Q:
f(q) ~ q-α
Source: Kleinberg, ‘Small-World Phenomena and the Dynamics of Information’ NIPS 14, 2001.
hierarchical small-world models: WDN
Watts, Dodds, Newman (Science, 2001)
individuals belong to hierarchically nested groups
pij ~ exp(-α x)
Source: 1978 Peter D. Killworth and H. Russell Bernard. The Reverse Small World Experiment Social Networks 1:159–92.
Navigability and search strategy:
Small world experiment @ Columbia
Successful chains disproportionately used
• weak ties (Granovetter)
• professional ties (34% vs. 13%)
• ties originating at work/college
• target's work (65% vs. 40%)
Motivation
Power-law (PL) networks, social and P2P
Simulation
artificial power-law topologies, real Gnutella networks
2
How do we search?
Mary
Jane
AT&T Call Graph
0
10
0 1
10 10
number of neighbors
summer 2000,
data provided by Clip2
Preferential attachment model
ping ping
host cache
Gnutella and the bandwidth barrier
67
63
54
6
2
1
Poisson graph
number of
nodes found
93
19
15
11
7
3
1
Search with knowledge of 2nd neighbors
Outline of search strategy
OPTIONS
kmax < N − 1 a node cannot have more connections than there are
other nodes
This is important for exponents close to 2
1
C = π6
∞ ∞
∑1 pk =∑1 Cτ xτ = 1 2 2
∞
p( k > 1000,τ = 2) = ∑ pk ~ 0.001
1000
Probability that none of the nodes in a 1,000 node graph has 1000 or more
neighbors:
(1 − p(k > 1000,τ = 2))1000 ~ 0.36
without a cutoff, for τ = 2
have > 50% chance of observing a node with more neighbors than there
are nodes
1. kmax < N
2. pk = Ck −τ e − k / κ Newman et al.
Generating Function N
(CN )1 τ
−τ k
G0 (x ) = C ∑ x
k
k =1
1000
# of sites linking to the site
Aiello’s ‘conservative’ vs. Havlin’s
n(
‘natural’ cutoff
k) N * pk = 1
−1
cutoff where expected Ck −τ
=N
number of nodes of degree 1
k is 1
1 k ~ Nτ
k
n( ∞
k) N* ∑ pk = 1
k = kmax
∞
cutoff so that
expected number of nodes ∫ ck −τ
~ N −1
of degree > k is 1 k = kmax
1 1−τ
kmax ~ N −1
k
1
kmax ~ N τ −1
The imposed cutoff can have a dramatic
effect on the properties of the graph
degrees drawn at random, for τ = 2, and N = 1000
Generating functions for degree distributions
Random graphs with arbitrary degree distributions and their applications
by Newman, Strogatz & Watts
2 2 ∞
G0 ( x ) = ∑ pk x k is a generating function
2 k =0
1 1
pk ~ k −τ is the probability that a randomly
chosen vertex has degree k
3 −τ 3 −τ
1 k τ − 2 k
z1B = G1' (1) : ' max
= 2 −τ
max
: k 3 −τ
max
G0 (1) (3 − τ ) 1 − kmax (3 − τ )
' kmax
In the limit τ->2, G (1) :
1
log(kmax )
Let’s for the moment ignore the fact that as we do a random walk,
we encounter neighbors
that we’ve seen before
N
s = number of steps =
z1B
Search time with different cutoffs
N N
If kmax = N, s(τ ) : 3 −τ = 3 −τ = N τ −2 ,2 < τ < 3
kmax N
s(2.1) : N 0.1
N log(kmax )
s: = log(N ),τ = 2
kmax
τ −2
If kmax = N1/(τ-1), s(τ ) : N N 2
3 −τ
= 3 −τ
= N τ −1
,2 < τ < 3
kmax
N τ −1
s(2.1) : N 0.18
N log(kmax )
s(2) : = log(N )
kmax
search with knowledge of first neighbors (cont’d)
N N
If kmax = N1/τ, s : 3 −τ = 1
= N 2−3 / τ ,2 < τ < 3
kmax
(N τ )3−τ
3(1− 2 τ ) 0.15
( )
S N ,τ ~ N ( )
S N ,τ = 2.1 ~ N
Following the degree sequence
kmax
z1D = ∫ Nk 1−τ dk ~ Nakmax
1−τ
kmax −a
a ~ s = # of steps taken
' 2(2 −τ )
z1DG ( x ) ~ Nak max
1
2(τ − 2)
s ~ k max ~ N 2−4 / τ
τ = 2.00
20 τ = 2.25
τ = 2.50
τ = 2.75
τ = 3.00
10 τ = 3.25
degree of neighbor - 1
τ = 3.50
τ = 3.75
degree of node
0 10 20 30 40 50 60 70 80 90 100
degree of node
Exponents τ close to 2 required to search effectively
Gnutella
105
number of actors/actresses
actors, τ = 2
Actor collaboration actresses, τ = 2.1
graph 104
(imdb database)
103
τ ~ 2.0-2.2 102
101
100 0
10 101 102 103 104
number of costars
Following the degree sequence
18 17
6
10 5 1
9
8
50
Complications
¤Should not visit same node more than once
30
not visited
25 visited
neighbors
visited
degree of node
20
15
10
1
random walk
degree sequence
0.1
seeking high degree nodes
-3
10 1
random walk
degree sequence
-4 0.8
10 2 3 4 5 6
1 10 10 10 10 10 10
step 0.6
0.4
about 50% of a 10,000 node graph
is explored in the first 12 steps 0.2
0 12 20 40 60 80 100
step
Scaling of search time with size of graph
3
10
random walk
α = 0.37 fit
degree sequence
covertime for half the nodes
α = 0.24 fit
2
10
1
10
0
10 1 2 3 4 5
10 10 10 10 10
size of graph
Comparison with a Poisson graph
Poisson
z ( x −1)
G0 (x ) = e
2
10 power-law
degree of current node
x
G1 (x ) = G0ʹ′ (x ) = G0 (x )
10
1
z
5
10
constant av. deg. = 3.4
γ = 1.0 fit
1
10
scaling is linear
0
10
1 2 4 6
10 10 10 10
number of nodes in graph
Gnutella network
50% of the files in a 700 node network can be found in < 8 steps
cumulative nodes found at step
0.8
0.6
0.4
0
0 20 40 60 80 100
step
Expander graphs
Time permitting
Def: Random k-Regular Graphs
# edges leaving S
α = min
S ⊆V min(| S |, | V \ S |)
(A
big)
graph
with
“good”
expansion
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 82
Expansion: k-Regular Graphs
# edges leaving S
α = min
S ⊆V min(| S |, | V \ S |)
¤ k-regular graph (every node has degree k):
¤ Expansion is at most k (when S is a single node)
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 83
Diameter of 3-Regular Rnd. Graph
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 84
Diameter of 3-Regular Rnd. Graph
¤Proof (continued):
¤ Let Sj be a set of all nodes found s
within j steps of BFS from s.
Make this
¤ We want to relate Sj and Sj+1 S0
into a 3-ary
Stree
1
Expansion
S2
α Sj
S j +1 ≥ S j + =
k
At most k edges
“collide” at a node |Sj| |Sj+1|
nodes nodes
j +1
⎛ α ⎞ ⎛ α ⎞
S j +1 ≥ S j ⎜1 + ⎟ = S 0 ⎜1 + ⎟ At least Each of
⎝ k ⎠ ⎝ k ⎠ α|Sj| edges degree k
where
S0=1 85
Diameter of 3-Regular Rnd. Graph
x
⎛ 1 ⎞
e = lim ⎜1 + ⎟
¤Proof (continued): x →∞ ⎝ x ⎠