05 Smallworlds

Small world networks
CS 224W
Outline
¤ Small world phenomenon

¤ Milgram’s small world experiment
¤ Local structure
¤ clustering coefficient
¤ motifs
¤ Small world network models:

¤ Watts & Strogatz (clustering & short paths)
¤ Kleinberg (geographical)
¤ Kleinberg, Watts/Dodds/Newman (hierarchical)
¤ Small world networks: why do they arise?

Small world phenomenon:
Milgram’s experiment
MA
NE
Milgram’s experiment
Instructions:
Given a target individual (stockbroker in Boston), pass the
message to a person you correspond with who is “closest” to
the target.
Outcome:
20% of initiated chains reached target

average chain length = 6.5
¤ “Six degrees of separation”

Milgram’s experiment repeated
email experiment
Dodds, Muhamad, Watts,
Science 301, (2003)
(optional reading)
•18 targets
•13 different countries
•60,000+ participants
•24,163 message chains
•384 reached their targets
•average path length 4.0
Source: NASA, U.S. Government;; http://visibleearth.nasa.gov/view_rec.php?id=2429

Interpreting Milgram’s experiment
n Is 6 is a surprising number?

n In the 1960s? Today? Why?
n Pool and Kochen in (1978 established that the

average person has between 500 and 1500
acquaintances)
Quiz Q:
¤Ignore for the time being the fact that

many of your friends’ friends are your
friends as well. If everyone has 500
friends, the average person would have
how many friends of friends?
¤ 500
¤ 1,000
¤ 5,000
¤ 250,000
Quiz Q:
¤With an average degree of 500, a node

in a random network would have this
many friends-of-friends-of-friends (3rd
degree neighbors):
¤ 5,000
¤ 500,000
¤ 1,000,000
¤ 125,000,000
Interpreting Milgram’s experiment
n Is 6 is a surprising number?

n In the 1960s? Today? Why?
n If social networks were random… ?

n Pool and Kochen (1978) - ~500-1500 acquaintances/person
n ~ 500 choices 1st link
n ~ 5002 = 250,000 potential 2nd degree neighbors
n ~ 5003 = 125,000,000 potential 3rd degree neighbors
n If networks are completely cliquish?

n all my friends’ friends are my friends
n what would happen?
Quiz Q:
¤If the network were completely cliquish,

that is all of your friends of friends were
also directly your friends, what would be
true:
¤ (a) None of your friendship edges would be
part of a triangle (closed triad)
¤ (b) It would be impossible to reach any node
outside the clique by following directed
edges
¤ (c) Your shortest path to your friends’ friends
would be 2
complete cliquishness
¤ If all your friends of friends were also your

friends, you would be part of an isolated
clique.
Uncompleted chains and distance
n Is 6 an accurate number?
n What bias is introduced by uncompleted chains?

n are longer or shorter chains more likely to be completed?
Attrition
probability of passing on message
position in chain

average
95 % confidence interval
Source: An Experimental Study of Search in Global Social Networks: Peter Sheridan Dodds, Roby Muhamad, and
Duncan J. Watts (8 August 2003); Science 301 (5634), 827.
Quiz Q:
n if each intermediate person in the

chain has 0.5 probability of passing the
letter on, what is the likelihood of a
chain being completed
n of length 2?
n of length 5?
sends for sure receives

chain of length 2
passes on with probability 0.5

Quiz Q:
n if each intermediate person in the

chain has 0.5 probability of passing the
letter on, what is the likelihood of a
chain of length 5 being completed
¤ (a) ½
¤ (b) ¼
¤ (c) 1/8
¤ (d) 1/16
Estimating the true distance
observed chain lengths
‘recovered’
histogram of path
lengths
inter-country
intra-country
Source: An Experimental Study of Search in Global Social Networks: Peter Sheridan Dodds, Roby Muhamad, and
Duncan J. Watts (8 August 2003); Science 301 (5634), 827.
Navigation and accuracy
¤Is 6 an accurate number?
¤Do people find the shortest paths?

¤ Killworth, McCarty ,Bernard, & House (2005):
¤ less than optimal choice for next link in chain is
made ½ of the time
Small worlds & networking
What does it mean to be 1, 2, 3 hops apart on
Facebook, Twitter, LinkedIn, Google Plus?
Transitivity, triadic closure, clustering
¤Transitivity:
¤ if A is connected to B and B is connected to C
what is the probability that A is connected to C?
¤ my friends’ friends are likely to be my friends
A ?
C
B
Clustering
¤Global clustering coefficient

3 x number of triangles in the graph
number of connected triples of vertices
3 x number of triangles in the graph

C= number of connected triples
Local clustering coefficient (Watts&Strogatz 1998)
¤For a vertex i
¤ The fraction pairs of neighbors of the node that
are themselves connected
¤ Let ni be the number of neighbors of vertex i
# of connections between i’s neighbors

Ci = max # of possible connections between i’s neighbors
# directed connections between i’s neighbors

Ci directed = ni * (ni -1)
# undirected connections between i’s neighbors

Ci undirected = ni * (ni -1)/2
Local clustering coefficient (Watts&Strogatz 1998)
¤Average over all n vertices

1
C = ∑ Ci
n i
ni = 4
max number of connections:
4*3/2 = 6
3 connections present
Ci = 3/6 = 0.5
i
link present
link absent
Quiz Q:
¤The clustering coefficient for vertex i is:
(a)0
(b)1/3
(c)1/2
(d)2/3
i
Explanation
¤ni = 3
¤there are 2 connections present out of
max of 3 possible
¤Ci = 2/3
i
beyond social networks
high clustering Cnetwork >> Crandom graph
low average shortest path lnetwork ≈ ln( N )
what other networks can you think of

with these characteristics?
Comparison with “random graph” used to determine
whether real-world network is “small world”
Network size av. Shortest Clustering Clustering in
shortest path in (averaged random graph
path fitted over vertices)
random
graph
Film actors 225,226 3.65 2.99 0.79 0.00027
MEDLINE co- 1,520,251 4.6 4.91 0.56 1.8 x 10-4

authorship
E.Coli 282 2.9 3.04 0.32 0.026

substrate
graph
C.Elegans 282 2.65 2.25 0.28 0.05
Watts/Strogatz model
Reconciling two observations:
• High clustering: my friends’ friends tend to be my friends
• Short average paths
Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:440-442.
Watts-Strogatz model:
Generating small world graphs
Select a fraction p of edges
Reposition on of their endpoints
Add a fraction p of additional

edges leaving underlying lattice
intact
n As in many network generating algorithms

n Disallow self-edges
n Disallow multiple edges
Watts-Strogatz model:
Generating small world graphs
¤Each node has K>=4 nearest neighbors
(local)
¤tunable: vary the probability p of rewiring any
given edge
¤small p: regular lattice
¤large p: classical random graph
Quiz question:
¤ Which of the following is a result of a

higher rewiring probability?
(a) Left (b) Right (c) insufficient information

What happens in between?
¤Small shortest path means low clustering?

¤Large shortest path means high clustering?
¤Through numerical simulation
¤ As we increase p from 0 to 1
¤ Fast decrease of mean distance
¤ Slow decrease in clustering
Clust coeff. and ASP as rewiring increases
1% of links rewired 10% of links rewired
Trying this with NetLogo
http://web.stanford.edu/class/cs224w/NetLogo/SmallWorldWS.nlogo
WS model clustering coefficient
¤ The probability that a connected triple stays

connected after rewiring
¤ probability that none of the 3 edges were rewired (1-p)3
¤ probability that edges were rewired back to each other very
small, can ignore
¤ Clustering coefficient = C(p) = C(p=0)*(1-p)3

1
0.8
0.6
C(p)/C(0)
0.4
0.2
0.2 0.4 0.6 0.8 1 p

Quiz Q
n Which of the following is a description

matching a small-world network?
(a)Its average shortest path is close to that of an

Erdos-Renyi graph
(b)It has many closed triads
(c)It has a high clustering coefficient
(d)It has a short average path length
WS Model: What’s missing?
n Long range links not as likely as short

range ones
n Hierarchical structure / groups
n Hubs
Ties and geography
“The geographic movement of the [message] from Nebraska to
Massachusetts is striking. There is a progressive closing in on the
target area as each new person is added to the chain”
S.Milgram ‘The small world problem’, Psychology Today 1,61,1967
MA
NE
Kleinberg’s geographical small world model
nodes are placed on a lattice and

connect to nearest neighbors
exponent that will determine navigability
additional links placed with

p(link between u and v) = (distance(u,v))-r
Source: Kleinberg, ‘The Small World Phenomenon, An Algorithmic Perspective’ (Nature
2000).
NetLogo demo
¤how does the probability of long-range links
affect search?
http://web.stanford.edu/class/cs224w/
NetLogo/SmallWorldSearch.nlogo
geographical search when network lacks locality
When r=0, links are randomly distributed, ASP ~ log(n), n size of grid
When r=0, any decentralized algorithm is at least a0n2/3
When r<2,
expected
time at
least αrn(2-r)/3
p ~ p0
Overly localized links on a lattice
When r>2 expected search time ~ N(r-2)/(r-1)
1
p~ 4
d
Just the right balance
When r=2, expected time of a DA is at most C (log N)2
1
p~ 2
d
Navigability
λ2|R|<|R’|<λ|R|
R
R’
k = c log2n calculate probability that s fails to have a link in R’
Quiz Q:
¤ What is true about a network where the

probability of a tie falls off as distance-2
(a)Large networks cannot be navigated

(b)A simple greedy strategy (pass the message to the
neighbor who is closest to the target) is sufficient
(c)There are fewer long range ties than short range
ones
(d)If the number of nodes doubles, the average
shortest path will be twice as long
Origins of small worlds:
group affiliations
hierarchical small-world models: Kleinberg
Hierarchical network models: h b=3
Individuals classified into a hierarchy,

hij = height of the least common ancestor.
−α hij e.g. state-county-city-neighborhood

pij : b industry-corporation-division-group
Group structure models:

Individuals belong to nested groups
q = size of smallest group that v,w belong to
f(q) ~ q-α
Source: Kleinberg, ‘Small-World Phenomena and the Dynamics of Information’ NIPS 14, 2001.
hierarchical small-world models: WDN
Watts, Dodds, Newman (Science, 2001)
individuals belong to hierarchically nested groups
pij ~ exp(-α x)
multiple independent hierarchies h=1,2,..,H

coexist corresponding to occupation,
geography, hobbies, religion…
Source: Identity and Search in Social Networks: Duncan J. Watts, Peter Sheridan Dodds, and M. E. J.
Newman; Science 17 May 2002 296: 1302-1305. < http://arxiv.org/abs/cond-mat/0205383v1 >
Navigability and search strategy:
Reverse small world experiment
¤ Killworth & Bernard (1978):

¤ Given hypothetical targets (name, occupation, location, hobbies,
religion…) participants choose an acquaintance for each target
¤ based on (most often) occupation, geography
¤ only 7% because they “know a lot of people”
¤ Simple greedy algorithm: most similar acquaintance
¤ two-step strategy rare
Source: 1978 Peter D. Killworth and H. Russell Bernard. The Reverse Small World Experiment Social Networks 1:159–92.
Navigability and search strategy:
Small world experiment @ Columbia
Successful chains disproportionately used
• weak ties (Granovetter)
• professional ties (34% vs. 13%)
• ties originating at work/college
• target's work (65% vs. 40%)
. . . and disproportionately avoided

• hubs (8% vs. 1%) (+ no evidence of
funnels)
• family/friendship ties (60% vs. 83%)
Strategy: Geography -> Work

Search in power-law networks
Motivation
Power-law (PL) networks, social and P2P
Analysis of scaling of search strategies in PL networks
Simulation
artificial power-law topologies, real Gnutella networks
2
How do we search?
Mary
Who could Bob

introduce me to
Richard Gere?
Jane
AT&T Call Graph
from which calls were made

# of telephone numbers
# of telephone numbers called
Aiello et al. STOC ‘00

Gnutella network
power-law link distribution
proportion of nodes 2 data

10 power-law fit
τ = 2.07
1
10
0
10
0 1
10 10
number of neighbors
summer 2000,
data provided by Clip2
Preferential attachment model
Nodes join at different times
The more connections a node has, the more likely it is to acquire

new connections
Growth process produces power-law network
ping ping
host cache
Gnutella and the bandwidth barrier
file sharing w/o a central index
queries broadcast to every node within

radius ttl
⇒ as network grows, encounter a bandwidth
barrier (dial up modems cannot keep up with
query traffic, fragmenting the network)
Clip 2 report
Gnutella: To the Bandwidth Barrier and Beyond
http://www.clip2.com/gnutella.html#q17
power-law graph
number of
nodes found
94
67
63
54
6
2
1
Poisson graph
number of
nodes found
93
19
15
11
7
3
1
Search with knowledge of 2nd neighbors
Outline of search strategy
pass query onto only one neighbor at each step
OPTIONS
requires that nodes sign query

- avoid passing message onto a node twice
requires knowledge of one’s neighbors degree

- pass to the highest degree node
requires knowledge of one’s neighbors neighbors

- route to 2nd degree neighbors
Generating functions
¤M.E.J. Newman, S.H. Strogatz, and D.J. Watts
¤‘Random graphs with arbitrary degree distributions and

their applications’, PRE, cond-mat/0007235
¤Generating functions for degree distributions

∞
G0 ( x ) = ∑ pk x k
k =0
¤Useful for computing moments of degree distribution,
¤component sizes, and average path lengths

Introducing cutoffs
kmax < N − 1 a node cannot have more connections than there are
other nodes
This is important for exponents close to 2
1
C = π6
∞ ∞
∑1 pk =∑1 Cτ xτ = 1 2 2
∞
p( k > 1000,τ = 2) = ∑ pk ~ 0.001
1000
Probability that none of the nodes in a 1,000 node graph has 1000 or more
neighbors:
(1 − p(k > 1000,τ = 2))1000 ~ 0.36
without a cutoff, for τ = 2
have > 50% chance of observing a node with more neighbors than there
are nodes
for τ = 2.1, have a 25% chance

Selecting from a variety of cutoffs
1. kmax < N
2. pk = Ck −τ e − k / κ Newman et al.
1 million websites (~ 1997)
proportion of sites w/ so many links

1τ
⎧Ck −τ
k < (CN )
3. pk = ⎨
⎩0 otherwise
Aiello et al.
Generating Function N
(CN )1 τ
−τ k
G0 (x ) = C ∑ x
k
k =1
1000
# of sites linking to the site
Aiello’s ‘conservative’ vs. Havlin’s
n(
‘natural’ cutoff
k) N * pk = 1
−1
cutoff where expected Ck −τ
=N
number of nodes of degree 1
k is 1
1 k ~ Nτ
k
n( ∞
k) N* ∑ pk = 1
k = kmax
∞
cutoff so that
expected number of nodes ∫ ck −τ
~ N −1
of degree > k is 1 k = kmax
1 1−τ
kmax ~ N −1
k
1
kmax ~ N τ −1
The imposed cutoff can have a dramatic
effect on the properties of the graph
degrees drawn at random, for τ = 2, and N = 1000
Generating functions for degree distributions
Random graphs with arbitrary degree distributions and their applications
by Newman, Strogatz & Watts
2 2 ∞
G0 ( x ) = ∑ pk x k is a generating function
2 k =0
1 1
pk ~ k −τ is the probability that a randomly
chosen vertex has degree k
1 < k >= ∑ kpk = G0' (1) is the expected degree of a

2 k randomly chosen vertex
2 '
G ( x ) is the distribution of remaining
0
2 G1 ( x ) = '
G (1) outgoing edges following and edge
0
z2 = G0' (1)G1' (1) is the expected number of second

degree neighbors
assuming neighbors don’t share edges
search with knowledge of first neighbors
kmax
G0 ( x ) = c ∑ k −τ x k
1 Generating function with cutoff
kmax
∂ Average degree of vertex
G0' ( x ) = G0 ( x ) = c ∑ k 1−τ x k −1
∂x 1
kmax kmax
1
'
G (1) =< k >= c ∑ k
0
1
1−τ
: ∫
1−τ
k dk =
τ −2
( 2 −τ
1 − kmax )
1
' G0' ( x ) c ∂ kmax 1−τ k −1

G (x) = '
1 = ' ∑ k x Average number of neighbors
G0 (1) G0 (1) ∂x 1
following an edge
c kmax 1−τ k −2
= ' ∑ k ( k − 1) x for 2<τ<3, and kmax~Na, decreases
G0 (1) 2 constant in N with N
3 −τ
' 1 kmax (τ − 2) − 22−τ (τ − 1) + kmax
2 −τ
(3 − τ )
G (1) = '
1
G0 (1) (τ − 2)(3 − τ )
search with knowledge of first neighbors (cont’d)
3 −τ 3 −τ
1 k τ − 2 k
z1B = G1' (1) : ' max
= 2 −τ
max
: k 3 −τ
max
G0 (1) (3 − τ ) 1 − kmax (3 − τ )
' kmax
In the limit τ->2, G (1) :
1
log(kmax )
Let’s for the moment ignore the fact that as we do a random walk,
we encounter neighbors
that we’ve seen before
N
s = number of steps =
z1B
Search time with different cutoffs
N N
If kmax = N, s(τ ) : 3 −τ = 3 −τ = N τ −2 ,2 < τ < 3
kmax N
s(2.1) : N 0.1
N log(kmax )
s: = log(N ),τ = 2
kmax
τ −2
If kmax = N1/(τ-1), s(τ ) : N N 2
3 −τ
= 3 −τ
= N τ −1
,2 < τ < 3
kmax
N τ −1
s(2.1) : N 0.18
N log(kmax )
s(2) : = log(N )
kmax
search with knowledge of first neighbors (cont’d)
N N
If kmax = N1/τ, s : 3 −τ = 1
= N 2−3 / τ ,2 < τ < 3
kmax
(N τ )3−τ
So the best we can do is N for exponents close to 2
2nd neighbor random walk, ignoring overlap:

2
3 −τ
⎡ ∂ ⎤ ' 2 ⎡ τ − 2 kmax ⎤
z2B = ⎢ G1(G1( x ))⎥ = ⎣G1(1)⎦ = ⎢
⎡ ⎤ 2−τ ⎥
⎣ ∂x ⎦ x =1 ⎣1 − kmax (3 − τ ) ⎦
ns = z2B ( N )
N
S~
z2B ( N )
3(1− 2 τ ) 0.15
( )
S N ,τ ~ N ( )
S N ,τ = 2.1 ~ N
Following the degree sequence
Go to highest degree node, then next highest, … etc.
kmax
z1D = ∫ Nk 1−τ dk ~ Nakmax
1−τ
kmax −a
a ~ s = # of steps taken
2nd neighbors, ignoring overlap:
' 2(2 −τ )
z1DG ( x ) ~ Nak max
1
2(τ − 2)
s ~ k max ~ N 2−4 / τ
Sdeg (N ,τ = 2.1) = N 0.1

Ratio of the degree of a node to the expected degree of its highest
degree neighbor for 10,000 node power-law graphs of varying exponents
τ = 2.00
20 τ = 2.25
τ = 2.50
τ = 2.75
τ = 3.00
10 τ = 3.25
degree of neighbor - 1
τ = 3.50
τ = 3.75
degree of node
0 10 20 30 40 50 60 70 80 90 100
degree of node
Exponents τ close to 2 required to search effectively
Gnutella
World Wide Web, τ ~ 2.0-2.3,

high degree nodes: directories, search engine
Social networks, AT&T call graph τ ~ 2.1
105
number of actors/actresses
actors, τ = 2
Actor collaboration actresses, τ = 2.1
graph 104
(imdb database)
103
τ ~ 2.0-2.2 102
101
100 0
10 101 102 103 104
number of costars
Following the degree sequence
18 17
6
10 5 1
9
8
50
Complications
¤Should not visit same node more than once
¤Many neighbors of current node being

visited were also neighbors of previously visited
nodes, and there is a bias toward high degree
nodes being ‘seen’ over and over again
Status and degree of node visited
30
not visited
25 visited
neighbors
visited
degree of node
20
15
10
00 100 200 300 400 500 600

step
Progress of exploration in a 10,000 node graph knowing
2nd degree neighbors
proportion of nodes found at step
1
random walk
degree sequence
0.1
seeking high degree nodes
cumulative nodes found at step

-2 speeds up the search process
10
-3
10 1
random walk
degree sequence
-4 0.8
10 2 3 4 5 6
1 10 10 10 10 10 10
step 0.6
0.4
about 50% of a 10,000 node graph
is explored in the first 12 steps 0.2
0 12 20 40 60 80 100
step
Scaling of search time with size of graph
3
10
random walk
α = 0.37 fit
degree sequence
covertime for half the nodes
α = 0.24 fit
2
10
1
10
0
10 1 2 3 4 5
10 10 10 10 10
size of graph
Comparison with a Poisson graph
Poisson
z ( x −1)
G0 (x ) = e
2
10 power-law
degree of current node
x
G1 (x ) = G0ʹ′ (x ) = G0 (x )
10
1
z
5
10
constant av. deg. = 3.4
γ = 1.0 fit
cover time for 1/2 of graph

0
10 4
0 1 2 3 10
10 10 10 10
step
3
10
expected degree and expected

2
degree following a link are equal 10
1
10
scaling is linear
0
10
1 2 4 6
10 10 10 10
number of nodes in graph
Gnutella network
50% of the files in a 700 node network can be found in < 8 steps
cumulative nodes found at step
0.8
0.6
0.4
0.2 high degree seeking 1st neighbors

high degree seeking 2nd neighbors
0
0 20 40 60 80 100
step
Expander graphs
Time permitting
Def: Random k-Regular Graphs
¤We need to define two concepts

¤1) Define: Random k-Regular graph
¤ Assume each node has k spokes (half-edges)
¤ Randomly pair them up!
¤2) Define: Expansion

¤ Graph G(V, E) has expansion α:
if∀ S ⊆ V: #edges leaving S
≥ α⋅ min(|S|,|V\S|)
¤ Or equivalently: # edges leaving S
α = min
S ⊆V min(| S |, | V \ S |)
S
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu V81\ S
Expansion: Intuition
S nodes ≥ α·S edges
S’ nodes ≥ α·S’ edges
# edges leaving S
α = min
S ⊆V min(| S |, | V \ S |)
(A big) graph with “good” expansion
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 82
Expansion: k-Regular Graphs
# edges leaving S
α = min
S ⊆V min(| S |, | V \ S |)
¤ k-regular graph (every node has degree k):
¤ Expansion is at most k (when S is a single node)
¤ Is there a graph on n nodes (n→∞), of fixed max

deg. k, so that expansion α remains const?
Make this into
6x6 grid!
Examples: S
¤ n×n grid: k=4: α =2n/(n2/4)→0
(S=n/2 × n/2 square in the center)
¤ Complete binary tree:

α →0 for |S|=(n/2)-1
S
¤ Fact: For a random 3-regular graph on n nodes, there is some
const α (α >0, independent. of n) such that w.h.p.
the expansion of the graph is ≥ α (In fact, α=d/2 as d→∞)
Diameter of 3-Regular Rnd. Graph
¤ Fact: In a graph on n nodes with expansion

α, for all pairs of nodes s and t there is a path
of O((log n) / α) edges connecting them.
¤ Proof:
¤ Proof strategy: Make this
¤ We want to show that from any s into a 3-ary
node s there is a path of length tree
O((log n)/α) to any other node t S0
¤ Let Sj be a set of all nodes S1

found within j steps of BFS from s. S2
¤ How does Sj increase as a function of j?
¤Proof (continued):
¤ Let Sj be a set of all nodes found s
within j steps of BFS from s.
Make this
¤ We want to relate Sj and Sj+1 S0
into a 3-ary
Stree
1
Expansion
S2
α Sj
S j +1 ≥ S j + =
k
At most k edges
“collide” at a node |Sj| |Sj+1|
nodes nodes
j +1
⎛ α ⎞ ⎛ α ⎞
S j +1 ≥ S j ⎜1 + ⎟ = S 0 ⎜1 + ⎟ At least Each of
⎝ k ⎠ ⎝ k ⎠ α|Sj| edges degree k
where S0=1 85
x
⎛ 1 ⎞
e = lim ⎜1 + ⎟
¤Proof (continued): x →∞ ⎝ x ⎠
¤ In how many steps of BFS In j steps, we In j steps, we

reach >n/2 nodes reach >n/2 nodes
⇒ Diameter = 2·j
do we reach >n/2 nodes?

j
¤ Need j so that: S j = ⎛⎜1 + α ⎞⎟ ≥ n Make this
k ⎠ 2 s
⎝ into a 3-ary t
¤ Let’s set: k log 2 n
j= tree
¤ Then: α
In log(n) steps, we In log(n) steps, we ⇒ Diameter
k log 2 n reach >n/2 nodes reach >n/2 nodes = 2 log(n)
⎛ α ⎞ α log 2 n n
⎜1 + ⎟ ≥2 =n> Claim:
k ⎠ 2
k log 2 n
⎝ ⎛ α ⎞
⎜1 + ⎟
α
≥ 2log 2 n
¤ In 2k/α·log n steps |Sj| grows to Θ(n). ⎝ k ⎠
Remember n>0, α ≤ k then:
So, the diameter of G is O(log(n)/ α) 1
log n
if α = k : (1 + 1)1 2 = 2log 2 n
k
if α → 0 then = x → ∞ :
α
x log 2 n
⎛ 1 ⎞
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu and ⎜1 + ⎟ = elog 2 n > 2log 2 n
⎝ x ⎠ 86
Summary
¤Small world phenomenon:

¤ Local structure (e.g. clustering)
¤ Short average shortest path
¤The Watts-Strogatz captures both

¤Other models create navigable small-
world models
¤Power-law networks are navigable due
to presence of hubs

05 Smallworlds

Uploaded by

Copyright:

Available Formats

05 Smallworlds

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

05 Smallworlds

Uploaded by

Copyright:

Available Formats

Small world networks

¤ Small world phenomenon

¤ Small world network models:

¤ Small world networks: why do they arise?

20% of initiated chains reached target

¤ “Six degrees of separation”

Source: NASA, U.S. Government;; http://visibleearth.nasa.gov/view_rec.php?id=2429

n Is 6 is a surprising number?

n Pool and Kochen in (1978 established that the

¤Ignore for the time being the fact that

¤With an average degree of 500, a node

n Is 6 is a surprising number?

n If social networks were random… ?

n If networks are completely cliquish?

¤If the network were completely cliquish,

¤ If all your friends of friends were also your

n Is 6 an accurate number?

n What bias is introduced by uncompleted chains?

probability of passing on message

position in chain

n if each intermediate person in the

sends for sure receives

passes on with probability 0.5

n if each intermediate person in the

observed chain lengths

¤Is 6 an accurate number?

¤Do people find the shortest paths?

¤ my friends’ friends are likely to be my friends

¤Global clustering coefficient

3 x number of triangles in the graph

# of connections between i’s neighbors

# directed connections between i’s neighbors

# undirected connections between i’s neighbors

¤Average over all n vertices

¤The clustering coefficient for vertex i is:

Small world phenomenon:

high clustering Cnetwork >> Crandom graph

low average shortest path lnetwork ≈ ln( N )

what other networks can you think of

MEDLINE co-­ 1,520,251 4.6 4.91 0.56 1.8 x 10-­4

E.Coli 282 2.9 3.04 0.32 0.026

Add a fraction p of additional

n As in many network generating algorithms

¤ Which of the following is a result of a

(a) Left (b) Right (c) insufficient information

¤Small shortest path means low clustering?

1% of links rewired 10% of links rewired

¤ The probability that a connected triple stays

¤ Clustering coefficient = C(p) = C(p=0)*(1-­p)3

0.2 0.4 0.6 0.8 1 p

n Which of the following is a description

(a)Its average shortest path is close to that of an

n Long range links not as likely as short

S.Milgram ‘The small world problem’, Psychology Today 1,61,1967

nodes are placed on a lattice and

additional links placed with

¤ What is true about a network where the

(a)Large networks cannot be navigated

Individuals classified into a hierarchy,

MEDLINE co- 1,520,251 4.6 4.91 0.56 1.8 x 10-4

¤ Clustering coefficient = C(p) = C(p=0)*(1-p)3

−α hij e.g. state-county-city-neighborhood