05 Smallworlds

Download as pdf or txt
Download as pdf or txt
You are on page 1of 87

Small world networks

CS 224W
Outline

¤ Small  world  phenomenon


¤ Milgram’s  small  world  experiment

¤ Local  structure
¤ clustering  coefficient
¤ motifs

¤ Small  world  network  models:


¤ Watts  &  Strogatz (clustering  &  short  paths)
¤ Kleinberg  (geographical)
¤ Kleinberg,  Watts/Dodds/Newman  (hierarchical)

¤ Small  world  networks:   why  do  they  arise?


Small  world  phenomenon:
Milgram’s  experiment

MA

NE
Milgram’s experiment

Instructions:
Given  a  target   individual   (stockbroker   in  Boston),   pass  the  
message   to  a  person   you  correspond   with  who  is  “closest” to  
the  target.

Outcome:

20%  of  initiated  chains  reached  target


average  chain  length  =  6.5

¤ “Six  degrees   of  separation”


Milgram’s experiment repeated

email  experiment  
Dodds,   Muhamad,   Watts,  
Science   301,   (2003)
(optional   reading)

•18  targets
•13  different   countries

•60,000+   participants
•24,163   message   chains  
•384  reached   their  targets
•average   path  length   4.0

Source:   NASA,  U.S.  Government;;   http://visibleearth.nasa.gov/view_rec.php?id=2429


Interpreting Milgram’s experiment

n Is  6  is  a  surprising number?


n In  the  1960s?  Today?  Why?

n Pool  and  Kochen in  (1978  established  that  the  


average  person  has  between  500  and  1500  
acquaintances)
Quiz Q:

¤Ignore for the time being the fact that


many of your friends’ friends are your
friends as well. If everyone has 500
friends, the average person would have
how many friends of friends?
¤ 500
¤ 1,000
¤ 5,000
¤ 250,000
Quiz Q:

¤With an average degree of 500, a node


in a random network would have this
many friends-of-friends-of-friends (3rd
degree neighbors):
¤ 5,000
¤ 500,000
¤ 1,000,000
¤ 125,000,000
Interpreting Milgram’s experiment

n Is  6  is  a  surprising number?


n In  the  1960s?  Today?  Why?

n If  social  networks  were  random…   ?


n Pool  and  Kochen (1978)  -­ ~500-­1500  acquaintances/person
n ~  500  choices  1st link
n ~  5002   =  250,000  potential  2nd degree  neighbors
n ~  5003   =  125,000,000 potential  3rd degree  neighbors

n If  networks   are  completely   cliquish?


n all  my  friends’ friends  are  my  friends
n what  would  happen?
Quiz Q:

¤If the network were completely cliquish,


that is all of your friends of friends were
also directly your friends, what would be
true:
¤ (a) None of your friendship edges would be
part of a triangle (closed triad)
¤ (b) It would be impossible to reach any node
outside the clique by following directed
edges
¤ (c) Your shortest path to your friends’ friends
would be 2
complete cliquishness

¤ If all your friends of friends were also your


friends, you would be part of an isolated
clique.
Uncompleted chains and distance

n Is  6  an  accurate number?

n What  bias  is  introduced   by  uncompleted   chains?


n are  longer  or  shorter  chains  more  likely  to  be  completed?
Attrition

probability  of  passing  on  message

position  in  chain


average
95  %  confidence  interval

Source: An Experimental Study of Search in Global Social Networks: Peter Sheridan Dodds, Roby Muhamad, and
Duncan J. Watts (8 August 2003); Science 301 (5634), 827.
Quiz Q:

n if each intermediate person in the


chain has 0.5 probability of passing the
letter on, what is the likelihood of a
chain being completed
n of length 2?
n of length 5?

sends for sure receives


chain of length 2

passes on with probability 0.5


Quiz Q:

n if each intermediate person in the


chain has 0.5 probability of passing the
letter on, what is the likelihood of a
chain of length 5 being completed
¤ (a) ½
¤ (b) ¼
¤ (c) 1/8
¤ (d) 1/16
Estimating the true distance

observed  chain  lengths

‘recovered’
histogram  of  path  
lengths
inter-­country
intra-­country  

Source: An Experimental Study of Search in Global Social Networks: Peter Sheridan Dodds, Roby Muhamad, and
Duncan J. Watts (8 August 2003); Science 301 (5634), 827.
Navigation and accuracy

¤Is  6  an  accurate number?

¤Do  people  find  the  shortest paths?


¤ Killworth,  McCarty  ,Bernard,   &  House   (2005):
¤ less  than   optimal  choice   for  next  link  in  chain   is  
made  ½  of  the  time
Small worlds & networking

What  does  it  mean  to  be  1,  2,  3  hops  apart  on  
Facebook,  Twitter,  LinkedIn,  Google  Plus?
Transitivity, triadic closure, clustering
¤Transitivity:
¤ if A is connected to B and B is connected to C
what is the probability that A is connected to C?

¤ my friends’ friends are likely to be my friends

A ?
C

B
Clustering

¤Global clustering coefficient


3 x number of triangles in the graph
number of connected triples of vertices

3 x number of triangles in the graph


C= number of connected triples
Local clustering coefficient (Watts&Strogatz 1998)

¤For a vertex i
¤ The fraction pairs of neighbors of the node that
are themselves connected
¤ Let ni be the number of neighbors of vertex i

# of connections between i’s neighbors


Ci = max # of possible connections between i’s neighbors

# directed connections between i’s neighbors


Ci directed = ni * (ni -1)

# undirected connections between i’s neighbors


Ci undirected = ni * (ni -1)/2
Local clustering coefficient (Watts&Strogatz 1998)

¤Average over all n vertices


1
C = ∑ Ci
n i

ni = 4
max number of connections:
4*3/2 = 6
3 connections present
Ci = 3/6 = 0.5
i

link present
link absent
Quiz Q:

¤The clustering coefficient for vertex i is:

(a)0
(b)1/3
(c)1/2
(d)2/3
i
Explanation

¤ni = 3
¤there are 2 connections present out of
max of 3 possible
¤Ci = 2/3

i
beyond social networks

Small  world  phenomenon:

high  clustering Cnetwork >> Crandom graph

low  average   shortest  path lnetwork ≈ ln( N )

what other networks can you think of


with these characteristics?
Comparison with “random graph” used to determine
whether real-world network is “small world”
Network   size av.   Shortest   Clustering Clustering  in  
shortest   path  in   (averaged   random  graph  
path fitted   over  vertices)  
random  
graph
Film  actors   225,226 3.65 2.99 0.79 0.00027

MEDLINE  co-­ 1,520,251 4.6 4.91 0.56 1.8  x  10-­4


authorship  

E.Coli   282 2.9 3.04 0.32 0.026


substrate  
graph
C.Elegans   282 2.65 2.25 0.28 0.05
Small world phenomenon:
Watts/Strogatz model
Reconciling  two  observations:
• High  clustering: my  friends’ friends  tend  to  be  my  friends
• Short  average  paths

Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:440-442.
Watts-­Strogatz model:
Generating small world graphs
Select  a  fraction  p  of  edges
Reposition  on  of  their  endpoints

Add  a  fraction  p  of  additional


edges  leaving  underlying  lattice
intact

n As in many network generating algorithms


n Disallow self-edges
n Disallow multiple edges

Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:440-442.
Watts-­Strogatz model:
Generating small world graphs
¤Each  node  has  K>=4  nearest  neighbors  
(local)
¤tunable:  vary  the  probability  p of  rewiring  any  
given  edge
¤small  p:  regular  lattice
¤large  p:  classical  random  graph
Quiz question:

¤ Which of the following is a result of a


higher rewiring probability?

(a) Left (b) Right (c) insufficient information


What happens in between?

¤Small  shortest path means low clustering?


¤Large shortest path means high clustering?
¤Through numerical simulation
¤ As  we increase p  from 0  to 1
¤ Fast decrease of  mean   distance
¤ Slow decrease in  clustering
Clust coeff. and ASP as rewiring increases

1%  of  links  rewired 10%  of  links  rewired

Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:440-442.
Trying this with NetLogo
http://web.stanford.edu/class/cs224w/NetLogo/SmallWorldWS.nlogo
WS model clustering coefficient

¤ The  probability   that  a  connected   triple  stays  


connected   after  rewiring
¤ probability  that  none  of  the  3  edges  were  rewired  (1-­p)3
¤ probability  that  edges  were  rewired  back  to  each  other  very  
small,  can  ignore

¤ Clustering   coefficient   =  C(p)  =  C(p=0)*(1-­p)3


1

0.8

0.6
C(p)/C(0)
0.4

0.2

0.2 0.4 0.6 0.8 1 p


Source: Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:440-442.
Quiz Q

n Which of  the following is a  description


matching a  small-­world network?

(a)Its average shortest path is close to that of an


Erdos-Renyi graph
(b)It has many closed triads
(c)It has a high clustering coefficient
(d)It has a short average path length
WS Model: What’s missing?

n Long  range links  not as  likely as  short  


range ones
n Hierarchical structure /  groups
n Hubs
Ties and geography
“The  geographic  movement  of  the  [message]  from  Nebraska  to    
Massachusetts  is  striking.  There  is  a  progressive  closing  in  on  the  
target    area  as  each  new  person  is  added  to  the  chain”

S.Milgram  ‘The  small  world  problem’,  Psychology  Today  1,61,1967

MA

NE
Kleinberg’s  geographical  small  world  model

nodes   are  placed   on  a  lattice  and


connect   to  nearest   neighbors
exponent  that  will  determine  navigability

additional   links  placed   with


p(link  between  u  and  v)  =  (distance(u,v))-­r
Source:   Kleinberg,  ‘The  Small  World  Phenomenon,  An  Algorithmic  Perspective’ (Nature  
2000).
NetLogo demo
¤how does the probability of  long-­range links  
affect search?

http://web.stanford.edu/class/cs224w/
NetLogo/SmallWorldSearch.nlogo
geographical  search  when  network  lacks  locality
When  r=0,  links  are  randomly  distributed,  ASP  ~ log(n),  n  size  of  grid
When  r=0,  any  decentralized  algorithm  is  at  least  a0n2/3

When r<2,
expected
time at
least αrn(2-r)/3
p ~ p0
Overly  localized  links  on  a  lattice
When r>2 expected search time ~ N(r-2)/(r-1)

1
p~ 4
d
Just  the  right  balance
When  r=2,  expected  time  of  a  DA  is  at  most  C  (log  N)2

1
p~ 2
d
Navigability
λ2|R|<|R’|<λ|R|  
R

R’

k  =  c log2n calculate  probability  that  s  fails  to  have  a  link  in  R’
Quiz Q:

¤ What is true about a network where the


probability of a tie falls off as distance-2

(a)Large networks cannot be navigated


(b)A simple greedy strategy (pass the message to the
neighbor who is closest to the target) is sufficient
(c)There are fewer long range ties than short range
ones
(d)If the number of nodes doubles, the average
shortest path will be twice as long
Origins of small worlds:
group affiliations
hierarchical small-world models: Kleinberg
Hierarchical  network  models: h b=3

Individuals  classified  into  a  hierarchy,  


hij =  height  of  the  least  common  ancestor.  

−α hij e.g.  state-­county-­city-­neighborhood


pij : b industry-­corporation-­division-­group

Group  structure  models:


Individuals  belong  to  nested  groups
q  =  size  of  smallest  group  that  v,w  belong  to

f(q)  ~  q-­α

Source:   Kleinberg,  ‘Small-­World  Phenomena  and  the   Dynamics  of   Information’ NIPS  14,  2001.
hierarchical small-world models: WDN
Watts,  Dodds,   Newman  (Science,   2001)
individuals   belong   to  hierarchically   nested   groups  

pij ~ exp(-α x)

multiple  independent   hierarchies   h=1,2,..,H  


coexist  corresponding   to  occupation,  
geography,   hobbies,   religion…
Source: Identity and Search in Social Networks: Duncan J. Watts, Peter Sheridan Dodds, and M. E. J.
Newman; Science 17 May 2002 296: 1302-1305. < http://arxiv.org/abs/cond-mat/0205383v1 >
Navigability and search strategy:
Reverse small world experiment

¤ Killworth &  Bernard  (1978):


¤ Given  hypothetical  targets  (name,  occupation,  location,  hobbies,  
religion…)  participants  choose  an  acquaintance  for  each  target
¤ based  on  (most  often)    occupation,  geography
¤ only  7%  because  they  “know  a  lot  of  people”
¤ Simple  greedy  algorithm:  most  similar  acquaintance
¤ two-­step  strategy  rare

Source: 1978 Peter D. Killworth and H. Russell Bernard. The Reverse Small World Experiment Social Networks 1:159–92.
Navigability and search strategy:
Small world experiment @ Columbia
Successful chains disproportionately used
• weak ties (Granovetter)
• professional ties (34% vs. 13%)
• ties originating at work/college
• target's work (65% vs. 40%)

. . . and disproportionately avoided


• hubs (8% vs. 1%) (+ no evidence of
funnels)
• family/friendship ties (60% vs. 83%)

Strategy: Geography -> Work


Search in power-law networks

Motivation
Power-law (PL) networks, social and P2P

Analysis of scaling of search strategies in PL networks

Simulation
artificial power-law topologies, real Gnutella networks

2
How do we search?

Mary

Who could Bob


introduce me to
Richard Gere?

Jane
AT&T Call Graph

from which calls were made


# of telephone numbers

# of telephone numbers called

Aiello et al. STOC ‘00


Gnutella network

power-law link distribution

proportion of nodes 2 data


10 power-law fit
τ = 2.07
1
10

0
10

0 1
10 10
number of neighbors

summer 2000,
data provided by Clip2
Preferential attachment model

Nodes join at different times

The more connections a node has, the more likely it is to acquire


new connections

Growth process produces power-law network

ping ping

host cache
Gnutella and the bandwidth barrier

file sharing w/o a central index

queries broadcast to every node within


radius ttl
⇒ as network grows, encounter a bandwidth
barrier (dial up modems cannot keep up with
query traffic, fragmenting the network)
Clip 2 report
Gnutella: To the Bandwidth Barrier and Beyond
http://www.clip2.com/gnutella.html#q17
power-law graph
number of
nodes found
94

67
63

54

6
2
1
Poisson graph

number of
nodes found
93

19
15
11
7
3
1
Search with knowledge of 2nd neighbors
Outline of search strategy

pass query onto only one neighbor at each step

OPTIONS

requires that nodes sign query


- avoid passing message onto a node twice

requires knowledge of one’s neighbors degree


- pass to the highest degree node

requires knowledge of one’s neighbors neighbors


- route to 2nd degree neighbors
Generating functions

¤M.E.J. Newman, S.H. Strogatz, and D.J. Watts

¤‘Random graphs with arbitrary degree distributions and


their applications’, PRE, cond-mat/0007235

¤Generating functions for degree distributions



G0 ( x ) = ∑ pk x k
k =0

¤Useful for computing moments of degree distribution,

¤component sizes, and average path lengths


Introducing cutoffs

kmax < N − 1 a node cannot have more connections than there are
other nodes
This is important for exponents close to 2

1
C = π6
∞ ∞

∑1 pk =∑1 Cτ xτ = 1 2 2


p( k > 1000,τ = 2) = ∑ pk ~ 0.001
1000
Probability that none of the nodes in a 1,000 node graph has 1000 or more
neighbors:
(1 − p(k > 1000,τ = 2))1000 ~ 0.36
without a cutoff, for τ = 2
have > 50% chance of observing a node with more neighbors than there
are nodes

for τ = 2.1, have a 25% chance


Selecting from a variety of cutoffs

1. kmax < N

2. pk = Ck −τ e − k / κ Newman et al.

1 million websites (~ 1997)

proportion of sites w/ so many links



⎧Ck −τ
k < (CN )
3. pk = ⎨
⎩0 otherwise
Aiello et al.

Generating Function N
(CN )1 τ
−τ k
G0 (x ) = C ∑ x
k
k =1
1000
# of sites linking to the site
Aiello’s ‘conservative’ vs. Havlin’s
n(
‘natural’ cutoff
k) N * pk = 1
−1
cutoff where expected Ck −τ
=N
number of nodes of degree 1
k is 1
1 k ~ Nτ
k
n( ∞
k) N* ∑ pk = 1
k = kmax

cutoff so that
expected number of nodes ∫ ck −τ
~ N −1
of degree > k is 1 k = kmax
1 1−τ
kmax ~ N −1
k
1
kmax ~ N τ −1
The imposed cutoff can have a dramatic
effect on the properties of the graph
degrees drawn at random, for τ = 2, and N = 1000
Generating   functions  for  degree   distributions
Random  graphs  with  arbitrary  degree  distributions  and  their  applications
by  Newman,  Strogatz  &  Watts
2 2 ∞
G0 ( x ) = ∑ pk x k is a generating function
2 k =0
1 1
pk ~ k −τ is the probability that a randomly
chosen vertex has degree k

1 < k >= ∑ kpk = G0' (1) is the expected degree of a


2 k randomly chosen vertex
2 '
G ( x ) is the distribution of remaining
0
2 G1 ( x ) = '
G (1) outgoing edges following and edge
0

z2 = G0' (1)G1' (1) is the expected number of second


degree neighbors
assuming neighbors don’t share edges
search with knowledge of first neighbors
kmax
G0 ( x ) = c ∑ k −τ x k
1 Generating function with cutoff
kmax
∂ Average degree of vertex
G0' ( x ) = G0 ( x ) = c ∑ k 1−τ x k −1
∂x 1
kmax kmax
1
'
G (1) =< k >= c ∑ k
0
1
1−τ
: ∫
1−τ
k dk =
τ −2
( 2 −τ
1 − kmax )
1

' G0' ( x ) c ∂ kmax 1−τ k −1


G (x) = '
1 = ' ∑ k x Average number of neighbors
G0 (1) G0 (1) ∂x 1
following an edge
c kmax 1−τ k −2
= ' ∑ k ( k − 1) x for 2<τ<3, and kmax~Na, decreases
G0 (1) 2 constant in N with N
3 −τ
' 1 kmax (τ − 2) − 22−τ (τ − 1) + kmax
2 −τ
(3 − τ )
G (1) = '
1
G0 (1) (τ − 2)(3 − τ )
search with knowledge of first neighbors (cont’d)

3 −τ 3 −τ
1 k τ − 2 k
z1B = G1' (1) : ' max
= 2 −τ
max
: k 3 −τ
max
G0 (1) (3 − τ ) 1 − kmax (3 − τ )

' kmax
In the limit τ->2, G (1) :
1
log(kmax )
Let’s for the moment ignore the fact that as we do a random walk,
we encounter neighbors
that we’ve seen before

N
s = number of steps =
z1B
Search time with different cutoffs
N N
If kmax = N, s(τ ) : 3 −τ = 3 −τ = N τ −2 ,2 < τ < 3
kmax N

s(2.1) : N 0.1

N log(kmax )
s: = log(N ),τ = 2
kmax
τ −2
If kmax = N1/(τ-1), s(τ ) : N N 2
3 −τ
= 3 −τ
= N τ −1
,2 < τ < 3
kmax
N τ −1

s(2.1) : N 0.18

N log(kmax )
s(2) : = log(N )
kmax
search with knowledge of first neighbors (cont’d)

N N
If kmax = N1/τ, s : 3 −τ = 1
= N 2−3 / τ ,2 < τ < 3
kmax
(N τ )3−τ

So the best we can do is N for exponents close to 2

2nd neighbor random walk, ignoring overlap:


2
3 −τ
⎡ ∂ ⎤ ' 2 ⎡ τ − 2 kmax ⎤
z2B = ⎢ G1(G1( x ))⎥ = ⎣G1(1)⎦ = ⎢
⎡ ⎤ 2−τ ⎥
⎣ ∂x ⎦ x =1 ⎣1 − kmax (3 − τ ) ⎦
ns = z2B ( N )
N
S~
z2B ( N )

3(1− 2 τ ) 0.15
( )
S N ,τ ~ N ( )
S N ,τ = 2.1 ~ N
Following the degree sequence

Go to highest degree node, then next highest, … etc.

kmax
z1D = ∫ Nk 1−τ dk ~ Nakmax
1−τ
kmax −a

a ~ s = # of steps taken

2nd neighbors, ignoring overlap:

' 2(2 −τ )
z1DG ( x ) ~ Nak max
1
2(τ − 2)
s ~ k max ~ N 2−4 / τ

Sdeg (N ,τ = 2.1) = N 0.1


Ratio of the degree of a node to the expected degree of its highest
degree neighbor for 10,000 node power-law graphs of varying exponents

τ = 2.00
20 τ = 2.25
τ = 2.50
τ = 2.75
τ = 3.00
10 τ = 3.25
degree of neighbor - 1

τ = 3.50
τ = 3.75
degree of node

0 10 20 30 40 50 60 70 80 90 100
degree of node
Exponents τ close to 2 required to search effectively

Gnutella

World Wide Web, τ ~ 2.0-2.3,


high degree nodes: directories, search engine
Social networks, AT&T call graph τ ~ 2.1

105

number of actors/actresses
actors, τ = 2
Actor collaboration actresses, τ = 2.1
graph 104
(imdb database)
103
τ ~ 2.0-2.2 102

101

100 0
10 101 102 103 104
number of costars
Following the degree sequence

18 17

6
10 5 1
9
8

50
Complications
¤Should not visit same node more than once

¤Many neighbors of current node being


visited were also neighbors of previously visited
nodes, and there is a bias toward high degree
nodes being ‘seen’ over and over again
Status and degree of node visited

30
not visited
25 visited
neighbors
visited
degree of node

20

15

10

00 100 200 300 400 500 600


step
Progress of exploration in a 10,000 node graph knowing
2nd degree neighbors
proportion of nodes found at step

1
random walk
degree sequence
0.1
seeking high degree nodes

cumulative nodes found at step


-2 speeds up the search process
10

-3
10 1
random walk
degree sequence
-4 0.8
10 2 3 4 5 6
1 10 10 10 10 10 10
step 0.6

0.4
about 50% of a 10,000 node graph
is explored in the first 12 steps 0.2

0 12 20 40 60 80 100

step
Scaling of search time with size of graph
3
10

random walk
α = 0.37 fit
degree sequence
covertime for half the nodes

α = 0.24 fit

2
10

1
10

0
10 1 2 3 4 5
10 10 10 10 10

size of graph
Comparison with a Poisson graph

Poisson
z ( x −1)
G0 (x ) = e
2
10 power-law
degree of current node

x
G1 (x ) = G0ʹ′ (x ) = G0 (x )
10
1
z
5
10
constant av. deg. = 3.4
γ = 1.0 fit

cover time for 1/2 of graph


0
10 4
0 1 2 3 10
10 10 10 10
step
3
10

expected degree and expected


2
degree following a link are equal 10

1
10
scaling is linear
0
10
1 2 4 6
10 10 10 10
number of nodes in graph
Gnutella network

50% of the files in a 700 node network can be found in < 8 steps
cumulative nodes found at step

0.8

0.6

0.4

0.2 high degree seeking 1st neighbors


high degree seeking 2nd neighbors

0
0 20 40 60 80 100
step
Expander graphs
Time permitting
Def: Random k-Regular Graphs

¤We need to define two concepts


¤1) Define: Random k-Regular graph
¤ Assume each node has k spokes (half-edges)
¤ Randomly pair them up!

¤2) Define: Expansion


¤ Graph G(V, E) has expansion α:
if∀ S ⊆ V: #edges leaving S
≥ α⋅ min(|S|,|V\S|)
¤ Or equivalently: # edges leaving S
α = min
S ⊆V min(| S |, | V \ S |)
S
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu V81\ S
Expansion: Intuition

S nodes ≥ α·S edges

S’ nodes ≥ α·S’ edges

# edges leaving S
α = min
S ⊆V min(| S |, | V \ S |)
(A  big)  graph  with  “good”  expansion
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 82
Expansion: k-Regular Graphs
# edges leaving S
α = min
S ⊆V min(| S |, | V \ S |)
¤ k-regular graph (every node has degree k):
¤ Expansion is at most k (when S is a single node)

¤ Is there a graph on n nodes (n→∞), of fixed max


deg. k, so that expansion α remains const?
Make this into
6x6 grid!
Examples: S
¤ n×n grid: k=4: α =2n/(n2/4)→0
(S=n/2 × n/2 square in the center)

¤ Complete binary tree:


α →0 for |S|=(n/2)-1
S
¤ Fact: For a random 3-regular graph on n nodes, there is some
const α (α >0, independent. of n) such that w.h.p.
the expansion of the graph is ≥ α (In fact, α=d/2 as d→∞)

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 83
Diameter of 3-Regular Rnd. Graph

¤ Fact: In a graph on n nodes with expansion


α, for all pairs of nodes s and t there is a path
of O((log n) / α) edges connecting them.
¤ Proof:
¤ Proof strategy: Make this
¤ We want to show that from any s into a 3-ary
node s there is a path of length tree
O((log n)/α) to any other node t S0

¤ Let Sj be a set of all nodes S1


found within j steps of BFS from s. S2
¤ How does Sj increase as a function of j?

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 84
Diameter of 3-Regular Rnd. Graph

¤Proof (continued):
¤ Let Sj be a set of all nodes found s
within j steps of BFS from s.
Make this
¤ We want to relate Sj and Sj+1 S0
into a 3-ary
Stree
1
Expansion
S2
α Sj
S j +1 ≥ S j + =
k
At most k edges
“collide” at a node |Sj| |Sj+1|
nodes nodes
j +1
⎛ α ⎞ ⎛ α ⎞
S j +1 ≥ S j ⎜1 + ⎟ = S 0 ⎜1 + ⎟ At  least   Each  of
⎝ k ⎠ ⎝ k ⎠ α|Sj| edges degree   k

where  S0=1 85
Diameter of 3-Regular Rnd. Graph
x
⎛ 1 ⎞
e = lim ⎜1 + ⎟
¤Proof (continued): x →∞ ⎝ x ⎠

¤ In how many steps of BFS In j steps, we In j steps, we


reach >n/2 nodes reach >n/2 nodes
⇒ Diameter  =  2·j

do we reach >n/2 nodes?


j
¤ Need j so that: S j = ⎛⎜1 + α ⎞⎟ ≥ n Make this
k ⎠ 2 s
⎝ into a 3-ary t
¤ Let’s set: k log 2 n
j= tree
¤ Then: α
In log(n) steps, we In log(n) steps, we ⇒ Diameter  
k log 2 n reach >n/2 nodes reach >n/2 nodes =  2  log(n)
⎛ α ⎞ α log 2 n n
⎜1 + ⎟ ≥2 =n> Claim:
k ⎠ 2
k log 2 n
⎝ ⎛ α ⎞
⎜1 + ⎟
α
≥ 2log 2 n
¤ In 2k/α·log n steps |Sj| grows to Θ(n). ⎝ k ⎠
Remember  n>0, α ≤ k then:
So, the diameter of G is O(log(n)/ α) 1
log n
if α = k : (1 + 1)1 2 = 2log 2 n
k
if α → 0 then = x → ∞ :
α
x log 2 n
⎛ 1 ⎞
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu and ⎜1 + ⎟ = elog 2 n > 2log 2 n
⎝ x ⎠ 86
Summary

¤Small world phenomenon:


¤ Local structure (e.g. clustering)
¤ Short average shortest path

¤The Watts-Strogatz captures both


¤Other models create navigable small-
world models
¤Power-law networks are navigable due
to presence of hubs

You might also like