0% found this document useful (0 votes)
65 views46 pages

Data Mining and BI: Social Network Analytics: Random Graphs

This document discusses random graph models and how they can be used to model real-world networks. It introduces the Erdos-Renyi random graph model and describes its key properties like the binomial degree distribution and emergence of a giant component. It then discusses more realistic models like preferential attachment models, which incorporate growth over time and preferential attachment to high-degree nodes, resulting in power-law degree distributions often seen in real networks.

Uploaded by

marouli90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views46 pages

Data Mining and BI: Social Network Analytics: Random Graphs

This document discusses random graph models and how they can be used to model real-world networks. It introduces the Erdos-Renyi random graph model and describes its key properties like the binomial degree distribution and emergence of a giant component. It then discusses more realistic models like preferential attachment models, which incorporate growth over time and preferential attachment to high-degree nodes, resulting in power-law degree distributions often seen in real networks.

Uploaded by

marouli90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Data Mining and BI: Social

Network Analytics
Random Graphs

Credits: Lada Adamic


Source: https://github.com/ladamalina/coursera-sna/tree/master/Week%202.%20Random%20Graph%20Models
Outline
● Introduction to random graphs
● Degree Distribution
● Giant Component
● Average Shortest Path
Network models
● Why model?
○ simple representation of complex network
○ can derive properties mathematically
○ predict properties and outcomes
● Also: to have a strawman
○ In what ways is your real-world network different from hypothesized model?
○ What insights can be gleaned from this?
Erdös and Rényi
Erdös-Renyi: simplest network model
● Assumptions
○ nodes connect at random
○ network is undirected
● Key parameter (besides number of nodes N) : p or M
○ p = probability that any two nodes share an edge
○ M = total number of edges in the graph
what they look like
Binomial degree distribution
● (N-1,p)-model: For each potential edge we flip a biased coin
○  with probability p we add the edge
○  with probability (1-p) we don’t

Can be approximated
by Poisson distribution
Degree Distribution
● What is the probability that a node has 0,1,2,3, … edges?
● Probabilities sum to 1
Quiz
● The maximum degree of a node in a simple (no multiple edges between the
same two nodes) N node graph is
○ N
○ N-1
○ N-2
Fact
● In an Erdos-Renyi random graph the maximal degree does not vary much
from the average
○ The degrees of the nodes tend to be similar
Fact
● Random networks do not have large hubs
Giant Component
● As N increases, a giant component emerges
○ I.e. a subgraph that comprises a fraction of the whole graph
● What is the average degree z at which the giant component starts to emerge?
○ 0
○ 1
○ 3/2
○ 3
Percolation threshold
● Percolation threshold: how many edges need to be
added before the giant component appears?
● As the average degree increases to z = 1,
a giant component suddenly appears average degree
Giant component: Another angle
● How many other friends besides you does each of your friends have?
○ By property of degree distribution the average degree of your friends, you excluded, is z
○ so at z = 1, each of your friends is expected to have another friend, who in turn have another
friend, etc.
○ the giant component emerges
Why just one giant component?
● What if you had 2, how long could they be sustained as the network
densifies?
Average Shortest Path
● How many hops on average between each pair of nodes?
● again, each of your friends has z = avg. degree friends besides you
● ignoring loops, the number of people you have at distance l is zl
Friends at distance l
Nl = zl

Scaling: Average shortest path lav

lav ~ logN/logz
What does it mean in practice
● Erdös-Renyi networks can grow to be very large but nodes will be just a few
hops apart
Logarithmic axes
● powers of a number will be uniformly spaced (20, 21, 22, 23, 24,...)
Erdös-Renyi avg. shortest path in log-log
Realism
● Consider alternative mechanisms of constructing a network that are also fairly
“random”.
● How do they stack up against Erdös-Renyi?
Other models
Introduction model
● Prob-link is the p (probability of any two nodes sharing an edge) that we are
used to
● But, with probability prob-intro the other node is selected among one of our
friends’ friends and not completely at random
Static Geographical model
● Each node connects to num-neighbors of its closest neighbors
Random encounter
● People move around randomly and connect to people they bump into
Growth model
● Instead of starting out with a fixed number of nodes, nodes are added over
time
Conclusion
● in some instances the ER model is plausible
● if dynamics are different, ER model may be a poor fit
Growth and preferential attachment
models
Example online Q&A site
Uneven participation
● Many people having replied few
Times Vs Few people having
replied many times
Real-world degree distributions
● Sexual networks
● Great variation in contact numbers
● Many people with small number of
partners Vs Few people with high
number of partners
Power-law distribution
● High skew (asymmetry)
● Straight line on a loglog plot (right) Vs linear plot (left)
Poisson distribution
● Little skew (asymmetry)
● Curved on a loglog plot (right) Vs linear plot (left)
Power law distribution
● Straight line on a log-log plot

ln(p(k))=c-αln(k)

● Exponentiate both sides to get that p(k), the probability of observing an node
of degree ‘k’ is given by:

p(k)=Ck-α

● C: normalization constant (probabilities over all k must sum to 1)


● α: power law exponent
2 ingredients in generating power-law networks
● nodes appear over time (growth)
● nodes prefer to attach to nodes with many connections (preferential
attachment, cumulative advantage)
Ingredient # 1: growth over time
nodes appear one by one, each selecting m other nodes at random to connect to

m=2
Random network growth
● one node is born at each time tick
● at time t there are t nodes
● change in degree ki of node i (born at time i, with 0 < i < t)

m/t

● There are m new edges being added per unit time (with 1 new node)
● The m edges are being distributed among t nodes
Age and degree
● On average ki(t)>kj(t)
● Older nodes on average have mode degrees
Ingredient #2: preferential attachment
● Preferential attachment
○ new nodes prefer to attach to well-connected nodes over less-well connected nodes
● Process also known as:
○ Cumulative advantage
○ Rich-get-richer
○ Matthew effect
Price's preferential attachment model for citation networks

● [Price 65]
○ each new paper is generated with m citations (mean)
○ new papers cite previous papers with probability proportional to their indegree (citations)
○ what about papers without any citations?
■ each paper is considered to have a “default” citation
■ probability of citing a paper with degree k, proportional to k+1
● Power law with exponent α = 2+1/m
Cumulative advantage: how?
● Copying mechanism
● Visibility
Barabasi-Albert model
● First used to describe skewed degree distribution of the World Wide Web
● Each node connects to other nodes with probability proportional to their
degree
○ the process starts with some initial subgraph
○ each new node comes in with m edges
○ probability of connecting to node i
● Results in power-law with exponent α = 3
Random Vs Preferential
Properties of the BA graph
●  The distribution is scale free with exponent α = 3

P(k) = 2m2/k3

●  The graph is connected


○ Every new vertex is born with a link or several links (depending on whether m = 1 or m > 1)
○ It then connects to an “older” vertex, which itself connected to another vertex when it was
introduced
○ And we started from a connected core
● The older are richer
○ Nodes accumulate links as time goes on, which gives older nodes an advantage since newer
nodes are going to attach preferentially – and older nodes have a higher degree to tempt them
with than some new kid on the block
Visualization
Summary: growth models
● Most networks aren't 'born', they are made
● Nodes being added over time means that older nodes can have more time to
accumulate edges
● Preference for attaching to 'popular' nodes further skews the degree
distribution toward a power-law

You might also like