0% found this document useful (0 votes)
2 views126 pages

Graph and Patterns

The document outlines a lecture on graph theory and its applications in data science, focusing on the motivations for studying graphs, various graph concepts, and network generation. It discusses real-world networks, properties of graphs, and models for complex networks, including power-law distributions and the concept of 'small world' phenomena. Additionally, it highlights the importance of understanding graph patterns and their implications in social behavior and information propagation.

Uploaded by

bocerin283
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views126 pages

Graph and Patterns

The document outlines a lecture on graph theory and its applications in data science, focusing on the motivations for studying graphs, various graph concepts, and network generation. It discusses real-world networks, properties of graphs, and models for complex networks, including power-law distributions and the concept of 'small world' phenomena. Additionally, it highlights the importance of understanding graph patterns and their implications in social behavior and information propagation.

Uploaded by

bocerin283
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 126

Algorithm Foundations of Data Science

Lecture 3: Graph and Patterns

MING GAO

DaSE@ECNU
(for course related communications)
[email protected]

Mar. 28, 2018


Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 2 / 47
Graph Motivations

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 3 / 47
Graph Motivations

Graphs - why should we care?

Networks in real world


“YahooWeb graph”: 1B vertices(Web sites), 6B edges (http
links)
Facebook, Twitter, etc: more than 1B users
Food Web: all biologies, food chain
Power-grid: vertices (plants or consumers), edges (power lines)
Airline route: vertices (airports), edges (flights)
Adoption: users purchase products, adopt services, etc.
MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 4 / 47
Graph Motivations

Motivation questions
Questions
What do real graphs look like?
Graph Motivations

Motivation questions
Questions
What do real graphs look like?
What properties of vertices, edges are important to model?
What local and global properties are important to measure?
Graph Motivations

Motivation questions
Questions
What do real graphs look like?
What properties of vertices, edges are important to model?
What local and global properties are important to measure?
Are graphs helpful to understand the real world?
Social influence
Recommendation
Information propagation
Human behaviors
Graph Motivations

Motivation questions
Questions
What do real graphs look like?
What properties of vertices, edges are important to model?
What local and global properties are important to measure?
Are graphs helpful to understand the real world?
Social influence
Recommendation
Information propagation
Human behaviors
Is a sub-graph “normal” (Water army, fraud detection, spam
filtering, etc)?
Graph Motivations

Motivation questions
Questions
What do real graphs look like?
What properties of vertices, edges are important to model?
What local and global properties are important to measure?
Are graphs helpful to understand the real world?
Social influence
Recommendation
Information propagation
Human behaviors
Is a sub-graph “normal” (Water army, fraud detection, spam
filtering, etc)?
How to generate realistic graphs?
Graph Motivations

Motivation questions
Questions
What do real graphs look like?
What properties of vertices, edges are important to model?
What local and global properties are important to measure?
Are graphs helpful to understand the real world?
Social influence
Recommendation
Information propagation
Human behaviors
Is a sub-graph “normal” (Water army, fraud detection, spam
filtering, etc)?
How to generate realistic graphs?
How to get a “good” sample of a network?
Graph Motivations

Motivation questions
Questions
What do real graphs look like?
What properties of vertices, edges are important to model?
What local and global properties are important to measure?
Are graphs helpful to understand the real world?
Social influence
Recommendation
Information propagation
Human behaviors
Is a sub-graph “normal” (Water army, fraud detection, spam
filtering, etc)?
How to generate realistic graphs?
How to get a “good” sample of a network?
How to design an efficient algorithm to handle large-scale
graphs?
MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 5 / 47
Graph Motivations

Models for complex networks

Steven H. S. proposes the model for complex networks in Nature


2001.
Graph Motivations

Models for complex networks

Steven H. S. proposes the model for complex networks in Nature


2001.
Regular network: each node has exactly the same
number of edges.
Graph Motivations

Models for complex networks

Steven H. S. proposes the model for complex networks in Nature


2001.
Regular network: each node has exactly the same
number of edges.
Random network: it is obtained by starting with
a set of n isolated vertices and adding successive
edges between them at random.
Graph Motivations

Models for complex networks

Steven H. S. proposes the model for complex networks in Nature


2001.
Regular network: each node has exactly the same
number of edges.
Random network: it is obtained by starting with
a set of n isolated vertices and adding successive
edges between them at random.
Scale-free network: it grows via attaching new
nodes to previously existing nodes randomly,
while the probability is proportional to the degree
of the target node, i.e., richly connected nodes
tend to get richer, leading to the formation of
hubs and a skewed degree distribution with a
heavy tail.(Matthew Effect or Pareto’s Law)

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 6 / 47
Graph Motivations

Are real graphs random?


Graph Motivations

Are real graphs random?

Looks random - right?


How does the Internet look like? Any rules?
Graph Motivations

Are real graphs random?

Looks random - right?


How does the Internet look like? Any rules?

Diameter: would you like to guess?


In- and out- degree distributions: if average degree is 2, what
is the most probable degree?
Other (surprising) patterns?

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 7 / 47
Graph Patterns

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 8 / 47
Graph Patterns

Power-law I
Graph Patterns

Power-law I

Internet topology

Out-degree distribution is plotted in log-log scale.


It forms a line with a slope ∼ −2.15
freq. = deg .−2.15

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 9 / 47
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20


principle, there are many settings with power law (Zipf’s law).
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20


principle, there are many settings with power law (Zipf’s law).
80% of Italy’s land owned by 20% of the population.
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20


principle, there are many settings with power law (Zipf’s law).
80% of Italy’s land owned by 20% of the population.
Richest 20% obtain 82.70% income.
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20


principle, there are many settings with power law (Zipf’s law).
80% of Italy’s land owned by 20% of the population.
Richest 20% obtain 82.70% income.
Bible: rank VS. frequency (log-log)
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20


principle, there are many settings with power law (Zipf’s law).
80% of Italy’s land owned by 20% of the population.
Richest 20% obtain 82.70% income.
Bible: rank VS. frequency (log-log)
Web: hit count VS. volume
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20


principle, there are many settings with power law (Zipf’s law).
80% of Italy’s land owned by 20% of the population.
Richest 20% obtain 82.70% income.
Bible: rank VS. frequency (log-log)
Web: hit count VS. volume
File: count VS. size
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20


principle, there are many settings with power law (Zipf’s law).
80% of Italy’s land owned by 20% of the population.
Richest 20% obtain 82.70% income.
Bible: rank VS. frequency (log-log)
Web: hit count VS. volume
File: count VS. size
Publication: citation VS. count
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20


principle, there are many settings with power law (Zipf’s law).
80% of Italy’s land owned by 20% of the population.
Richest 20% obtain 82.70% income.
Bible: rank VS. frequency (log-log)
Web: hit count VS. volume
File: count VS. size
Publication: citation VS. count
Business
80% of a company’s profits come from 20% of customers.
80% of a company’s complaints come from 20% of customers.
80% of a company’s profits come from 20% of the time staff
spent
80% of a company’s sales are made by 20% of sales staff

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 10 / 47
Graph Patterns

Power-law II
Graph Patterns

Power-law II

Rank of out-degrees

Vertices are ranked in decreasing out-degree order, and


plotted in log-log scale.
It forms a line with a slope ∼ −0.74
deg . = rank −0.74

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 11 / 47
Graph Patterns

Power-law III
Graph Patterns

Power-law III

Rank of eigenvalues

Eigenvalues of adjacency matrix (top 20) are ranked in


decreasing order, and plotted in log-log scale.
It forms a line with a slope ∼ −0.48
eigen. = rank −0.48

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 12 / 47
Graph Patterns

Power-law IV
Graph Patterns

Power-law IV

Hop plot

P many neighbors within 1, 2, · · · , h hops?


How
( hi=1 avg .i )
Pairs of vertices are plotted in log-log scale. It forms a
line with a slope ∼ 2.83
pairs. = hop 2.83

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 13 / 47
Graph Patterns

Power-law V
Graph Patterns

Power-law V

Counting of triangles

X-axis: # of triangles a vertex participates in


Y-axis: count of such vertices
In log-log scale, the plot is almost linear.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 14 / 47
Graph Patterns

Triangle law
How to count # triangles?

Naive algorithm: 3-way join (O(n3 )).


# triangles = 16 ni=1 λ3i . Why?
P

Because of skewness, we only need the top few eigenvalues via


using Lanczos algorithm.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 15 / 47
Graph Patterns

Erdös number

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 16 / 47
Graph Patterns

Erdös number

Small world - six degrees of separation

The world looks “small” when you think of how short a path of friends
it takes to get from you to almost anyone else. Stanley Milgram and
his colleagues in the 1960s did an experiment.
296 randomly chosen “starters” asked to forward a letter to a
“target” person, a stockbroker in Boston’s suburb.
The six degrees of separation was also found by Jure Leskovec
on Miscrosoft Instant Message.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 16 / 47
Graph Patterns

Shrinking diameter

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 17 / 47
Graph Patterns

Shrinking diameter

Citation or patents networks

For citation network, they collected citations among Physics papers.


11 years data
29,555 papers
352,807 citations
For each month, create a graph of all citations up to the
month.
The diameters are plotted in the figures.
MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 17 / 47
Graph Patterns

Temporal evolution of graphs


Question

Let N(t) and E (t) be # nodes and # edges at time t, respectively.


Suppose that N(t + 1) = 2N(t), what is your guess for E (t + 1)?
Graph Patterns

Temporal evolution of graphs


Question

Let N(t) and E (t) be # nodes and # edges at time t, respectively.


Suppose that N(t + 1) = 2N(t), what is your guess for E (t + 1)?
It is over-doubled, but obeying: E (t) ∼ N(t)α for all t, where
1 < α < 2.
For tree (clique), α = 1 (α = 2).

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 18 / 47
Graph Patterns

Dunbar’s number
Why primates have unusually big brains?

Social group size (and a lot of social behaviour as wel) correlates


with relative neocortex volume.
Graph Patterns

Dunbar’s number
Why primates have unusually big brains?

Social group size (and a lot of social behaviour as wel) correlates


with relative neocortex volume.
Our relationships form a hierarchically inclusive series of circles
of increasing size but decreasing intensity.
Graph Patterns

Dunbar’s number
Why primates have unusually big brains?

Social group size (and a lot of social behaviour as wel) correlates


with relative neocortex volume.
Our relationships form a hierarchically inclusive series of circles
of increasing size but decreasing intensity.
150 is the limitation on reciprocated relationships.
Graph Patterns

Dunbar’s number
Why primates have unusually big brains?

Social group size (and a lot of social behaviour as wel) correlates


with relative neocortex volume.
Our relationships form a hierarchically inclusive series of circles
of increasing size but decreasing intensity.
150 is the limitation on reciprocated relationships.
1500 is the limitation on memory for faces?

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 19 / 47
Graph Concepts Graph types

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 20 / 47
Graph Concepts Graph types

Graph types
Undirected graph

A undirected graph on 4 vertices


Degree: # edges connected to the
vertex
Degree 0 vertex: isolated vertex

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 21 / 47
Graph Concepts Graph types

Graph types
Undirected graph

A undirected graph on 4 vertices


Degree: # edges connected to the
vertex
Degree 0 vertex: isolated vertex

Directed graph

A directed graph on 4 vertices


In-degree: # incoming edges to the
vertex
Out-degree: # outgoing edges to the
vertex
Degree: in-degree + outdegree

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 21 / 47
Graph Concepts Graph types

Graph types cont.


Signed graph

A signed graph on 3 vertices


Positive-degree: # edges associated
with positive labels
Negative-degree: # edges associated
with negative labels

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 22 / 47
Graph Concepts Graph types

Graph types cont.


Signed graph

A signed graph on 3 vertices


Positive-degree: # edges associated
with positive labels
Negative-degree: # edges associated
with negative labels

Bipartite graph

Users interact on social platforms


Reply network
Retweet network
Adoption network

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 22 / 47
Graph Concepts Properties

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 23 / 47
Graph Concepts Properties

Paths

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 24 / 47
Graph Concepts Properties

Paths
Path
Path is a sequence of nodes with the property
that each consecutive pair in the sequence is
connected by an edge
Simple path does not repeat nodes.
The length of path is the number of
nodes in the path

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 24 / 47
Graph Concepts Properties

Paths
Path
Path is a sequence of nodes with the property
that each consecutive pair in the sequence is
connected by an edge
Simple path does not repeat nodes.
The length of path is the number of
nodes in the path

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 24 / 47
Graph Concepts Properties

Paths
Path
Path is a sequence of nodes with the property
that each consecutive pair in the sequence is
connected by an edge
Simple path does not repeat nodes.
The length of path is the number of
nodes in the path

Cycle

Cycle is a path with at least three edges, in


which the first and last nodes are the same.
Every edge in the 1970 Arpanet belongs to a
cycle, and this was by design. Why?

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 24 / 47
Graph Concepts Properties

Connectivity

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 25 / 47
Graph Concepts Properties

Connectivity

Connected component

A connected component is a subset of nodes


s.t.:
Every node in the subset has a path to
every other; and
Graph Concepts Properties

Connectivity

Connected component

A connected component is a subset of nodes


s.t.:
Every node in the subset has a path to
every other; and
The subset is not part of some larger set
with the property that every node can
reach every other.
Graph Concepts Properties

Connectivity

Connected component

A connected component is a subset of nodes


s.t.:
Every node in the subset has a path to
every other; and
The subset is not part of some larger set
with the property that every node can
reach every other.
A graph is connected if for every pair of nodes,
there is a path between them, i.e., the whole
graph is a connected component.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 25 / 47
Graph Concepts Properties

Strongly connected component

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 26 / 47
Graph Concepts Properties

Strongly connected component


Strongly connected component

A directed graph is strongly connected if there


is a path from every node to every other node.

Edges of the path must follow the


forward direction.
Graph Concepts Properties

Strongly connected component


Strongly connected component

A directed graph is strongly connected if there


is a path from every node to every other node.

Edges of the path must follow the


forward direction.
A undirected graph can be treated as a
bidirectional graph. Thus connected
component in a directed graph is also a
SCC.
Graph Concepts Properties

Strongly connected component


Strongly connected component

A directed graph is strongly connected if there


is a path from every node to every other node.

Edges of the path must follow the


forward direction.
A undirected graph can be treated as a
bidirectional graph. Thus connected
component in a directed graph is also a
SCC.
In a strongly connected component,
there are followers and followees for each
node.
Graph Concepts Properties

Strongly connected component


Strongly connected component

A directed graph is strongly connected if there


is a path from every node to every other node.

Edges of the path must follow the


forward direction.
A undirected graph can be treated as a
bidirectional graph. Thus connected
component in a directed graph is also a
SCC.
In a strongly connected component,
there are followers and followees for each
node.
SCCs can be treated as super-nodes.
MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 26 / 47
Graph Concepts Properties

Giant component

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 27 / 47
Graph Concepts Properties

Giant component

Giant connected component

A connected component that contains a sig-


nificant fraction of all the nodes.
When a network (e.g., friendship
network) contains a giant component, it
almost always contains only one.
Graph Concepts Properties

Giant component

Giant connected component

A connected component that contains a sig-


nificant fraction of all the nodes.
When a network (e.g., friendship
network) contains a giant component, it
almost always contains only one.
The other connected components are
very small by comparison.
Graph Concepts Properties

Giant component

Giant connected component

A connected component that contains a sig-


nificant fraction of all the nodes.
When a network (e.g., friendship
network) contains a giant component, it
almost always contains only one.
The other connected components are
very small by comparison.
The largest connected component would
break apart into three distinct
components if this node were removed
[related to robustness of network].

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 27 / 47
Graph Concepts Properties

Web giant component

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 28 / 47
Graph Concepts Properties

Web giant component

Web graph

Web contains a giant strongly connected com-


ponent (containing home pages of many of the
major commercial, governmental, and non-
profit organizations)
Graph Concepts Properties

Web giant component

Web graph

Web contains a giant strongly connected com-


ponent (containing home pages of many of the
major commercial, governmental, and non-
profit organizations)
IN: nodes that can reach the giant SCC
but cannot be reached from it, i.e.,
nodes that are “upstream” of it.
Graph Concepts Properties

Web giant component

Web graph

Web contains a giant strongly connected com-


ponent (containing home pages of many of the
major commercial, governmental, and non-
profit organizations)
IN: nodes that can reach the giant SCC
but cannot be reached from it, i.e.,
nodes that are “upstream” of it.
OUT: nodes that can be reached from
the giant SCC but cannot reach it, i.e.,
nodes are “downstream” of it.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 28 / 47
Graph Concepts Properties

Distance and diameter

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 29 / 47
Graph Concepts Properties

Distance and diameter

Distance or Geodesic distance


The distance between two vertices in a graph
is the number of edges in a shortest path.
Diameter is the length of the “longest
shortest path” between any two vertices
of a graph.
Graph Concepts Properties

Distance and diameter

Distance or Geodesic distance


The distance between two vertices in a graph
is the number of edges in a shortest path.
Diameter is the length of the “longest
shortest path” between any two vertices
of a graph.
Erdös number is bounded by diameter of
a graph.
Graph Concepts Properties

Distance and diameter

Distance or Geodesic distance


The distance between two vertices in a graph
is the number of edges in a shortest path.
Diameter is the length of the “longest
shortest path” between any two vertices
of a graph.
Erdös number is bounded by diameter of
a graph.
Research community is a small world
[Duncan Watts and Steven Strogatz
1998].

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 29 / 47
Graph Concepts Properties

Mean Geodesic distance of undirected networks


Definition
1 X
L= 1
dij ,
2 n(n + 1) i≥j

where n denotes # of nodes, and dij is the shortest distance between


nodes i and j.
Mean Geodesic distance includes distance to itself.
Graph Concepts Properties

Mean Geodesic distance of undirected networks


Definition
1 X
L= 1
dij ,
2 n(n + 1) i≥j

where n denotes # of nodes, and dij is the shortest distance between


nodes i and j.
Mean Geodesic distance includes distance to itself.
Can be computed in O(mn) using breadth first search, where
m denotes # of edges.
Graph Concepts Properties

Mean Geodesic distance of undirected networks


Definition
1 X
L= 1
dij ,
2 n(n + 1) i≥j

where n denotes # of nodes, and dij is the shortest distance between


nodes i and j.
Mean Geodesic distance includes distance to itself.
Can be computed in O(mn) using breadth first search, where
m denotes # of edges.
What happens if the network has multiple connected
components?
Graph Concepts Properties

Mean Geodesic distance of undirected networks


Definition
1 X
L= 1
dij ,
2 n(n + 1) i≥j

where n denotes # of nodes, and dij is the shortest distance between


nodes i and j.
Mean Geodesic distance includes distance to itself.
Can be computed in O(mn) using breadth first search, where
m denotes # of edges.
What happens if the network has multiple connected
components?
Harmonic mean (can have multiple connected components):

1 X
L−1 = d −1
1
2 n(n + 1) i≥j ij
MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 30 / 47
Graph Concepts Properties

Summarization

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 31 / 47
Graph Concepts Graph Modeling

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 32 / 47
Graph Concepts Graph Modeling

Adjacency matrix

Definition
Given a finite graph G = (V , E ), an adjacency matrix A is a |V |×|V |
matrix, whose elements indicate whether pairs of vertices are adjacent
or not in the graph.
The adjacency matrix is a (0,1)-matrix with zeros on its
diagonal.
If the graph is undirected, the adjacency matrix is symmetric.
Graph Concepts Graph Modeling

Adjacency matrix

Definition
Given a finite graph G = (V , E ), an adjacency matrix A is a |V |×|V |
matrix, whose elements indicate whether pairs of vertices are adjacent
or not in the graph.
The adjacency matrix is a (0,1)-matrix with zeros on its
diagonal.
If the graph is undirected, the adjacency matrix is symmetric.
Graph Concepts Graph Modeling

Adjacency matrix

Definition
Given a finite graph G = (V , E ), an adjacency matrix A is a |V |×|V |
matrix, whose elements indicate whether pairs of vertices are adjacent
or not in the graph.
The adjacency matrix is a (0,1)-matrix with zeros on its
diagonal.
If the graph is undirected, the adjacency matrix is symmetric.

The adjacency matrix A of a bipartite


graph whose two parts have r and s
vertices
 can be written
 in the form
0r ,r B
A= .
B 0s,s

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 33 / 47
Graph Concepts Graph Modeling

Storing a graph
Adjacency lists

An adjacency list is a collection of unordered lists used to represent


a graph G . Each list describes the set of neighbors of a vertex in the
graph.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 34 / 47
Graph Concepts Graph Modeling

Random walk of a graph


Markov chain
Suppose that G = (V , E ) is a graph of n vertices with vertex set V and
edge set E ⊂ V × V . Let N(x) = {y |(x, y ) ∈ E }, and degree of vertex x
denote as d(x) = |N(x)|.
Graph Concepts Graph Modeling

Random walk of a graph


Markov chain
Suppose that G = (V , E ) is a graph of n vertices with vertex set V and
edge set E ⊂ V × V . Let N(x) = {y |(x, y ) ∈ E }, and degree of vertex x
denote as d(x) = |N(x)|.
Note that x is isolated vertex if N(x) = 0.
G is an undirected graph, we have (x, y ) ∈ E if (y , x) ∈ E .
Graph Concepts Graph Modeling

Random walk of a graph


Markov chain
Suppose that G = (V , E ) is a graph of n vertices with vertex set V and
edge set E ⊂ V × V . Let N(x) = {y |(x, y ) ∈ E }, and degree of vertex x
denote as d(x) = |N(x)|.
Note that x is isolated vertex if N(x) = 0.
G is an undirected graph, we have (x, y ) ∈ E if (y , x) ∈ E .
1
For each x ∈ V , the transition matrix P(y |x) is d(x) if y ∈ N(x), and
P(y |x) = 0 otherwise.
Graph Concepts Graph Modeling

Random walk of a graph


Markov chain
Suppose that G = (V , E ) is a graph of n vertices with vertex set V and
edge set E ⊂ V × V . Let N(x) = {y |(x, y ) ∈ E }, and degree of vertex x
denote as d(x) = |N(x)|.
Note that x is isolated vertex if N(x) = 0.
G is an undirected graph, we have (x, y ) ∈ E if (y , x) ∈ E .
1
For each x ∈ V , the transition matrix P(y |x) is d(x) if y ∈ N(x), and
P(y |x) = 0 otherwise.
Let X be a random walk on G , if G is connected then X is
irreducible.
X has period 2 if and only if G is bipartite, in which case the parts
are the cyclic classes of X .
Let D = diag (d1 , d2 , · · · , dn ) be a diagonal matrix, and P = D −1 A.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 35 / 47
Graph Concepts Graph Modeling

Combinatorial Laplacian of graph


Definition
Graph Concepts Graph Modeling

Combinatorial Laplacian of graph


Definition
Given a graph G , (Combinatorial) Laplacian of G :
L = D − A,  i.e.,
 dv , if u = v ;
L(u, v ) = −1, if u and v are adjacent ;
0, otherwise.

If G is an undirected graph G, and its Laplacian matrix
L with eigenvalues λ0 ≤ λ1 ≤ · · · ≤ λn−1 , then
Graph Concepts Graph Modeling

Combinatorial Laplacian of graph


Definition
Given a graph G , (Combinatorial) Laplacian of G :
L = D − A,  i.e.,
 dv , if u = v ;
L(u, v ) = −1, if u and v are adjacent ;
0, otherwise.

If G is an undirected graph G, and its Laplacian matrix
L with eigenvalues λ0 ≤ λ1 ≤ · · · ≤ λn−1 , then
L is singular and symmetric(existing λi = 0).
Graph Concepts Graph Modeling

Combinatorial Laplacian of graph


Definition
Given a graph G , (Combinatorial) Laplacian of G :
L = D − A,  i.e.,
 dv , if u = v ;
L(u, v ) = −1, if u and v are adjacent ;
0, otherwise.

If G is an undirected graph G, and its Laplacian matrix
L with eigenvalues λ0 ≤ λ1 ≤ · · · ≤ λn−1 , then
L is singular and symmetric(existing λi = 0).
Since row sum and column sum of L is zero,
λ0 = 0 and v0 = (1, 1, · · · , 1).
Graph Concepts Graph Modeling

Combinatorial Laplacian of graph


Definition
Given a graph G , (Combinatorial) Laplacian of G :
L = D − A,  i.e.,
 dv , if u = v ;
L(u, v ) = −1, if u and v are adjacent ;
0, otherwise.

If G is an undirected graph G, and its Laplacian matrix
L with eigenvalues λ0 ≤ λ1 ≤ · · · ≤ λn−1 , then
L is singular and symmetric(existing λi = 0).
Since row sum and column sum of L is zero,
λ0 = 0 and v0 = (1, 1, · · · , 1).
The second smallest eigenvalue is called algebraic
connectivity.
Graph Concepts Graph Modeling

Combinatorial Laplacian of graph


Definition
Given a graph G , (Combinatorial) Laplacian of G :
L = D − A,  i.e.,
 dv , if u = v ;
L(u, v ) = −1, if u and v are adjacent ;
0, otherwise.

If G is an undirected graph G, and its Laplacian matrix
L with eigenvalues λ0 ≤ λ1 ≤ · · · ≤ λn−1 , then
L is singular and symmetric(existing λi = 0).
Since row sum and column sum of L is zero,
λ0 = 0 and v0 = (1, 1, · · · , 1).
The second smallest eigenvalue is called algebraic
connectivity.
For weighted graph G , Laplacian can be defined
in a same manner.
MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 36 / 47
Graph Concepts Graph Modeling

Incidence matrix
Definition
An incidence matrix B is a |V |×|E | matrix that shows the relationship
between vertices and edges of graph G = (V , E ).
Graph Concepts Graph Modeling

Incidence matrix
Definition
An incidence matrix B is a |V |×|E | matrix that shows the relationship
between vertices and edges of graph G = (V , E ).
Each column corresponds to an edge e = (vi , vj ) (with i < j),
where the value of an entry is 1 in the row corresponding to vi ,
and entry −1 in the row corresponding to vj .
Graph Concepts Graph Modeling

Incidence matrix
Definition
An incidence matrix B is a |V |×|E | matrix that shows the relationship
between vertices and edges of graph G = (V , E ).
Each column corresponds to an edge e = (vi , vj ) (with i < j),
where the value of an entry is 1 in the row corresponding to vi ,
and entry −1 in the row corresponding to vj .
L = BB T . Thus, L is positive semidefinite and has nonnegative
eigenvalues since xT Lx = xT BB T x = (B T x)T (B T x) ≥ 0
(λi ≥ 0).

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 37 / 47
Graph Concepts Graph Modeling

Normalized Laplacian of graph


Definition

Given a graph  G , normailized Laplacian of G : L = D −1/2 LD −1/2 ,


 1, if u = v ;
1
i.e., L(u, v ) = − d d , if u and v are adjacent ;

 u v
0, otherwise.
Graph Concepts Graph Modeling

Normalized Laplacian of graph


Definition

Given a graph  G , normailized Laplacian of G : L = D −1/2 LD −1/2 ,


 1, if u = v ;
1
i.e., L(u, v ) = − d d , if u and v are adjacent ;

 u v
0, otherwise.
L = D −1/2 BB T D −1/2 = I − D −1/2 AD −1/2 =
D 1/2 (I − P)D −1/2 . Thus, L is positive semidefinite and
0 ≤ λ(L) ≤ 2.
Graph Concepts Graph Modeling

Normalized Laplacian of graph


Definition

Given a graph  G , normailized Laplacian of G : L = D −1/2 LD −1/2 ,


 1, if u = v ;
1
i.e., L(u, v ) = − d d , if u and v are adjacent ;

 u v
0, otherwise.
L = D −1/2 BB T D −1/2 = I − D −1/2 AD −1/2 =
D 1/2 (I − P)D −1/2 . Thus, L is positive semidefinite and
0 ≤ λ(L) ≤ 2.
L is singular and symmetric, and λ0 = 0 corresponding to
eigenvector D 1/2 v0T = D 1/2 (1, 1, · · · , 1)T .
Graph Concepts Graph Modeling

Normalized Laplacian of graph


Definition

Given a graph  G , normailized Laplacian of G : L = D −1/2 LD −1/2 ,


 1, if u = v ;
1
i.e., L(u, v ) = − d d , if u and v are adjacent ;

 u v
0, otherwise.
L = D −1/2 BB T D −1/2 = I − D −1/2 AD −1/2 =
D 1/2 (I − P)D −1/2 . Thus, L is positive semidefinite and
0 ≤ λ(L) ≤ 2.
L is singular and symmetric, and λ0 = 0 corresponding to
eigenvector D 1/2 v0T = D 1/2 (1, 1, · · · , 1)T .
P has an eigenvalue 1 − λi , where λi is an eigenvalue of L.
Graph Concepts Graph Modeling

Normalized Laplacian of graph


Definition

Given a graph  G , normailized Laplacian of G : L = D −1/2 LD −1/2 ,


 1, if u = v ;
1
i.e., L(u, v ) = − d d , if u and v are adjacent ;

 u v
0, otherwise.
L = D −1/2 BB T D −1/2 = I − D −1/2 AD −1/2 =
D 1/2 (I − P)D −1/2 . Thus, L is positive semidefinite and
0 ≤ λ(L) ≤ 2.
L is singular and symmetric, and λ0 = 0 corresponding to
eigenvector D 1/2 v0T = D 1/2 (1, 1, · · · , 1)T .
P has an eigenvalue 1 − λi , where λi is an eigenvalue of L.
The regularization of graph G :
f 2
F T LF = 12 ni=1 nj=1 √fdi − √j
P P
.
ii djj

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 38 / 47
Graph Concepts Graph Modeling

Properties of normalized Laplacian [WebScience 2013]

Properties

The eigenvalues of the normalized Laplacian matrix of graph G with


n vertices satisfy the following properties:
n
0 ≤ λ2 ≤ n−1 ≤ λn ≤ 2.
n
λ2 = · · · = λn = n−1 if and only if G is a clique.
λn = 2 if and only if G is a bi-clique.
G has at least i connected components if and only if λj = 0,
for j = 1, 2, · · · , i.
The mean of eigenvalues λ2 , λ3 , · · · , λn of a network G with n
n
vertices is n−1 .
The variance of eigenvalues λ2 , λ3 , · · · , λn of a network G with
1 Pn Pn Aij n
n vertices is n−1 i=1 j6=i d(vi )d(vj ) − (n−1)2 (R-energy).

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 39 / 47
Network Generation

Network generation

Generators
Erdös-Renyi model
Preferential attachment
Variations + extensions
Copying model
Triad-closing
Butterfly model
Recursion - Kronecker generator

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 40 / 47
Network Generation

Random network generator: Erdös-Renyi model

Erdös-Renyi model is known as the random graph model, which gen-


erates undirected random graphs.
Parameters: N (# vertices) and p (prob. of forming an edge)
For each possible node pair, the approach generates an edge
with probability p. Thus, # edges = pN(N−1)
2 .
Degree distribution:
P(node has degree k) = N−1
 k
k p (1 − p)N−1−k
Follows binomial distribution with mean (N − 1)p and variance
(N − 1)p(1 − p) (not power-law distribution).

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 41 / 47
Network Generation

Scale-free network generator


Preferential attachment model
The more connected a node is, the more likely it is to receive new links
(namely, Rich gets Richer, Matthew Effect or Paretos Law, etc.).
Price model
Barabasi Albert model

Price model for citation networks


Each new paper is generated with m citations (mean).
New papers cite previous papers with probability proportional
to their indegree (citations).
Each new paper is generated with m citations (mean).
New papers cite previous papers with probability proportional to
their indegree (citations).
Power law with exponent α = 2 + m1 [Science 1965]

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 42 / 47
Network Generation

Barabasi Albert model

Model
Network Generation

Barabasi Albert model

Model

Start with an initial network of m0 (≥ 2)


nodes, and the degree of each node ≥ 1,
otherwise it will always remain isolated.
Network Generation

Barabasi Albert model

Model

Start with an initial network of m0 (≥ 2)


nodes, and the degree of each node ≥ 1,
otherwise it will always remain isolated.
For each new node, connect it to m
existing nodes i with a probability pi ,
where pi = Pki kj , where ki is degree of
j
node i.
Network Generation

Barabasi Albert model

Model

Start with an initial network of m0 (≥ 2)


nodes, and the degree of each node ≥ 1,
otherwise it will always remain isolated.
For each new node, connect it to m
existing nodes i with a probability pi ,
where pi = Pki kj , where ki is degree of
j
node i.
Results in a single connected component
with power-law degree distribution with
α = 3 [Reviews of Modern Physics 2003].

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 43 / 47
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product


matrix S ∈ Rnp×mq is given by
 
u11 V u12 V ··· u1m V
O  u21 V u22 V ··· u2m V 
S =U V =
 
··· ··· ··· ··· 
un1 V un2 V ··· unm V
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product


matrix S ∈ Rnp×mq is given by
 
u11 V u12 V · · · u1m V
O  u21 V u22 V · · · u2m V 
S =U V = ···

··· ··· ··· 
un1 V un2 V · · · unm V
N N N N N
A (aB + C ) = (aA) B + A C , but A B 6= B A.
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product


matrix S ∈ Rnp×mq is given by
 
u11 V u12 V · · · u1m V
O  u21 V u22 V · · · u2m V 
S =U V = ···

··· ··· ··· 
un1 V un2 V · · · unm V
N N N N N
A (aB + C ) = (aA) B + A C , but A B 6= B A.
N N N
(A B)(C D) = (AC ) (BD).
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product


matrix S ∈ Rnp×mq is given by
 
u11 V u12 V · · · u1m V
O  u21 V u22 V · · · u2m V 
S =U V = ···

··· ··· ··· 
un1 V un2 V · · · unm V
N N N N N
A (aB + C ) = (aA) B + A C , but A B 6= B A.
N N N
(A B)(C D) = (AC ) (BD).
(A B) = A−1 B −1 and (A B)T = AT
N −1 N N N T
B
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product


matrix S ∈ Rnp×mq is given by
 
u11 V u12 V · · · u1m V
O  u21 V u22 V · · · u2m V 
S =U V = ···

··· ··· ··· 
un1 V un2 V · · · unm V
N N N N N
A (aB + C ) = (aA) B + A C , but A B 6= B A.
N N N
(A B)(C D) = (AC ) (BD).
(A B) = A−1 B −1 and (A B)T = AT
N −1 N N N T
B
|A B| = |A| |B| and Tr (A B) = Tr (A)Tr (B) if A ∈ Rn×n and
m n
N N
B ∈ Rm×m .
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product


matrix S ∈ Rnp×mq is given by
 
u11 V u12 V · · · u1m V
O  u21 V u22 V · · · u2m V 
S =U V = ···

··· ··· ··· 
un1 V un2 V · · · unm V
N N N N N
A (aB + C ) = (aA) B + A C , but A B 6= B A.
N N N
(A B)(C D) = (AC ) (BD).
(A B) = A−1 B −1 and (A B)T = AT
N −1 N N N T
B
|A B| = |A| |B| and Tr (A B) = Tr (A)Tr (B) if A ∈ Rn×n and
m n
N N
B ∈ Rm×m .
[k]
We define k−th Kronecker power of A1 as A1 (abbreviated to Ak ),
[k] N
where Ak = A1 = Ak−1 A1 .

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 44 / 47
Network Generation

Kronecker model cont.


Model
Instead of a single property of the network, Kronecker model can fit
multiple properties of a network, which makes them interesting for
fitting.
Network Generation

Kronecker model cont.


Model
Instead of a single property of the network, Kronecker model can fit
multiple properties of a network, which makes them interesting for
fitting.
Deterministic Kronecker model: it begins with an initiator
graph G1 with N1 nodes, and produces successively larger
graphs G2 , · · · , Gn such that the k−th graph Gk has Nk = N1k .
Network Generation

Kronecker model cont.


Model
Instead of a single property of the network, Kronecker model can fit
multiple properties of a network, which makes them interesting for
fitting.
Deterministic Kronecker model: it begins with an initiator
graph G1 with N1 nodes, and produces successively larger
graphs G2 , · · · , Gn such that the k−th graph Gk has Nk = N1k .
Stochastic Kronecker model: it starts with a N1 × N1
probability matrix Θ = [θij ], where the element θij ∈ [0, 1] is
the probability that edge (i, j) is present.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 45 / 47
Network Generation

Sources for generator

Generators
Erdös Renyi: http://ladamic.com/netlearn/NetLogo501/
ErdosRenyiDegDist.html
BRITE: http://wwwcsbuedu/brite/
INET: http://topology.eecs.umich.edu/inet
Kronecker:
[email protected]
http://www.cc.gatech.edu/dimacs10/archive/
kronecker.shtml
http://www.cc.gatech.edu/dimacs10/archive/
kronecker.shtml

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 46 / 47
Take-home msg.

Take-home messages

Graph
Motivations
Patterns
Graph aspects
Graph types
Properties
Graph modeling
Network generation
Erdös Renyi model
Barabasi Albert model
Kronecker model

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 47 / 47

You might also like