SNA-Graph Essentials

Download as pdf or txt
Download as pdf or txt
You are on page 1of 106

SOCIAL

MEDIA
MINING
Graph
Essentials
Bridges of Konigsberg

• There are 2 islands and 7 bridges that connect


the islands and the mainland
• Find a path that crosses each bridge exactly once

City Map (From Wikipedia) Graph Representation

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 22
Modeling the Problem by Graph Theory

• The key to solve this problem is an ingenious


graph representation

• Euler proved that since except for the starting


and ending point of a walk, one has to enter
and leave all other nodes, thus these nodes
should have an even number of bridges
connected to them

• This property does not hold in


this problem
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 33
Networks

• A network is a graph.
– Elements of the network have meanings
• Network problems can usually be represented in
terms of graph theory

Twitter example:
• Given a piece of information, a
network of individuals, and the
cost to propagate information
among any connected pair, find
the minimum cost to disseminate
the information to all individuals.
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 44
Food Web

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 55
Network are Pervasive
Twitter Networks

Citation Networks
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 66
Internet

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 77
Network of the US Interstate Highways

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 88
NY State Road Network

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 99
Social Networks and Social Network Analysis

• A social network
– A network where elements have a social structure
• A set of actors (such as individuals or organizations)
• A set of ties (connections between individuals)

• Social networks examples:


– your family network, your friend network, your
colleagues ,etc.

• To analyze these networks we can use Social


Network Analysis (SNA)

• Social Network Analysis is an interdisciplinary


field from social sciences, statistics, graph theory,
complex networks, and now computer science

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 10
10
Social Networks: Examples
High school dating

High school friendship

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 11
11
Graph Basics

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 12
12
Nodes and Edges

A network is a graph, or a collection of points


connected by lines
• Points are referred to as nodes, actors, or
vertices (plural of vertex)
• Connections are referred to as edges or ties

Node
Edge

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 13
13
Nodes or Actors

• In a friendship social graph, nodes are people


and any pair of people connected denotes the
friendship between them
• Depending on the context, these nodes are
called nodes, or actors
– In a web graph, “nodes” represent sites and the
connection between nodes indicates web-links
between them
– In a social setting, these nodes are called actors

– The size of the graph is


Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 14
14
Edges

• Edges connect nodes and are also known as


ties or relationships

• In a social setting, where nodes represent


social entities such as people, edges indicate
internode relationships and are therefore
known as relationships or (social) ties

• Number is edges (size of the edge-set) is


denoted as
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 15
15
Directed Edges and Directed Graphs

• Edges can have directions. A directed edge is sometimes


called an arc

• Edges are represented using their end-points .


• In undirected graphs both representations are the same

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 16
16
Neighborhood and Degree (In-degree, out-degree)

For any node 𝑣, in an undirected graph, the set of nodes it is


connected to via an edge is called its neighborhood and is
represented as 𝑁 𝑣

– In directed graphs we have incoming neighbors 𝑁𝑖𝑛 𝑣 (nodes that


connect to 𝑣) and outgoing neighbors 𝑁𝑜𝑢𝑡 𝑣 .

The number of edges connected to one node is the degree


of that node (the size of its neighborhood)
– Degree of a node 𝑖 is usually presented using notation 𝑑𝑖

In Directed graphs:
– In-degrees is the number of edges pointing towards a node

– Out-degree is the number of edges pointing away from a node

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 17
17
Degree and Degree Distribution

• Theorem 1. The summation of degrees in an


undirected graph is twice the number of
edges

• Lemma 1. The number of nodes with odd


degree is even
• Lemma 2. In any directed graph, the
summation of in-degrees is equal to the
summation of out-degrees,

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 18
18
Degree Distribution

When dealing with very large graphs, how nodes’


degrees are distributed is an important concept to
analyze and is called Degree Distribution

(Degree sequence)

is the number of
nodes with degree 𝑑

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 19
19
Degree Distribution Plot

The 𝑥-axis represents the degree and the 𝑦-axis


represents the fraction of nodes having that
degree

– On social networking sites


There exist many users with few
connections and there exist a
handful of users with very large
numbers of friends.
(Power-law degree distribution)

Facebook
Degree Distribution
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 20
20
Subgraph

• Graph 𝐺 can be represented as a pair


where 𝑉 is the node set and 𝐸 is the edge set

• is a subgraph of

5
6 4 5

3 1
1

3 2
2

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 21
21
Graph Representation

• Adjacency Matrix
• Adjacency List
• Edge List
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 22
22
Graph Representation

• Graph representation is straightforward


and intuitive, but it cannot be
effectively manipulated using
mathematical and computational tools

• We are seeking representations that can


store these two sets in a way such that
– Does not lose information
– Can be manipulated easily by computers
– Can have mathematical methods applied easily

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 23
23
Adjacency Matrix (a.k.a. sociomatrix)

 1, if there is an edge between nodes 𝑣𝑖 and 𝑣𝑗


Aij  
 0, otherwise

Diagonal Entries are self-links or loops


Social media networks have
very sparse Adjacency matrices
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 24
24
Adjacency List

• In an adjacency list for every node, we maintain


a list of all the nodes that it is connected to
• The list is usually sorted based on the node
order or other preferences

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 25
25
Edge List

• In this representation, each element is an


edge and is usually represented as 𝑢, 𝑣 ,
denoting that node 𝑢 is connected to node
𝑣 via an edge

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 26
26
Types of Graphs

• Null, Empty,
Directed/Undirected/Mixed,
Simple/Multigraph, Weighted,
Signed Graph, Webgraph
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 27
27
Null Graph and Empty Graph

• A null graph is one where the node set is


empty (there are no nodes)
– Since there are no nodes, there are also no edges

• An empty graph or edge-less graph is one


where the edge set is empty,

• The node set can be non-empty.


– A null-graph is an empty graph.

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 28
28
Directed/Undirected/Mixed Graphs

• The adjacency matrix for


directed graphs is often
not symmetric (𝑨 ≠ 𝑨𝑻 )
– 𝑨𝒊𝒋  𝑨𝒋𝒊
– We can have equality
though

The adjacency
matrix for
undirected graphs is
symmetric (𝑨 = 𝑨𝑻 )

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 29
29
Simple Graphs and Multigraphs

• Simple graphs are graphs where only a single


edge can be between any pair of nodes
• Multigraphs are graphs where you can have
multiple edges between two nodes and loops

Simple graph Multigraph

• The adjacency matrix for multigraphs can include


numbers larger than one, indicating multiple
edges between nodes

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 30
30
Weighted Graph

• A weighted graph 𝑮(𝑽, 𝑬, 𝑾) is one


where edges are associated with
weights

– For example, a graph could


represent a map where
nodes are airports and edges
are routes between them
• The weight associated with
each edge could represent the
distance between the
corresponding cities

w ij or w(i, j), w  R
Aij  
0, There is no edge between vi and v j

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 31
31
Signed Graph

• When weights are binary (0/1, -1/1, +/-) we


have a signed graph

• It is used to represent friends or foes


• It is also used to represent social status

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 32
32
Webgraph

• A webgraph is a way of representing how


internet sites are connected on the web
• In general, a web graph is a directed
multigraph
• Nodes represent sites and edges represent
links between sites.
• Two sites can have multiple links pointing to
each other and can have loops (links pointing
to themselves)

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 33
33
Webgraph

Bow-tie structure

Government Agencies

Broder et al –
200 million pages, 1.5 billion links
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 34
34
Connectivity in Graphs

• Adjacent nodes/Edges,
Walk/Path/Trail/Tour/Cycle

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 35
35
Adjacent nodes and Incident Edges

Two nodes are adjacent if they are connected


via an edge.

Two edges are incident, if they share on end-


point

When the graph is directed, edge directions


must match for edges to be incident

An edge in a graph can be traversed when one


starts at one of its end-nodes, moves along the
edge, and stops at its other end-node.
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 36
36
Walk, Path, Trail, Tour, and Cycle

Walk: A walk is a sequence of incident edges visited


one after another
– Open walk: A walk does not end where it starts
– Closed walk: A walk returns to where it starts

• Representing a walk:
– A sequence of edges: 𝑒1, 𝑒2, … , 𝑒𝑛
– A sequence of nodes: 𝑣1, 𝑣2, … , 𝑣𝑛

• Length of walk:
the number of visited edges

Length of walk= 8

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 37
37
Trail

• A trail is a walk where no edge is visited


more than once and all walk edges are
distinct

• A closed trail (one that ends where it starts) is


called a tour or circuit

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 38
38
Path

• A walk where nodes and edges are distinct is


called a path and a closed path is called a
cycle
• The length of a path or cycle is the number of
edges visited in the path or cycle

Length of path= 4

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 39
39
Examples

Eulerian Tour
• All edges are traversed only once
– Konigsberg bridges

Hamiltonian Cycle
• A cycle that visits all nodes

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 40
40
Random walk

• A walk that in each step the next node is


selected randomly among the neighbors

– The weight of an edge can be used to define the


probability of visiting it

– For all edges that start at 𝑣𝑖 the following equation


holds

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 41
41
Random Walk: Example

Mark a spot on the ground


– Stand on the spot and flip the coin (or more than one
coin depending on the number of choices such as left,
right, forward, and backward)
– If the coin comes up heads, turn to the right and take a
step
– If the coin comes up tails, turn to the left and take a step
– Keep doing this many times and see where you end up

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 42
42
Connectivity

• A node 𝒗𝒊 is connected to node 𝒗𝒋 (or reachable


from 𝑣𝑗) if it is adjacent to it or there exists a path
from 𝑣𝑖 to 𝑣𝑗.

• A graph is connected, if there exists a path


between any pair of nodes in it
– In a directed graph, a graph is strongly connected if
there exists a directed path between any pair of nodes
– In a directed graph, a graph is weakly connected if
there exists a path between any pair of nodes, without
following the edge directions

• A graph is disconnected, if it not connected.


Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 43
43
Connectivity: Example

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 44
44
Component

• A component in an undirected graph is a


connected subgraph, i.e., there is a path between
every pair of nodes inside the component

• In directed graphs, we have a strongly


connected components when there is a path
from 𝑢 to 𝑣 and one from 𝑣 to 𝑢 for every pair of
nodes 𝑢 and 𝑣.

• The component is weakly connected if replacing


directed edges with undirected edges results in a
connected component

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 45
45
Component Examples:

3 components 3 Strongly-connected
components

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 46
46
Shortest Path

• Shortest Path is the path between two nodes


that has the shortest length.
– We denote the length of the shortest path between
nodes 𝑣𝑖 and 𝑣𝑗 as 𝑙𝑖,𝑗

• The concept of the neighborhood of a node


can be generalized using shortest paths. An
n-hop neighborhood of a node is the set of
nodes that are within n hops distance from
the node.

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 47
47
Diameter

The diameter of a graph is the length of the


longest shortest path between any pair of nodes
between any pairs of nodes in the graph

• How big is the diameter of the web?

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 48
48
Adjacency Matrix and Connectivity

• Consider the following adjacency matrix

• Number of Common neighbors between node


𝑖 and node 𝑗
i j

• That’s element of [ij] of matrix 𝐴 × 𝐴𝑇 = 𝐴2


• Common neighbors are paths of length 2
• Similarly, what is 𝐴3 ?

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 49
49
Special Graphs

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 50
50
Trees and Forests

• Trees are special cases of undirected graphs


• A tree is a graph structure that has no cycle in it
• In a tree, there is exactly one path between any
pair of nodes
• In a tree: |𝑉| = |𝐸| + 1

• A set of disconnected
trees is called a forest

A forest containing 3 trees


Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 51
51
Special Subgraphs

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 52
52
Spanning Trees

• For any connected graph, the spanning tree is a


subgraph and a tree that includes all the nodes
of the graph

• There may exist multiple spanning trees for a


graph.

• In a weighted graph, the weight of a spanning


tree is the summation of the edge weights in the
tree.

• Among the many spanning trees found for a


weighted graph, the one with the minimum
weight is called the
minimum spanning tree (MST)

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 53
53
Steiner Trees

Given a weighted graph G(V, E, W) and a subset


of nodes 𝑉’ ⊆ 𝑉 (terminal nodes ), the Steiner
tree problem aims to find a tree such that it
spans all the 𝑉’ nodes and the weight of this tree
is minimized

What can be the terminal


set here?

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 54
54
Complete Graphs

• A complete graph is a graph where for a set of


nodes 𝑉, all possible edges exist in the graph
• In a complete graph, any pair of nodes are
connected via an edge

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 55
55
Planar Graphs

A graph that can be drawn in such a way that no


two edges cross each other (other than the
endpoints) is called planar

Planar Graph Non-planar Graph

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 56
56
Bipartite Graphs

A bipartite graph 𝐺(𝑉, 𝐸) is a graph where the


node set can be partitioned into two sets such
that, for all edges, one end-point is in one set
and the other end-point is in the other set.

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 57
57
Affiliation Networks

An affiliation network is a bipartite graph. If an


individual is associated with an affiliation, an
edge connects the corresponding nodes.

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 58
58
Affiliation Networks: Membership

Affiliation of people on People Companies


corporate boards of
directors

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 59
59
Bipartite Representation / one-mode Projections

• We can save some space by keeping


membership matrix X

– What is 𝑋𝑋 𝑇 ? Similarity between users - [Bibliographic Coupling]

– What is 𝑋 𝑇 𝑋? Similarity between groups - [Co-citation]

Elements on the diagonal are number of groups


the user is a member of
OR
number of users in the group
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 60
60
Social-Affiliation Network

Social-Affiliation network is a combination of a


social network and an affiliation network

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 61
61
Regular Graphs

• A regular graph is one in which all


nodes have the same degree
• Regular graphs can be connected or
disconnected
• In a 𝑘-regular graph, all nodes have
degree 𝑘
• Complete graphs are examples of
regular graphs

Regular graph
With 𝑘 = 3

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 62
62
Egocentric Networks

• Egocentric network: A focal actor (ego) and a


set of alters who have ties with the ego

• Usually there are limitations for nodes to


connect to other nodes or have relation with
other nodes
– Example: In a network of mothers and their
children:
• Each mother only holds mother-children relations with her
own children

• Additional examples of egocentric networks are


Teacher-Student or Husband-Wife
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 63
63
Bridges (cut-edges)

• Bridges are edges whose removal will increase


the number of connected components

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 64
64
Graph Algorithms

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 65
65
Graph/Network
Traversal Algorithms

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 66
66
Graph/Tree Traversal

• We are interested in surveying a social media site


to computing the average age of its users
– Start from one user;
– Employ some traversal technique to reach her friends
and then friends’ friends, …

• The traversal technique guarantees that


1. All users are visited; and
2. No user is visited more than once.

• There are two main techniques:


– Depth-First Search (DFS)
– Breadth-First Search (BFS)

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 67
67
Depth-First Search (DFS)

• Depth-First Search (DFS) starts from a node 𝑣𝑖 ,


selects one of its neighbors 𝑣𝑗 from 𝑁(𝑣𝑖 ) and
performs Depth-First Search on 𝑣𝑗 before
visiting other neighbors in 𝑁(𝑣𝑖 )

• The algorithm can be used both for trees and


graphs
– The algorithm can be implemented using a stack
structure

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 68
68
DFS Algorithm

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 69
69
Depth-First Search (DFS): An Example

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 70
70
Breadth-First Search (BFS)

• BFS starts from a node and visits all its


immediate neighbors first, and then moves to
the second level by traversing their neighbors.

• The algorithm can be used both for trees and


graphs
– The algorithm can be implemented using a queue
structure

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 71
71
BFS Algorithm

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 72
72
Breadth-First Search (BFS)

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 73
73
Finding Shortest Paths

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 74
74
Shortest Path

When a graph is connected, there is a chance that


multiple paths exist between any pair of nodes
– In many scenarios, we want the shortest path between
two nodes in a graph
• How fast can I disseminate information on social media?

Dijkstra’s Algorithm
– Designed for weighted graphs with non-negative edges
– It finds shortest paths that start from a provided node 𝑠
to all other nodes
– It finds both shortest paths and their respective lengths

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 75
75
Dijkstra’s Algorithm: Finding the shortest path

1. Initiation:
– Assign zero to the source node and infinity to all other nodes
– Mark all nodes as unvisited
– Set the source node as current

2. For the current node, consider all of its unvisited Tentative distance =
neighbors and calculate their tentative distances current distance +
– If tentative distance is smaller than neighbor’s distance, then edge weight
Neighbor’s distance = tentative distance

3. After considering all of the neighbors of the current A visited node will
node, mark the current node as visited and remove it
from the unvisited set never be checked
again and its
distance recorded
4. If the destination node has been marked visited or if now is final and
the smallest tentative distance among the nodes in minimal
the unvisited set is infinity, then stop

5. Set the unvisited node marked with the smallest


tentative distance as the next "current node" and go
to step 2

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 76
76
Dijkstra’s Algorithm: Execution Example

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 77
77
Dijkstra’s Algorithm: Notes

• Dijkstra’s algorithm is source-dependent


– Finds the shortest paths between the source node and
all other nodes.

• To generate all-pair shortest paths,


– We can run Dijsktra’s algorithm 𝑛 times, or
– Use other algorithms such as Floyd-Warshall algorithm.

• If we want to compute the shortest path from


source 𝑣 to destination 𝑑,
– we can stop the algorithm once the shortest path to the
destination node has been determined

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 78
78
Finding
Minimum Spanning Tree

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 79
79
Prim’s Algorithm: Finding Minimum Spanning Tree

Finds MST in a weighted graph


1. Selecting a random node and add it to the MST
2. Grows the spanning tree by selecting edges which
have one endpoint in the existing spanning tree and
one endpoint among the nodes that are not selected
yet. Among the possible edges, the one with the
minimum weight is added to the set (along with its
end-point).
3. This process is iterated until the graph is fully
spanned

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 80
80
Prim’s Algorithm Execution Example

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 81
81
Network Flow

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 82
82
Network Flow

• Consider a network of pipes that connects an


infinite water source to a water sink.
– Given the capacity of these pipes, what is the maximum
flow that can be sent from the source to the sink?

• Parallel in Social Media:


– Users have daily cognitive/time limits (the capacity,
here) of sending messages (the flow) to others,
– What is the maximum number of messages the
network should be prepared to handle at any time?

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 83
83
Flow Network

• A Flow network G(V,E,C) is a directed weighted


graph, where we have the following:

– ∀ (𝑢, 𝑣) ∈ 𝐸, 𝑐(𝑢, 𝑣) ≥ 0 defines the edge capacity.


– When 𝑢, 𝑣 ∈ 𝐸, 𝑣, 𝑢 ∉ 𝐸 (opposite flow is impossible)
– 𝑠 defines the source node and 𝑡 defines the sink node.
An infinite supply of flow is connected to the source.

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 84
84
Flow

• Given edges with certain capacities, we can fill


these edges with the flow up to their capacities
(capacity constraint)
• The flow that enters any node other than source
𝑠 and sink 𝑡 is equal to the flow that exits it so
that no flow is lost (flow conservation constraint)

• ∀ (𝑢, 𝑣) ∈ 𝐸, 𝑓(𝑢, 𝑣) ≥ 0 defines the flow passing


through the edge.
• ∀ (𝑢, 𝑣) ∈ 𝐸, 0 ≤ 𝑓(𝑢, 𝑣) ≤ 𝑐(𝑢, 𝑣) (capacity constraint)
• ∀𝑣 ∈ 𝑉 − 𝑠, 𝑡 , 𝑘: 𝑘,𝑣 ∈𝐸 𝑓 𝑘, 𝑣 = 𝑙:(𝑣,𝑙)∈𝐸 𝑓 𝑣, 𝑙
(flow conservation constraint)
Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 85
85
A Sample Flow Network

• Commonly, to visualize an edge with capacity


𝑐 and flow 𝑓 , we use the notation 𝑓/𝑐.

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 86
86
Flow Quantity

• The flow quantity (or value of the flow) in any


network is the amount of
– Outgoing flow from the source minus the incoming
flow to the source.
– Alternatively, one can compute this value by
subtracting the outgoing flow from the sink from
its incoming value

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 87
87
What is the flow value?

• 19
– 11+8 from s, or
– 4+15 to t

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 88
88
Ford-Fulkerson Algorithm

• Find a path from source to sink such that


there is unused capacity for all edges in the
path.
• Use that capacity (the minimum capacity
unused among all edges on the path) to
increase the flow.

• Iterate until no other path is available.

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 89
89
Residual Network

• Given a flow network 𝐺(𝑉, 𝐸, 𝐶), we define


another network 𝐺(𝑉, 𝐸𝑅 , 𝐶𝑅 )

• This network defines how much capacity


remains in the original network.

• The residual network has an edge between


nodes 𝑢 and 𝑣 if and only if either (𝑢, 𝑣) or
(𝑣, 𝑢) exists in the original graph.
– If one of these two exists in the original network,
we would have two edges in the residual network:
one from (𝑢, 𝑣) and one from (𝑣, 𝑢).

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 90
90
Intuition

• When there is no flow going through an edge


in the original network, a flow of as much as
the capacity of the edge remains in the
residual.

• In the residual network, one has the ability to


send flow in the opposite direction to cancel
some amount of flow in the original network.

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 91
91
Residual Network (Example)

• Edges that have zero capacity in the residual


are not shown

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 92
92
Augmentation / Augmenting Paths

1. In the residual graph, when edges are in the


same direction as the original graph,
– Their capacity shows how much more flow can be
pushed along that edge in the original graph.

2. When edges are in the opposite direction,


– their capacities show how much flow can be
pushed back on the original graph edge.

• By finding a flow in the residual, we can


augment the flow in the original graph.

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 93
93
Augmentation / Augmenting Paths

• Any simple path from 𝑠 to 𝑡 in the residual graph


is an augmenting path.

– All capacities in the residual are positive,


• These paths can augment flows in the original, thus increasing
the flow.

– The amount of flow that can be pushed along this path


is equal to the minimum capacity along the path
• The edge with the minimum capacity limits the amount of flow
being pushed
• We call the edge the Weak link

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 94
94
How do we augment?

• Given flow 𝑓 (𝑢, 𝑣) in the original graph and


flow 𝑓𝑅 (𝑢, 𝑣) and 𝑓𝑅 (𝑣, 𝑢) in the residual graph,
we can augment the flow as follows:

Flow Quantity: 1

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 95
95
Augmenting

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 96
96
The Ford-Fulkerson Algorithm

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 97
97
Maximum Bipartite Matching

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 98
98
Example

• Given 𝑛 products and


𝑚 users
– Some users are only
interested in certain
products
– We have only one copy
of each product.
– Can be represented as a
bipartite graph
– Find the maximum
number of products
that can be bought by
users
• No two edges selected Matching Maximum
share a node Matching

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 99
99
Matching Solved with Max-Flow

• Create a flow graph


𝐺(𝑉’, 𝐸’, 𝐶) from our
bipartite graph 𝐺(𝑉, 𝐸)

1. Set 𝑉’ = 𝑉 ∪ 𝑠 ∪ 𝑡
2. Connect all nodes in 𝑉𝐿
to 𝑠 and all nodes in 𝑉𝑅
to 𝑡
3. Set 𝑐(𝑢, 𝑣) = 1, for all
edges in 𝐸’

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 100
10
Bridges, Weak Ties, and
Bridge Detection

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 101
10
Bridge and a Local Bridge

• Bridge: Bridges are edges


whose removal will increase
the number of connected
components
– Bridges are extremely rare in
real-world social networks.

• Local Bridge: when the


endpoints have no friend in
common
– the removal increases the
length of shortest path to Source: Easley and Kleinberg – Networks, Crowds, and Markets
more than 2
– Span of the local bridge:
How much the distance
between the endpoints would
become if the edge is
removed
• Large span is desirable to find
communities

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 102
10
Strength of Ties

• Assume that you can


divide connections
into two categories:

– Strong tie (S):


• friends
– Weak ties (W):
• acquaintances

• Strong Triadic Closure:


– Consider a node 𝒖 that has two strong ties to nodes 𝒗 and 𝒘
– If there is no edge between 𝒗 and 𝒘 (weak or strong tie) then 𝒖
does not exhibit a strong triadic closure

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 103
10
Connection between Bridges and Tie Strength

If a node exhibits Strong Triadic Closure and


has at least two strong ties, then if it part of
a local bridge, that bridge must be a weak tie

Why?

Source: Easley and Kleinberg – Networks, Crowds, and Markets

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 104
10
Generalizing to Real-World Networks

• Consider a cell-phone network Neighborhood


Overlap

– We have an edge if both end points


call each other

– Tie Strength: it does not have to be


weak/strong Tie Strength

• For (𝑢, 𝑣), the number of minutes


spent 𝑢 and 𝑣 spent talking to each
other on the phone
The numerator is
– Local Bridge: can be generalized called embeddedness
using neighborhood overlap: of an edge

When numerator is zero we have a local bridge


Social Media Mining http://socialmediamining.info/ Measures
Graphand
Essentials
Metrics 105
10
Bridge Detection

Social Media Mining http://socialmediamining.info/ Measures


Graphand
Essentials
Metrics 106
10

You might also like