03_NetworkStructure
03_NetworkStructure
• A range of network datasets , large, complex networks often have what is called a giant
component, a deliberately informal term for a connected component that contains a
significant fraction of all the nodes. Moreover, when a network contains a giant
component, it almost always contains only one.
• The example of the global friendship network and try imagining that there were two giant
components, each with hundreds of millions of people. All it would take is a single edge from
someone in the rest of these components to someone in the second, and the two giant
components would merge into a single component.
• The global social network likely contained two giant components | one in
the Americas, and one in the Europe-Asia land mass. Because of this,
technology evolved independently in the two components, and perhaps
even worse, human diseases evolved independently;
• and so when the two components finally came in contact, the technology
and diseases of one quickly and disastrously overwhelmed the other.
• The notion of giant components is useful for reasoning about networks
on much smaller scale as well.
Giant Component
• A giant component: “connected
component that contains a
significant fraction of all the nodes”
• Ex: Real-world social network
• Ex: Random graphs
Giant component(1)
• How many components exist in G(n,p) model
– p=0 Every node is isolated
– Component size = 1 (independent of n)
– p=1 All nodes connected with each other
– Component size = n (proportional to n)
http://networkx.lanl.gov/archive/networkx-1.1/examples/drawing/giant_component.html
Giant component (2)
• A network component whose size grows in proportion to n is
called giant component
9
Giant component
• Hence, the total probability of i not being connected to
giant component via vertex j is: 1-p+pu
– Considering all n-1 vertices through which i can connect:
12
Giant component
Average degree
13
Emergence of giant component
Emergence of giant connected
– Mean degree component in G(n,p) as p
increases
Reference point-2: Regular
graphs
• Ring lattice with k connections to nearest
neighbors
Small world property
• The connection topology is assumed to be either completely regular
or completely random.
• But many biological, technological and social networks lie somewhere
between these two extremes.
• Regular networks ‘Rewired’ to introduce increasing amount of
disorder.
• These systems can be highly ‘clustered’ like regular lattice, yet have
small characteristics path length like random graphs ---- call them
“small world” networks.
• Ex: Power grids, models of dynamic systems with small world
couplings display enhanced signal propagation speed, computational
power and synchronizability.
• In particular, infection diseases spread more easily in small world
networks than regular lattice.
The principle of Triadic closure
“If two people in a social network have a friend
in common, then there is an increased
likelihood that they will become friends
themselves ...”
- Rapoport 1953
G
• Why? B
– Opportunity
C
– Trust F A
– Incentive
How prevalent is triadic E D
closure in a graph?
Global clustering based on triplets of nodes
A triplet consists of three nodes that are connected by either two (open triplet) or
three (closed triplet) undirected ties.
A triangle consists of three closed triplets, one centred on each of the nodes.
Figure 3.1: The formation of the edge between B and C illustrates the effects of
triadic closure, since they have a common neighbor A.
Cluster coefficient
• In graph theory, a clustering coefficient is a measure of the
degree to which nodes in a graph tend to cluster together.
Evidence suggests that in most real-world networks, and in
particular social networks, nodes tend to create tightly knit
groups characterized by a relatively high density of ties; this
likelihood tends to be greater than the average probability of
a tie randomly established between two nodes.
• Two versions of this measure exist: the global and the local.
The global version was designed to give an overall indication
of the clustering in the network, whereas the local gives an
indication of the embeddedness of single nodes.
The Clustering Coefficient
• The basic role of triadic closure in social networks has motivated the
formulation of simple social network measures to capture its prevalence is the
clustering coefficient .
• The clustering coefficient of a node A is defined as the probability that two
randomly selected friends of A are friends with each other.
• In other words, it is the fraction of pairs of A’s friends that are connected to each
other by edges.
• For example, the clustering coefficient of node A in Figure 3.2(a) is 1/6
(because there is only the single C-D edge among the six pairs of friends B-C, B-
D, B-E, C-D, C-E, and D-E), and it has increased to 1/2 in the second snapshot of
the network in Figure 3.2(b) (because there are now the three edges B-C, C-D,
and D-E among the same six pairs). In general, the clustering coe fficient of a
node ranges from 0 (when none of the node’s friends are friends with each
other) to 1 (when all of the node’s friends are friends with each other), and the
more strongly triadic closure is operating in the neighborhood of the node, the
higher the clustering coefficient will tend to be.
Figure 3.2: If we watch a network for a longer span of time, we can see multiple edges
forming — some form through triadic closure while others (such as the D-G edge)
form even though the two endpoints have no neighbors in common.
Clustering Coefficient
Dfn: The (overall) clustering coefficient
of a node v is the probability that u1, u2
(two randomly selected neighbors of v)
are themselves neighbors.
neighbors of v that are linked
Any 2 neighbors of v
Cliques
• Dfn: A clique is a maximal,
completely connected subgraph of a
given graph.
B B B
C A C A C A
D D D
INDIVIDUAL EXERCISE:
C
F A
E D
Bridges
Dfn:
An edge e
is a bridge
in the graph G
if G―e
has more components than G
What about giant components?
Local Bridges
Dfn:
An edge e = (v,u)
is a local bridge
in the graph G
if pathG-e(v, u) ― pathG(v, u) > 2
span of local bridge
GROUP
DISCUSSION:
Not all edges (“ties”) are created equal.
To find a job, which would be more useful:
A good friend? “strong” tie
Or an acquaintance?
“weak” tie
Strong Triadic Closure Property
• Node v violates the Strong Triadic
Closure Property if:
– it has strong ties to 2 other nodes u1 and u2
– and there is no edge at all (strong or weak)
between u1 and u2
| N (v) N (u ) u v |
neighborhoodOverlap(v,u )
| N (v) N (u ) u v |
all neighbors of
either node
INDIVIDUAL EXERCISE:
C
F A
E D
Centrality: degree, betweenness, closeness, information;
NODE-LEVEL METRICS
Centrality
• How does a node relate to the
overall network?
• In real-life…
– Information flow
– Bargaining power
– Influence
Review: Node Degree
• Dfn: The node degree is the
number of neighbors a node has.
di(G) = |N(i)| deg=3 A
E B
• w.r.t. adjacency matrix:
adjacency A B C D E D C
matrix value
A 0 1 0 0 1
d i (G ) ai , j B
C
1
0
0
1
1
0
0
0
0
1
j V D 0 0 0 0 1
pick
row i E 1 0 1 1 0
Review:
Local Bridges
Dfn:
An edge e = (i,j)
is a local bridge
in the graph G
if pathG-e(i, j) ― pathG(i, j) > 2
span of local bridge
Pi (k , j ) / P (k , j )
C (G )
i
B
j k ;i {k , j } (| V | 1)(| V | 2) / 2
Closeness Centrality
• How easily can a node reach others?
• Simple: Inverse average distance
C |V | 1 # nodes (not i)
C (G )
distance(i, j )
i
avg. distance
i j from node i
• Decay centrality:
If δ -> 0? Degree centrality
i j
distance( i , j )
If δ -> 1? Component size