Basics of Network Analysis
Basics of Network Analysis
Hiroki Sayama
[email protected]
Graph = Network
• G(V, E): graph (network)
V: vertices (nodes), E: edges (links)
1 Nodes = 1, 2, 3, 4, 5
Links =
1<->2, 1<->3, 1<->5,
3 2<->3, 2<->4, 2<->5,
2
3<->4, 3<->5, 4<->5
• Adjacency matrix:
A matrix with rows and columns labeled by
nodes, where element aij shows the number
of links going from node i to node j
(becomes symmetric for undirected graph)
• Adjacency list:
A list of links whose element “i->j” shows a
link going from node i to node j
(also represented as “i -> {j1, j2, j3, …}”)
3
Exercise
• Represent the following network in:
1 – Adjacency matrix
3 – Adjacency list
2
5 4
Degree of a node
• A degree of node u, deg(u), is the
number of links connected to u
u1 u2
deg(u1) = 4 deg(u2) = 2
5
Connected graph
• A graph in which there is a path
between any pair of nodes
6
Connected components
Connected Number of
component connected
components
= 2
Connected
component 7
Complete graph
• A graph in which any pair of nodes
are connected (often written as K1, K2,
…)
8
Regular graph
• A graph in which all nodes have the
same degree (often called k-regular
graph with degree k)
9
Bipartite graph
• A graph whose nodes can be divided
into two subsets so that no link
connects nodes within the same subset
=
10
Directed graph
• Each link is
directed
• Direction repre-
sents either order
of relationship or
accessibility
between nodes
E.g. genealogy
11
Weighted directed graph
• Most general
version of graphs
• Both weight and
direction is
assigned to each
link
E.g. traffic
network
12
Measuring Topological Properties of
Networks (1):
Macroscopic Properties
Network density
• The ratio of # of actual links and #
of possible links
14
Characteristic path length
• In graph theory: Maximum of
shortest path lengths between pairs
of nodes (a.k.a. network diameter)
• In complex network science: Average
shortest path lengths
• Characterizes how large the world
being modeled is
– A small length implies that the network is
well connected globally 15
Clustering coefficient
• For each node:
– Let n be the number of its neighbor nodes
– Let m be the number of links among the k
neighbors
– Calculate c = m / (n choose 2)
Then C = <c> (the average of c)
• C indicates the average probability for
two of one’s friends to be friends too
– A large C implies that the network is well
connected locally to form a cluster 16
Degree distribution
log k
Random Scale-free
19
Degree Distributions of Real-World
Complex Networks
20
Degree distribution of FB
P(k) CCDF
• http://www.facebook.com/note.php?note_id=1
0150388519243859
• http://arxiv.org/abs/1111.4503 21
Measuring Topological Properties of
Networks (2):
Centralities
Centrality measures (“B,C,D,E”)
• Degree centrality
– How many connections the node has
• Betweenness centrality
– How many shortest paths go through the
node
• Closeness centrality
– How close the node is to other nodes
• Eigenvector centrality
23
Degree centrality
• Simply, # of links attached to a node
CD(v) = deg(v)
or sometimes defined as
CD(v) = deg(v) / (N-1)
24
Betweenness centrality
• Prob. for a node to be on shortest
paths between two other nodes
#sp(s,e,v)
CB(v) = Σs≠v,t≠v
#sp(s,e)
• s: start node, e: end node
• #sp(s,e,v): # of shortest paths from s to e
that go though node v
• #sp(s,e): total # of shortest paths from s to e
• Easily generalizable to “group betweenness” 25
Closeness centrality
• Inverse of an average distance from a
node to all the other nodes
n-1
CC(v) =
Σw≠v d(v,w)
• d(v,w): length of the shortest path from v to w
• Its inverse is called “farness”
• Sometimes “Σ” is moved out of the fraction (it works for
networks that are not strongly connected)
• NetworkX calculates closeness within each connected 26
component
Eigenvector centrality
• Eigenvector of the largest eigenvalue
of the adjacency matrix of a network
27
Exercise
• Who is most central by degree,
betweenness, closeness, eigenvector?
28
Which centrality to use?
• To find the most popular person
• To find the most efficient person to
collect information from the entire
organization
• To find the most powerful person to
control information flow within an
organization
• To find the most important person (?)
29
Measuring Topological Properties of
Networks (3):
Mesoscopic Properties
Degree correlation (assortativity)
Cov(X, Y)
r =
σX σY
• X: degree of start node (in / out)
• Y: degree of end node (in / out)
31
Assortative/disassortative networks
Social
networks are
assortative
Engineered /
biological
networks are
disassortative
34
Coreness (core number)
• A node’s coreness (core number) is c
if it belongs to a c-core but not
(c+1)-core
35
Community
• A subgraph of a network within which
nodes are connected to each other
more densely than to the outside
– Still defined vaguely…
– Various detection
algorithms proposed
• K-clique percolation
• Hierarchical clustering
• Girvan-Newman algorithm
• Modularity maximization
(e.g., Louvain method) (diagram from Wikipedia) 36
Modularity
• A quantity that characterizes how
good a given community structure is in
dividing the network
|Ein|-|Ein-R|
Q =
|E|
• |Ein|: # of links connecting nodes that belong
to the same community
• |Ein-R|: Estimated |Ein| if links were random 37
Community detection based on
modularity