Unit6-1Social Network Analysis
Unit6-1Social Network Analysis
Nodes: individuals
Links: social relationship
(family/work/friendship/etc.)
S. Milgram (1967)
Six Degrees of Separation
John Guare
Communication
networks: Many
non-identical
components with
diverse
connections
between them.
Network diameter:
maximum (worst-case) or average?
Clustering:
to what extent that links tend to cluster “locally”?
Degree distribution:
what is the typical degree in the network?
Small diameter:
often a constant independent of network size (like 6)
or even shrink?
typically exclude infinite distances
Watts-Strogatz models:
give few components, small diameter and high clustering
Scale-free Networks:
gives few components, small diameter and heavy-tailed distribution
Hierarchical networks:
few components, small diameter, high clustering, heavy-tailed
Affiliation networks:
models group-actor formation
Connect with
probability p Pál Erdös
(1913-1996)
p=1/6
N=10
k~1.5 Poisson distribution
- Democratic
- Random
k=4
choose(k,2) = 6
c(u) = 4/6 = 0.666…
Probability to be connected C »p
# of links between 1,2,…n neighbors
C=
n(n-1)/2
Network C Crand L N
WWW 0.1078 0.00023 3.1 153127
3015-
Internet 0.18-0.3 0.001 3.7-3.76
Networks are clustered 6209
Actor 0.79 0.00027 3.65 225226
[large C(p)]
but have a small Coauthorship 0.43 0.00018 5.9 52909
NO! 876
876
…
KevinBacon
Kevin Bacon 2.786981
2.786981 46
46 1811
1811
#876
Kevin Bacon
Donald
#2 Pleasence
#3 Martin Sheen
May 5, 2023 Data Mining: Concepts and Techniques 19
Models of Social Network Generation
k ~ 6
P(k=500) ~ 10-99 Pout(k) ~ k-out Pin(k) ~ k- in
NWWW ~ 109 NWWW ~ 109
P(k=500) ~ 10-6
N(k=500) ~ 103
N(k=500)~10-90 J. Kleinberg, et. al, Proceedings of the ICCC (1999)
May 5, 2023 Data Mining: Concepts and Techniques 22
World Wide Web
3
l15=2 [125]
6
1
4 l17=4 [1346 7]
7
2 5
… < l > = ??
Finite size scaling: create a network with N nodes with Pin(k) and Pout(k)
[S. Lawrence et al
IBM Nature (99)]
A. Broder et al WWW9 (00)
transportation networks
Scale-free networks
are more robust in the
face of such failures.
Scale-free networks
are highly vulnerable
to a coordinated attack
against their hubs.
Object-Related Tasks
Link-based object ranking
Link-Related Tasks
Link prediction
Graph-Related Tasks
Subgraph discovery
Graph classification
Homogeneous networks
Single object type and single link type
Heterogeneous networks
Multiple object and link types
treatments
Bibliographic network: publications, authors, venues
link type
This is a primary focus of link analysis community
Web information analysis
PageRank and Hits are typical LBR approaches
Intuitions
Links are like citations in literature
useful in general
PageRank is essentially “citation counting”, but improves
over simple counting
Consider “indirect citations” (being cited by a highly
d1 0 0 1/ 2 1/ 2
1 0 0 0
M “Transition matrix”
0 1 0 0 Same as
d3 1/ 2 1/ 2 0 0 /N (why?)
d2
1
pt 1 (di ) (1 )
d j IN ( di )
m ji pt (d j )
k N
pt (d k )
d4 1
p( di ) [ (1 )mki ] p(d k ) Stationary (“stable”)
k N distribution, so we
p ( I (1 ) M )T p I = 1/N ignore time
ij
Intuitions
Pages that are widely cited are good
authorities
Pages that cite many other pages are good
hubs
The key idea of HITS
Good authorities are cited by good hubs
Good hubs point to good authorities
Iterative reinforcement …
May 5, 2023 Data Mining: Concepts and Techniques 38
The HITS Algorithm (Kleinberg 98)
0 0 1 1
1 0 0 0 “Adjacency matrix”
d1
A
0 1 0 0
d3
d2 1 1 0 0 Initial values: a=h=1
h( d i ) a(d j )
d4 d j OUT ( di )
Iterat
a (di ) h( d j ) e
d j IN ( di ) Normalize:
T
h Aa ; a A h
a ( d i ) h( d i ) 1
2 2
T
h AA h ; a AT Aa i i
Methods
Hierarchical clustering
Blockmodeling of SNA
Stochastic blockmodeling
Multi-relational clustering
the same
Biology: learning when two names refer to the same
protein
May 5, 2023 Data Mining: Concepts and Techniques 44
Entity Resolution Methods
Earlier viewed as pair-wise resolution problem: resolved
based on the similarity of their attributes
Importance at considering links
Coauthor links in bib data, hierarchical links between
recognition decisions
May 5, 2023 Data Mining: Concepts and Techniques 45
Link Prediction
Predict whether a link exists between two entities, based
on attributes and other observed links
Applications
Web: predict if there will be a link between two pages
Methods
Often viewed as a binary classification problem
number of citations
Epidemics: predicting the number of people that will
a site
Citation: predicting the number of citations of a
Applications
Biology: protein structure discovery
Methods
Subgraph pattern mining
Graph classification
Classification based on subgraph pattern analysis