0% found this document useful (0 votes)
20 views54 pages

Course 5-6

The document discusses centrality measures in complex networks, focusing on concepts such as degree, betweenness, closeness, and eigenvector centrality. It also covers the HITS algorithm and PageRank, explaining how these measures assess the importance of nodes in a network. The document emphasizes the varying definitions and applications of centrality in different contexts.

Uploaded by

Alexandru Berci
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views54 pages

Course 5-6

The document discusses centrality measures in complex networks, focusing on concepts such as degree, betweenness, closeness, and eigenvector centrality. It also covers the HITS algorithm and PageRank, explaining how these measures assess the importance of nodes in a network. The document emphasizes the varying definitions and applications of centrality in different contexts.

Uploaded by

Alexandru Berci
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

COMPLEX

NETWORKS 5-6
Centrality measures
Outline
2

 Part 1
 Basic Centrality Concepts
 Degree Centrality
 Betweenness Centrality
 Closeness Centrality
 Eigenvector Centrality
 Centralization
 Part 2
 Hub and Authorities (HITS Algorithm)
 PageRank
PART 1

Basic centrality
Centrality
4

 Relative importance of a node in the graph


 Which nodes are in the “center” of a graph?
 What do you mean by “center”?
 Definition of “center” varies by context/purpose
 “There is certainly no unanimity on exactly
what centrality is or on its conceptual
foundations, and there is little agreement on
the proper procedure for its measurement.”
 by Freeman, 1979
Centrality
5

 Real valued function on the nodes of a graph


 Structural index
 Applications:
 How influential a person is in a social network?
(http://moviegalaxies.com)
 How well used a road is in a transportation
network?
 How important a web page is?
 How important a room is in a buildling?
6
Centrality Measures
 Different measures of centrality:
 Degree centrality
 Betweenness centrality
 Closeness centrality
 Eigenvector centrality
Example (Borgatti, 2005)
7
8

Degree Centrality
Degree Centrality
9

 Most intuitive notion of centrality


 Node with the highest degree is most
important
 Index of exposure to what is flowing through
the network
 Gossip network: central actor more likely to hear
a gossip
 Normalized degree centrality
 Divide by max. possible degree (n-1)
Degree Centrality
1
0

Example:
Degree Centrality
1
1

 When to use?
 Whom to ask for favor?
 People you can talk to
Degree Centrality
1
2
 Can be deceiving
 Why?
 Local measure
1
3

Betweenness Centrality
Betweenness Centrality
1
4
 BC of a node 𝑢 is the ratio of the shortest paths
between all other nodes, that pass through node 𝑢
 Quantifies the control of a node on the
communication between other nodes
 First introduced by Freeman
 δ 𝑠t ( 𝑢 )
δ𝑠
𝐶𝐵 ( 𝑢) = ∑ 𝑠 ≠ 𝑣 ≠ 𝑡
t
 𝑠 = source
 𝑡 = destination
 δ𝑠𝑡 = number of shortest paths between (𝑠, 𝑡)
 δ𝑠𝑡(𝑢) = number of shortest paths between (𝑠, 𝑡) that pass
through 𝑢
Betweenness Centrality
1
5
 Example:

A B C D E

 𝐴 lies between no two other vertices


 𝐵 lies between 𝐴 and 3 other vertices: 𝐶, 𝐷, and
𝐸
 𝐶 lies between 4 pairs of vertices
 (𝐴, 𝐷), (𝐴, 𝐸), (𝐵, 𝐷), (𝐵, 𝐸)
Betweenness Centrality
1
6
More Example:
 why do C and D each
have betweenness 1?
 They are both on shortest C

paths for pairs (A,E), and


(B,E), and so must share
credit: A B E

½+½ = 1
 Can you figure out why B
D
has betweenness 3.5 while
E has betweenness 0.5?
1
Betweenness Centrality
7

 Famous algorithm by Brandes


 𝑂(n𝑚) for unweighted graph
 𝑂(𝑚2 log 𝑚 + n𝑚 ) for weighted graph
 Edge betweenness centrality
 Pass through that edge
 Normalize

2
Divide by 𝑛−1 for undirected graph

Number of pairs of nodes excluding itself


 Divide by (n−1)(n −2) for directed graph
Betweenness Centrality
1
8

Normalized example:
Betweenness Centrality
1
9

Normalized example:
 Red circled node
has low centrality
value. Why?
 Green circled node
has high value.
Why?
2
0

Closeness Centrality
Closeness Centrality
2
1

 A node is considered important if it is


relatively close to all other nodes.
 Farness of a node is the sum of its distances to
all other nodes.
 Closeness if the inverse of the farness.
1
 𝐶𝐶 = ∑
𝑢 𝑑(𝑢,𝑣)
𝑣≠𝑢
 Normalized:
 Divide by (𝑚 − 1)
Closeness Centrality
2
2

 Closeness is a measure of how long it will take


to spread information from node 𝑢 to all other
nodes
 Normalized Example:

A B C D E

N 1

 d(A, j)  1
  1
'
Cc (A)  j1
 1 2  3  4 
 10 
N 1   4
  0.4
 4 
  
Closeness Centrality
2
3

More example:
Comparison
2
 Comparing across 3 centrality values
4
 Generally, the 3 types will be positively correlated
 When they are not, it tells you something
interesting!
Low Low Closeness Low
Degree Betweenness
Embedded in Ego's
cluster that is far connections are
High Degree
from the rest of the redundant -
network communication
bypasses
him/her
Key player Probably
High tied to multiple paths
Closeness important/acti in the network,
ve alters ego is near
many people,
but so are
many others
Ego's few ties Very rare cell.
Eigenvector Centrality
2
5  Measure of the influence of a node in a network
 Connections to high-scoring nodes contribute more
 “An important node is connected to important
neighbor”
 Google’s PageRank is a variant of eigenvector
centrality
 Eigenvector centrality of 𝑣,

 Power iteration is one of the eigenvalue


algorithm
Centralization of Network
2
6
 Measure of how central its most central node is in
relation to how central all the other nodes are
 How much variation in the centrality scores?
 Every centrality measure can have its own
centralization measure
 Freeman’s formula for centralization of
degree:
∑𝑖 =1[𝐶𝐷 𝑚∗
maximum value in the

𝐶𝐷
network

𝑛 (𝑁
−−𝐶𝐷1)(𝑁
(𝑖)] −
= 2)
theoretically largest such sum of differences in any network of
the same degree
Centralization of Network
2
7

Degree Centralization Example:

CD = 0.167 CD = 1.0

CD = 0.167
Centralization of Network
2
8

Degree Centralization Example: financial trading networks

high centralization: one node low centralization: trades


trading with many others are more evenly distributed
PART 2

Hub-Authority and PageRank: Conceptual


3
Searching the Web
0

 How does Google know


the “best” answers?
 How hard is the
problem?
 Synonymy
 Polysemy
 dynamicity
Understanding the network
structure of web pages is
crucial
Link Analysis
3
1

 In this hyperlinked network of webpages,


which pages are most popular/important?
 More in-links?
 More out-links?
 Combinations?
Voting by in-links
3
2

2
 How to rank pages
 From in-links?
 Intuition:
3 1
 Implicit endorsement
 Single vs aggregate endorsement
 Page referred by most preferred
How about out-links
3
3

Any implication of out-links?

3 1

4
An example (Kleinberg)
3
4
In-links to pages for the query newspaper

Pages getting higher in-links


from other relevant pages are
important
An example (Kleinberg) contd.
3
5
Good lists: some pages compile lists of relevant resources

Pages listing higher number of


relevant resources should
score higher as lists
An example (Kleinberg) contd.
3
6
Updated score: some of scores of all lists that point to it

Where does it head to?


- Principle of repeated improvement
Hub-Authority (HITS Algorithm)
3
7
• Authority: highly endorsed answers to queries
• Hub: high value lists for the query

 Quality of hubs to refine estimate of the quality of the


authorities
 Authority update rule
 Hub update rule

 Recursive dependency:

𝑎(𝑣) 
Σ 𝑤𝜖𝜖𝜖𝜖𝜖𝑛𝑡[𝑣]

ℎ(𝑣 ℎ(𝑤)
)

Σ 𝑤𝜖𝑐𝑐𝑖𝑐𝑑𝜖𝜖𝑛[𝑣] 𝑎(𝑤

)
Hub-Authority (HITS Algorithm)
3
8
 Authority: highly endorsed answers to queries
 Hub: high value lists for the query

a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7)


5
2

6 1
3 1

7
4
Hub-Authority (HITS Algorithm)
3
9
 starts with all hub and authority scores equal to 1
 chooses a number of steps K
 performs a sequence of K Authority and Hub updates in this order.

a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7)


5
2

6 1
3 1

7
4
Hub-Authority (HITS Algorithm)
 starts with all hub and authority scores equal to 1
 chooses a number of steps K
 performs a sequence of K Authority and Hub updates in this order.

 Problems
 Score grows to very large
numbers
 Actually converges?
Hub-Authority (HITS Algorithm)
4
1
 Problems
 Score grows to very large
numbers
 normalization
 Actually converges?
 Equilibrium
 Effect of initial values
PageRank
4
2

PageRank works by counting the number


and quality of links to a page to determine a
rough estimate of how important the
website is. The underlying assumption is
that more important websites are likely to
receive more links from other websites.
—Facts about Google and Competition

 Keys:
 Mode of endorsement form the basis
of PageRank
 Starts with simple voting on in-links
 Pass endorsement across out-links
 Repeated improvement
PageRank (contd.)
4
3

Think as kind of “fluid” that circulates through networks

Computation procedure:
1
 Each node with initial PageRank 𝑛
 A number of steps K
 K updates of PageRank values
 Each node/page divides it current PageRank
value equally across its out-links
 Each page updates its new PageRank value to be
the sum of what it receives
PageRank (contd.)
4
4

What is the PageRank of node A at step 1?


PageRank (contd.)
4
5

 Computation procedure:
1
 Each node with initial pagerank 8
 Step 1: PR(A) = ½*PR(D) + ½*PR(E) + PR(H) +
PR(F) + PR(G)=1/16+1/16+1/8+1/8+1/8=1/2
PageRank (contd.)
4
6

 Convergence/equlibrium?
 Is there any?
 How to check?
PageRank (contd.)
4
7

Do you see any problem with the definition?


PageRank (contd.)
4
8

It’s leaking!
PageRank (contd.)
4
9

What would happen here? (Broder et al. 2001)


PageRank (contd.)
5
0
 Solution: scaled PageRank Update rule
 Scaling factor s
 Scale down all PageRank
values by a factor of s
 Divide residual 1-s equally
over all nodes, (1-s)/n to
each.
PageRank (contd.)
5
1

 Limit of scaled PageRank


 Still converges?
 Depends on scaling
factor?
 Sensitivity to
addition/deletion of
pages?
PageRank
5
2
(contd.)
 Limit of scaled PageRank
 Still converges? YES
 Depends on scaling factor?
YES
 Sensitivity to addition/deletion
of pages? (Ng et al. 2001)
PageRank: alternate definition
5
3

 Random walk
 Choose a page at random
 Pick each edge with equal
probability
 Follow links for a sequence of k
steps
 Pick a random out-links
 Follow it to where it leads
5
PageRank: alternate definition
4

Scaled version of Random walk?

You might also like