0% found this document useful (0 votes)
18 views28 pages

5 Thmodule

Chapter 10 discusses the analysis of social networks using graph theory, focusing on community detection and properties of graphs. It outlines various types of social networks, such as telephone and email networks, and introduces concepts like locality and clustering. The chapter also explores algorithms for measuring similarities among nodes and the connectedness of communities within these networks.

Uploaded by

Yogesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views28 pages

5 Thmodule

Chapter 10 discusses the analysis of social networks using graph theory, focusing on community detection and properties of graphs. It outlines various types of social networks, such as telephone and email networks, and introduces concepts like locality and clustering. The chapter also explores algorithms for measuring similarities among nodes and the connectedness of communities within these networks.

Uploaded by

Yogesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

354

Chapter 10

Mining Social-Network
Graphs

There is much information to be gained by analyzing the large-scale data that


is derived from social networks. The best-known example of a social network
is the “friends” relation found on sites like Facebook. However, as we shall see
there are many other sources of data that connect people or other entities.

In this chapter, we shall study techniques for analyzing such networks. An


important question about a social network is how to identify “communities,”
that is, subsets of the nodes (people or other entities that form the network)
with unusually strong connections. Some of the techniques used to identify
communities are similar to the clustering algorithms we discussed in Chapter 7.
However, communities almost never partition the set of nodes in a network.
Rather, communities usually overlap. For example, you may belong to several
communities of friends or classmates. The people from one community tend to
know each other, but people from two different communities rarely know each
other. You would not want to be assigned to only one of the communities, nor
would it make sense to cluster all the people from all your communities into
one cluster.

Also in this chapter we explore efficient algorithms for discovering other


properties of graphs. We look at “simrank,” a way to discover similarities
among nodes of a graph. One of the interesting applications of simrank is
that it gives us a way to identify communities as sets of “similar” nodes. We
then explore triangle counting as a way to measure the connectedness of a
community. In addition, we give efficient algorithms for exact and approximate
measurement of the neighborhood sizes of nodes in a graph, and we look at
efficient algorithms for computing the transitive closure.

355
356 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

10.1 Social Networks as Graphs


We begin our discussion of social networks by introducing a graph model. Not
every graph is a suitable representation of what we intuitively regard as a social
network. We therefore discuss the idea of “locality,” the property of social
networks that says nodes and edges of the graph tend to cluster in communities.
This section also looks at some of the kinds of social networks that occur in
practice.

10.1.1 What is a Social Network?


When we think of a social network, we think of Facebook, Twitter, Google+,
or another website that is called a “social network,” and indeed this kind of
network is representative of the broader class of networks called “social.” The
essential characteristics of a social network are:

1. There is a collection of entities that participate in the network. Typically,


these entities are people, but they could be something else entirely. We
shall discuss some other examples in Section 10.1.3.

2. There is at least one relationship between entities of the network. On


Facebook or its ilk, this relationship is called friends. Sometimes the
relationship is all-or-nothing; two people are either friends or they are
not. However, in other examples of social networks, the relationship has a
degree. This degree could be discrete; e.g., friends, family, acquaintances,
or none as in Google+. It could be a real number; an example would
be the fraction of the average day that two people spend talking to each
other.

3. There is an assumption of nonrandomness or locality. This condition is


the hardest to formalize, but the intuition is that relationships tend to
cluster. That is, if entity A is related to both B and C, then there is a
higher probability than average that B and C are related.

10.1.2 Social Networks as Graphs


Social networks are naturally modeled as graphs, which we sometimes refer to
as a social graph. The entities are the nodes, and an edge connects two nodes
if the nodes are related by the relationship that characterizes the network. If
there is a degree associated with the relationship, this degree is represented by
labeling the edges. Often, social graphs are undirected, as for the Facebook
friends graph. But they can be directed graphs, as for example the graphs of
followers on Twitter or Google+.

Example 10.1 : Figure 10.1 is an example of a tiny social network. The


entities are the nodes A through G. The relationship, which we might think of
10.1. SOCIAL NETWORKS AS GRAPHS 357

A B D E

G F

Figure 10.1: Example of a small social network

as “friends,” is represented by the edges. For instance, B is friends with A, C,


and D.
Is this graph really typical of a social network, in the sense that it exhibits
locality of relationships? First, note that the graph has nine edges out of the
7

2 = 21 pairs of nodes that could have had an edge between them. Suppose
X, Y , and Z are nodes of Fig. 10.1, with edges between X and Y and also
between X and Z. What would we expect the probability of an edge between
Y and Z to be? If the graph were large, that probability would be very close
to the fraction of the pairs of nodes that have edges between them, i.e., 9/21
= .429 in this case. However, because the graph is small, there is a noticeable
difference between the true probability and the ratio of the number of edges to
the number of pairs of nodes. Since we already know there are edges (X, Y )
and (X, Z), there are only seven edges remaining. Those seven edges could run
between any of the 19 remaining pairs of nodes. Thus, the probability of an
edge (Y, Z) is 7/19 = .368.
Now, we must compute the probability that the edge (Y, Z) exists in Fig.
10.1, given that edges (X, Y ) and (X, Z) exist. What we shall actually count
is pairs of nodes that could be Y and Z, without worrying about which node
is Y and which is Z. If X is A, then Y and Z must be B and C, in some
order. Since the edge (B, C) exists, A contributes one positive example (where
the edge does exist) and no negative examples (where the edge is absent). The
cases where X is C, E, or G are essentially the same. In each case, X has only
two neighbors, and the edge between the neighbors exists. Thus, we have seen
four positive examples and zero negative examples so far.
Now, consider X = F . F has three neighbors, D, E, and G. There are edges
between two of the three pairs of neighbors, but no edge between G and E.
Thus, we see two more positive examples and we see our first negative example.
If X = B, there are again three neighbors, but only one pair of neighbors,
A and C, has an edge. Thus, we have two more negative examples, and one
positive example, for a total of seven positive and three negative. Finally, when
X = D, there are four neighbors. Of the six pairs of neighbors, only two have
edges between them.
Thus, the total number of positive examples is nine and the total number
of negative examples is seven. We see that in Fig. 10.1, the fraction of times
358 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

the third edge exists is thus 9/16 = .563. This fraction is considerably greater
than the .368 expected value for that fraction. We conclude that Fig. 10.1 does
indeed exhibit the locality expected in a social network. ✷

10.1.3 Varieties of Social Networks


There are many examples of social networks other than “friends” networks.
Here, let us enumerate some of the other examples of networks that also exhibit
locality of relationships.

Telephone Networks
Here the nodes represent phone numbers, which are really individuals. There
is an edge between two nodes if a call has been placed between those phones
in some fixed period of time, such as last month, or “ever.” The edges could
be weighted by the number of calls made between these phones during the
period. Communities in a telephone network will form from groups of people
that communicate frequently: groups of friends, members of a club, or people
working at the same company, for example.

Email Networks
The nodes represent email addresses, which are again individuals. An edge
represents the fact that there was at least one email in at least one direction
between the two addresses. Alternatively, we may only place an edge if there
were emails in both directions. In that way, we avoid viewing spammers as
“friends” with all their victims. Another approach is to label edges as weak or
strong. Strong edges represent communication in both directions, while weak
edges indicate that the communication was in one direction only. The com-
munities seen in email networks come from the same sorts of groupings we
mentioned in connection with telephone networks. A similar sort of network
involves people who text other people through their cell phones.

Collaboration Networks
Nodes represent individuals who have published research papers. There is an
edge between two individuals who published one or more papers jointly. Option-
ally, we can label edges by the number of joint publications. The communities
in this network are authors working on a particular topic.
An alternative view of the same data is as a graph in which the nodes are
papers. Two papers are connected by an edge if they have at least one author
in common. Now, we form communities that are collections of papers on the
same topic.
There are several other kinds of data that form two networks in a similar
way. For example, we can look at the people who edit Wikipedia articles and
the articles that they edit. Two editors are connected if they have edited an
10.1. SOCIAL NETWORKS AS GRAPHS 359

article in common. The communities are groups of editors that are interested
in the same subject. Dually, we can build a network of articles, and connect
articles if they have been edited by the same person. Here, we get communities
of articles on similar or related subjects.
In fact, the data involved in Collaborative filtering, as was discussed in
Chapter 9, often can be viewed as forming a pair of networks, one for the
customers and one for the products. Customers who buy the same sorts of
products, e.g., science-fiction books, will form communities, and dually, prod-
ucts that are bought by the same customers will form communities, e.g., all
science-fiction books.

Other Examples of Social Graphs


Many other phenomena give rise to graphs that look something like social
graphs, especially exhibiting locality. Examples include: information networks
(documents, web graphs, patents), infrastructure networks (roads, planes, water
pipes, powergrids), biological networks (genes, proteins, food-webs of animals
eating each other), as well as other types, like product co-purchasing networks
(e.g., Groupon).

10.1.4 Graphs With Several Node Types


There are other social phenomena that involve entities of different types. We
just discussed under the heading of “collaboration networks,” several kinds of
graphs that are really formed from two types of nodes. Authorship networks
can be seen to have author nodes and paper nodes. In the discussion above, we
built two social networks by eliminating the nodes of one of the two types, but
we do not have to do that. We can rather think of the structure as a whole.
For a more complex example, users at a site like del.icio.us place tags on
Web pages. There are thus three different kinds of entities: users, tags, and
pages. We might think that users were somehow connected if they tended to
use the same tags frequently, or if they tended to tag the same pages. Similarly,
tags could be considered related if they appeared on the same pages or were
used by the same users, and pages could be considered similar if they had many
of the same tags or were tagged by many of the same users.
The natural way to represent such information is as a k-partite graph for
some k > 1. We met bipartite graphs, the case k = 2, in Section 8.3. In
general, a k-partite graph consists of k disjoint sets of nodes, with no edges
between nodes of the same set.

Example 10.2 : Figure 10.2 is an example of a tripartite graph (the case k = 3


of a k-partite graph). There are three sets of nodes, which we may think of
as users {U1 , U2 }, tags {T1 , T2 , T3 , T4 }, and Web pages {W1 , W2 , W3 }. Notice
that all edges connect nodes from two different sets. We may assume this graph
represents information about the three kinds of entities. For example, the edge
(U1 , T2 ) means that user U1 has placed the tag T2 on at least one page. Note
360 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

T1 W1

U1 T2 W2

U2 T3 W3

T4

Figure 10.2: A tripartite graph representing users, tags, and Web pages

that the graph does not tell us a detail that could be important: who placed
which tag on which page? To represent such ternary information would require
a more complex representation, such as a database relation with three columns
corresponding to users, tags, and pages. ✷

10.1.5 Exercises for Section 10.1


Exercise 10.1.1 : It is possible to think of the edges of one graph G as the
nodes of another graph G′ . We construct G′ from G by the dual construction:

1. If (X, Y ) is an edge of G, then XY , representing the unordered set of X


and Y is a node of G′ . Note that XY and Y X represent the same node
of G′ , not two different nodes.

2. If (X, Y ) and (X, Z) are edges of G, then in G′ there is an edge between


XY and XZ. That is, nodes of G′ have an edge between them if the
edges of G that these nodes represent have a node (of G) in common.

(a) If we apply the dual construction to a network of friends, what is the


interpretation of the edges of the resulting graph?

(b) Apply the dual construction to the graph of Fig. 10.1.

! (c) How is the degree of a node XY in G′ related to the degrees of X and Y


in G?

!! (d) The number of edges of G′ is related to the degrees of the nodes of G by


a certain formula. Discover that formula.
10.2. CLUSTERING OF SOCIAL-NETWORK GRAPHS 361

! (e) What we called the dual is not a true dual, because applying the con-
struction to G′ does not necessarily yield a graph isomorphic to G. Give
an example graph G where the dual of G′ is isomorphic to G and another
example where the dual of G′ is not isomorphic to G.

10.2 Clustering of Social-Network Graphs


An important aspect of social networks is that they contain communities of
entities that are connected by many edges. These typically correspond to groups
of friends at school or groups of researchers interested in the same topic, for
example. In this section, we shall consider clustering of the graph as a way to
identify communities. It turns out that the techniques we learned in Chapter 7
are generally unsuitable for the problem of clustering social-network graphs.

10.2.1 Distance Measures for Social-Network Graphs


If we were to apply standard clustering techniques to a social-network graph,
our first step would be to define a distance measure. When the edges of the
graph have labels, these labels might be usable as a distance measure, depending
on what they represented. But when the edges are unlabeled, as in a “friends”
graph, there is not much we can do to define a suitable distance.
Our first instinct is to assume that nodes are close if they have an edge
between them and distant if not. Thus, we could say that the distance d(x, y)
is 0 if there is an edge (x, y) and 1 if there is no such edge. We could use any
other two values, such as 1 and ∞, as long as the distance is closer when there
is an edge.
Neither of these two-valued “distance measures” – 0 and 1 or 1 and ∞ – is
a true distance measure. The reason is that they violate the triangle inequality
when there are three nodes, with two edges between them. That is, if there are
edges (A, B) and (B, C), but no edge (A, C), then the distance from A to C
exceeds the sum of the distances from A to B to C. We could fix this problem
by using, say, distance 1 for an edge and distance 1.5 for a missing edge. But
the problem with two-valued distance functions is not limited to the triangle
inequality, as we shall see in the next section.

10.2.2 Applying Standard Clustering Methods


Recall from Section 7.1.2 that there are two general approaches to clustering:
hierarchical (agglomerative) and point-assignment. Let us consider how each
of these would work on a social-network graph. First, consider the hierarchical
methods covered in Section 7.2. In particular, suppose we use as the intercluster
distance the minimum distance between nodes of the two clusters.
Hierarchical clustering of a social-network graph starts by combining some
two nodes that are connected by an edge. Successively, edges that are not
between two nodes of the same cluster would be chosen randomly to combine
362 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

the clusters to which their two nodes belong. The choices would be random,
because all distances represented by an edge are the same.

Example 10.3 : Consider again the graph of Fig. 10.1, repeated here as Fig.
10.3. First, let us agree on what the communities are. At the highest level,
it appears that there are two communities {A, B, C} and {D, E, F, G}. How-
ever, we could also view {D, E, F } and {D, F, G} as two subcommunities of
{D, E, F, G}; these two subcommunities overlap in two of their members, and
thus could never be identified by a pure clustering algorithm. Finally, we could
consider each pair of individuals that are connected by an edge as a community
of size 2, although such communities are uninteresting.

A B D E

G F

Figure 10.3: Repeat of Fig. 10.1

The problem with hierarchical clustering of a graph like that of Fig. 10.3 is
that at some point we are likely to chose to combine B and D, even though
they surely belong in different clusters. The reason we are likely to combine B
and D is that D, and any cluster containing it, is as close to B and any cluster
containing it, as A and C are to B. There is even a 1/9 probability that the
first thing we do is to combine B and D into one cluster.
There are things we can do to reduce the probability of error. We can
run hierarchical clustering several times and pick the run that gives the most
coherent clusters. We can use a more sophisticated method for measuring the
distance between clusters of more than one node, as discussed in Section 7.2.3.
But no matter what we do, in a large graph with many communities there is a
significant chance that in the initial phases we shall use some edges that connect
two nodes that do not belong together in any large community. ✷

Now, consider a point-assignment approach to clustering social networks.


Again, the fact that all edges are at the same distance will introduce a number
of random factors that will lead to some nodes being assigned to the wrong
cluster. An example should illustrate the point.

Example 10.4 : Suppose we try a k-means approach to clustering Fig. 10.3.


As we want two clusters, we pick k = 2. If we pick two starting nodes at random,
they might both be in the same cluster. If, as suggested in Section 7.3.2, we
start with one randomly chosen node and then pick another as far away as
10.2. CLUSTERING OF SOCIAL-NETWORK GRAPHS 363

possible, we don’t do much better; we could thereby pick any pair of nodes not
connected by an edge, e.g., E and G in Fig. 10.3.
However, suppose we do get two suitable starting nodes, such as B and F .
We shall then assign A and C to the cluster of B and assign E and G to the
cluster of F . But D is as close to B as it is to F , so it could go either way, even
though it is “obvious” that D belongs with F .
If the decision about where to place D is deferred until we have assigned
some other nodes to the clusters, then we shall probably make the right decision.
For instance, if we assign a node to the cluster with the shortest average distance
to all the nodes of the cluster, then D should be assigned to the cluster of F , as
long as we do not try to place D before any other nodes are assigned. However,
in large graphs, we shall surely make mistakes on some of the first nodes we
place. ✷

10.2.3 Betweenness
Since there are problems with standard clustering methods, several specialized
clustering techniques have been developed to find communities in social net-
works. In this section we shall consider one of the simplest, based on finding
the edges that are least likely to be inside a community.
Define the betweenness of an edge (a, b) to be the number of pairs of nodes
x and y such that the edge (a, b) lies on the shortest path between x and y.
To be more precise, since there can be several shortest paths between x and y,
edge (a, b) is credited with the fraction of those shortest paths that include the
edge (a, b). As in golf, a high score is bad. It suggests that the edge (a, b) runs
between two different communities; that is, a and b do not belong to the same
community.

Example 10.5 : In Fig. 10.3 the edge (B, D) has the highest betweenness, as
should surprise no one. In fact, this edge is on every shortest path between
any of A, B, and C to any of D, E, F , and G. Its betweenness is therefore
3 × 4 = 12. In contrast, the edge (D, F ) is on only four shortest paths: those
from A, B, C, and D to F . ✷

10.2.4 The Girvan-Newman Algorithm


In order to exploit the betweenness of edges, we need to calculate the number of
shortest paths going through each edge. We shall describe a method called the
Girvan-Newman (GN) Algorithm, which visits each node X once and computes
the number of shortest paths from X to each of the other nodes that go through
each of the edges. The algorithm begins by performing a breadth-first search
(BFS) of the graph, starting at the node X. Note that the level of each node in
the BFS presentation is the length of the shortest path from X to that node.
Thus, the edges that go between nodes at the same level can never be part of
a shortest path from X.
364 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

Edges between levels are called DAG edges (“DAG” stands for directed,
acyclic graph). Each DAG edge will be part of at least one shortest path
from root X. If there is a DAG edge (Y, Z), where Y is at the level above Z
(i.e., closer to the root), then we shall call Y a parent of Z and Z a child of Y ,
although parents are not necessarily unique in a DAG as they would be in a
tree.
1
E

1 1
Level 1 D F

1
Level 2 B G 2

1 1
Level 3 A C

Figure 10.4: Step 1 of the Girvan-Newman Algorithm

Example 10.6 : Figure 10.4 is a breadth-first presentation of the graph of Fig.


10.3, starting at node E. Solid edges are DAG edges and dashed edges connect
nodes at the same level. ✷

The second step of the GN algorithm is to label each node by the number of
shortest paths that reach it from the root. Start by labeling the root 1. Then,
from the top down, label each node Y by the sum of the labels of its parents.

Example 10.7 : In Fig. 10.4 are the labels for each of the nodes. First, label
the root E with 1. At level 1 are the nodes D and F . Each has only E as a
parent, so they too are labeled 1. Nodes B and G are at level 2. B has only
D as a parent, so B’s label is the same as the label of D, which is 1. However,
G has parents D and F , so its label is the sum of their labels, or 2. Finally, at
level 3, A and C each have only parent B, so their labels are the label of B,
which is 1. ✷

The third and final step is to calculate for each edge e the sum over all nodes
Y of the fraction of shortest paths from the root X to Y that go through e.
This calculation involves computing this sum for both nodes and edges, from
the bottom. Each node other than the root is given a credit of 1, representing
the shortest path to that node. This credit may be divided among nodes and
10.2. CLUSTERING OF SOCIAL-NETWORK GRAPHS 365

edges above, since there could be several different shortest paths to the node.
The rules for the calculation are as follows:

1. Each leaf in the DAG (a leaf is a node with no DAG edges to nodes at
levels below) gets a credit of 1.
2. Each node that is not a leaf gets a credit equal to 1 plus the sum of the
credits of the DAG edges from that node to the level below.
3. A DAG edge e entering node Z from the level above is given a share of the
credit of Z proportional to the fraction of shortest paths from the root to
Z that go through e. Formally, let the parents of Z be Y1 , Y2 , . . . , Yk . Let
pi be the number of shortest paths from the root to Yi ; this number was
computed in Step 2 and is illustrated by the labels in Fig. 10.4. ThenP the
credit for the edge (Yi , Z) is the credit of Z times pi divided by kj=1 pj .

After performing the credit calculation with each node as the root, we sum
the credits for each edge. Then, since each shortest path will have been discov-
ered twice – once when each of its endpoints is the root – we must divide the
credit for each edge by 2.

Example 10.8 : Let us perform the credit calculation for the BFS presentation
of Fig. 10.4. We shall start from level 3 and proceed upwards. First, A and C,
being leaves, get credit 1. Each of these nodes have only one parent, so their
credit is given to the edges (B, A) and (B, C), respectively.

D F

3 B G 1
1 1

1 A C 1

Figure 10.5: Final step of the Girvan-Newman Algorithm – levels 3 and 2

At level 2, G is a leaf, so it gets credit 1. B is not a leaf, so it gets credit


equal to 1 plus the credits on the DAG edges entering it from below. Since
both these edges have credit 1, the credit of B is 3. Intuitively 3 represents the
fact that all shortest paths from E to A, B, and C go through B. Figure 10.5
shows the credits assigned so far.
366 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

Now, let us proceed to level 1. B has only one parent, D, so the edge
(D, B) gets the entire credit of B, which is 3. However, G has two parents, D
and F . We therefore need to divide the credit of 1 that G has between the edges
(D, G) and (F, G). In what proportion do we divide? If you examine the labels
of Fig. 10.4, you see that both D and F have label 1, representing the fact that
there is one shortest path from E to each of these nodes. Thus, we give half
the credit of G to each of these edges; i.e., their credit is each 1/(1 + 1) = 0.5.
Had the labels of D and F in Fig. 10.4 been 5 and 3, meaning there were five
shortest paths to D and only three to F , then the credit of edge (D, G) would
have been 5/8 and the credit of edge (F, G) would have been 3/8.

4.5 1.5

4.5 D F 1.5

3 0.5 0.5

3 B G 1
1 1

1 A C 1

Figure 10.6: Final step of the Girvan-Newman Algorithm – completing the


credit calculation

Now, we can assign credits to the nodes at level 1. D gets 1 plus the credits
of the edges entering it from below, which are 3 and 0.5. That is, the credit of D
is 4.5. The credit of F is 1 plus the credit of the edge (F, G), or 1.5. Finally, the
edges (E, D) and (E, F ) receive the credit of D and F , respectively, since each
of these nodes has only one parent. These credits are all shown in Fig. 10.6.
The credit on each of the edges in Fig. 10.6 is the contribution to the be-
tweenness of that edge due to shortest paths from E. For example, this contri-
bution for the edge (E, D) is 4.5. ✷

To complete the betweenness calculation, we have to repeat this calculation


for every node as the root and sum the contributions. Finally, we must divide
by 2 to get the true betweenness, since every shortest path will be discovered
twice, once for each of its endpoints.

10.2.5 Using Betweenness to Find Communities


The betweenness scores for the edges of a graph behave something like a distance
measure on the nodes of the graph. It is not exactly a distance measure, because
10.2. CLUSTERING OF SOCIAL-NETWORK GRAPHS 367

it is not defined for pairs of nodes that are unconnected by an edge, and might
not satisfy the triangle inequality even when defined. However, we can cluster
by taking the edges in order of increasing betweenness and add them to the
graph one at a time. At each step, the connected components of the graph
form some clusters. The higher the betweenness we allow, the more edges we
get, and the larger the clusters become.
More commonly, this idea is expressed as a process of edge removal. Start
with the graph and all its edges; then remove edges with the highest between-
ness, until the graph has broken into a suitable number of connected compo-
nents.
Example 10.9 : Let us start with our running example, the graph of Fig. 10.1.
We see it with the betweenness for each edge in Fig. 10.7. The calculation of
the betweenness will be left to the reader. The only tricky part of the count
is to observe that between E and G there are two shortest paths, one going
through D and the other through F . Thus, each of the edges (D, E), (E, F ),
(D, G), and (G, F ) are credited with half a shortest path.

5 12 4.5
A B D E

1 5 4
4.5 1.5
C

G F
1.5

Figure 10.7: Betweenness scores for the graph of Fig. 10.1

Clearly, edge (B, D) has the highest betweenness, so it is removed first.


That leaves us with exactly the communities we observed make the most sense,
namely: {A, B, C} and {D, E, F, G}. However, we can continue to remove
edges. Next to leave are (A, B) and (B, C) with a score of 5, followed by (D, E)
and (D, G) with a score of 4.5. Then, (D, F ), whose score is 4, would leave the
graph. We see in Fig. 10.8 the graph that remains.
The “communities” of Fig. 10.8 look strange. One implication is that A and
C are more closely knit to each other than to B. That is, in some sense B is a
“traitor” to the community {A, B, C} because he has a friend D outside that
community. Likewise, D can be seen as a “traitor” to the group {D, E, F, G},
which is why in Fig. 10.8, only E, F , and G remain connected. ✷

10.2.6 Exercises for Section 10.2


Exercise 10.2.1 : Figure 10.9 is an example of a social-network graph. Use
the Girvan-Newman approach to find the number of shortest paths from each
368 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

A B D E

G F

Figure 10.8: All the edges with betweenness 4 or more have been removed

Speeding Up the Betweenness Calculation


If we apply the method of Section 10.2.4 to a graph of n nodes and e edges,
it takes O(ne) running time to compute the betweenness of each edge.
That is, BFS from a single node takes O(e) time, as do the two labeling
steps. We must start from each node, so there are n of the computations
described in Section 10.2.4.
If the graph is large – and even a million nodes is large when the
algorithm takes O(ne) time – we cannot afford to execute it as suggested.
However, if we pick a subset of the nodes at random and use these as
the roots of breadth-first searches, we can get an approximation to the
betweenness of each edge that will serve in most applications.

of the following nodes that pass through each of the edges. (a) A (b) B.

Exercise 10.2.2 : Using symmetry, the calculations of Exercise 10.2.1 are all
you need to compute the betweenness of each edge. Do the calculation.

Exercise 10.2.3 : Using the betweenness values from Exercise 10.2.2, deter-
mine reasonable candidates for the communities in Fig. 10.9 by removing all
edges with a betweenness above some threshold.

10.3 Direct Discovery of Communities


In the previous section we searched for communities by partitioning all the in-
dividuals in a social network. While this approach is relatively efficient, it does
have several limitations. It is not possible to place an individual in two different
communities, and everyone is assigned to a community. In this section, we shall
see a technique for discovering communities directly by looking for subsets of
the nodes that have a relatively large number of edges among them. Interest-
ingly, the technique for doing this search on a large graph involves finding large
frequent itemsets, as was discussed in Chapter 6.
10.3. DIRECT DISCOVERY OF COMMUNITIES 369

B C

H D

I G E F

Figure 10.9: Graph for exercises

10.3.1 Finding Cliques


Our first thought about how we could find sets of nodes with many edges
between them is to start by finding a large clique (a set of nodes with edges
between any two of them). However, that task is not easy. Not only is finding
maximal cliques NP-complete, but it is among the hardest of the NP-complete
problems in the sense that even approximating the maximal clique is hard.
Further, it is possible to have a set of nodes with almost all edges between
them, and yet have only relatively small cliques.

Example 10.10 : Suppose our graph has nodes numbered 1, 2, . . . , n and there
is an edge between two nodes i and j unless i and j have the same remain-
der when divided by k. Then the fraction of possible edges that are actually
present is approximately (k − 1)/k. There are many cliques of size k, of which
{1, 2, . . . , k} is but one example.
Yet there are no cliques larger than k. To see why, observe that any set of
k + 1 nodes has two that leave the same remainder when divided by k. This
point is an application of the “pigeonhole principle.” Since there are only k
different remainders possible, we cannot have distinct remainders for each of
k + 1 nodes. Thus, no set of k + 1 nodes can be a clique in this graph. ✷

10.3.2 Complete Bipartite Graphs


Recall our discussion of bipartite graphs from Section 8.3. A complete bipartite
graph consists of s nodes on one side and t nodes on the other side, with all st
possible edges between the nodes of one side and the other present. We denote
this graph by Ks,t . You should draw an analogy between complete bipartite
graphs as subgraphs of general bipartite graphs and cliques as subgraphs of
general graphs. In fact, a clique of s nodes is often referred to as a complete
370 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

graph and denoted Ks , while a complete bipartite subgraph is sometimes called


a bi-clique.
While as we saw in Example 10.10, it is not possible to guarantee that a
graph with many edges necessarily has a large clique, it is possible to guar-
antee that a bipartite graph with many edges has a large complete bipartite
subgraph.1 We can regard a complete bipartite subgraph (or a clique if we
discovered a large one) as the nucleus of a community and add to it nodes
with many edges to existing members of the community. If the graph itself is
k-partite as discussed in Section 10.1.4, then we can take nodes of two types
and the edges between them to form a bipartite graph. In this bipartite graph,
we can search for complete bipartite subgraphs as the nuclei of communities.
For instance, in Example 10.2, we could focus on the tag and page nodes of a
graph like Fig. 10.2 and try to find communities of tags and Web pages. Such a
community would consist of related tags and related pages that deserved many
or all of those tags.
However, we can also use complete bipartite subgraphs for community find-
ing in ordinary graphs where nodes all have the same type. Divide the nodes
into two equal groups at random. If a community exists, then we would expect
about half its nodes to fall into each group, and we would expect that about
half its edges would go between groups. Thus, we still have a reasonable chance
of identifying a large complete bipartite subgraph in the community. To this
nucleus we can add nodes from either of the two groups, if they have edges to
many of the nodes already identified as belonging to the community.

10.3.3 Finding Complete Bipartite Subgraphs


Suppose we are given a large bipartite graph G , and we want to find instances
of Ks,t within it. It is possible to view the problem of finding instances of Ks,t
within G as one of finding frequent itemsets. For this purpose, let the “items”
be the nodes on one side of G, which we shall call the left side. We assume that
the instance of Ks,t we are looking for has t nodes on the left side, and we shall
also assume for efficiency that t ≤ s. The “baskets” correspond to the nodes
on the other side of G (the right side). The members of the basket for node v
are the nodes of the left side to which v is connected. Finally, let the support
threshold be s, the number of nodes that the instance of Ks,t has on the right
side.
We can now state the problem of finding instances of Ks,t as that of finding
frequent itemsets F of size t. That is, if a set of t nodes on the left side is
frequent, then they all occur together in at least s baskets. But the baskets
are the nodes on the right side. Each basket corresponds to a node that is
connected to all t of the nodes in F . Thus, the frequent itemset of size t and s
1 It is important to understand that we do not mean a generated subgraph – one formed

by selecting some nodes and including all edges. In this context, we only require that there
be edges between any pair of nodes on different sides. It is also possible that some nodes on
the same side are connected by edges as well.
10.3. DIRECT DISCOVERY OF COMMUNITIES 371

of the baskets in which all those items appear form an instance of Ks,t .

1 a

2 b

3 c

4 d

Figure 10.10: The bipartite graph from Fig. 8.1

Example 10.11 : Recall the bipartite graph of Fig. 8.1, which we repeat here as
Fig. 10.10. The left side is the nodes {1, 2, 3, 4} and the right side is {a, b, c, d}.
The latter are the baskets, so basket a consists of “items” 1 and 4; that is,
a = {1, 4}. Similarly, b = {2, 3}, c = {1} and d = {3}.
If s = 2 and t = 1, we must find itemsets of size 1 that appear in at least
two baskets. {1} is one such itemset, and {3} is another. However, in this tiny
example there are no itemsets for larger, more interesting values of s and t,
such as s = t = 2. ✷

10.3.4 Why Complete Bipartite Graphs Must Exist


We must now turn to the matter of demonstrating that any bipartite graph
with a sufficiently high fraction of the edges present will have an instance of
Ks,t . In what follows, assume that the graph G has n nodes on the left and
another n nodes on the right. Assume the two sides have the same number of
nodes simplifies the calculation, but the argument generalizes to sides of any
size. Finally, let d be the average degree of all nodes.
The argument involves counting the number of frequent itemsets of size t
that a basket with d items contributes to. When we sum this number over all
nodes on the right side, we get the
 total frequency of all the subsets of size t on
the left. When we divide by nt , we get the average frequency of all itemsets
of size t. At least one must have a frequency that is at least average, so if this
average is at least s, we know an instance of Ks,t exists.
Now, we provide the detailed calculation. Suppose the degree of the ith
node on the right is di ; that is, di is the size of the ith basket. Then this
372 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

basket contributes to dti itemsets of size t. The total contribution of the n




nodes on the right is i dti . The value of this sum depends on the di ’s, of
P 

course. However, we know that the average value of di is d. It is known that


this sum is minimized when each di is d. We shall not
 prove this point, but a
simple example will suggest the reasoning: since dti grows roughly as the tth
power of di , moving 1 from a large di to some smaller dj will reduce the sum
of dti + dtj .
 

Example 10.12 : Suppose there are only two nodes, t = 2, and the average
degree of the nodes is 4. Then d1 + d2 = 8, and the sum of interest is d21 + d22 .


If d1 = d2 = 4, then the sum is 42 + 42 = 6 + 6 = 12. However, if d1 = 5 and


 

d2 = 3, the sum is 52 + 32 = 10 + 3 = 13. If d1 = 6 and d2 = 2, then the sum


 

is 62 + 22 = 15 + 1 = 16. ✷
 

Thus, in what follows, we shall assume that all nodes have the average degree
d. So doing minimizes the total contribution to the counts for the itemsets, and
thus makes it least likely that there will be a frequent itemset (itemset with
with support s or more) of size t. Observe the following:

• The total contribution of the n nodes on the right to the counts of the
itemsets of size t is n dt .

• The number of itemsets of size t is nt .




• Thus, the average count of an itemset of size t is n dt / nt ; this expression


 

must be at least s if we are to argue that an instance of Ks,t exists.

If we expand the binomial coefficients in terms of factorials, we find


   
d n 
n / = nd!(n − t)!t!/ (d − t)!t!n! =
t t

n(d)(d − 1) · · · (d − t + 1)/ n(n − 1) · · · (n − t + 1)
To simplify the formula above, let us assume that n is much larger than d, and
d is much larger than t. Then d(d − 1) · · · (d − t + 1) is approximately dt , and
n(n − 1) · · · (n − t + 1) is approximately nt . We thus require that

n(d/n)t ≥ s

That is, if there is a community with n nodes on each side, the average degree
of the nodes is d, and n(d/n)t ≥ s, then this community is guaranteed to have
a complete bipartite subgraph Ks,t . Moreover, we can find the instance of Ks,t
efficiently, using the methods of Chapter 6, even if this small community is
embedded in a much larger graph. That is, we can treat all nodes in the entire
graph as baskets and as items, and run A-priori or one of its improvements on
the entire graph, looking for sets of t items with support s.
10.3. DIRECT DISCOVERY OF COMMUNITIES 373

Example 10.13 : Suppose there is a community with 100 nodes on each side,
and the average degree of nodes is 50; i.e., half the possible edges exist. This
community will have an instance of Ks,t , provided 100(1/2)t ≥ s. For example,
if t = 2, then s can be as large as 25. If t = 3, s can be 11, and if t = 4, s can
be 6.
Unfortunately, the approximation we made gives us a bound on s that is a
little too high. If we revert to the original formula n dt / nt ≥ s, we see that


for the case t = 4 we need 100 50


 100
4 / 4 ≥ s. That is,

100 × 50 × 49 × 48 × 47
≥s
100 × 99 × 98 × 97

The expression on the left is not 6, but only 5.87. However, if the average
support for an itemset of size 4 is 5.87, then it is impossible that all those
itemsets have support 5 or less. Thus, we can be sure that at least one itemset
of size 4 has support 6 or more, and an instance of K6.4 exists in this community.

10.3.5 Exercises for Section 10.3


Exercise 10.3.1 : For the running example of a social network from Fig. 10.1,
how many instances of Ks,t are there for:

(a) s = 1 and t = 3.

(b) s = 2 and t = 2.

(c) s = 2 and t = 3.

Exercise 10.3.2 : Suppose there is a community of 2n nodes. Divide the


community into two groups of n members, at random, and form the bipartite
graph between the two groups. Suppose that the average degree of the nodes of
the bipartite graph is d. Find the set of maximal pairs (t, s), with t ≤ s, such
that an instance of Ks,t is guaranteed to exist, for the following combinations
of n and d:

(a) n = 20 and d = 5.

(b) n = 200 and d = 150.

(c) n = 1000 and d = 400.

By “maximal,” we mean there is no different pair (s′ , t′ ) such that both s′ ≥ s


and t′ ≥ t hold.
374 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

10.4 Partitioning of Graphs


In this section, we examine another approach to organizing social-network
graphs. We use some important tools from matrix theory (“spectral meth-
ods”) to formulate the problem of partitioning a graph to minimize the number
of edges that connect different components. The goal of minimizing the “cut”
size needs to be understood carefully before proceeding. For instance, if you
just joined Facebook, you are not yet connected to any friends. We do not
want to partition the friends graph with you in one group and the rest of the
world in the other group, even though that would partition the graph without
there being any edges that connect members of the two groups. This cut is not
desirable because the two components are too unequal in size.

10.4.1 What Makes a Good Partition?


Given a graph, we would like to divide the nodes into two sets so that the cut, or
set of edges that connect nodes in different sets is minimized. However, we also
want to constrain the selection of the cut so that the two sets are approximately
equal in size. The next example illustrates the point.

Example 10.14 : Recall our running example of the graph in Fig. 10.1. There,
it is evident that the best partition puts {A, B, C} in one set and {D, E, F, G}
in the other. The cut consists only of the edge (B, D) and is of size 1. No
nontrivial cut can be smaller.

A B D E

H G F

Best cut
Smallest
cut

Figure 10.11: The smallest cut might not be the best cut

In Fig. 10.11 is a variant of our example, where we have added the node
H and two extra edges, (H, C) and (C, G). If all we wanted was to minimize
the size of the cut, then the best choice would be to put H in one set and all
the other nodes in the other set. But it should be apparent that if we reject
10.4. PARTITIONING OF GRAPHS 375

partitions where one set is too small, then the best we can do is to use the
cut consisting of edges (B, D) and (C, G), which partitions the graph into two
equal-sized sets {A, B, C, H} and {D, E, F, G}. ✷

10.4.2 Normalized Cuts


A proper definition of a “good” cut must balance the size of the cut itself
against the difference in the sizes of the sets that the cut creates. One choice
that serves well is the “normalized cut.” First, define the volume of a set S of
nodes, denoted Vol (S), to be the number of edges with at least one end in S.
Suppose we partition the nodes of a graph into two disjoint sets S and T .
Let Cut (S, T ) be the number of edges that connect a node in S to a node in T .
Then the normalized cut value for S and T is
Cut (S, T ) Cut(S, T )
+
Vol (S) Vol(T )

Example 10.15 : Again consider the graph of Fig. 10.11. If we choose S = {H}
and T = {A, B, C, D, E, F, G}, then Cut (S, T ) = 1. Vol(S) = 1, because there
is only one edge connected to H. On the other hand, Vol(T ) = 11, because all
the edges have at least one end at a node of T . Thus, the normalized cut for
this partition is 1/1 + 1/11 = 1.09.
Now, consider the preferred cut for this graph consisting of the edges (B, D)
and (C, G). Then S = {A, B, C, H} and T = {D, E, F, G}. Cut (S, T ) = 2,
Vol (S) = 6, and Vol(T ) = 7. The normalized cut for this partition is thus only
2/6 + 2/7 = 0.62. ✷

10.4.3 Some Matrices That Describe Graphs


To develop the theory of how matrix algebra can help us find good graph
partitions, we first need to learn about three different matrices that describe
aspects of a graph. The first should be familiar: the adjacency matrix that has
a 1 in row i and column j if there is an edge between nodes i and j, and 0
otherwise.

A B D E

G F

Figure 10.12: Repeat of the graph of Fig. 10.1


376 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

Example 10.16 : We repeat our running example graph in Fig. 10.12. Its
adjacency matrix appears in Fig. 10.13. Note that the rows and columns cor-
respond to the nodes A, B, . . . , G in that order. For example, the edge (B, D)
is reflected by the fact that the entry in row 2 and column 4 is 1 and so is the
entry in row 4 and column 2. ✷
 
0 1 1 0 0 0 0

 1 0 1 1 0 0 0 


 1 1 0 0 0 0 0 


 0 1 0 0 1 1 1 


 0 0 0 1 0 1 0 

 0 0 0 1 1 0 1 
0 0 0 1 0 1 0

Figure 10.13: The adjacency matrix for Fig. 10.12

The second matrix we need is the degree matrix for a graph. This graph has
nonzero entries only on the diagonal. The entry for row and column i is the
degree of the ith node.
Example 10.17 : The degree matrix for the graph of Fig. 10.12 is shown in
Fig. 10.14. We use the same order of the nodes as in Example 10.16. For
instance, the entry in row 4 and column 4 is 4 because node D has edges to
four other nodes. The entry in row 4 and column 5 is 0, because that entry is
not on the diagonal. ✷
 
2 0 0 0 0 0 0

 0 3 0 0 0 0 0 


 0 0 2 0 0 0 0 


 0 0 0 4 0 0 0 


 0 0 0 0 2 0 0 

 0 0 0 0 0 3 0 
0 0 0 0 0 0 2

Figure 10.14: The degree matrix for Fig. 10.12

Suppose our graph has adjacency matrix A and degree matrix D. Our third
matrix, called the Laplacian matrix, is L = D − A, the difference between the
degree matrix and the adjacency matrix. That is, the Laplacian matrix L has
the same entries as D on the diagonal. Off the diagonal, at row i and column j,
L has −1 if there is an edge between nodes i and j and 0 if not.
Example 10.18 : The Laplacian matrix for the graph of Fig. 10.12 is shown
in Fig. 10.15. Notice that each row and each column sums to zero, as must be
the case for any Laplacian matrix. ✷
10.4. PARTITIONING OF GRAPHS 377
 
2 -1 -1 0 0 0 0

 -1 3 -1 -1 0 0 0 


 -1 -1 2 0 0 0 0 


 0 -1 0 4 -1 -1 -1 


 0 0 0 -1 2 -1 0 

 0 0 0 -1 -1 3 -1 
0 0 0 -1 0 -1 2

Figure 10.15: The Laplacian matrix for Fig. 10.12

10.4.4 Eigenvalues of the Laplacian Matrix


We can get a good idea of the best way to partition a graph from the eigenvalues
and eigenvectors of its Laplacian matrix. In Section 5.1.2 we observed how the
principal eigenvector (eigenvector associated with the largest eigenvalue) of the
transition matrix of the Web told us something useful about the importance of
Web pages. In fact, in simple cases (no taxation) the principal eigenvector is the
PageRank vector. When dealing with the Laplacian matrix, however, it turns
out that the smallest eigenvalues and their eigenvectors reveal the information
we desire.
The smallest eigenvalue for every Laplacian matrix is 0, and its correspond-
ing eigenvector is [1, 1, . . . , 1]. To see why, let L be the Laplacian matrix for a
graph of n nodes, and let 1 be the column vector of all 1’s with length n. We
claim L1 is a column vector of all 0’s. To see why, consider row i of L. Its
diagonal element has the degree d of node i. Row i also will have d occurrences
of −1, and all other elements of row i are 0. Multiplying row i by column vector
1 has the effect of summing the row, and this sum is clearly d + (−1)d = 0.
Thus, we can conclude L1 = 01, which demonstrates that 0 is an eigenvalue
and 1 its corresponding eigenvector.
There is a simple way to find the second-smallest eigenvalue for any matrix,
such as the Laplacian matrix, that is symmetric (the entry in row i and column
j equals the entry in row j and column i). While we shall not prove this
fact, the second-smallest eigenvalue of L is the minimum of xT Lx, where x =
[x1 , x2 , . . . , xn ] is a column vector with n components, and the minimum is
taken under the constraints:

Pn
1. The length of x is 1; that is i=1 x2i = 1.

2. x is orthogonal to the eigenvector associated with the smallest eigenvalue.

Moreover, the value of x that achieves this minimum is the second eigenvector.
When L is a Laplacian matrix for an n-node graph, we know something
more. The eigenvector associated with the smallest eigenvalue is 1. Thus, if x
378 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

is orthogonal to 1, we must have


n
X
xT 1 = xi = 0
i=1

In addition for the Laplacian matrix, the expression xT Lx has a useful equiv-
alent expression. Recall that L = D − A, where D and A are the degree and
adjacency matrices of the same graph. Thus, xT Lx = xT Dx − xT Ax. Let us
evaluate the term with D and then the term for A. Here, Dx is the column vec-
tor [d1 x1 , d2 x2 , .P
. . , dn xn ], where di is the degree of the ith node of the graph.
Thus, xT Dx is ni=1 di x2i .
Now, turn to xT Ax. The ith component of the column vector Ax is the sum
of xj over all j such that there is an edge (i, j) in the graph. Thus, −xT Ax is the
sum of −2xi xj over all pairs of nodes {i, j} such that there is an edge between
them. Note that the factor 2 appears because each set {i, j} corresponds to two
terms, −xi xj and −xj xi .
We can group the terms of xT Lx in a way that distributes the terms to each
pair {i, j}. From −xT Ax, we already have the term −2xi xj . From xT Dx, we
distribute the term di x2i to the di pairs that include node i. As a result, we
can associate with each pair {i, j} that has an edge between nodes i and j the
terms x2i − 2xi xj + x2j . This expression is equivalent to (xi − xj )2 . Therefore, we
have proved that xT Lx equals the sum over all graph edges (i, j) of (xi − xj )2 .
Recall that the second-smallest Pn eigenvalue is the minimum of this expression
under the constraint that i=1 x2i = 1. Intuitively, we minimize it by making
xi and xj close whenever there is an edge between √ nodes i and j in the graph.
We might imagine that we could choose xi = 1/ n for all i and thus make this
sum 0. However, recall that we are constrained to choose x to be orthogonal Pn to
1, which means the sum of the xi ’s is 0. We are also forced to make i=1 x2i be
1, so all components cannot be 0. As a consequence, x must have some positive
and some negative components.
We can obtain a partition of the graph by taking one set to be the nodes
i whose corresponding vector component xi is positive and the other set to
be those whose components are negative. This choice does not guarantee a
partition into sets of equal size, but the sizes are likely to be close. We believe
that the cut between the two sets will have a small number of edges because
(xi −xj )2 is likely to be smaller if both xi and xj have the same sign than if they
have different signs. Thus, minimizing xT Lx under the required constraints will
tend to give xi and xj the same sign if there is an edge (i, j).

Example 10.19 : Let us apply the above technique to the graph of Fig. 10.16.
The Laplacian matrix for this graph is shown in Fig. 10.17. By standard meth-
ods or math packages we can find all the eigenvalues and eigenvectors of this
matrix. We shall simply tabulate them in Fig. 10.18, from lowest eigenvalue to
highest. Note that we have not scaled the eigenvectors to have length 1, but
could do so easily if we wished.
10.4. PARTITIONING OF GRAPHS 379

1 4

2 5

3 6

Figure 10.16: Graph for illustrating partitioning by spectral analysis


 
3 -1 -1 -1 0 0
 -1 2 -1 0 0 0 
 
 -1 -1 3 0 0 -1 
 
 -1 0 0 3 -1 -1 
 
 0 0 0 -1 2 -1 
0 0 -1 -1 -1 3

Figure 10.17: The Laplacian matrix for Fig. 10.16

The second eigenvector has three positive and three negative components.
It makes the unsurprising suggestion that one group should be {1, 2, 3}, the
nodes with positive components, and the other group should be {4, 5, 6}. ✷

Eigenvalue 0 1 3 3 4 5
Eigenvector 1 1 −5 −1 −1 −1
1 2 4 −2 1 0
1 1 1 3 −1 1
1 −1 −5 −1 1 1
1 −2 4 −2 −1 0
1 −1 1 3 1 −1

Figure 10.18: Eigenvalues and eigenvectors for the matrix of Fig. 10.17

10.4.5 Alternative Partitioning Methods


The method of Section 10.4.4 gives us a good partition of the graph into two
pieces that have a small cut between them. There are several ways we can use
the same eigenvectors to suggest other good choices of partition. First, we are
not constrained to put all the nodes with positive components in the eigenvector
into one group and those with negative components in the other. We could set
the threshold at some point other than zero.
For instance, suppose we modified Example 10.19 so that the threshold was
not zero, but −1.5. Then the two nodes 4 and 6, with components −1 in the
second eigenvector of Fig. 10.18, would join 1, 2, and 3, leaving five nodes in one
380 CHAPTER 10. MINING SOCIAL-NETWORK GRAPHS

component and only node 5 in the other. That partition would have a cut of size
two, as did the choice based on the threshold of zero, but the two components
have radically different sizes, so we would tend to prefer our original choice.
However, there are other cases where the threshold zero gives unequally sized
components, as would be the case if we used the third eigenvector in Fig. 10.18.
We may also want a partition into more than two components. One approach
is to use the method described above to split the graph into two, and then use
it repeatedly on the components to split them as far as desired. A second
approach is to use several of the eigenvectors, not just the second, to partition
the graph. If we use m eigenvectors, and set a threshold for each, we can get a
partition into 2m groups, each group consisting of the nodes that are above or
below threshold for each of the eigenvectors, in a particular pattern.
It is worth noting that each eigenvector except the first is the vector x that
minimizes xT Lx, subject to the constraint that it is orthogonal to all previous
eigenvectors. This constraint generalizes the constraints we described for the
second eigenvector in a natural way. As a result, while each eigenvector tries
to produce a minimum-sized cut, the fact that successive eigenvectors have to
satisfy more and more constraints generally causes the cuts they describe to be
progressively worse.

Example 10.20 : Let us reconsider the graph of Fig. 10.16, for which the
eigenvectors of its Laplacian matrix were tabulated in Fig. 10.18. The third
eigenvector, with a threshold of 0, puts nodes 1 and 4 in one group and the
other four nodes in the other. That is not a bad partition, but its cut size is
four, compared with the cut of size two that we get from the second eigenvector.
If we use both the second and third eigenvectors, we put nodes 2 and 3 in
one group, because their components are positive in both eigenvectors. Nodes
5 and 6 are in another group, because their components are negative in the
second eigenvector and positive in the third. Node 1 is in a group by itself
because it is positive in the second eigenvector and negative in the third, while
node 4 is also in a group by itself because its component is negative in both
eigenvectors. This partition of a six-node graph into four groups is too fine a
partition to be meaningful. But at least the groups of size two each have an
edge between the nodes, so it is as good as we could ever get for a partition
into groups of these sizes. ✷

10.4.6 Exercises for Section 10.4


Exercise 10.4.1 : For the graph of Fig. 10.9, construct:

(a) The adjacency matrix.

(b) The degree matrix.

(c) The Laplacian matrix.


10.5. FINDING OVERLAPPING COMMUNITIES 381

! Exercise 10.4.2 : For the Laplacian matrix constructed in Exercise 10.4.1(c),


find the second-smallest eigenvalue and its eigenvector. What partition of the
nodes does it suggest?

!! Exercise 10.4.3 : For the Laplacian matrix constructed in Exercise 10.4.1(c),


construct the third and subsequent smallest eigenvalues and their eigenvectors.

10.5 Finding Overlapping Communities


So far, we have concentrated on clustering a social graph to find communities.
But communities are in practice rarely disjoint. In this section, we explain a
method for taking a social graph and fitting a model to it that best explains
how it could have been generated by a mechanism that assumes the probability
that two individuals are connected by an edge (are “friends”) increases as they
become members of more communities in common. An important tool in this
analysis is “maximum-likelihood estimation,” which we shall explain before
getting to the matter of finding overlapping communities.

10.5.1 The Nature of Communities


To begin, let us consider what we would expect two overlapping communities
to look like. Our data is a social graph, where nodes are people and there is an
edge between two nodes if the people are “friends.” Let us imagine that this
graph represents students at a school, and there are two clubs in this school:
the Chess Club and the Spanish Club. It is reasonable to suppose that each of
these clubs forms a community. It is also reasonable to suppose that two people
in the Chess Club are more likely to be friends in the graph because they know
each other from the club. Likewise, if two people are in the Spanish Club, then
there is a good chance they know each other, and are likely to be friends.
What if two people are in both clubs? They now have two reasons why they
might know each other, and so we would expect an even greater probability
that they will be friends in the social graph. Our conclusion is that we expect
edges to be dense within any community, but we expect edges to be even denser
in the intersection of two communities, denser than that in the intersection of
three communities, and so on. The idea is suggested by Fig. 10.19.

10.5.2 Maximum-Likelihood Estimation


Before we see the algorithm for finding communities that have overlap of the
kind suggested in Section 10.5.1, let us digress and learn a useful modeling
tool called maximum-likelihood estimation, or MLE. The idea behind MLE
is that we make an assumption about the generative process (the model ) that
creates instances of some artifact, for example, “friends graphs.” The model has
parameters that determine the probability of generating any particular instance
of the artifact; this probability is called the likelihood of those parameter values.

You might also like