Mod3 Newman Networks An Introduction
Mod3 Newman Networks An Introduction
1
For those interested in traditional social network analysis, introductions can be found in the
books by Scott [293] and by Wasserman and Faust [320].
168
7.2 | E IGENVECTOR CENTRALITY
where Aij is an element of the adjacency matrix. We can also write this ex-
pression in matrix notation as x = Ax, where x is the vector with elements xi .
Repeating this process to make better estimates, we have after t steps a vector
169
M EASURES AND METRICS
x ( t ) = A t x (0). (7.2)
where the κi are the eigenvalues of A, and κ1 is the largest of them. Since
κi /κ1 < 1 for all i = 1, all terms in the sum other than the first decay exponen-
tially as t becomes large, and hence in the limit t → ∞ we get x(t) → c1 κ1t v1 .
In other words, the limiting vector of centralities is simply proportional to the
leading eigenvector of the adjacency matrix. Equivalently we could say that
the centrality x satisfies
Ax = κ1 x. (7.5)
This then is the eigenvector centrality, first proposed by Bonacich [49] in 1987.
As promised the centrality xi of vertex i is proportional to the sum of the cen-
tralities of i’s neighbors:
xi = κ1−1 ∑ Aij x j , (7.6)
j
which gives the eigenvector centrality the nice property that it can be large
either because a vertex has many neighbors or because it has important neigh-
bors (or both). An individual in a social network, for instance, can be impor-
tant, by this measure, because he or she knows lots of people (even though
those people may not be important themselves) or knows a few people in high
places.
Note also that the eigenvector centralities of all vertices are non-negative.
To see this, consider what happens if the initial vector x(0) happens to have
only non-negative elements. Since all elements of the adjacency matrix are also
non-negative, multiplication by A can never introduce any negative elements
to the vector and x(t) in Eq. (7.2) must have all elements non-negative.2
2
Technically, there could be more than one eigenvector with eigenvalue κ1 , only one of which
need have all elements non-negative. It turns out, however, that this cannot happen: the adjacency
matrix has only one eigenvector of eigenvalue κ1 . See footnote 2 on page 346 for a proof.
170
7.2 | E IGENVECTOR CENTRALITY
3
This is not entirely true, as we will see in Section 7.5. Web pages that point to many others are
often directories of one sort or another and can be useful as starting points for web surfing. This is
a different kind of importance, however, from that highlighted by the eigenvector centrality and a
different, complementary centrality measure is needed to quantify it.
171
M EASURES AND METRICS
in Eq. (7.7). This might not seem to be a problem: perhaps a vertex that no
one points to should have centrality zero. But then consider vertex B, which
has one ingoing edge, but that edge originates at vertex A, and hence B also
has centrality zero, because the one term in its sum in Eq. (7.7) is zero. Taking
this argument further, we see that a vertex may be pointed to by others that
themselves are pointed to by many more, and so on through many generations,
but if the progression ends up at a vertex or vertices that have in-degree zero,
it is all for nothing—the final value of the centrality will still be zero.
In mathematical terms, only vertices that are in a strongly connected com-
ponent of two or more vertices, or the out-component of such a component,
can have non-zero eigenvector centrality.4 In many cases, however, it is ap-
propriate for vertices with high in-degree to have high centrality even if they
are not in a strongly-connected component or its out-component. Web pages
with many links, for instance, can reasonably be considered important even if
they are not in a strongly connected component. Recall also that acyclic net-
works, such as citation networks, have no strongly connected components of
more than one vertex (see Section 6.11.1), so all vertices will have centrality
zero. Clearly this make the standard eigenvector centrality completely useless
for acyclic networks.
A variation on eigenvector centrality that addresses these problems is the
Katz centrality, which is the subject of the next section.
where α and β are positive constants. The first term is the normal eigenvector
centrality term in which the centralities of the vertices linking to i are summed,
and the second term is the “free” part, the constant extra term that all vertices
receive. By adding this second term, even vertices with zero in-degree still get
centrality β, and once they have a non-zero centrality, then the vertices they
point to derive some advantage from being pointed to. This means that any
vertex that is pointed to by many others will have a high centrality, although
4
For the left eigenvector it would be the in-component.
172
7.3 | K ATZ CENTRALITY
those that are pointed to by others with high centrality themselves will still do
better.
In matrix terms, Eq. (7.8) can be written
where 1 is the vector (1, 1, 1 . . .). Rearranging for x, we find that x = β(I −
αA)−1 · 1. As we have said, we normally don’t care about the absolute mag-
nitude of the centrality, only about which vertices have high or low centrality
values, so the overall multiplier β is unimportant. For convenience we usually
set β = 1, giving
x = (I − αA)−1 · 1. (7.10)
This centrality measure was first proposed by Katz in 1953 [169] and we will
refer to it as the Katz centrality.
The Katz centrality differs from ordinary eigenvector centrality in the im-
portant respect of having a free parameter α, which governs the balance be-
tween the eigenvector term and the constant term in Eq. (7.8). If we wish to
make use of the Katz centrality we must first choose a value for this constant.
In doing so it is important to understand that α cannot be arbitrarily large. If
we let α → 0, then only the constant term survives in Eq. (7.8) and all vertices
have the same centrality β (which we have set to 1). As we increase α from
zero the centralities increase and eventually there comes a point at which they
diverge. This happens at the point where (I − αA)−1 diverges in Eq. (7.10),
i.e., when det(I − αA) passes through zero. Rewriting this condition as
we see that it is simply the characteristic equation whose roots α−1 are equal to
the eigenvalues of the adjacency matrix.5 As α increases, the determinant first
crosses zero when α−1 = κ1 , the largest eigenvalue of A, or alternatively when
α = 1/κ1 . Thus, we should choose a value of α less than this if we wish the
expression for the centrality to converge.6
Beyond this, however, there is little guidance to be had as to the value that
α should take. Most researchers have employed values close to the maximum
of 1/κ1 , which places the maximum amount of weight on the eigenvector term
5
The eigenvalues being defined by Av = κv, we see that (A − κI)v = 0, which has non-zero
solutions for v only if (A − κI) cannot be inverted, i.e., if det(A − κI) = 0, and hence this equation
gives the eigenvalues κ.
6
Formally one recovers finite values again when one moves past 1/κ1 to higher α, but in prac-
tice these values are meaningless. The method returns good results only for α < 1/κ1 .
173
M EASURES AND METRICS
and the smallest amount on the constant term. This returns a centrality that is
numerically quite close to the ordinary eigenvector centrality, but gives small
non-zero values to vertices that are not in the strongly connected components
or their out-components.
The Katz centrality can be calculated directly from Eq. (7.10) by inverting
the matrix on the right-hand side, but often this isn’t the best way to do it.
Inverting a matrix on a computer takes an amount of time proportional to n3 ,
where n is the number of vertices. This makes direct calculation of the Katz
centrality prohibitively slow for large networks. Networks of more than a
thousand vertices or so present serious problems.
A better approach in many cases is to evaluate the centrality directly from
Eq. (7.8) (or equivalently, Eq. (7.9)). One makes an initial estimate of x—
probably a bad one, such as x = 0—and uses that to calculate a better estimate
Repeating the process many times, x converges to a value close to the correct
centrality. Since A has m non-zero elements, each iteration requires m multi-
plication operations and the total time for the calculation is proportional to rm,
where r is the number of iterations necessary for the calculation to converge.
Unfortunately, r depends on the details of the network and on the choice of α,
so we cannot give a general guide to how many iterations will be necessary.
Instead one must watch the values of xi to observe when they converge to con-
stant values. Nonetheless, for large networks it is almost always worthwhile
to evaluate the centrality this way rather than by inverting the matrix.
We have presented the Katz centrality as a solution to the problems en-
countered with ordinary eigenvector centrality in directed networks. How-
ever, there is no reason in principle why one cannot use Katz centrality in un-
directed networks as well, and there are times when this might be useful. The
idea of adding a constant term to the centrality so that each vertex gets some
weight just by virtue of existing is a natural one. It allows a vertex that has
many neighbors to have high centrality regardless of whether those neighbors
themselves have high centrality, and this could be desirable in some applica-
tions.
A possible extension of the Katz centrality is to consider cases in which the
additive constant term in Eq. (7.8) is not the same for all vertices. One could
define a generalized centrality measure by
xi = α ∑ Aij x j + β i , (7.13)
j
174
7.4 | PAGE R ANK
x = (I − αA)−1 β, (7.14)
where β is the vector whose elements are the β i . One nice feature of this ap-
proach is that the difficult part of the calculation—the inversion of the matrix—
only has to be done once for a given network and choice of α. For difference
choices of the β i we need not recalculate the inverse, but simply multiply the
inverse into different vectors β.
This gives problems however if there are vertices in the network with out-
degree kout
i = 0. If there are any such vertices then the first term in Eq. (7.15)
175
M EASURES AND METRICS
is indeterminate—it is equal to zero divided by zero (because Aij = 0 for all i).
This problem is easily fixed however. It is clear that vertices with no out-going
edges should contribute zero to the centrality of any other vertex, which we
can contrive by artificially setting kout
i = 1 for all such vertices. (In fact, we
out
could set k i to any non-zero value and the calculation would give the same
answer.)
In matrix terms, Eq. (7.15), is then
with 1 being again the vector (1, 1, 1, . . .) and D being the diagonal matrix with
elements Dii = max(kout −1 −1 · 1,
i , 1). Rearranging, we find that x = β ( I − αAD )
and thus, as before, β plays the role only of an unimportant overall multiplier
for the centrality. Conventionally we set β = 1, giving
176
7.4 | PAGE R ANK
One could, for instance, use this for ranking web pages, giving β i a value based
perhaps on textual relevance to a search query. Pages that contained the word
or words being searched for more often or in more prominent places could
be given a higher intrinsic centrality than others, thereby pushing them up
the rankings. The author is not aware, however, of any cases in which this
technique has been implemented in practice.
Finally, one can also imagine a version of PageRank that did not have the
additive constant term in it at all:
xj
xi = α ∑ Aij , (7.20)
j
kj
7
It is easy to confirm that this vector is indeed an eigenvector with eigenvalue 1. That there
is no eigenvalue larger than 1 is less obvious. It follows from a standard result in linear algebra,
the Perron–Frobenius theorem, which states that the largest eigenvalue of a matrix such as AD−1
that has all elements non-negative is unique—there is only one eigenvector with this eigenvalue—
that the eigenvector also has all elements non-negative, and that it is the only eigenvector with all
elements non-negative. Combining these results, it is clear that the eigenvalue 1 above must be
the largest eigenvalue of the matrix AD−1 . For a discussion of the Perron–Frobenius theorem see
Ref. [217] and the two footnotes on page 346 of this book.
177
M EASURES AND METRICS
Table 7.1: Four centrality measures. The four matrix-based centrality measures dis-
cussed in the text are distinguished by whether or not they include an additive constant
term in their definition and whether they are normalized by dividing by the degrees of
neighboring vertices. Note that the diagonal matrix D, which normally has elements
Dii = k i , must be defined slightly differently for PageRank, as Dii = max(1, k i )—see
Eq. (7.15) and the following discussion. Each of the measures can be applied to directed
networks as well as undirected ones, although only three of the four are commonly used
in this way. (The measure that appears in the top right corner of the table is equivalent
to degree centrality in the undirected case but takes more complicated values in the
directed case and is not widely used.)
and therefore is just the same as ordinary degree centrality. For a directed net-
work, on the other hand, it does not reduce to any equivalent simple value
and it might potentially be of use, although it does not seem to have found
use in any prominent application. (It does suffer from the same problem as
the original eigenvector centrality, that it gives non-zero scores only to vertices
that fall in a strongly connected component of two or more vertices or in the
out-component of such a component. All other vertices get a zero score.)
In Table 7.1 we give a summary of the different matrix centrality measures
we have discussed, organized according to their definitions and properties. If
you want to use one of these measures in your own calculations and find the
many alternatives bewildering, eigenvector centrality and PageRank are prob-
ably the two measures to focus on initially. They are the two most commonly
used measures of this type. The Katz centrality has found widespread use in
the past but has been favored less in recent work, while the PageRank mea-
sure without the constant term, Eq. (7.20), is the same as degree centrality for
undirected networks and not in common use for directed ones.
178
7.5 | H UBS AND AUTHORITIES
xi = α ∑ Aij y j , (7.21)
j
179
M EASURES AND METRICS
yi = β ∑ A ji x j , (7.22)
j
with β another constant. Notice that the indices on the matrix element A ji are
swapped around in this second equation: it is the vertices that i points to that
define its hub centrality.
In matrix terms these equations can be written as
x = αAy, y = βA T x, (7.23)
where λ = (αβ)−1 . Thus the authority and hub centralities are respectively
given by eigenvectors of AA T and A T A with the same eigenvalue. By an ar-
gument similar to the one we used for the standard eigenvector centrality in
Section 7.1 we can show that we should in each case take the eigenvector cor-
responding to the leading eigenvalue.
A crucial condition for this approach to work, is that AA T and A T A have
the same leading eigenvalue λ, otherwise we cannot satisfy both conditions in
Eq. (7.24). It is easily proved, however, that this is the case, and in fact that all
eigenvalues are the same for the two matrices. If AA T x = λx then multiplying
both sides by AT gives
A T A ( A T x ) = λ ( A T x ), (7.25)
and hence A T x is an eigenvector of A T A with the same eigenvalue λ. Compar-
ing with Eq. (7.24) this means that
y = A T x, (7.26)
which gives us a fast way of calculating the hub centralities once we have
the authority ones—there is no need to solve both the eigenvalue equations
in Eq. (7.24) separately.
Note that AA T is precisely the cocitation matrix defined in Section 6.4.1
(Eq. (6.8)) and the authority centrality is thus, roughly speaking, the eigen-
vector centrality for the cocitation network.8 Similarly A T A is the bibliographic
8
This statement is only approximately correct since, as discussed in Section 6.4.1, the cocitation
matrix is not precisely equal to the adjacency matrix of the cocitation network, having non-zero
elements along its diagonal where the adjacency matrix has none.
180
7.6 | C LOSENESS CENTRALITY
coupling matrix, Eq. (6.11), and hub centrality is the eigenvector centrality for
the bibliographic coupling network.
A nice feature of the hub and authority centralities is that they circum-
vent the problems that ordinary eigenvector centrality has with directed net-
works, that vertices outside of strongly connected components or their out-
components always have centrality zero. In the hubs and authorities approach
vertices not cited by any others have authority centrality zero (which is reason-
able), but they can still have non-zero hub centrality. And the vertices that they
cite can then have non-zero authority centrality by virtue of being cited. This
is perhaps a more elegant solution to the problems of eigenvector centrality
in directed networks than the more ad hoc method of introducing an additive
constant term as we did in Eq. (7.8). We can still introduce such a constant
term into the HITS algorithm if we wish, or employ any of the other variations
considered in previous sections, such as normalizing vertex centralities by the
degrees of the vertices that point to them. Some variations along these lines
are explored in Refs. [52, 256], but we leave the pursuit of such details to the
enthusiastic reader.
The HITS algorithm is an elegant construction that should in theory pro-
vide more information about vertex centrality than the simpler measures of
previous sections, but in practice it has not yet found much application. It is
used as the basis for the web search engines Teoma and Ask.com, and will per-
haps in future find further use, particularly in citation networks, where it holds
clear advantages over other eigenvector measures.
1
i =
n ∑ dij . (7.27)
j
9
Recall that geodesic paths need not be unique—vertices can be joined by several shortest
paths of the same length. The length dij however is always well defined, being the length of any
one of these paths.
181
M EASURES AND METRICS
This quantity takes low values for vertices that are separated from others by
only a short geodesic distance on average. Such vertices might have better ac-
cess to information at other vertices or more direct influence on other vertices.
In a social network, for instance, a person with lower mean distance to others
might find that their opinions reach others in the community more quickly
than the opinions of someone with higher mean distance.
In calculating the average distance some authors exclude from the sum
in (7.27) the term for j = i, so that
1
i =
n−1 ∑ dij , (7.28)
j(=i )
1 n
Ci = = . (7.29)
i ∑ j dij
182
7.6 | C LOSENESS CENTRALITY
network the values of Ci might span a factor of five or less. What this means
in practice is that it is difficult to distinguish between central and less central
vertices using this measure: the values tend to be cramped together with the
differences between adjacent values showing up only when you examine the
trailing digits. This means that even small fluctuations in the structure of the
network can change the order of the values substantially.
For example, it has become popular in recent years to rank film actors ac-
cording to their closeness centrality in the network of who has appeared in
films with who else [323]. Using data from the Internet Movie Database,10 we
find that in the largest component of the network, which includes more than
98% of all actors, the smallest closeness centrality of any actor is 2.4138 for
the actor Christopher Lee,11 while the largest is 8.6681 for an Iranian actress
named Leia Zanganeh. The ratio of the two is just 3.6 and about half a million
other actors lie in between. As we can immediately see, the values must be
very closely spaced. The second best centrality score belongs to actor Donald
Pleasence, who scores 2.4164, just a tenth of a percent less than winner Lee.
Because of the close spacing of values, the leaders under this dubious measure
of superiority change frequently as the small details of the film network shift
when new films are made or old ones added to the database. In an analysis
using an earlier version of the database, Watts and Strogatz [323] proclaimed
Rod Steiger to be the actor with the lowest closeness centrality. Steiger falls in
sixth place in our analysis and it is entirely possible that the rankings will have
changed again by the time you read this. Other centrality measures, including
degree centrality and eigenvector centrality, typically don’t suffer from this
problem because they have a wider dynamic range and the centrality values,
particular those of the leaders, tend to be widely separated.
The closeness centrality has another problem too. If, as discussed in Sec-
tion 6.10.1, we define the geodesic distance between two vertices to be infinite
if the vertices fall in different components of the network, then i is infinite
for all i in any network with more than one component and Ci is zero. There
are two strategies for getting around this. The most common one is simply
to average over only those vertices in the same component as i. Then n in
Eq. (7.29) becomes the number of vertices in the component and the sum is
over only that component. This gives us a finite measure, but one that has its
own problems. In particular, distances tend to be smaller between vertices in
small components, so that vertices in such components get lower values of i
10
www.imdb.com
11
Perhaps most famous for his role as the evil wizard Saruman in the film version of The Lord
of the Rings.
183
M EASURES AND METRICS
(Notice that we are obliged in this case to exclude from the sum the term for
j = i, since dii = 0 which would make this term infinite. This means that the
sum has only n − 1 terms in it, hence the leading factor of 1/(n − 1).)
This definition has a couple of nice properties. First, if dij = ∞ because i and
j are in different components, then the corresponding term in the sum is sim-
ply zero and drops out. Second, the measure naturally gives more weight to
vertices that are close to i than to those far away. Intuitively we might imagine
that the distance to close vertices is what matters in most practical situations—
once a vertex is far away in a network it matters less exactly how far away it
is, and Eq. (7.30) reflects this, having contributions close to zero from all such
vertices.
Despite its desirable qualities, however, Eq. (7.30) is rarely used in practice.
We have seen it employed only occasionally.
An interesting property of entire networks, which is related to the closeness
centrality, is the mean geodesic distance between vertices. In Section 8.2 we
will use measurements of mean distance in networks to study the so-called
“small-world effect.”
For a network with only one component, the mean distance between pairs
of vertices, conventionally denoted just (now without the subscript), is
1 1
=
n2 ∑ dij = n ∑ i . (7.31)
ij i
∑m ∑ij∈Cm dij
= , (7.32)
∑m n2m
184
7.7 | B ETWEENNESS CENTRALITY
1 1 1 1
n ( n − 1) ∑
= = ∑ Ci , (7.33)
d
i = j ij
n i
or equivalently
n
= , (7.34)
∑i Ci
where Ci is the harmonic mean closeness of Eq. (7.30). (Note that, as in (7.30),
we exclude from the first sum in (7.33) the terms for i = j, which would be
infinite since dii = 0.)
Equation (7.34) automatically removes any contributions from vertex pairs
for which dij = ∞. Despite its elegance, however, Eq. (7.34), like Eq. (7.30), is
hardly ever used.
185
M EASURES AND METRICS
xi = ∑ nist . (7.35)
st
Note that this definition counts separately the geodesic paths in either direc-
tion between each vertex pair. Since these paths are the same on an undirected
network this effectively counts each path twice. One could compensate for this
by dividing xi by 2, and often this is done, but we prefer the definition given
here for a couple of reasons. First, it makes little difference in practice whether
one divides the centrality by 2, since one is usually concerned only with the rel-
ative magnitudes of the centralities and not with their absolute values. Second,
as discussed below, Eq. (7.35) has the advantage that it can be applied unmod-
ified to directed networks, in which the paths in either direction between a
vertex pair can differ.
Note also that Eq. (7.35) includes paths from each vertex to itself. Some
186
7.7 | B ETWEENNESS CENTRALITY
people prefer to exclude such paths from the definition, so that xi = ∑s=t nist ,
but again the difference is typically not important. Each vertex lies on one
path from itself to itself, so the inclusion of these terms simply increases the
betweenness by 1, but does not change the rankings of the vertices—which
ones have higher or lower betweenness—relative to one another.
There is also a choice to be made about whether the path from s to t should
be considered to pass through the vertices s and t themselves. In the social net-
works literature it is usually assumed that it does not. We prefer the definition
where it does: it seems reasonable to define a vertex to be on a path between
itself and someone else, since normally a vertex has control over information
flowing from itself to other vertices or vice versa. If, however, we exclude the
endpoints of the path as sociologists commonly do, the only effect is to reduce
the number of paths through each vertex by twice the size of the component
to which the vertex belongs. Thus the betweennesses of all vertices within a
single component are just reduced by an additive constant and the ranking of
vertices within the component is again unchanged. (The rankings of vertices
in different components can change relative to one another, but this is rarely an
issue because betweenness centrality is not typically used to compare vertices
in different components, since such vertices are not competing for influence in
the same arena.)
These developments are all for the case in which there is at most one geo-
desic path between each vertex pair. More generally, however, there may be
more than one. The standard extension of betweenness to this case gives each
path a weight equal to the inverse of the number of paths. For instance, if
there are two geodesic paths between a given pair of vertices, each of them
gets weight 12 . Then the betweenness of a vertex is defined to be the sum of the B
weights of all geodesic paths passing through that vertex. C
Note that the geodesic paths between a pair of vertices need not be vertex-
independent, meaning they may pass through some of the same vertices (see A
figure). If two or more paths pass through the same vertex then the between-
ness sum includes contributions from each of them. Thus if there are, say, three Vertices A and B are con-
geodesic paths between a given pair of vertices and two of them pass through nected by two geodesic
a particular vertex, then they contribute 23 to that vertex’s betweenness. paths. Vertex C lies on both
Formally, we can express the betweenness for a general network by redefin- paths.
ing nist to be the number of geodesic paths from s to t that pass through i. And
we define gst to be the total number of geodesic paths from s to t. Then the
betweenness centrality of vertex i is
ni
xi = ∑ gstst , (7.36)
st
187
M EASURES AND METRICS
where we adopt the convention that nist /gst = 0 if both nist and gst are zero. This
definition is equivalent to our message-passing thought experiment above, in
which messages pass between all pairs of vertices in a network at the same
average rate, traveling along shortest paths, and in the case of several shortest
paths between a given pair of vertices they choose at random between those
several paths. Then xi is proportional to the average rate at which traffic passes
though vertex i.
Betweenness centrality can be applied to directed networks as well. In a
directed network the shortest path between two vertices depends, in general,
on the direction you travel in. The shortest path from A to B is different from
the shortest path from B to A. Indeed there may be a path in one direction and
no path at all in the other. Thus it is important in a directed network explicitly
to include the path counts in either direction between each vertex pair. The
definition in Eq. (7.36) already does this and so, as mentioned above, we can
use the same definition without modification for the directed case. This is one
reason why we prefer this definition to other slight variants that are sometimes
used.
Although the generalization of betweenness to directed networks is straight-
forward, however, it is rarely if ever used, so we won’t discuss it further here,
concentrating instead on the much more common undirected case.
Betweenness centrality differs from the other centrality
measures we have considered in being not principally a mea-
sure of how well-connected a vertex is. Instead it measures
A
how much a vertex falls “between” others. Indeed a vertex
Group 1 Group 2
can have quite low degree, be connected to others that have
low degree, even be a long way from others on average, and
still have high betweenness. Consider the situation depicted
in Fig. 7.2. Vertex A lies on a bridge between two groups
Figure 7.2: A low-degree vertex with high be-
within a network. Since any shortest path (or indeed any
tweenness. In this sketch of a network, ver-
path whatsoever) between a vertex in one group and a ver-
tex A lies on a bridge joining two groups of
tex in the other must pass along this bridge, A acquires very
other vertices. All paths between the groups
must pass through A, so it has a high between- high betweenness, even though it is itself on the periphery of
ness even though its degree is low. both groups and in other respects may be not well connected:
probably A would not have particularly impressive values for
eigenvector or closeness centrality, and its degree centrality is
only 2, but nonetheless it might have a lot of influence in the network as a re-
sult of its control over the flow of information between others. Vertices in roles
188
7.7 | B ETWEENNESS CENTRALITY
12
Much of sociological literature concerns power or “social capital.” It may seem ruthless to
think of individuals exploiting their control over other people’s information to gain the upper
hand on them, but it may also be realistic. At least in situations where there is a significant pay-off
to having such an upper hand (like business relationships, for example), it is reasonable to suppose
that notions of power derived from network structure really do play into people’s manipulations
of the world around them.
13
It is perhaps no coincidence that the highest betweenness belongs to an actor who appeared
in both European and American films, played roles in several different languages, and worked
extensively in both film and television, as well as on stage. Rey was the archetypal “broker,” with
a career that made him a central figure in several different arms of the entertainment business that
otherwise overlap relatively little.
189
M EASURES AND METRICS
while the lowest score of any actor14 in the large component is just 8.91 × 105 .
Thus there is a ratio of almost a thousand between the two limits—a much
larger dynamic range than the ratio of 3.6 we saw in the case of closeness cen-
trality. One consequence of this is that there are very clear winners and losers
in the betweenness centrality competition. The second highest betweenness
in the actor network is that of Christopher Lee (again), with 6.46 × 108 , a 14%
percent difference from winner Fernando Rey. Although betweenness values
may shift a little as new movies are made and new actors added to the net-
work, the changes are typically small compared with these large gaps between
the leaders, so that the ordering at the top of the list changes relatively infre-
quently, giving betweenness centrality results a robustness not shared by those
for closeness centrality.
The values of betweenness calculated here are raw path counts, but it is
sometimes convenient to normalize betweenness in some way. Several of the
standard computer programs for network analysis, such as Pajek and UCINET,
perform such normalizations. One natural choice is to normalize the path
count by dividing by the total number of (ordered) vertex pairs, which is n2 , so
that betweenness becomes the fraction (rather than the number) of paths that
run through a given vertex:15
1 ni
xi =
n2 ∑ gstst . (7.38)
st
With this definition, the values of the betweenness lie strictly between zero and
one.
Some other variations on the betweenness centrality idea are worth men-
tioning. Betweenness gets at an important idea in network analysis, that of
the flow of information or other traffic and of the influence vertices might
have over that flow. However, betweenness as defined by Freeman is based
on counting only the shortest paths between vertex pairs, effectively assuming
that all or at least most traffic passes along those shortest paths. In reality traf-
14
This score is shared by many actors. It is the minimum possible score of 2n − 1 as described
above.
15
Another possibility, proposed by Freeman [128] in his original paper on betweenness, is to
divide by the maximum possible value that betweenness can take on any network of size n, which,
as mentioned above, occurs for the central vertex in a star graph. The resulting expression for
between is then
1 nist
n −n+1 ∑
xi = 2 .
st gst
We, however, prefer Eq. (7.38), which we find easier to interpret, although the difference between
the two becomes small anyway in the limit of large n.
190
7.7 | B ETWEENNESS CENTRALITY
fic flows along paths other than the shortest in many networks. Most of us, for
instance, will have had the experience of hearing news about one of our friends
not from that friend directly but from another mutual acquaintance—the mes-
sage has passed along a path of length two via the mutual acquaintance, rather
than along the direct (geodesic) path of length one.
A version of betweenness centrality that makes some allowance for effects
like this is the flow betweenness, which was proposed by Freeman et al. [130]
and is based on the idea of maximum flow. Imagine each edge in a network See Section 6.12 for a dis-
as a pipe that can carry a unit flow of some fluid. We can ask what the maxi- cussion of maximum flow
mum possible flow then is between a given source vertex s and target vertex t in networks.
through these pipes. In general the answer is that more than a single unit of
flow can be carried between source and target by making simultaneous use of
several different paths through the network. The flow betweenness of a ver-
tex i is defined according to Eq. (7.35), but with nist being now the amount of
flow through vertex i when the maximum flow is transmitted from s to t.
As we saw in Section 6.12, the maximum flow between vertices s and t
is also equal to the number of edge-independent paths between them. Thus
another way equivalent to look at the flow betweenness would be to consider
nist to be the number of independent paths between s and t that run through
vertex i.
A slight problem arises because the independent paths between
a given pair of vertices are not necessarily unique. For instance, A
the network shown in Fig. 7.3 has two edge-independent paths be-
tween s and t but we have two choices about what those paths are,
either the paths denoted by the solid arrows, or those denoted by s t
the dashed ones. Furthermore, our result for the flow betweenness
will depend on which choice we make; the vertices labeled A and B
B
fall on one set of paths but not the other. To get around this problem,
Freeman et al. define the flow through a vertex for their purposes to
be the maximum possible flow over all possible choices of paths, or
Figure 7.3: Edge-independent paths in
equivalently the maximum number of independent paths. Thus in a small network. The vertices s and t
the network of Fig. 7.3, the contribution of the flow between s and t in this network have two independent
to the betweenness of vertex A would be 1, since this is the maxi- paths between them, but there are two
mum value it takes over all possible choices of flow paths. distinct ways of choosing those paths,
In terms of our information analogy, one can think of flow be- represented by the solid and dashed
tweenness as measuring the betweenness of vertices in a network in curves.
which a maximal amount of information is continuously pumped
between all sources and targets. Flow betweenness takes account of more than
just the geodesic paths between vertices, since flow can go along non-geodesic
paths as well as geodesic ones. (For example, the paths through vertices A
191
M EASURES AND METRICS
and B in the example above are not geodesic.) Indeed, in some cases none of the
paths that appear in the solution of the maximum flow problem are geodesic
paths, so geodesic paths may not be counted at all by this measure.
But this point highlights a problem with flow betweenness: although it
typically counts more paths than the standard shortest-path betweenness, flow
betweenness still only counts a subset of possible paths, and some important
ones (such as geodesic paths) may be missed out altogether. One way to look at
the issue is that both shortest-path betweenness and flow betweenness assume
flows that are optimal in some sense—passing only along shortest paths in the
first case and maximizing total flow in the second. Just as there is no reason to
suppose that information or other traffic always takes the shortest path, there
is no reason in general to suppose it should act to maximize flow (although of
course there may be special cases in which it does).
A betweenness variant that does count all paths is the random-walk between-
ness [243]. In this variant traffic between vertices s and t is thought of as per-
See Section 6.14 for a dis- forming an (absorbing) random walk that starts at vertex s and continues un-
cussion of random walks. til it reaches vertex t. The betweenness is defined according to xi = ∑st nist
but with nist now being the number of times that the random walk from s to t
passes through i on its journey, averaged over many repetitions of the walk.
Note that in this case nist = nits in general, even on an undirected network.
For instance, consider this portion of a network:
s t
A random walk from s to t may pass through vertex A before returning to s and
stepping thence to t, but a walk from t to s will never pass through A because
its first step away from t will always take it to s and then the walk will finish.
Since every possible path from s to t occurs in a random walk with some
probability (albeit a very small one) the random-walk betweenness includes
contributions from all paths.16 Note, however, that different paths appear in
general with different probabilities, so paths do not contribute equally to the
16
All paths, that is, that terminate at the target vertex t the first time they reach it. Since we use
an absorbing random walk, paths that visit the target, move away again, and then return are not
included in the random-walk betweenness.
192
7.8 | G ROUPS OF VERTICES
193
M EASURES AND METRICS
be added to the subset while preserving the property that every vertex is con-
nected to every other. Thus a set of four vertices in a network would be a clique
if (and only if) each of the four is directly connected by edges to the other three
and if there is no other vertex anywhere in the network that could be added to
make a group of five vertices all connected to each other. Note that cliques can
overlap, meaning that they can share one or more of the same vertices.
The occurrence of a clique in an otherwise sparse network is normally an
indication of a highly cohesive subgroup. In a social network, for instance, one
might encounter a set of individuals each of whom was acquainted with each
of the others, and such a clique would probably indicate that the individuals
A clique of four vertices in question are closely connected—a set of coworkers in an office for example
within a network. or a group of classmates in a school.
However, it’s also the case that many circles of friends form only near-
cliques, rather than perfect cliques. There may be some members of the group
who are unacquainted, even if most members know one another. The require-
ment that every possible edge be present within a clique is a very stringent
A one, and it seems natural to consider how we might relax this requirement.
One construct that does this is the k-plex. A k-plex of size n is a maximal subset
of n vertices within a network such that each vertex is connected to at least
n − k of the others. If k = 1, we recover the definition of an ordinary clique—a
B
1-plex is the same as a clique. If k = 2, then each vertex must be connected
to all or all-but-one of the others. And so forth.17 Like cliques, k-plexes can
Two overlapping cliques. overlap one another; a single vertex can belong to more than one k-plex.
Vertices A and B in this net- The k-plex is a useful concept for discovering groups within networks: in
work both belong to two real life many groups in social and other networks form k-plexes. There is
cliques of four vertices. no solid rule about what value k should take. Experimentation starting from
small values is the usual way to proceed. Smaller values of k tend to be mean-
ingful for smaller groups, whereas in large groups the smaller values impose
too stringent a constraint but larger values often give useful results. This sug-
gests another possible generalization of the clique idea: one could specify that
each member be connected to a certain fraction of the others, say 75% or 50%.
(As far as we know, this variant doesn’t have a name and it is not in wide use,
but perhaps it should be.)
Many other variations on the clique idea have been proposed in the litera-
ture. For instance Flake et al. [122] proposed a definition of a group as a subset
17
This definition is slightly awkward to remember, since the members of a k-plex are allowed
to be unconnected to k − 1 other members and not k. It would perhaps have been more sensible to
define k such that a 0-plex was equivalent to a normal clique, but for better or worse we are stuck
with the definition we have.
194
7.8 | G ROUPS OF VERTICES
of vertices such that each has at least as many connections to vertices inside the
group as to vertices outside. Radicchi et al. [276] proposed a weaker definition
of a group as a subset of vertices such that the total number of connections of
all vertices in the group to others in the group is greater than the total number
of connections to vertices outside.18
Another concept closely related to the k-plex is the k-core. A k-core is a
maximal subset of vertices such that each is connected to at least k others in
the subset.19 It should be obvious (or you can easily prove it for yourself) that
a k-core of n vertices is also an (n − k )-plex. However, the set of all k-cores
for a given value of k is not the same as the set of all k-plexes for any value
of k, since n, the size of the group, can vary from one k-core to another. Also,
unlike k-plexes (and cliques), k-cores cannot overlap, since by their definition
two k-cores that shared one or more vertices would just form a single larger
k-core.
The k-core is of particular interest in network analysis for the practical rea-
son that it is very easy to find the set of all k-cores in a network. A simple
algorithm is to start with your whole network and remove from it any vertices
that have degree less than k, since clearly such vertices cannot under any cir-
cumstances be members of a k-core. In so doing, one will normally also reduce
the degrees of some other vertices in the network—those that were connected
to the vertices just removed. So we then go through the network again to see
if there are any more vertices that now have degree less than k and if there are
we remove those too. And so we proceed, repeatedly pruning the network to
remove vertices with degree less than k until no such vertices remain.20 What
is left over will, by definition, be a k-core or a set of k-cores, since each vertex is
connected to at least k others. Note that we are not necessarily left with a single
k-core—there’s no guarantee that the network will be connected once we are
done pruning it, even if it was connected to start with.
Two other generalizations of cliques merit a brief mention. A k-clique is a
maximal subset of vertices such that each is no more than a distance k away
from any of the others via the edges of the network. For k = 1 this just recovers
18
Note that for the purposes of this latter definition, an edge between two vertices A and B
within the group counts as two connections, one from A to B and one from B to A.
19
We have to be careful about the meaning of the word “maximal” here. It is possible to have a
group of vertices such that each is connected to at least k others and no single vertex can be added
while retaining this property, but it may be possible to add more than one vertex. Such groups,
however, are not considered to be k-cores. A group is only a k-core if it is not a subset of any larger
group that is a k-core.
20
A closely related process, bootstrap percolation, has also been studied in statistical physics,
principally on regular lattices.
195
M EASURES AND METRICS
196
7.8 | G ROUPS OF VERTICES
1−component
2−component
3−component
Figure 7.4: The k-components in a small network. The shaded regions denote the k-
components in this small network, which has a single 1-component, two 2-components,
one 3-component, and no k-components for any higher value of k. Note that the
k-components are nested within one another, the 2-components falling inside the 1-
component and the 3-component falling inside one of the 2-components.
197
M EASURES AND METRICS
7.9 T RANSITIVITY
A property very important in social networks, and useful to a lesser degree in
other networks too, is transitivity. In mathematics a relation “◦” is said to be
transitive if a ◦ b and b ◦ c together imply a ◦ c. An example would be equality.
If a = b and b = c, then it follows that a = c also, so “=” is a transitive relation.
Other examples are “greater than,” “less than,” and “implies.”
In a network there are various relations between pairs of vertices, the sim-
plest of which is “connected by an edge.” If the “connected by an edge” re-
lation were transitive it would mean that if vertex u is connected to vertex v,
and v is connected to w, then u is also connected to w. In common parlance,
“the friend of my friend is also my friend.” Although this is only one possi-
ble kind of network transitivity—other network relations could be transitive
too—it is the only one that is commonly considered, and networks showing
this property are themselves said to be transitive. This definition of network
transitivity could apply to either directed or undirected networks, but let us
take the undirected case first, since it’s simpler.
Perfect transitivity only occurs in networks where each component is a
fully connected subgraph or clique, i.e., a subgraph in which all vertices are
connected to all others.21 Perfect transitivity is therefore pretty much a useless
concept in networks. However, partial transitivity can be very useful. In many
networks, particularly social networks, the fact that u knows v and v knows w
21
To see this suppose we have a component that is perfectly transitive but not a clique, i.e., there
is at least one pair of vertices u, w in the component that are not directly connected by an edge.
Since u and w are in the same component they must therefore be connected by some path of length
greater than one, u, v1 , v2 , v3 , . . . , w. Consider the first two links in this path. Since u is connected
by an edge to v1 and v1 to v2 it follows that u must be connected to v2 if the network is perfectly
transitive. Then consider the next two links. Since u is connected to v2 and v2 to v3 it follows that
u must be connected to v3 . Repeating the argument all the way along the path, we can then see
that u must be connected by an edge to w. But this violates the hypothesis that u and w are not
directly connected. Hence no perfectly transitive components exist that are not cliques.
198
7.9 | T RANSITIVITY
doesn’t guarantee that u knows w, but makes it much more likely. The friend
of my friend is not necessarily my friend, but is far more likely to be my friend
than some randomly chosen member of the population.
We can quantify the level of transitivity in a network as follows. If u knows u w
v and v knows w, then we have a path uvw of two edges in the network. If u
also knows w, we say that the path is closed—it forms a loop of length three,
or a triangle, in the network. In the social network jargon, u, v, and w are said v
to form a closed triad. We define the clustering coefficient22 to be the fraction of The path uvw (solid edges)
paths of length two in the network that are closed. That is, we count all paths is said to be closed if the
of length two, and we count how many of them are closed, and we divide the third edge directly from u
second number by the first to get a clustering coefficient C that lies in the range to w is present (dashed
from zero to one: edge).
(number of triangles) × 6
C= . (7.40)
(number of paths of length two)
Why the factor of six? It arises because each triangle in the network gets
counted six times over when we count up the number of closed paths of length
two. Suppose we have a triangle uvw. Then there are six paths of length two
22
It’s not entirely clear why the clustering coefficient has the name it has. The name doesn’t
appear to be connected with the earlier use of the word clustering in social network analysis to
describe groups or clusters of vertices (see Section 11.11.2). The reader should be careful to avoid
confusing these two uses of the word.
23
In fact, we could count each path just in one direction, provided we did it for both the nu-
merator and denominator of Eq. (7.39). Doing so would decrease both counts by a factor of two,
but the factors would cancel and the end result would be the same. In most cases, and particularly
when writing computer programs, it is easier to count paths in both directions—it avoids having
to remember which paths you have counted before.
199
M EASURES AND METRICS
in it: uvw, vwu, wuv, wvu, vuw, and uwv. Each of these six is closed, so the
number of closed paths is six times the number of triangles.
Yet another way to write the clustering coefficient would be to note that
if we have a path of length two, uvw, then it is also true to say that vertices
u and w have a common neighbor in v—they share a mutual acquaintance
in social network terms. If the triad uvw is closed then u and w are them-
selves acquainted, so the clustering coefficient can be thought of also as the
fraction of pairs of people with a common friend who are themselves friends
or equivalently as the mean probability that two people with a common friend
are themselves friends. This is perhaps the most common way of defining the
clustering coefficient. In mathematical notation:
(number of triangles) × 3
C= . (7.41)
(number of connected triples)
Here a “connected triple” means three vertices uvw with edges (u, v) and (v, w).
(The edge (u, w) can be present or not.) The factor of three in the numerator
arises because each triangle gets counted three times when we count the con-
nected triples in the network. The triangle uvw for instance contains the triples
A triangle contains six dis-
uvw, vwu, and wuv. In the older social networks literature the clustering coef-
tinct paths of length two, ficient is sometimes referred to as the “fraction of transitive triples,” which is a
all of them closed. reference to this definition of the coefficient.
Social networks tend to have quite high values of the clustering coefficient.
For example, the network of film actor collaborations discussed earlier has
been found to have C = 0.20 [241]; a network of collaborations between bi-
ologists has been found to have C = 0.09 [236]; a network of who sends email
to whom in a large university has C = 0.16 [103]. These are typical values for
social networks. Some denser networks have even higher values, as high as 0.5
or 0.6. (Technological and biological networks by contrast tend to have some-
what lower values. The Internet at the autonomous system level, for instance,
has a clustering coefficient of only about 0.01. This point is discussed in more
detail in Section 8.6.)
In what sense are these clustering coefficients for social networks high?
Well, let us assume, to make things simple, that everyone in a network has
about the same number c of friends. Consider one of my friends in this net-
work and suppose they pick their friends completely at random from the whole
population. Then the chance that one of their c friends happens to be a partic-
ular one of my other friends would be c/n, where n is the size of the network.
Thus in this network the probability of two of my friends being acquainted,
which is by definition the clustering coefficient, would be just c/n. Of course
it is not the case that everyone in a network has the same number of friends,
200
7.9 | T RANSITIVITY
and we will see how to perform better calculations of the clustering coefficient
later (Section 13.4), but this crude calculation will serve our purposes for the
moment.
For the networks cited above, the value of c/n is 0.0003 (film actors), 0.00001
(biology collaborations), and 0.00002 (email messages). Thus the measured
clustering coefficients are much larger than this estimate based on the assump-
tion of random network connections. Even though the estimate ignores, as
we have said, any variation in the number of friends people have, the dispar-
ity between the calculated and observed values of the clustering coefficient
is so large that it seems unlikely it could be eliminated just by allowing the
number of friends to vary. A much more likely explanation is that our other
assumption, that people pick their friends at random, is seriously flawed. The
numbers suggest that there is a much greater chance that two people will be
acquainted if they have another common acquaintance than if they don’t.
Although this argument is admittedly crude, we will see in Section 8.6 how
to make it more accurate and so show that our basic conclusion is indeed cor-
rect.
Some social networks, such as the email network above, are directed net-
works. In calculating clustering coefficients for direct networks, scientists have
typically just ignored their directed nature and applied Eq. (7.41) as if the edges u w
were undirected. It is however possible to generalize transitivity to take ac-
count of directed links. If we have a directed relation between vertices such
as “u likes v” then we can say that a triple of vertices is closed or transitive if
u likes v, v likes w, and also u likes w. (Note that there are many distinct ways v
for such a triple to be transitive, depending on the directions of the edges. The A transitive triple of ver-
example given here is only one of six different possibilities.) One can calculate tices in a directed network.
a clustering coefficient or fraction of transitive triples in the obvious fashion for
the directed case, counting all directed paths of length two that are closed and
dividing by the total number of directed paths of length two. For some reason,
however, such measurements have not often appeared in the literature.
24
The notation Ci is used for both the local clustering coefficient and the closeness centrality
and we should be careful not to confuse the two.
201
M EASURES AND METRICS
That is, to calculate Ci we go through all distinct pairs of vertices that are neigh-
bors of i in the network, count the number of such pairs that are connected to
each other, and divide by the total number of pairs, which is 12 k i (k i − 1) where
k i is the degree of i,. Ci is sometimes called the local clustering coefficient and it
represents the average probability that a pair of i’s friends are friends of one
another.
Local clustering is interesting for several reasons. First, in many networks
Structural holes it is found empirically to have a rough dependence on degree, vertices with
higher degree having a lower local clustering coefficient on average. This point
is discussed in detail in Section 8.6.1.
Second, local clustering can be used as a probe for the existence of so-called
“structural holes” in a network. While it is common in many networks, es-
pecially social networks, for the neighbors of a vertex to be connected among
themselves, it happens sometimes that these expected connections between
neighbors are missing. The missing links are called structural holes and were
first studied in this context by Burt [60]. If we are interested in efficient spread
When the neighbors of a of information or other traffic around a network, as we were in Section 7.7,
node are not connected to then structural holes are a bad thing—they reduce the number of alternative
one another we say the net- routes information can take through the network. On the other hand structural
work contains “structural holes can be a good thing for the central vertex i whose friends lack connec-
holes.” tions, because they give i power over information flow between those friends.
If two friends of i are not connected directly and their information about one
another comes instead via their mutual connection with i then i can control the
flow of that information. The local clustering coefficient measures how influ-
ential i is in this sense, taking lower values the more structural holes there are
in the network around i. Thus local clustering can be regarded as a type of
centrality measure, albeit one that takes small values for powerful individuals
rather than large ones.
In this sense, local clustering can also be thought of as akin to the between-
ness centrality of Section 7.7. Where betweenness measures a vertex’s control
over information flowing between all pairs of vertices in its component, lo-
cal clustering is like a local version of betweenness that measures control over
flows between just the immediate neighbors of a vertex. One measure is not
necessarily better than another. There may be cases in which we want to take
all vertices into account and others where we want to consider only immedi-
ate neighbors—the choice will depend on the particular questions we want to
answer. It is worth pointing out however that betweenness is much more com-
putationally intensive to calculate than local clustering (see Section 10.3.6), and
that in practice betweenness and local clustering are strongly correlated [60].
There may in many cases be little to be gained by performing the more costly
202
7.9 | T RANSITIVITY
quantities:
1
k i Ri Ri
Ci = 1 2 = . (7.43)
2 k i ( k i − 1)
k i −1
25
As an example, in Section 11.11.1 we study methods for partitioning networks into clusters
or communities and we will see that effective computer algorithms for this task can be created
based on betweenness measures, but that almost equally effective and much faster algorithms can
be created based on local clustering.
26
Actually, the local clustering coefficient hadn’t yet been invented. It was first proposed to this
author’s knowledge by Watts [321] a few years later.
203
M EASURES AND METRICS
7.10 R ECIPROCITY
The clustering coefficient of Section 7.9 measures the frequency with which
loops of length three—triangles—appear in a network. Of course, there is
no reason why one should concentrate only on loops of length three, and
people have occasionally looked at the frequency of loops of length four or
more [44,61,133,140,238]. Triangles occupy a special place however because in
an undirected simple graph the triangle is the shortest loop we can have (and
usually the most commonly occurring). However, in a directed network this is
not the case. In a directed network, we can have loops of length two—a pair of
vertices between which there are directed edges running in both directions—
and it is interesting to ask about the frequency of occurrence of these loops
also.
A loop of length two in a The frequency of loops of length two is measured by the reciprocity, and tells
directed network. you how likely it is that a vertex that you point to also points back at you. For
instance, on the World Wide Web if my web page links to your web page, how
likely is it, on average, that yours link back again to mine? In general, it’s found
27
As discussed in Section 8.6.1, vertices with low degree tend to have high values of Ci in most
networks and this means that CWS is usually larger than the value given by Eq. (7.41), sometimes
much larger.
204
7.10 | R ECIPROCITY
that you are much more likely to link to me if I link to you than if I don’t. (That
probably isn’t an Earth-shattering surprise, but it’s good to know when the
data bear out one’s intuitions.) Similarly in friendship networks, such as the
networks of schoolchildren described in Section 3.2 where respondents were
asked to name their friends, it is much more likely that you will name me if I
name you than if I do not.
If there is a directed edge from vertex i to vertex j in a directed network and
there is also an edge from j to i then we say the edge from i to j is reciprocated.
(Obviously the edge from j to i is also reciprocated.) Pairs of edges like this are
also sometimes called co-links, particularly in the context of the World Wide
Web [104].
The reciprocity r is defined as the fraction of edges that are reciprocated.
Noting that the product of adjacency matrix elements Aij A ji is 1 if and only if
there is an edge from i to j and an edge from j to i and is zero otherwise, we
can sum over all vertex pairs i, j to get an expression for the reciprocity:
1 1
r=
m ∑ Aij A ji = m
Tr A2 , (7.45)
ij
where m is, as usual, the total number of (directed) edges in the network.
Consider for example this small network of four vertices:
There are seven directed edges in this network and four of them are recipro-
cated, so the reciprocity is r = 47
0.57. In fact, this is about the same value as
seen on the World Wide Web. There is about a 57% percent chance that if web
page A links to web page B then B also links back to A.28 As another example,
in a study of a network of who has whom in their email address book it was
found that the reciprocity was about r = 0.23 [248].
28
This figure is an unusually high one among directed networks, but there are reasons for it.
One is that many of the links between web pages are between pages on the same website, and it is
common for such pages to link to each other. If you exclude links between pages on the same site
the value of the reciprocity is lower.
205
M EASURES AND METRICS
Enemies
Friends
One could also consider varying degrees of friendship or animosity—networks
with more strongly positive or negative edges in them—but for the moment
let’s stick to the simple case where each edge is in just one of two states, pos-
itive or negative, like or dislike. Such networks are called signed networks and
their edges are called signed edges.
It is important to be clear here that a negative edge is not the same as the
absence of an edge. A negative edge indicates, for example, two people who
interact regularly but dislike each other. The absence of an edge represents two
people who do not interact. Whether they would like one another if they did
interact is not recorded.
Now consider the possible configurations of three edges in a triangle in a
signed network, as depicted in Fig. 7.6. If “+” and “−” represent like and
dislike, then we can imagine some of these configurations creating social prob-
lems if they were to arise between three people in the real world. Configura-
tion (a) is fine: everyone likes everyone else. Configuration (b) is probably also
fine, although the situation is more subtle than (a). Individuals u and v like one
another and both dislike w, but the configuration can still be regarded as sta-
ble in the sense that u and v can agree over their dislike of w and get along just
fine, while w hates both of them. No one is conflicted about their allegiances.
Put another way, w is u’s enemy and v is w’s enemy, but there is no problem
with u and v being friends if one considers that the “enemy of my enemy is my
friend.”
Configuration (c) however could be problematic. Individual u likes indi-
vidual v and v likes w, but u thinks w is an idiot. This is going to place a strain
on the friendship between u and v because u thinks v’s friend is an idiot. Alter-
natively, from the point of view of v, v has two friends, u and w and they don’t
get along, which puts v in an awkward position. In many real-life situations
of this kind the tension would be resolved by one of the acquaintances being
206
7.11 | S IGNED EDGES AND STRUCTURAL BALANCE
w w w w
u v u v u v u v
Figure 7.6: Possible triad configurations in a signed network. Configurations (a) and
(b) are balanced and hence relatively stable, but configurations (c) and (d) are unbal-
anced and liable to break apart.
broken, i.e., the edge would be removed altogether. Perhaps v would simply
stop talking to one of his friends, for instance.
Configuration (d) is somewhat ambiguous. On the one hand, it consists
of three people who all dislike each other, so no one is in doubt about where
things stand: everyone just hates everyone else. On the other hand, the “en-
emy of my enemy” rule does not apply here. Individuals u and v might like to
form an alliance in recognition of their joint dislike of w, but find it difficult to
do so because they also dislike each other. In some circumstances this might
cause tension. (Think of the uneasy alliance of the US and Russia against Ger-
many during World War II, for instance.) But what one can say definitely is
that configuration (d) is often unstable. There may be little reason for the three
to stay together when none of them likes the others. Quite probably three ene-
mies such as these would simply sever their connections and go their separate
ways.
The feature that distinguishes the two stable configurations in Fig. 7.6 from
the unstable ones is that they have an even number of minus signs around the
loop.29 One can enumerate similar configurations for longer loops, of length
four or greater, and again find that loops with even numbers of minus signs
appear stable and those with odd numbers unstable.
This alone would be an observation of only slight interest, where it not Two stable configurations
for the intriguing fact that this type of stability really does appear have an in loops of length four.
effect on the structure of networks. In surveys it is found that the unstable
configurations in Fig. 7.6, the ones with odd numbers of minus signs, occur
29
This is similar in spirit to the concept of “frustration” that arises in the physics of magnetic
spin systems.
207
M EASURES AND METRICS
far less often in real social networks than the stable configurations with even
numbers of minus signs.
Networks containing only loops with even numbers of minus signs are
said to show structural balance, or sometimes just balance. An important conse-
quence of balance in networks was proved by Harary [154]:
A balanced network can be divided into connected groups of vertices
such that all connections between members of the same group are
positive and all connections between members of different groups are
negative.
Note that the groups in question can consist of a single vertex or many vertices,
and there may be only one group or there may be very many. Figure 7.7 shows
a balanced network and its division into groups. Networks that can be divided
into groups like this are said to be clusterable. Harary’s theorem tells us that all
balanced networks are clusterable.
Harary’s theorem is straightforward to prove, and the proof is
“constructive,” meaning that it shows not only when a network is
clusterable but also tells us what the groups are.30 We consider ini-
tially only networks that are connected—they have just one compo-
nent. In a moment we will relax this condition. We will color in the
vertices of the network each in one of two colors, denoted by the
open and filled circles in Fig. 7.7, for instance. We start with any
vertex we please and color it with whichever color we please. Then
Figure 7.7: A balanced, clusterable
we color in the others according to the following algorithm:
network. Every loop in this network
contains an even number of minus 1. A vertex v connected by a positive edge to another u that has
signs. The dotted lines indicate the di- already been colored gets colored the same as u.
vision of the network into clusters such 2. A vertex v connected by a negative edge to another u that has
that all acquaintances within clusters already been colored gets colored the opposite color from u.
have positive connections and all ac- For most networks it will happen in the course of this coloring pro-
quaintances in different clusters have cess that we sometimes come upon a vertex whose color has already
negative connections. been assigned. When this happens there is the possibility of a con-
flict arising between the previously assigned color and the one that
we would like to assign to it now according to the rules above. However, as
we now show, this conflict only arises if the network as a whole is unbalanced.
If in coloring in a network we come upon a vertex that has already been
colored in, it immediately implies that there must be another path by which
that vertex can be reached from our starting point and hence that there is at
least one, and possibly more than one, loop in the network to which this ver-
30
The proof we give is not Harary’s proof, which was quite different and not constructive.
208
7.11 | S IGNED EDGES AND STRUCTURAL BALANCE
Even number of
Odd number of
minus signs
minus signs
u u
v v
(a) (b)
Figure 7.8: Proof that a balanced network is clusterable. If we fail to color a network in
two colors as described in the text, then there must exist a loop in the network that has
one or other of the two configurations shown here, both of which have an odd number
of minus signs around them (counting the one between the vertices u and v), and hence
the network is not balanced.
tex belongs—the loop consisting of the two paths between the starting point
and the vertex. Since the network is balanced, every loop to which our ver-
tex belongs must have an even number of negative edges around it. Now let
us suppose that the color already assigned to the vertex is in conflict with the
one we would like to assign it now. There are two ways in which this could
happen, as illustrated in Fig. 7.8. In case (a), we color in a vertex u and then
move onto its neighbor v, only to find that v has already been colored the op-
posite color to u, even though the edge between them is positive. This presents
a problem. But if u and v are opposite colors, then around any loop contain-
ing them both there must be an odd number of minus signs, so that the color
changes an odd number of times and ends up the opposite of what it started
out as. And if there is an odd number of minus signs around the loop, then the
network is not balanced.
In case (b) vertices u and v have the same color but the edge between them
is negative. Again we have a problem. But if u and v are the same color then
there must be an even number of negative edges around the rest of the loop
connecting them which, along with the negative edge between u and v, gives
us again an odd total number of negative edges around the entire loop, and
hence the network is again not balanced.
Either way, if we ever encounter a conflict about what color a vertex should
have then the network must be unbalanced. If the network is balanced, there-
fore, we will never encounter such a conflict and we will be able to color the
entire network with just two colors while obeying the rules.
Once we have colored the network in this way, we can immediately deduce
the identity of the groups that satisfy Harary’s theorem: we simply divide
209
M EASURES AND METRICS
the network into contiguous clusters of vertices that have the same color—see
Fig. 7.7 again. In every such cluster, since all vertices have the same color,
they must be joined by positive edges. Conversely, all edges that connected
different clusters must be negative, since the clusters have different colors. (If
they did not have different colors they would be considered the same cluster.)
Thus Harary’s theorem is proved and at the same time we have deduced a
method for constructing the clusters.31 It only remains to extend the proof to
networks that have more than one component, but this is trivial, since we can
simply repeat the proof above for each component separately.
The practical importance of Harary’s result rests on the fact that, as men-
tioned earlier, many real social networks are found naturally to be in a bal-
anced or mostly balanced state. In such cases it would be possible, therefore,
for the network to form into groups such that everyone likes others within
their group with whom they have contact and dislikes those in other groups.
It is widely assumed in social network theory that this does indeed often hap-
pen. Structural balance and clusterability in networks are thus a model for
cliquishness or insularity, with people tending to stick together in like-minded
groups and disdaining everyone outside their immediate community.
It is worth asking whether the inverse of Harary’s clusterability theorem
is also true. Is it also the case that a network that is clusterable is necessarily
balanced? The answer is no, as this simple counter-example shows:
31
As an interesting historical note, we observe that while Harary’s proof of his theorem is per-
fectly correct, his interpretation of it was, in this author’s opinion, erroneous. In his 1953 pa-
per [154], he describes the meaning of the theorem in the following words: “A psychological in-
terpretation of Theorem 1 is that a ‘balanced group’ consists of two highly cohesive cliques which
dislike each other.” (Harary is using the word “clique” in a non-technical sense here to mean a
closed group of people, rather than in the graph theoretical sense of Section 7.8.1.) However, just
because it is possible to color the network in two colors as described above does not mean the net-
work forms two groups. Since the vertices of a single color are not necessarily contiguous, there
are in general many groups of each color, and it seems unreasonable to describe these groups as
forming a single “highly cohesive clique” when in fact they have no contact at all. Moreover, it is
neither possible nor correct to conclude that the members of two groups of opposite colors dislike
each other unless there is at least one edge connecting the two. If two groups of opposite colors
never actually have any contact then it might be that they would get along just fine if they met.
It’s straightforward to prove that such an occurrence would lead to an unbalanced network, but
Harary’s statement says that the present balanced network implies dislike, and this is untrue. Only
if the network were to remain balanced upon addition of one or more edges between groups of
unlike colors would his conclusion be accurate.
210
7.12 | S IMILARITY
In this network all three vertices dislike each other, so there is an odd number
of minus signs around the loop, but there is no problem dividing the network
into three clusters of one vertex each such that everyone dislikes the members
of the other clusters. This network is clusterable but not balanced.
7.12 S IMILARITY
Another central concept in social network analysis is that of similarity between
vertices. In what ways can vertices in a network be similar, and how can we
quantify that similarity? Which vertices in a given network are most similar
to one another? Which vertex v is most similar to a given vertex u? Answers
to questions like these can help us tease apart the types and relationships of
vertices in social networks, information networks, and others. For instance,
one could imagine that it might be useful to have a list of web pages that are
similar—in some appropriate sense—to another page that we specify. In fact,
several web search engines already provide a feature like this: “Click here for
pages similar to this one.”
Similarity can be determined in many different ways and most of them
have nothing to do with networks. For example, commercial dating and match-
making services try to match people with others to whom they are similar by
using descriptions of people’s interests, background, likes, and dislikes. In ef-
fect, these services are computing similarity measures between people based
on personal characteristics. Our focus in this book, however, is on networks,
so we will concentrate on the more limited problem of determining similar-
ity between the vertices of a network using the information contained in the
network structure.
There are two fundamental approaches to constructing measures of net-
work similarity, called structural equivalence and regular equivalence. The names
are rather opaque, but the ideas they represent are simple enough. Two ver-
tices in a network are structurally equivalent if they share many of the same
network neighbors. In Fig. 7.9a we show a sketch depicting structural equiv-
alence between two vertices i and j—the two share, in this case, three of the
same neighbors, although both also have other neighbors that are not shared.
Regular equivalence is more subtle. Two regularly equivalent vertices do
not necessarily share the same neighbors, but they have neighbors who are
211
M EASURES AND METRICS
i j i j
Figure 7.9: Structural equivalence and regular equivalence. (a) Vertices i and j are
structurally equivalent if they share many of the same neighbors. (b) Vertices i and j
are regularly equivalent if their neighbors are themselves equivalent (indicated here by
the different shades of vertices).
which is the ijth element of A2 . This quantity is closely related to the “co-
citation” measure introduced in Section 6.4.1. Cocitation is defined for directed
212
7.12 | S IMILARITY
Salton [290] proposed that we regard the ith and jth rows (or columns) of the
adjacency matrix as two vectors and use the cosine of the angle between them
as our similarity measure. Noting that the dot product of two rows is simply
∑k Aik Akj for an undirected network, this gives us a similarity
∑k Aik Akj
σij = cos θ = . (7.48)
∑k A2ik ∑k A2jk
213
M EASURES AND METRICS
vertices i and j depicted in Fig. 7.9a, for instance, the cosine similarity would
be
3
σij = √ = 0.671 . . . (7.50)
4×5
Notice that the cosine similarity is technically undefined if one or both of the
vertices has degree zero, but by convention we normally say in that case that
σij = 0.
The cosine similarity provides a natural scale for our similarity measure.
Its value always lies in the range from 0 to 1. A cosine similarity of 1 indicates
that two vertices have exactly the same neighbors. A cosine similarity of zero
indicates that they have none of the same neighbors. Notice that the cosine
similarity can never be negative, being a sum of positive terms, even though
cosines in general can of course be negative.
214
7.12 | S IMILARITY
= ∑ Aik A jk − n Ai
A j
k
= ∑ Aik A jk − Ai
A j
k
= ∑( Aik − Ai
)( A jk − A j
), (7.51)
k
where Ai
denotes the mean n−1 ∑k Aik of the elements of the ith row of the
adjacency matrix. Equation (7.51) will be zero if the number of common neigh-
bors of i and j is exactly what we would expect on the basis of random chance.
If it is positive, then i and j have more neighbors than we would expect by
chance, which we take as an indication of similarity between the two. Equa-
tion (7.51) can also be negative, indicating that i and j have fewer neighbors
than we would expect, a possible sign of dissimilarity.
Equation (7.51) is simply n times the covariance cov( Ai , A j ) of the two rows
of the adjacency matrix. It is common to normalize the covariance, as we did
with the cosine similarity, so that its maximum value is 1. The maximum value
of the covariance of any two sets of quantities occurs when the sets are exactly
the same, in which case their covariance is just equal to the variance of either
set, which we could write as σi2 or σj2 , or in symmetric form as σi σj . Normaliz-
ing by this quantity then gives us the standard Pearson correlation coefficient:
cov( Ai , A j ) ∑k ( Aik − Ai
)( A jk − A j
)
rij = = . (7.52)
σi σj ∑k ( Aik − Ai
)2 ∑k ( A jk − A j
)2
215
M EASURES AND METRICS
where we have made use of the fact that A2ij = Aij because Aij is always zero or
one, and nij is again the number of neighbors that i and j have in common. To
within additive and multiplicative constants, this normalized Euclidean dis-
tance can thus be regarded as just another alternative normalization of the
number of common neighbors.
32
This is actually a bad name for it—it should be called Hamming distance, since it is essentially
the same as the Hamming distance of computer science and has nothing to do with Euclid.
216
7.12 | S IMILARITY
217
M EASURES AND METRICS
or in matrix notation
σ = αAσA + I. (7.58)
However, while expressions like this have been proposed as similarity mea-
sures, they still suffer from some problems. Suppose we evaluate Eq. (7.58) by
repeated iteration, taking a starting value, for example, of σ (0) = 0 and using
it to compute σ (1) = αAσA + I, and then repeating the process many times
until σ converges. On the first few iterations we will get the following results:
σ (1) = I, (7.59a)
σ (2) = αA2 + I, (7.59b)
(3)
σ = α A + αA + I.
2 4 2
(7.59c)
The pattern is clear: in the limit of many iterations, we will get a sum over
even powers of the adjacency matrix. However, as discussed in Section 6.10,
k the elements of the rth power of the adjacency matrix count paths of length r
between vertices, and hence this measure of similarity is a weighted sum over
i j the numbers of paths of even length between pairs of vertices.
But why should we consider only paths of even length? Why not consider
paths of all lengths? These questions lead us to a better definition of regular
In the modified definition equivalence as follows: vertices i and j are similar if i has a neighbor k that is
of regular equivalence ver- itself similar to j.33 Again we assume that vertices are similar to themselves,
tex i is considered similar which we can represent with a diagonal δij term in the similarity, and our sim-
to vertex j (dashed line) if ilarity measure then looks like
it has a neighbor k that is it-
self similar to j. σij = α ∑ Aik σkj + δij , (7.60)
k
or
σ = αAσ + I, (7.61)
in matrix notation. Evaluating this expression by iterating again starting from
σ (0) = 0, we get
σ (1) = I, (7.62a)
(2)
σ = αA + I, (7.62b)
(3)
σ = α A + αA + I.
2 2
(7.62c)
33
This definition is not obviously symmetric with respect to i and j but, as we see, does in fact
give rise to an expression for the similarity that is symmetric.
218
7.12 | S IMILARITY
which we could also have deduced directly by rearranging Eq. (7.61). Now
our similarity measure includes counts of paths at all lengths, not just even
paths. In fact, we can see now that this similarity measure could be defined
a completely different way, as a weighted count of all the paths between the
vertices i and j with paths of length r getting weight αr . So long as α < 1,
longer paths will get less weight than shorter ones, which seems sensible: in
effect we are saying that vertices are similar if they are connected either by a
few short paths or by very many long ones.
Equation (7.63) is reminiscent of the formula for the Katz centrality, Eq.
(7.10). We could call Eq. (7.63) the “Katz similarity” perhaps, although Katz
himself never discussed it. The Katz centrality of a vertex would then be sim-
ply the sum of the Katz similarities of that vertex to all others. Vertices that
are similar to many others would get high centrality, a concept that certainly
makes intuitive sense. As with the Katz centrality, the value of the parameter
α is undetermined—we are free to choose it as we see fit—but it must satisfy
α < 1/κ1 if the sum in Eq. (7.63) is to converge, where κ1 is the largest eigen-
value of the adjacency matrix.
In a sense, this regular equivalence measure can be seen as a generalization
of our structural equivalence measures in earlier sections. With those measures
we were counting the common neighbors of a pair of vertices, but the number
of common neighbors is also of course the number of paths of length two be-
tween the vertices. Our “Katz similarity” measure merely extends this concept
to counting paths of all lengths.
Some variations of this similarity measure are possible. As defined it tends
to give high similarity to vertices that have high degree, because if a vertex
has many neighbors it tends to increase the number of those neighbors that
are similar to any other given vertex and hence increases the total similarity
to that vertex. In some cases this might be desirable: maybe the person with
many friends should be considered more similar to others than the person with
few. However, in other cases it gives an unwanted bias in favor of high-degree
nodes. Who is to say that two hermits are not “similar” in an interesting sense?
If we wish, we can remove the bias in favor of high degree by dividing by
vertex degree thus:
α
σij = ∑ Aik σkj + δij , (7.64)
ki k
219
M EASURES AND METRICS
Another useful variant is to consider cases where the last term in Eqs. (7.60)
or (7.64) is not simply diagonal, but includes off-diagonal terms too. Such
a generalization would allow us to specify explicitly that particular pairs of
vertices are similar, based on some other (probably non-network) information
that we have at our disposal. Going back to the example of CEOs at compa-
nies that we gave at the beginning of Section 7.12, we might, for example, want
to state explicitly that the CFOs and CIOs and so forth at different companies
are similar, and then our similarity measure would, we hope, correctly deduce
from the network structure that the CEOs are similar also. This kind of ap-
proach is particularly useful in the case of networks that consist of more than
one component, so that some pairs of vertices are not connected at all. If, for
instance, we have two separate components representing people in two differ-
ent companies, then there will be no paths of any length between individuals
in different companies, and hence a measure like (7.60) or (7.64) will never as-
sign a non-zero similarity to such individuals. If however, we explicitly insert
some similarities between members of the different companies, our measure
will then be able to generalize and extend those inputs to deduce similarities
between other members.
This idea of generalizing from a few given similarities arises in other con-
texts too. For example, in the fields of machine learning and information re-
trieval there is a considerable literature on how to generalize known similar-
ities between a subset of the objects in a collection of, say, text documents to
the rest of the collection, based on network data or other information.
34
It is interesting to note that when we expand this measure in powers of the adjacency matrix,
as we did in Eq. (7.63), the second-order (i.e., path-length two) term is the same as the structural
equivalence measure of Eq. (7.53), which perhaps lends further credence to both expressions as
natural measures of similarity.
35
The study used a “name generator”—students were asked to list the names of others they
considered to be their friends. This results in a directed network, but we have neglected the edge
220
7.13 | H OMOPHILY AND ASSORTATIVE MIXING
Black
White
Other
Figure 7.10: Friendship network at a US high school. The vertices in this network represent 470 students at a US
high school (ages 14 to 18 years). The vertices are color coded by race as indicated in the key. Data from the National
Longitudinal Study of Adolescent Health [34, 314].
the network into two groups. It turns out that this division is principally along
lines of race. The different shades of the vertices in the picture correspond to
students of different race as denoted in the legend, and reveal that the school is
sharply divided between a group composed principally of black children and
a group composed principally of white.
This is not news to sociologists, who have long observed and discussed
such divisions [225]. Nor is the effect specific to race. People are found to
form friendships, acquaintances, business relations, and many other types of
tie based on all sorts of characteristics, including age, nationality, language, in-
come, educational level, and many others. Almost any social parameter you
directions in the figure. In our representation there is an undirected edge between vertices i and j
if either of the pair considers the other to be their friend (or both).
221
M EASURES AND METRICS
can imagine plays into people’s selection of their friends. People have, it ap-
pears, a strong tendency to associate with others whom they perceive as being
similar to themselves in some way. This tendency is called homophily or assor-
tative mixing.
More rarely, one also encounters disassortative mixing, the tendency for peo-
ple to associate with others who are unlike them. Probably the most widespread
and familiar example of disassortative mixing is mixing by gender in sexual
contact networks. The majority of sexual partnerships are between individu-
als of opposite sex, so they represent connections between people who differ
in their gender. Of course, same-sex partnerships do also occur, but they are a
much smaller fraction of the ties in the network.
Assortative (or disassortative) mixing is also seen in some nonsocial net-
works. Papers in a citation network, for instance, tend to cite other papers in
the same field more than they do papers in different fields. Web pages written
in a particular language tend to link to others in the same language.
In this section we look at how assortative mixing can be quantified. As-
sortative mixing by discrete characteristics such as race, gender, or nationality
is fundamentally different from mixing by a scalar characteristic like age or
income, so we treat the two cases separately.
36
Ignoring, for the purposes of argument, dogs, cats, imaginary friends, and so forth.
222
7.13 | H OMOPHILY AND ASSORTATIVE MIXING
that run between vertices of the same type, and then we subtract from that fig-
ure the fraction of such edges we would expect to find if edges were positioned
at random without regard for vertex type. For the trivial case in which all ver-
tices are of a single type, for instance, 100% of edges run between vertices of
the same type, but this is also the expected figure, since there is nowhere else
for the edges to fall. The difference of the two numbers is then zero, telling us
that there is no non-trivial assortativity in this case. Only when the fraction of
edges between vertices of the same type is significantly greater than we would
expect on the basis of chance will our measure give a positive score.
In mathematical terms, let us denote by ci the class or type of vertex i, which
is an integer 1 . . . nc , with nc being the total number of classes. Then the total
number of edges that run between vertices of the same type is
∑ δ ( ci , c j ) = 1
2 ∑ Aij δ(ci , c j ), (7.66)
edges (i,j) ij
where δ(m, n) is the Kronecker delta and the factor of 12 accounts for the fact
that every vertex pair i, j is counted twice in the second sum.
Calculating the expected number of edges between vertices if edges are
placed at random takes a little more work. Consider a particular edge attached
to vertex i, which has degree k i . There are by definition 2m ends of edges in
the entire network, where m is as usual the total number of edges, and the
chances that the other end of our particular edge is one of the k j ends attached
to vertex j is thus k j /2m if connections are made purely at random.37 Counting
all k i edges attached to i, the total expected number of edges between vertices i
and j is then k i k j /2m, and the expected number of edges between all pairs of
vertices of the same type is
ki k j
1
2 ∑ 2m δ(ci , c j ), (7.67)
ij
37
Technically, we are making connections at random while preserving the vertex degrees. We
could in principle ignore vertex degrees and make connections truly at random, but in practice
this is found to give much poorer results.
223
M EASURES AND METRICS
(7.68)
Conventionally, one calculates not the number of such edges but the fraction,
which is given by this same expression divided by the number m of edges:
1 ki k j
2m ∑
Q= A ij − δ ( c i , c j ). (7.69)
ij
2m
This quantity Q is called the modularity [239,250] and is a measure of the extent
to which like is connected to like in a network. It is strictly less than 1, takes
positive values if there are more edges between vertices of the same type than
we would expect by chance, and negative ones if there are less.
For Fig. 7.10, for instance, where the types are the three ethnic classifica-
tions “black,” “white,” and “other,” we find a modularity value of Q = 0.305,
indicating (positive) assortative mixing by race in this particular network.38
Negative values of the modularity indicate disassortative mixing. We might
see a negative modularity, for example, in a network of sexual partnerships
where most partnerships were between individuals of opposite sex.
The quantity
ki k j
Bij = Aij − (7.70)
2m
in Eq. (7.69) appears in a number of situations in the study of networks. We will
encounter it, for instance, in Section 11.8 when we study community detection
in networks. In some contexts it is useful to consider Bij to be an element of a
matrix B, which itself is called the modularity matrix.
The modularity, Eq. (7.69), is always less than 1 but in general it does not
achieve the value Q = 1 even for a perfectly mixed network, one in which
every vertex is connected only to others of the same type. Depending on the
sizes of the groups and the degrees of vertices, the maximum value of Q can
be considerably less than 1. This is in some ways unsatisfactory: how is one to
38
An alternative measure of assortativity has been proposed by Gupta et al. [152]. That measure
however gives equal weight to each group of vertices, rather than to each edge as the modularity
does. With this measure if one had a million vertices of each of two types, which mixed with
one another entirely randomly, and ten more vertices of a third type that connected only among
themselves, one would end up with a score of about 0.5 [239], which appears to imply strong
assortativity when in fact almost all of the network mixes randomly. For most purposes therefore,
the measure of Eq. (7.69) gives results more in line with our intuitions.
224
7.13 | H OMOPHILY AND ASSORTATIVE MIXING
know when one has strong assortative mixing and when one doesn’t? To rec-
tify the problem, we can normalize Q by dividing by its value for the perfectly
mixed network. With perfect mixing all edges fall between vertices of the same
type and hence δ(ci , c j ) = 1 whenever Aij = 1. This means that the first term
in the sum in Eq. (7.69) sums to 2m and the modularity for the perfectly mixed
network is
1 ki k j
Qmax = 2m − ∑ δ ( ci , c j ) . (7.71)
2m ij
2m
1
ers =
2m ∑ Aij δ(ci , r) δ(c j , s), (7.73)
ij
which is the fraction of edges that join vertices of type r to vertices of type s,
and
1
2m ∑
ar = k i δ ( c i , r ), (7.74)
i
225
M EASURES AND METRICS
This form can be useful, for instance, when we have network data in the form
of a list of edges and the types of the vertices at their ends, but no explicit data
on vertex degrees. In such a case ers and ar are relatively easy to calculate,
while Eq. (7.69) is quite awkward.
39
Of course, one could make up some measure of national differences, based say on geographic
distance, but if the question we are asked is, “Are these two people of the same nationality?” then
under normal circumstances the only answers are “yes” and “no.” There is nothing in between.
40
In the US school system there are 12 grades of one year each and to begin grade g students
normally must be at least of age g + 5. Thus the 9th grade corresponds to children of age 14 and
15.
226
7.13 | H OMOPHILY AND ASSORTATIVE MIXING
12
11
Age (grade)
10
9 10 11 12
Age (grade)
Figure 7.11: Ages of pairs of friends in high school. In this scatter plot each dot cor-
responds to one of the edges in Fig. 7.10, and its position along the horizontal and
vertical axes gives the ages of the two individuals at either end of that edge. The ages
are measured in terms of the grades of the students, which run from 9 to 12. In fact,
grades in the US school system don’t correspond precisely to age since students can
start or end their high-school careers early or late, and can repeat grades. (Each student
is positioned at random within the interval representing their grade, so as to spread the
points out on the plot. Note also that each friendship appears twice, above and below
the diagonal.)
friendships between students in the same grade. There is also, in this case, a
notable tendency for students to have more friends of a wider range of ages
as their age increases so there is a lower density of points in the top right box
than in the lower left one.
One could make a crude measure of assortative mixing by scalar charac-
teristics by adapting the ideas of the previous section. One could group the
vertices into bins according to the characteristic of interest (say age) and then
treat the bins as separate “types” of vertex in the sense of Section 7.13.1. For in-
stance, we might group people by age in ranges of one year or ten years. This
however misses much of the point about scalar characteristics, since it con-
siders vertices falling in the same bin to be of identical types when they may
227
M EASURES AND METRICS
∑ij Aij xi ∑ k i xi 1
μ=
∑ij Aij
= i
∑i k i
=
2m ∑ k i xi . (7.77)
i
Note that this is not simply the mean value of xi averaged over all vertices. It
is an average over edges, and since a vertex with degree k i lies at the ends of k i
edges it appears k i times in the average (hence the factor of k i in the sum).
Then the covariance of xi and x j over edges is
where we have made use of Eqs. (6.21) and (7.77). Note the strong similar-
ity between this expression and Eq. (7.69) for the modularity—only the delta
function δ(ci , c j ) in (7.69) has changed, being replaced by xi x j .
The covariance will be positive if, on balance, values xi , x j at either end of
an edge tend to be both large or both small and negative if they tend to vary in
opposite directions. In other words, the covariance will be positive when we
have assortative mixing and negative for disassortative mixing.
Just as with the modularity measure of Section 7.13.1, it is sometimes con-
venient to normalize the covariance so that it takes the value 1 in a perfectly
mixed network—one in which all edges fall between vertices with precisely
equal values of xi (although in most cases such an occurrence would be ex-
tremely unlikely in practice). Putting x j = xi in Eq. (7.78) gives a perfect mix-
228
7.13 | H OMOPHILY AND ASSORTATIVE MIXING
ing value of
1 ki k j 2 1 ki k j
2m ∑ Aij −
2m
xi =
2m ∑ k i δij −
2m
xi x j , (7.79)
ij ij
41
There could be non-linear correlations in such a network and we could still have r = 0; the
correlation coefficient detects only linear correlations. For instance, we could have vertices with
high and low values of xi connected predominantly to vertices with intermediate values. This is
neither assortative nor disassortative by the conventional definition and would give a small value
of r, but might nonetheless be of interest. Such non-linear correlations could be discovered by
examining a plot such as Fig. 7.11 or by using alternative measures of correlation such as informa-
tion theoretic measures. Thus it is perhaps wise not to rely solely on the value of r in investigating
assortative mixing.
229
M EASURES AND METRICS
230
P ROBLEMS
(a) (b)
Figure 7.12: Assortative and disassortative networks. These two small networks are not real networks—they were
computer generated to display the phenomenon of assortativity by degree. (a) A network that is assortative by degree,
displaying the characteristic dense core of high-degree vertices surrounded by a periphery of lower-degree ones. (b) A
disassortative network, displaying the star-like structures characteristic of this case. Figure from Newman and Gir-
van [249]. Copyright 2003 Springer-Verlag Berlin Heidelberg. Reproduced with kind permission of Springer Science
and Business Media.
lations for other forms of assortative mixing). Once we know the adjacency
matrix (and hence the degrees) of all vertices we can calculate r. Perhaps for
this reason mixing by degree is one of the most frequently studied types of
assortative mixing.
P ROBLEMS
7.1 Consider a k-regular undirected network (i.e., a network in which every vertex has
degree k).
231
M EASURES AND METRICS
7.2 Suppose a directed network takes the form of a tree with all edges pointing inward
towards a central vertex:
What is the PageRank centrality of the central vertex in terms of the single parameter α
appearing in the definition of PageRank and the geodesic distances di from each vertex i
to the central vertex?
7.3 Consider an undirected tree of n vertices. A particular edge in the tree joins ver-
tices 1 and 2 and divides the tree into two disjoint regions of n1 and n2 vertices as
sketched here:
1 2
n1
n2
232
P ROBLEMS
Show that the closeness centralities C1 and C2 of the two vertices, defined according to
Eq. (7.29), are related by
1 n1 1 n2
+ = + .
C1 n C2 n
7.4 Consider an undirected (connected) tree of n vertices. Suppose that a particular
vertex in the tree has degree k, so that its removal would divide the tree into k disjoint
regions, and suppose that the sizes of those regions are n1 . . . nk .
a) Show that the unnormalized betweenness centrality x of the vertex, as defined in
Eq. (7.36), is
k
x = n2 − ∑ n2m .
m =1
b) Hence, or otherwise, calculate the betweenness of the ith vertex from the end of a
“line graph” of n vertices, i.e., n vertices in a row like this:
A B
7.6 Among all pairs of vertices in a directed network that are connected by an edge or
edges, suppose that half are connected in only one direction and the rest are connected
in both directions. What is the reciprocity of the network?
7.7 In this network + and − indicate pairs of people who like each other or don’t,
respectively:
233
M EASURES AND METRICS
7.8 In a survey of couples in the US city of San Francisco, Catania et al. [65] recorded,
among other things, the ethnicity of their interviewees and calculated the fraction of
couples whose members were from each possible pairing of ethnic groups. The frac-
tions were as follows:
Women
Black Hispanic White Other Total
Black 0.258 0.016 0.035 0.013 0.323
Men Hispanic 0.012 0.157 0.058 0.019 0.247
White 0.013 0.023 0.306 0.035 0.377
Other 0.005 0.007 0.024 0.016 0.053
Total 0.289 0.204 0.423 0.084
Assuming the couples interviewed to be a representative sample of the edges in the
undirected network of relationships for the community studied, and treating the ver-
tices as being of four types—black, Hispanic, white, and other—calculate the numbers
err and ar that appear in Eq. (7.76) for each type. Hence calculate the modularity of the
network with respect to ethnicity.
234