Social Networks
Social Networks
net/publication/222845920
CITATIONS READS
1,039 6,002
2 authors:
All content following this page was uploaded by Pietro Panzarasa on 25 February 2018.
Social Networks
journal homepage: www.elsevier.com/locate/socnet
a r t i c l e i n f o a b s t r a c t
Keywords: In recent years, researchers have investigated a growing number of weighted networks where ties are
Clustering differentiated according to their strength or capacity. Yet, most network measures do not take weights
Transitivity into consideration, and thus do not fully capture the richness of the information contained in the data.
Weighted networks
In this paper, we focus on a measure originally defined for unweighted networks: the global clustering
coefficient. We propose a generalization of this coefficient that retains the information encoded in the
weights of ties. We then undertake a comparative assessment by applying the standard and generalized
coefficients to a number of network datasets.
© 2009 Elsevier B.V. All rights reserved.
1. Introduction 2001; Snijders et al., 2006; Watts and Strogatz, 1998). In partic-
ular, the problem of network clustering can be investigated from
While a substantial body of recent research has investigated the a two-fold perspective. On the one hand, it involves determining
topological features of a variety of networks (Barabási et al., 2002; whether and to what extent clustering is a property of a network
Ingram and Roberts, 2000; Kossinets and Watts, 2006; Uzzi and or, alternatively, whether nodes tend to be members of tightly knit
Spiro, 2005; Watts and Strogatz, 1998), relatively little work has groups (Luce and Perry, 1949). On the other, it is concerned with
been conducted that moves beyond merely topological measures the identification of the groups of nodes into which a network can
to take explicitly into account the heterogeneity of ties (or edges) be partitioned. This can be obtained, for example, by applying algo-
connecting nodes (or vertices) (Barrat et al., 2004). In a number rithms for community detection that assess and compare densities
of real-world networks, ties are often associated with weights that within and between groups (Newman, 2006; Newman and Girvan,
differentiate them in terms of their strength, intensity or capacity 2004; Rosvall and Bergstrom, 2008), or by using the image matrix
(Barrat et al., 2004; Wasserman and Faust, 1994). On the one hand, in blockmodeling for grouping nodes with the same or similar pat-
Granovetter (1973) argued that the strength of social relationships terns of ties and uncovering connections between groups of nodes
in social networks is a function of their duration, emotional inten- (Doreian et al., 2005).
sity, intimacy, and exchange of services. On the other, for non-social In this paper, we focus our attention only on the problem of
networks, weights often refer to the function performed by ties, determining whether clustering is a property of a network. More
e.g., the carbon flow (mg/m2 /day) between species in food webs specifically, to address this problem one may ask: If there are three
(Luczkowich et al., 2003; Nordlund, 2007), the number of synapses nodes in a network, i, j, and k, and i is connected to j and k, how
and gap junctions in neural networks (Watts and Strogatz, 1998), or likely is it that j and k are also connected with each other? In real-
the amount of traffic flowing along connections in transportation world networks, empirical studies have shown that this likelihood
networks (Barrat et al., 2004). In order to fully capture the richness tends to be greater than the probability of a tie randomly estab-
of the data, it is therefore crucial that the measures used to study a lished between two nodes (Barabási et al., 2002; Davis et al., 2003;
network incorporate the weights of the ties. Ebel et al., 2002; Holme et al., 2004; Ingram and Roberts, 2000;
A measure that has long received much attention in both the- Newman, 2001; Uzzi and Spiro, 2005; Watts and Strogatz, 1998). For
oretical and empirical research is concerned with the degree to social networks, scholars have investigated the mechanisms that
which nodes tend to cluster together. Evidence suggests that in most are responsible for the increase in the probability that two people
real-world networks, and especially social networks, nodes tend to will be connected if they share a common acquaintance (Holland
cluster into densely connected groups (Feld, 1981; Friedkin, 1984; and Leinhardt, 1971; Simmel, 1923; Snijders, 2001; Snijders et al.,
Holland and Leinhardt, 1970; Louch, 2000; Simmel, 1923; Snijders, 2006). The nature of these mechanisms can be cognitive, as in the
case of individuals’ desire to maintain balance among ties with oth-
ers (Hallinan, 1974; Heider, 1946), social, as in the case of third-part
∗ Corresponding author. Tel.: +44 20 7882 6984; fax: +44 20 7882 3615. referral (Davis, 1970), or can be explained in other ways, such as in
E-mail addresses: [email protected] (T. Opsahl), [email protected] terms of focus constraints (Feld, 1981; Kossinets and Watts, 2006;
(P. Panzarasa). Louch, 2000; Monge et al., 1985) or the differing popularity among
0378-8733/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.socnet.2009.02.002
156 T. Opsahl, P. Panzarasa / Social Networks 31 (2009) 155–163
individuals (Feld and Elmore, 1982a, b). While clustering is likely to 1994). The outcome of this procedure is a binary network consist-
result from a combination of all these mechanisms, network stud- ing of ties that are either present (i.e., equal to 1) or absent (i.e.,
ies have offered no conclusive theoretical explanation of its causes, equal to 0) (Scott, 2000; Wasserman and Faust, 1994). For example,
nor have they concentrated as much on its underpinning processes Doreian (1969) studied clustering in a weighted network by creat-
as on the measures to formally detect its presence in real-world ing a series of binary networks from the original weighted network
networks (Levine and Kurzban, 2006). using different cut-offs. To address potential problems arising from
Traditionally, the two main measures developed for testing the the subjectivity inherent in the choice of the cut-off, a sensitivity
tendency of nodes to cluster together into tightly knit groups are the analysis was conducted to assess the degree to which the value
local clustering coefficient (Watts and Strogatz, 1998) and the global of clustering varies depending on the cut-off (Doreian, 1969). How-
clustering coefficient (Feld, 1981; Karlberg, 1997, 1999; Louch, ever, this analysis tells us little about the original weighted network,
2000; Newman, 2003). The local clustering coefficient is based on apart from the fact that the value of clustering changes at different
ego’s network density or local density (Scott, 2000; Wasserman and levels of the cut-off.
Faust, 1994). For node i, this is measured as the fraction of the num- In this paper, we focus on the global clustering coefficient, and
ber of ties connecting i’s neighbors over the total number of possible propose a generalization that explicitly takes weights of ties into
ties between i’s neighbors. To create an overall local coefficient for consideration and, for this reason, does not depend on a cut-off to
the whole network, the individual fractions are averaged across all dichotomize weighted networks. In what follows, we start by dis-
nodes. cussing the existing literature on the global clustering coefficient in
Despite its ability to capture the degree of social embeddedness undirected and unweighted networks. In Section 3, we propose our
that characterizes the nodes of a network, nonetheless the local generalized measure of clustering. We then turn our attention to
clustering coefficient suffers from a number of limitations. First, directed networks, and discuss the current literature on clustering
in its original formulation, it does not take into consideration the in those networks. We extend our generalized measure of clustering
weights of the ties in the network. As a result, the same value of to cover weighted and directed networks. In Section 5, we empiri-
the coefficient might be attributed to networks that share the same cally test our proposed measure, and compare it with the standard
topology but differ in terms of how weights are distributed across one, by using a number of weighted network datasets. Finally, in
ties and, as a result, may be characterized by different likelihoods Section 6 we summarize and discuss the main results.
to befriend the friends of one’s friends. Second, the local clustering
coefficient does not take into consideration the directionality of 2. Clustering coefficient
the ties connecting a node to its neighbors (Wasserman and Faust,
1994).1 Recently, there have been a number of attempts to extend The global clustering coefficient is concerned with the density of
the local clustering coefficient to the case of weighted networks triplets of nodes in a network. A triplet can be defined as three nodes
(Barrat et al., 2004; Lopez-Fernandez et al., 2004; Onnela et al., that are connected by either two (open triplet) or three (closed
2005; Zhang and Horvath, 2005). However, the issue of directional- triplet) ties. A triangle consists of three closed triplets, each cen-
ity still remains mainly unresolved (Caldarelli, 2007), thus making tered on one node. The global clustering coefficient is defined as
the coefficient suitable primarily for undirected networks. the number of closed triplets (or 3× triangles) over the total num-
Moreover, the local clustering coefficient, even in its weighted ber of triplets (both open and closed). The first attempt to measure
version, is biased by correlations with nodes’ degrees: a node with the coefficient was made by Luce and Perry (1949). For an undi-
more neighbors is likely to be embedded in relatively fewer closed rected network, they showed that the total number of triplets could
triplets, and therefore to have a smaller local clustering than a node be found by summing the non-diagonal cells of a squared binary
connected to fewer neighbors (Ravasz and Barabási, 2003; Ravasz et matrix. The number of closed triplets could be found by summing
al., 2002). An additional bias might stem from degree–degree cor- the diagonal of a cubed matrix. For clarity, we will refer to the global
relations. When nodes preferentially connect to others with similar clustering coefficient as the standard clustering coefficient C:
degree, local clustering is positively correlated with nodes’ degree
(Ravasz and Barabási, 2003; Ravasz et al., 2002; Soffer and Vàzquez, 3 × number of triangles
C= = (1)
2005). Lack of comparability between values of clustering of nodes number of triples
with different degrees thus makes the average value of local clus-
tering sensitive with respect to how degrees are distributed across where is the total number of triplets and is the sub-
the whole network. set of these triplets that are closed as a result of the addition of a
Unlike the local clustering coefficient, the global clustering coef- third tie. The coefficient takes values between 0 and 1. In a com-
ficient is based on transitivity, which is a measure used to detect the pletely connected network, C = 1 as all triplets are closed, whereas
fraction of triplets that are closed in directed networks (Wasserman in a classical random network C → 0 as the network size grows.
and Faust, 1994, p. 243). It is not an average of individual frac- More specifically, in a classical random network, the probabilities
tions calculated for each node, and, as a result, it does not suffer that pairs of nodes have of being connected are, by definition, inde-
from the same type of correlations with nodes’ degrees as the pendent (Erdős and Rényi, 1959; Solomonoff and Rapoport, 1951).
local coefficient. Despite its merits, however, in its original formu- Therefore, C is equal to the probability of a tie in these networks
lation, the global coefficient applies only to networks where ties (Newman, 2003).
are unweighted. To address this limitation, and make the coeffi- A major limitation of the clustering coefficient is that it cannot
cient suitable also to networks where ties are weighted, researchers be applied to weighted networks. As a result, the same outcome
have typically introduced an arbitrary cut-off level of the weight, might be attributed to networks that differ in terms of distribu-
and then dichotomized the network by removing ties with weights tion of weights and that, for this reason, might be characterized
that are below the cut-off, and then setting the weights of the by different likelihoods of one’s neighbors being connected with
remaining ties equal to one (Doreian, 1969; Wasserman and Faust, each other. This limitation could therefore bias the analysis of the
network structure. In order to overcome this shortcoming, in the
following section we will propose a generalization of the cluster-
1
ing coefficient that explicitly captures the richness of the weights
Node i’s neighbor might be: (1) a node that has directed a tie toward i; (2) a node
to which i has directed a tie; or (3) a node that has directed a tie toward i and, at the
attached to ties, while at the same time it produces the same results
same time, has received a tie from i. as the standard clustering coefficient when ties are unweighted.
T. Opsahl, P. Panzarasa / Social Networks 31 (2009) 155–163 157
Table 1
Methods for calculating the triplet value, ω.
Triplet value ω of
Method
present:
3×1
CGT0 = = 0.33 (3)
9
However, if, for example, the two sample networks represented
social networks in which ties refer to friendship between individu-
als, we believe that it would not be accurate to claim that both these
networks show the same tendency of one’s friends to be friends
Fig. 2. Non-vacuous triplets centered around node i.
themselves. Being friends refers to a social relationship that can
be assessed by using the same criteria (duration, emotional inten-
sity, intimacy, and exchange of services) that Granovetter (1973) of the numerator nor of the denominator of the fraction in Eq. (1).
proposed for classifying tie weights. The generalized clustering More specifically, when we are dealing with directed data, there can
coefficient helps highlight the difference between the two sam- be four basic configurations of a triplet around an individual node
ple networks. More specifically, for networks a and b in Fig. 1, the i: ij,ik , ij,ki , ji,ik , and ji,ki . The configurations ji,ki and ij,ik form,
generalized clustering coefficients obtained by using the geometric respectively, an in- and out-star, and therefore are vacuous and not
mean method (gm) for defining triplet values are, respectively: part of the fraction in Eq. (1). Conversely, the configurations ij,ki
and ji,ik are non-vacuous. These triplets can be either transitive or
Cω,gm = 0.44 (4a) intransitive.
Triplets defined according to Wasserman and Faust (1994) form
Cω,gm = 0.23 (4b)
chains of nodes. These triplets have been termed 2-path as they
The difference in values stems from the fact that the general- form chains of two directed ties between three nodes (Luce and
ized clustering coefficient captures more information than CGT0 . In Perry, 1949). A triplet is transitive if a tie is present from the first
particular, the difference between (4a) and (4b) reflects the differ- node to the last node of the chain. For the two triplets shown in
ences in tie weights in the two sample networks. If, for example, Fig. 2, transitivity would imply xkj = 1 and xjk = 1, respectively.
the tie weights in the sample networks were to represent duration, Transitivity suffers from the same limitation as the standard
we might reasonably argue that the nodes in network a are invest- clustering coefficient in that it cannot be applied to networks where
ing more time on interactions with other nodes that are themselves the ties are weighted. To overcome this shortcoming, here we
connected than is the case with network b. extend our proposed generalization also to directed and weighted
Following Barrat et al. (2004), in our proposed generalization of networks by using the same definition of a triplet, , as in transitiv-
the clustering coefficient we do not take into account the weight of ity. The triplet value, ω, is calculated by using the same methods as
the closing tie of a triplet. This is because the aim of the clustering stated in Section 3. This generalization produces the same results
coefficient is to assess the likelihood of the occurrence of a tie that as transitivity if applied to binary directed networks, and the
closes a triplet, and not the strength of this tie. A triplet must be same results as the standard clustering coefficient if applied to
created prior to the closing tie. In other words, as networks evolve binary and undirected networks. Moreover, it still ranges between
over time by the creation and removal of ties, clustering occurs 0 and 1. In a completely connected network, we would still
when a triplet exists, and a newly created third tie closes the triplet. obtain Cω = 1, whereas in a classical random network, Cω → 0 as
Nevertheless, when we observe a triangle in a cross-sectional net- the network size grows. In particular, once again we found that
work dataset, we do not know which of the three triplets that Cω approximated the probability of a tie in a classical random
make up the triangle occurred in the first place. In effect, this network.4
means that the weight of the closing tie of a triplet is taken into To clarify which triplets are transitive and non-vacuous, Table 2
account since it is part of the values of the other two triplets in the illustrates configurations of triplets centered on node i. The first four
triangle. rows show the basic configurations mentioned above. The remain-
ing rows show configurations of triplets where ties are reciprocated.
4. Directed networks In these cases, each additional tie doubles the number of triplets.
Moreover, the table shows which triplets are transitive under dif-
In directed networks, connections between nodes are described ferent conditions, and which triplet values should be included in
as ties that originate from one node and point toward another the fraction of Eq. (2).
(Wasserman and Faust, 1994). The weight of a tie directed from
node i to node j is expressed as xij . In a binary network, the weight 5. Empirical test of the generalized clustering coefficient
of a present tie is set equal to 1, whereas the weight of an absent
tie is 0. We define the triplet consisting of the two directed ties, xji We now test the proposed generalization of clustering on a
and xik , as ji,ik , and the value of this triplet as ωji,ik . number of network datasets. We also compare the generalized coef-
The standard clustering coefficient as stated in Eq. (1) cannot be ficient with the standard one measured with different cut-offs.5
applied to directed data. A more refined measure to calculate clo- Table 3 summarizes the empirical results.
sure in directed networks is called transitivity, T (for a review, see
Wasserman and Faust, 1994, p. 243). Transitivity produces the same
results as the standard clustering coefficient if applied to an undi- 4
These findings are based on ensembles of classical random networks with 50,
rected network (Feld, 1981; Newman, 2003). It also shares the same 100, 200, 400, 800, and 1600 nodes and an average degree of 10. Each ensemble
contains 1000 realizations. The four methods for defining triplet value were not
properties. In fact, 0 ≤ T ≤ 1: in a completely connected network,
significantly different.
we have: T = 1; in a classical random network, T → 0 as the net- 5
For the standard clustering coefficient, the networks are dichotomized with dif-
work size grows. T takes the direction of the ties between nodes into ferent values X of the cut-off. More specifically, CGTX refers to Eq. (1) where ties with
consideration by using a more sophisticated definition of a triplet. weights that are greater than X are set to present and ties with weights that are
A triplet centered on node i must have one incoming and one out- lower than, or equal to, X are removed. Unless otherwise specified, ties are set to
present if their weights are greater than 0. Moreover, in our empirical analysis, we
going tie, i.e., xki = xij = 1 or xji = xik = 1, as shown by the solid lines adopt the generalized coefficient Cω,gm that uses the geometric mean method gm
in Fig. 2. Wasserman and Faust (1994) termed triplets that do not for defining triplet value ω. A program to calculate the standard and generalized
fulfill the above condition as vacuous. These triplets are not part clustering coefficients using R or Matlab is available upon request from the authors.
T. Opsahl, P. Panzarasa / Social Networks 31 (2009) 155–163 159
Table 2
Triplets () and triplet values (ω) in a directed network (i =
/ j=
/ k).
if wjk = 0, wkj = 0 if wjk > 0, wkj = 0 if wjk = 0, wkj > 0 if wjk > 0, wkj > 0
Table 3
Comparison between the generalized and the standard clustering coefficients.
Network Cω C
Cω,am Cω,gm Cω,min Cω,max CGT0 CGT2 CGT4 CGT6 CGT8 CGT10
Due to a large range of tie weights in the US airport network, we have divided the tie weights by 100,000. This operation has no impact on the generalised coefficient, but it
enables us to conduct the sensitivity analysis (i.e., in the original dataset, the minimum tie weight is equal to 17, and therefore a sensitivity analysis with values of the cut-off
lower than 17 would be meaningless).
160 T. Opsahl, P. Panzarasa / Social Networks 31 (2009) 155–163
The first dataset we consider is Freeman’s EIES networks ter are connected with one another, whereas this is not the case
(Freeman and Freeman, 1979), also used in Wasserman and Faust for researchers located in the outer ring. Moreover, the strongest
(1994). This dataset was collected in 1978 and contains three net- ties in the network tend to connect the researchers in the center
works of researchers working on social network analysis. The first with one another and with nodes at the periphery. This implies
is an acquaintance network including 48 researchers, and in which that stronger ties are more likely to be part of triangles than
relationships were recorded at the beginning of the study (time 1). weaker ties. For example, Nick Mullins is strongly connected to
The second network is similar, but the data were recorded at the Sue Freeman and Barry Wellman, who are in turn connected with
end of the study (time 2). The third is a frequency matrix of the each other. By contrast, Phipps Arabie is weakly connected to
number of messages sent among 32 of the researchers that used an Ev Rogers and Carol Barner–Barry, who are not connected with
electronic communication tool. In the two acquaintance networks, each other. This tendency of strongly connected researchers to
all relationships have a weight between 0 and 4. 4 represents a close establish a tie with the same third party is responsible for the
personal friend of the researcher’s; 3 represents a friend; 2 repre- increased value of clustering when measured with our generalized
sents a person the researcher has met; 1 represents a person the coefficient.
researcher has heard of, but not met; and 0 represents a person The second dataset is a network created from an online com-
unknown to the researcher. In the frequency matrix, the average tie munity (Panzarasa et al., in press). This network dataset covers
weight is 33.7 and the maximum weight is 559. The three networks the period from April to October 2004. It includes 1899 nodes
are highly connected, with densities of 0.34, 0.40, and 0.46, respec- that represent students at the University of California, Irvine.
tively. They also exhibit a fairly large tendency toward clustering: During the observation period, students sent a total number of
CGT0 for the three networks is 0.7627, 0.8131, and 0.6386, respec- 59,835 online messages. A directed tie is established from one
tively. When the proposed generalization of clustering is applied student to another if one or more messages have been sent from
to the three networks, clustering increases. More specifically, Cω,gm the former to the latter. The weight of a tie is defined as the
takes the value of 0.7708, 0.8218, and 0.7332, respectively. Thus, for number of messages sent. The maximum and average tie weight
the acquaintance networks, clustering increases of 1.1%, whereas for are 98 and 2.95, respectively. This network exhibits a density
the frequency matrix it shows a relatively higher increase of 14.8%. of 0.0056, and an average degree of 10.69. In this network, we
Fig. 3 shows Freeman’s third EIES network, in which the size found CGT0 = 0.0547 and Cω,gm = 0.0638. Thus, when the gen-
of a node is proportional to the number of messages sent by eralized coefficient is applied, there is an increase in clustering
the researcher, and the width of a tie between two nodes corre- of 16.8%.
sponds to the number of messages exchanged between the two The third dataset contains four organizational networks, two
researchers. As shown in the figure, all researchers at the cen- from a consulting company and two from a research team in a
Fig. 3. Freeman’s third EIES network. The size of a node is proportional to the total number of messages sent by the corresponding scientist, and the width of a tie to the
number of messages exchanged among the two connected scientists. The scientists that are part of the largest clique are placed in the inner circle.
T. Opsahl, P. Panzarasa / Social Networks 31 (2009) 155–163 161
manufacturing company (Cross and Parker, 2004).6 The consult- 2008).9 In this network, two airports are connected if a flight was
ing company had 46 employees that are the nodes in the first scheduled between them in 2002. The weight of a tie between two
two networks. The ties in the first network are differentiated in airports corresponds to the number of seats available on the sched-
terms of frequency of information or advice requests, whereas the uled flights. Although air transportation networks are directed by
ties in the second network are differentiated in terms of the value nature, they are also highly symmetric (Barrat et al., 2004). There-
placed on the information or advice received. In both these net- fore, we analyse this network as an undirected one. On average,
works, ties are weighted on a scale from 0 to 5. The company had each airport is connected to 11.92 other airports (i.e., density is
offices both in Europe and in the US. The US employees were divided 0.0239). For the average route, 152,320 seats were scheduled. In
into two tightly knit groups, whereas this did not occur with the this network, the standard and generalised clustering coefficients
European employees. The other two networks are concerned with were well above the randomly expected value: CGT0 = 0.3514 and
a research team in a manufacturing company. The nodes in these Cω,gm = 0.5066. The generalised coefficient is 44.16% larger than
networks are the 77 employees. The ties in the first network are the standard one. This suggests that airports with busy routes are
differentiated in terms of advice, whereas in the second network in part of transitive triplets.
terms of the employees’ awareness of knowledge and skills. In both A number of observations are now in order. First, for all
these networks, ties are weighted on a scale from 0 to 6. Moreover, the networks, the standard clustering coefficient, CGTX , generally
for both networks, data collection took place after an organiza- decreases as the value X of the cut-off increases. However, the
tional restructuring operation that combined four separate units rate of decrease differs considerably among the networks. More-
in different European countries. The research team was partitioned over, for each network, there is variation in the rate of decrease
into strong communities based on the employees’ previous geo- between different values of the cut-off. Despite an average decreas-
graphical location (Cross and Parker, 2004, pp. 15–17). Thus, focus ing trend, we also found that, in certain networks, the clustering
constraint might have been partly responsible for a high value of coefficient increases in correspondence of increasing levels of the
clustering (Feld, 1981). All four networks do in fact exhibit a high cut-off. In addition, the reliability of the results when large cut-
clustering coefficient: CGT0 ranges between 0.6764 and 0.6932, and offs are used should be questioned, for in the networks there
Cω,gm between 0.6857 and 0.7209. The data thus exhibit an average remain only few triplets and triangles when those cut-offs are
increase in clustering of 3.2% when the generalised coefficient is used. Thus, these findings from a sensitivity analysis of the stan-
applied. dard clustering coefficient do not lend themselves to unequivocal
The fourth dataset is a network of political support in the interpretation.
US Senate (101st Congress, 1989/1990; see Skvoretz, 2002).7 The Second, there are variations in the values of the generalized
network includes 102 nodes that represent senators. Ties among clustering coefficient, Cω , when different methods for defining the
senators reflect co-sponsorship of bills. This network has a den- triplet value ω are used. For most of the networks, the highest Cω is
sity of 0.58 and an average degree of 59. As the network is well obtained when the minimum method is used, whereas the lowest
connected, it is difficult to draw conclusions from CGT0 . We found: outcome is obtained when the maximum method is used. Given
CGT0 = 0.7219. Weights of ties represent the number of bills that two triplets with the same average weight, the minimum method
the connected senators have co-sponsored. The average tie weight assigns a lower value to the triplet with a higher dispersion of
is 2.68 and the maximum weight is 29. The large difference between weights than to the triplet with a lower dispersion. The reverse is
the mean and the maximum weights signals that many of the true for the maximum method. Since, for most networks, the min-
ties are relatively weak. This is an indication that a cut-off higher imum (maximum) method produces the highest (lowest) value of
than zero might be more appropriate for dichotomizing the net- clustering, the triplets consisting of ties with a lower (higher) vari-
work. In Table 3, we list CGTX calculated using different values ation in weight are more likely to be closed (open) than the triplets
X of the cut-off. When we applied the generalized coefficient, with a larger (lower) variation. This means that triplets consisting
we found: Cω,gm = 0.7639. This represents an increase in cluster- of two ties with approximately the same weight are likely to be
ing of 5.8%. This increase in clustering is likely to be influenced closed.10
by the fact that party membership and ideologies place a con- Third, for all networks, the generalized clustering coefficient
straint on the existence and strength of ties among senators. In is higher than the standard coefficient. When networks are
particular, senators belonging to different parties are likely to co- dichotomized by setting ties with weights greater than 0 to present,
sponsor a limited number of bills, which inevitably affects the the standard clustering coefficient can be used as a benchmark for
total value of closed triplets connecting senators from different the generalized one. As shown by simulations in Section 3, when
parties. the weights are reshuffled among the ties, Cω ≈ CGT0 . Thus, by com-
The fifth dataset is the neural network of the Caenorhabditis ele- paring Cω with CGT0 , we can assess whether strong triplets are
gans worm. This network was studied in Watts and Strogatz (1998).8 more likely to be closed than weak triplets. More specifically, if
The network contains 306 nodes that represent neurons. A tie joins the generalized clustering coefficient is significantly higher than
two neurons if they are connected by either a synapse or a gap junc- the standard clustering coefficient, strong triplets are more likely
tion. The weight of a tie represents the number of these synapses to be closed than weak ones, whereas if the reverse were the
and gap junctions. The average tie weight is 3.74, and the maximum
tie weight is 70. The density is 0.0253 and the average degree is 7.7.
We found: CGT0 = 0.1843, and Cω,gm = 0.2210. Thus, the generalized
9
coefficient is 19.9% higher than the standard one. We thank Vittoria Colizza for making this dataset available: http://cxnets.
The sixth dataset is the network of the 500 busiest commer- googlepages.com/usairtransportationnetwork.
10
This observation does not apply to three of our networks: Freeman’s frequency
cial airports in the United States (Colizza et al., 2007; Opsahl et al.,
matrix, the online community, and C. elegans’ neural network. For these networks,
the maximum method is associated with the highest level of Cω , and vice versa. A
possible reason for this is that these networks have a relatively high variation of
tie weights. The fact that in these networks Cω,max is higher than Cω,min signals the
tendency of triplets with large variation in weights to be closed. Moreover, variation
6
We thank Andrew Parker at Stanford University for supplying this dataset. in tie weights might translate into variation between triplet values, which makes
7
We thank John Skvoretz for making this dataset available to us. clustering sensitive to individual triplets. For example, in a network with a single
8
This dataset was obtained from the Collective Dynamics Group’s (Duncan Watts) extremely strong triplet, the value of the generalized coefficient will depend heavily
website: http://smallworld.sociology.columbia.edu/cdg/datasets/. on whether or not this triplet is closed.
162 T. Opsahl, P. Panzarasa / Social Networks 31 (2009) 155–163
case, weak triplets would be more likely to be closed than strong than the value obtained with CGT0 (i.e., 0.6386).11 Thus, despite the
ones. fact that, as suggested by Fig. 3, in Freeman’s third EIES network,
strong ties tend to be part of transitive triplets, the results obtained
by using Ahnert et al.’s method would in fact suggest the opposite.
6. Conclusions and discussion Our generalized clustering coefficient is consistent with the
local weighted clustering coefficient proposed by Barrat et al.
Relationships among unique people are unique. We live in (2004). For example, in the US airport network, both measures pro-
an increasingly connected world with an increasing number of duce values that are higher than the values of the corresponding
contacts to whom we relate in different ways, with different fre- binary measures. However, the weighted local clustering coeffi-
quencies, and for different reasons. Each social relationship bears cient is inevitably biased by the fact that it builds explicitly on
a special meaning to us, and it would be overly simplistic and the local binary coefficient. This is likely to constrain the mea-
grossly unfair to treat every contact in the same manner. There- sure in two ways. First, as the binary measure, the weighted one
fore, it is important to capture differences among relationships is not applicable to directed networks. Second, it still suffers from
when mapping and studying social networks. In particular, social negative correlation between the degree of nodes and their like-
network measures should reflect the richness of the information lihood of being embedded in closed triplets. For example, in the
that the weights of relationships convey. However, despite the US airport network, we found a negative correlation of –0.24
fact that there are a large number of network datasets where between node degree and weighted local clustering. Unlike our
the weights of the relationships are recorded (see Section 5, but global measure, the weighted local clustering coefficient is there-
also Barrat et al., 2004; Ebel et al., 2002; Holme et al., 2004; fore affected by the way degrees are distributed across the nodes in a
Kossinets and Watts, 2006; Panzarasa et al., in press), only a lim- network.
ited number of measures take weights into account (among others, One of the advantages of the generalized clustering coefficient
Barrat et al., 2004; Burt, 1992; Freeman et al., 1991; Nordlund, is also a limitation. Unlike what is normally done with the stan-
2007; Opsahl et al., 2008; Yang and Knoke, 2001). Therefore, dard clustering coefficient, our measure does not require ties in
most measures can only be calculated on network data that are weighted networks to be transformed. This becomes an issue when
binary. all possible ties within a network are assigned a weight, even a
Among the measures that suffer from this shortcoming is the very small one. In these circumstances, the network is fully con-
clustering coefficient. In this paper, we focused on this measure, nected, and the generalized clustering coefficient is 1. The standard
and offered a generalization that takes the weight of ties explic- clustering coefficient does not have this shortcoming as ties with
itly into account by attaching a value to each triplet. The standard a small weight are set to absent and, therefore, the network does
coefficient divides the number of closed triplets by the total num- not become fully connected. An example of a weighted, fully con-
ber of triplets, whereas the generalized coefficient divides the nected network is a network consisting of cities, where the ties
total value of the closed triplets by the total value of all triplets. between cities are assigned a weight that reflects the distance
In particular, the generalized clustering coefficient produces the between the two connected cities. Here, all possible ties are present
same result as the standard coefficient when applied to a binary and assigned a weight. The standard clustering coefficient over-
network. comes this issue by setting weak relations, i.e., those characterized
We measured and compared the standard and generalized clus- by long distances, to absent. A possible solution when applying
tering coefficients on a number of network datasets where the the generalized coefficient, which does not normally transform the
weights of ties are recorded. First, we found that the value of data, is to carry out precisely this transformation and filter the data
the standard coefficient generally decreases as the value of the by setting weak relations, with distances smaller than a fixed cut-
cut-off increases. However, as the rate of decrease varies across off, to absent. However, the suitability and appropriateness of this
datasets, it is difficult to interpret this result. Second, we found solution depends on the data, the context in which the data were
that there were differences among the outcomes when different collected, and the research question.
methods for defining the triplet value were used. The general- More generally, researchers should operationalize variables
ized coefficient based on the minimum method yielded mostly the with care when dealing with research questions concerned with
highest value, whereas when the maximum method was used, the tie weights. Marsden and Campbell (1984) conducted a compar-
lowest outcome was generally attained. This suggests that simi- ative analysis of Granovetter’s (1973) four criteria for defining tie
larity in tie weights in a triplet increases the chance of closure weights. They found that emotional intensity was a better indicator
of that triplet. Third, we found that, in all social networks stud- of strength of friendship than the other three criteria. Researchers
ied, the value of the generalized coefficient was greater than the should choose the appropriate measures of tie strength depend-
value of the standard one. These findings thus provide support in ing on the nature of the nodes and ties and, more generally, on the
favour of Granovetter’s (1973) claim that in social networks strong context of the research setting. In addition, the scale of the weights
ties are more likely to be part of transitive triplets than weak should be carefully defined. The scale should be consistent with the
ones. chosen criteria. For example, a typical network question often used
Being able to produce values of clustering that are positively in studies of advice networks is:
affected by the tendency of strong ties to be part of transitive triplets
Please indicate how often you have turned to this person for infor-
is a distinct property of our method as well as an advantage over
mation or advice on work-related topics in the past three months.
alternative methods for applying binary measures to weighted net-
works. For example, we adopted Ahnert et al.’s (2007) method for with the ordinal scale: 0 (Do not know this person); 1 (Never); 2
converting a weighted network into an ensamble of binary net- (Seldom); 3 (Sometimes); 4 (Often); 5 (Very Often).12 In this case,
works, and calculated the average standard clustering coefficient answers are inevitably subject to the bias that comes from the dif-
on these networks. Drawing on Freeman’s third EIES network (see
Fig. 3), we produced 1000 binary networks in which the probabil-
ity of a tie was obtained by dividing its weight by the maximum 11
This might be due to the fact that the average density of the binary networks
weight in the network. The average standard clustering coefficient tends to be much smaller than the density of the weighted network.
found on this ensamble is 0.1288. This value is not only much lower 12
Cross and Parker (2004) used this question to create the advice network in the
than the one found with our method (i.e., 0.7332), but also lower consulting company used in Section 5.
T. Opsahl, P. Panzarasa / Social Networks 31 (2009) 155–163 163
ferent ways in which different people assess duration and define Holme, P., Edling, C.R., Liljeros, F., 2004. Structure and time evolution of an Internet
the meaning of the time-related scale. One way to overcome this dating community. Social Networks 26 (2), 155–174.
Ingram, P., Roberts, P.W., 2000. Friendships among competitors in the Sydney hotel
problem is to transform the ordinal scale into a ratio scale that industry. American Journal of Sociology 106 (2), 387–423.
describes reality more consistently across people. For example, a Karlberg, M., 1997. Testing transitivity in graphs. Social Networks 19 (4), 325–343.
more appropriate scale for the above network question could be: Karlberg, M., 1999. Testing transitivity in digraphs. Sociological Methodology 29,
225–251.
0 (Never); 1 (Once); 3 (Monthly); 6 (Fortnightly); 12 (Weekly). In Kossinets, G., Watts, D.J., 2006. Empirical analysis of an evolving social network.
turn, this scale, when compared to the former, is likely to yield a Science 311, 88–90.
network dataset that is richer in information, more robust against Levine, S.S., Kurzban, R., 2006. Explaining clustering in social networks: toward an
evolutionary theory of cascading benefits. Managerial and Decision Economics
potential inaccuracies emanating from subjective judgments, and 27, 173–187.
more suitable to investigations that rely on generalized measures, Lopez-Fernandez, L., Robles, G., Gonzalez-Barahona, J.M., 2004. Applying social net-
such as our proposed clustering coefficient. work analysis to the information in CVS repositories. In: Proceedings of the
1st International Workshop on Mining Software Repositories (MSR2004), pp.
101–105.
Acknowledgements Louch, H., 2000. Personal network integration: transitivity and homophily in strong-
tie relations. Social Networks 22 (1), 45–64.
We wish to give very special thanks to Filip Agneessens, Stephen Luce, R.D., Perry, A.D., 1949. A method of matrix analysis of group structure. Psy-
chometrika 14 (1), 95–116.
Borgatti, Carter Butts, and Tom Snijders for their valuable feedback Luczkowich, J.J., Borgatti, S.P., Johnson, J.C., Everett, M.G., 2003. Defining and mea-
on earlier versions of this paper. We are also grateful to participants suring trophic role similarity in food webs using regular equivalence. Journal of
of the 3rd Conference on Applications of Social Network Analysis, Theoretical Biology 220, 303–321.
Marsden, P.V., Campbell, K.E., 1984. Measuring tie strength. Social Forces 63 (2),
the 27th Sunbelt Social Networks Conference, and NetSci 2008 for
482–501.
their helpful comments and suggestions. Monge, P., Rothman, L., Eisenberg, E., Miller, K., Kirste, K., 1985. The dynamics of
organizational proximity. Management Science 31, 1129–1141.
References Newman, M.E.J., 2001. Scientific collaboration networks: II. Shortest paths, weighted
networks, and centrality. Physical Review E 64 (1), 016132.
Newman, M.E.J., 2003. The structure and function of complex networks. SIAM Review
Ahnert, S.E., Garlaschelli, D., Fink, T.M., Caldarelli, G., 2007. An ensemble approach
45, 167–256.
to the analysis of weighted networks. Physical Review E 76, 016101.
Newman, M.E.J., 2006. Modularity and community structure in networks. Proceed-
Barabási, A.-L., Jeonga, H., Néda, Z., Ravasz, E., Schubert, A., Vicsek, T., 2002. Evolution
ings of the National Academy of Sciences of the United States of America, 103
of the social network of scientific collaborations. Physica A 311, 590–614.
(23), 8577–8582.
Barrat, A., Barthélémy, M., Pastor-Satorras, R., Vespignani, A., 2004. The architecture
Newman, M.E.J., Girvan, M., 2004. Finding and evaluating community structure in
of complex weighted networks. Proceedings of the National Academy of Sciences
networks. Physical Review E 69, 026113.
of the United States of America 101 (11), 3747–3752.
Nordlund, C., 2007. Identifying regular blocks in valued networks: a heuristic applied
Burt, R.S., 1992. Structural Holes. Harvard University Press, Cambridge, MA.
to the St. Marks carbon flow data, and international trade in cereal products.
Caldarelli, G., 2007. Scale-Free Networks: Complex Webs in Nature and Technology.
Social Networks 29 (1), 59–69.
Oxford University Press, Oxford.
Onnela, J.-P., Saramäki, J., Kertész, J., Kaski, K., 2005. Intensity and coherence of motifs
Colizza, V., Pastor-Satorras, R., Vespignani, A., 2007. Reaction-diffusion processes and
in weighted complex networks. Physical Review E 71 (6), 065103.
metapopulation models in heterogeneous networks. Nature Physics 3, 276–282.
Opsahl, T., Colizza, V., Panzarasa, P., Ramasco, J.J., 2008. Prominence and control: the
Cross, R., Parker, A., 2004. The Hidden Power of Social Networks. Harvard Business
weighted rich-club effect. Physical Review Letters 101, 168702.
School Press, Boston, MA.
Panzarasa, P., Opsahl, T., Carley, K.M. Patterns and dynamics of users’ behavior and
Davis, J.A., 1970. Clustering and hierarchy in interpersonal relations: testing two
interaction: network analysis of an online community. Journal of the American
graph theoretical models on 742 sociomatrices. American Sociological Review
Society for Information Science and Technology 60 (4), doi:10.1002/asi.21015.
35 (5), 843–851.
Ravasz, E., Barabási, A.-L., 2003. Hierarchical organization in complex networks.
Davis, G.F., Yoo, M., Baker, W.E., 2003. The small world of the American corporate
Physical Review E 67 (2), 026112.
elite, 1982–2001. Strategic Organization 1 (3), 301–326.
Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., Barabási, A.-L., 2002. Hierar-
Doreian, P., 1969. A note on the detection of cliques in valued graphs. Sociometry 32
chical organization of modularity in metabolic networks. Science 297 (5586),
(2), 237–242.
1551–1555.
Doreian, P., Batagelj, V., Ferligoj, A., 2005. Generalized Blockmodeling. Cambridge
Rosvall, M., Bergstrom, C.T., 2008. Maps of random walks on complex networks reveal
University Press, New York, NY.
community structure. Proceeding of the National Academy of Sciences of the
Ebel, H., Mielsch, L.-I., Bornholdt, S., 2002. Scale-free topology of e-mail networks.
United States of America, 105, 1118–1123.
Physical Review E 66 (3), 035103.
Scott, J., 2000. Social Network Analysis: A Handbook. Sage Publications, London, UK.
Erdős, P., Rényi, A., 1959. On random graphs. Publicationes Mathematicae 6,
Simmel, G., 1923. The Sociology of Georg Simmel (KH Wolff, trans.). Free Press, New
290–297.
York, NY.
Feld, S.L., 1981. The focused organization of social ties. American Journal of Sociology
Skvoretz, J., 2002. Complexity theory and models for social networks. Complexity 8,
86 (5), 1015–1035.
47–55.
Feld, S.L., Elmore, R.G., 1982a. Patterns of sociometric choices: transitivity reconsid-
Snijders, T.A.B., 2001. The statistical evaluation of social network dynamics. Socio-
ered. Social Psychology Quarterly 45, 77–85.
logical Methodology 31, 361–395.
Feld, S.L., Elmore, R.G., 1982b. Processes underlying patterns of sociometric choice:
Snijders, T.A.B., Pattison, P., Robins, G., Handcock, M.S., 2006. New specifica-
response to Hallinan. Social Psychology Quarterly 45, 90–92.
tions for exponential random graph models. Sociological Methodology 36 (1),
Freeman, L.C., Borgatti, S.P., White, D.R., 1991. Centrality in valued graphs: a measure
99–153.
of betweenness based on network flow. Social Networks 13 (2), 141–154.
Soffer, S.N., Vàzquez, A., 2005. Network clustering coefficient without degree-
Freeman, S.C., Freeman, L.C., 1979. The networkers network: a study of the impact of a
correlation biases. Physical Review E 71, 057101.
new communications medium on sociometric structure. Social Science Research
Solomonoff, R., Rapoport, A., 1951. Connectivity of random nets. Bulletin of Mathe-
Reports 46. University of California, Irvine, CA.
matical Biophysics 13, 107–117.
Friedkin, N.E., 1984. Structural cohesion and equivalence explanations of social
Uzzi, B., Spiro, J., 2005. Collaboration and creativity: the small world problem. Amer-
homogeneity. Sociological Methods and Research 12, 235–261.
ican Journal of Sociology 111 (2), 447–504.
Granovetter, M., 1973. The strength of weak ties. American Journal of Sociology 78
Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications.
(6), 1360–1380.
Cambridge University Press, New York, NY.
Hallinan, M.T., 1974. A structural model of sentiment relations. American Journal of
Watts, D.J., Strogatz, S.H., 1998. Collective dynamics of small-world networks. Nature
Sociology 80, 364–378.
393, 440–442.
Heider, F., 1946. Attitudes and cognitive organization. Journal of Psychology 21,
Yang, S., Knoke, D., 2001. Optimal connections: strength and distance in valued
107–112.
graphs. Social Networks 23 (4), 285–295.
Holland, P.W., Leinhardt, S., 1970. A method for detecting structure in sociometric
Zhang, B., Horvath, S., 2005. A general framework for weighted gene co-expression
data. American Journal of Sociology 76, 492–513.
network analysis. Statistical Applications in Genetics and Molecular Biology 4
Holland, P.W., Leinhardt, S., 1971. Transitivity in structural models of small groups.
(1), 17.
Comparative Group Studies 2, 107–124.