Face Long 14
Face Long 14
Face Long 14
Abstract
We study the social structure of Facebook friendship networks at one hundred
American colleges and universities at a single point in time, and we examine the
roles of user attributesgender, class year, major, high school, and residenceat
these institutions. We investigate the influence of common attributes at the dyad
level in terms of assortativity coefficients and regression models. We then examine larger-scale groupings by detecting communities algorithmically and comparing
them to network partitions based on the user characteristics. We thereby compare
the relative importances of different characteristics at different institutions, finding
for example that common high school is more important to the social organization
of large institutions and that the importance of common major varies significantly
between institutions. Our calculations illustrate how microscopic and macroscopic
perspectives give complementary insights on the social organization at universities
and suggest future studies to investigate such phenomena further.
Preprint submitted to Social Networks
1. Introduction
Since their introduction, social networking sites (SNSs) such as Friendster, MySpace, Facebook, Orkut, LinkedIn, and myriad others have attracted hundreds of
millions of users, many of whom have integrated SNSs into their daily lives to communicate with friends, send e-mails, solicit opinions or votes, organize events, spread
ideas, find jobs, and more [Boyd and Ellison, 2007]. Facebook, an SNS launched
in February 2004, now overwhelms numerous aspects of everyday life, and it has
become an immensely popular societal obsession [Boyd, 2007b, Boyd and Ellison,
2007, Lewis et al., 2008b, Mayer and Puller, 2008]. Facebook members can create
self-descriptive profiles that include links to the profiles of their friends, who may
or may not be offline friends. Facebook requires that anybody who one wants to
add as a friend confirm the relationship, so Facebook friendships define a network
(graph) of reciprocated ties (undirected edges) that connect individual users.
The emergence of SNSs such as Facebook and MySpace has revolutionized the
availability of social and demographic data, which has in turn had a significant impact
on the study of social networks [Boyd and Ellison, 2007, Krebs, 2008, Lievrouw
and Livingstone, 2005]. It is possible to acquire very large data sets from SNSs,
though of course the population online and actively using SNSs is a biased sample
of the broader population. Services like Facebook also contain large quantities of
demographic data, as many users now voluntarily reveal voluminous amounts of
detailed personal information. An especially exciting aspect of studying SNSs is
that they provide an opportunity to examine social organization at unprecedented
levels of size and detail, and they also provide new venues to test sampling effects
2
[Kurant et al., 2011]. One can investigate the structure of an SNS like Facebook to
examine it as a network in its own right, and ideally one can also try to take one step
further and infer interesting insights regarding the offline social networks that an SNS
imperfectly parallels. Most people tend to draw their Facebook friends from their
real-life social networks [Boyd and Ellison, 2007], so it is not entirely unreasonable to
use Facebook networks as a proxy for an offline social network. (Of course, as noted
by Hogan [2009], one does need to be aware of significant limitations when taking
such a leap of faith.)
Social scientists, information scientists, and physical scientists have all jumped
on the SNS data bandwagon [Rosenbloom, 2007]. It would be impossible to exhaustively cite all of the research in this area, so we only highlight a few results; additional
references can be found in the review by Boyd and Ellison [2007]. Boyd [2007a] also
wrote a popular essay about her empirical study of Facebook and MySpace, concluding that Facebook tends to appeal to a more elite and educated cross section
than MySpace. The company RapLeaf [Sodera, 2008] has compiled global demographics on the age and gender usage of numerous SNSs. Other recent studies have
investigated the manifestation on SNSs of race and ethnicity [Gajjala, 2007], religion
[Nyland and Near, 2007], gender [Geidner et al., 2007, Hjorth and Kim, 2005], and
national identity [Fragoso, 2006]. Preliminary research has also suggested that online
friendship networks can be exploited to improve shopper recommendation systems
on websites such as Amazon [Zheng et al., 2007].
Several papers have attempted to increase understanding of how SNS friendships
form. For example, Kumar et al. [2006] examined preferential attachment models
(e.g., dormitory, House, fraternity, etc.) of the users. We examine homophily and
community structure (network partitions that are obtained algorithmically) for each
of the networks and compare the community structure to partitions based on the
given categorical data. We thereby compare and contrast the organizations of the
100 different Facebook networks, which arguably allows us to compare and contrast
the organizations of the underlying university social networks that they imperfectly
represent. In addition to the inherent interest of these Facebook networks, our investigation is important for subsequent use of these networkswhich were formed
via ostensibly the same generative mechanism onlineas benchmark examples for
numerous types of computations, such as new community detection methods.
The remainder of this paper is organized as follows. We first discuss the Facebook data and present the methods that we used for testing homophily at the dyad
level and demographic prevalences at the community level. We then present and
discuss results on the largest connected components of the networks, student-only
subnetworks, and single-gender subnetworks. Finally, we summarize and discuss our
findings.
2. Data
The data that we use was sent directly to us in anonymized form by Adam
DAngelo of Facebook. It consists of the complete set of users (nodes) from the
Facebook networks at each of 100 American institutions (which we enumerate in
Table A.1) and all of the friendship links between those users pages as they existed
in September 2005. The data clearly identifies most institutions, although there are a
small number of disambiguation problems. For instance, 4 different UC institutions
5
plus Cal are in the data, and there are 2 Texas listings. Each institution in the
data includes a number appearing as part of its name that appears to correspond to
the order in which each institution joined Facebook. The data can be downloaded
at http://people.maths.ox.ac.uk/porterm/data/facebook100.zip.
Similar snapshots of Facebook data from 10 Texas institutions were analyzed
recently by Mayer and Puller [2008], and a snapshot from a diverse private college in
the Northeast U.S. was studied by Lewis et al. [2008b]. Other studies of Facebook
have typically obtained data either through surveys [Boyd and Ellison, 2007] or
through various forms of automated sampling [Gjoka et al., 2010], thereby missing
nodes and links that can impact the resulting graph structures and analyses. We
consider only ties between people at the same institution, yielding 100 separate
realizations of university social networks and allowing us to compare the structures
at different institutions.
We consider four networks for each of the 100 Facebook data sets: the largest
connected component of the full network (which we hereafter identify as Full), the
largest connected component of the student-only network (Student), the largest
connected component of the female-only network (Female), and the largest connected component of the male-only network (Male). The Male and Female networks are each subsets of the Full network rather than the Student network. Each
network has a single type of unweighted, undirected connection between nodes and
can thus be represented as an adjacency matrix A with elements Aij = Aji indicating
the presence (Aij = 1) or absence (Aij = 0) of a tie between nodes i and j. The
resulting tangle of nodes and links, which we illustrate for the Reed College student
Facebook network in Figure 1, can obfuscate any organizational structure that might
be present.
The data also includes limited demographic (categorical) information that is volunteered by users on their individual pages: gender, class year, and (using anonymous
numerical identifiers) high school, major, and residence. We use a Missing label
for situations in which individuals did not volunteer a particular characteristic. The
different characteristics allow us to make comparisons between institutions, under
the assumption (see the discussion by Boyd and Ellison [2007]) that the communities and other elements of structural organization in Facebook networks reflect (even
if imperfectly) the social communities and organization of the offline networks on
which theyre based. It is an important research issue to determine just how imperfect this might be [Hogan, 2009], but this is far beyond the scope of the present paper
(though we hope that others will take on this particular challenge). The conclusions
that we draw in this paper apply directly to the university Facebook networks from
September 2005, and we expect that they can provide insight about the real-world
social networks at the institutions as well.
3. Methods
We study each network at both the dyad level and the community level. We first
consider homophily [McPherson et al., 2001, Newman, 2010, Wasserman and Faust,
1994]) quantified by assortativity coefficients using the available categorical data.
For some of the smaller networks, we additionally perform independent logistic regression on node pairs to obtain the log odds contributions to edge presence between
two nodes that have the same categorical-data value. We similarly fit exponential
7
random graph models (ERGMs) [Frank and Strauss, 1986, Handcock et al., 2008,
Lubbers and Snijders, 2007, Robins et al., 2007, Wasserman and Pattison, 1996]
with triangle terms to these smaller networks. Finally, we partition the networks by
algorithmically detecting communities [Fortunato, 2010, Porter et al., 2009], which
we compare to the given categorical data using the technique in this papers prequel [Traud et al., 2010]. Calculating assortativity values and log odds contributions
allows us to examine microscopic features of the networks, while comparing algorithmic partitions of the networks to the categorical data allows us to examine their
macroscopic features. As we illustrate below, both perspectives are important
because they provide complementary insights.
3.1. Assortativity
A general measure of scalar assortativity r relative to a categorical variable is
given by Newman [2003, 2010]:
r=
tr(e) ke2 k
[1, 1] ,
1 ke2 k
(1)
where e = E/kEk is the normalized mixing matrix, the elements Eij indicate the
number of edges in the network that connect a node of type i (e.g., a person with
a given major) to a node of type j, and the entry-wise matrix 1-norm kEk is equal
to the sum of all entries of E. By construction, this formula yields r = 0 when the
amount of assortative mixing is the same as that expected independently at random
(i.e., eij is simply the product of the fraction of nodes of type i and the fraction of
nodes of type j), and it yields r = 1 when the mixing is perfectly assortative.
the same high school. In all cases, we ignore possible contributions from missing
characteristic data: two nodes with the same missing data field are not treated as
having the same value for the characteristic. Rather than include gender explicitly
in the model, we instead additionally fit the model to the single-gender subnetworks
in order to be consistent with the treatment of gender in the community-level comparisons below. In the second model (an ERGM), we add a triangle statistic to
account for the observed amount of transitivity in the network data. This gives a
total of six coefficients: edges, common residence, common class year, common
major, common high school, and the triangle coefficient.
3.3. Community Detection
The global organization of social networks often includes coexisting modular (horizontal) and hierarchical (vertical) organizational structures, and myriad papers have
attempted to interpret such organization through the computational identification
of community structure. Communities are defined in terms of cohesive groups of
nodes with more internal connections (between nodes in the same group) than external connections (between nodes in the group and nodes in other groups). As
discussed at length in two recent review articles [Fortunato, 2010, Porter et al., 2009]
and in references therein, the ensemble of techniques available to detect communities
is both numerous and diverse. Existing techniques include hierarchical clustering
methods such as single linkage clustering, centrality-based methods, local methods,
optimization of quality functions such as modularity and similar quantities, spectral partitioning, likelihood-based methods, and more. Communities are considered
to not be merely structural modules but are also expected to have functional im10
portance because of the large number of common ties among nodes in a community.
For example, communities in social networks might correspond to circles of friends or
business associates and communities in the World Wide Web might encompass pages
on closely-related topics. In addition to remarkable successes on benchmark problems, investigations of community structure have observed correspondence between
communities and ground truth groups in diverse application areasincluding the
reconstruction of college football conferences [Girvan and Newman, 2002] and the
investigation of such structures in algorithmic rankings [Callaghan et al., 2007]; the
investigation of committee assignments [Porter et al., 2005], legislation cosponsorship
[Zhang et al., 2008], and voting blocs [Mucha et al., 2010, Waugh et al., 2009] in the
United States Congress; the examination of functional groups in metabolic networks
[Guimer`a and Amaral, 2005]; the study of ethnic preferences in school friendship
networks [Gonzalez et al., 2007]; and the study of social structures in mobile-phone
conversation networks [Onnela et al., 2007]
In the present paper, we investigate the community structures of the Facebook
networks from each of the 100 colleges and universities. (See the visualization of the
community structure for Reed College in Figure 2.) For each institution, we consider the Full, Student, Female, and Male networks. We seek to determine how well
the demographic labels included in the data correspond to algorithmically computed
communities. Assortativity provides a local measure of homophily, but that does
not provide sufficient information to draw conclusions about the global organization
of the Facebook networks. For example, two students who attended the same high
school are typically more likely to be friends with each other than are two students
11
who attended different high schools, but this will not necessarily have a meaningful
community-level effect unless enough of the students went to common high schools.
As we we will see below, high school tends to be a much more dominant organizing
characteristic of the social structure at the large institutions than at small institutions, presumably because of a significant frequency of common high school pairs at
the large institutions.
We identify communities by optimizing the modularity quality function Q =
b2i ), where eij denotes the fraction of ends of edges in group i for which the
P
other end of the edge lies in group j and bi = j eij is the fraction of all ends of
P
i (eii
edges that lie in group i. High values of modularity correspond to community assignments with greater numbers of intra-community links than expected at random (with
respect to a particular null model [Fortunato, 2010, Newman, 2006a, Porter et al.,
2009]). Although numerous other community detection methods are also available,
modularity optimization is perhaps the most popular way to detect communities and
it has been successfully applied to many applications [Fortunato, 2010, Porter et al.,
2009]. One might also consider using a method that includes a resolution parameter
[Reichardt and Bornholdt, 2006] to avoid issues with resolution limits [Fortunato
and Barthelemy, 2007]. However, our primary focus is on global organization of the
networks, so we limit our attention to the default resolution of modularity. This
focus arguably biases our study of communities to the largest structures, such as
those influenced by common class year, making the observed correlations with other
demographic characteristics even more striking.
To try to ensure that the communities we detect are properties of the data rather
12
than of the algorithms that we used, we optimize modularity (with default resolution) using 6 different combinations of spectral optimization, greedy optimization,
and Kernighan and Lin [1970] (KL) node-swapping steps (in the manner discussed
by Newman [2006b]). Specifically, we use (1) recursive partitioning by the leading
eigenvector of a modularity matrix [Newman, 2006a], (2) recursive partitioning by the
leading pair of eigenvectors (including the Richardson et al. [2009] extension of the
method in Newman [2006a]), (3) the Louvain greedy method [Blondel et al., 2008],
and each of these three supplemented with small increases in the quality Q that can
be obtained using KL node swaps. Each of these 6 methods yields a community
partition, and we obtain our comparisons (described in Section 3.4) by considering
each of these 6 partitions.
Modularity optimization is NP-hard [Brandes et al., 2008], so one must be cautious about the large number of degenerate partitions in the modularity landscape
[Good et al., 2010]. However, by detecting coarse observablesin particular, the
global organization of a Facebook network based on the given categorical dataand
considering results that are averaged over multiple optimization methods, one can
obtain interesting insights. The specific best partition will vary from one method
to another, but some of the predicted coarse organizational structure of the networks
(see below) is robust to the choice of community detection algorithm.
3.4. Comparing Communities to Node Data
Once we have detected communities for each institution, we will compare the
algorithmically-obtained community structure to the available categorical data for
the nodes. We recently developed a methodology to accomplish this goal in Traud
13
et al. [2010] (where we considered only 5 institutions among the 100 in order to
illustrate the techniques). This method of comparison can be applied to the output
of any hard partitioning algorithm in which each node is assigned to precisely one
community (cf. soft partitioning methods, in which communities can overlap). We
briefly review that methodology here.
To compare a network partition to the categorical demographic data, we standardize (using a z-score) the Rand coefficient of the communities in that partition
compared to partitioning based purely on each of the four categorical variables (one
at a time). For each comparisons, we calculate the Rand z-score z in terms of the
total number of pairs of nodes in the network M, the number of pairs that are in the
same community M1 , the number of pairs that have the same categorical value M2 ,
and the number of pairs of nodes that are both in the same community and have
the same categorical value w [Traud et al., 2010]. The Rand coefficient is given in
term of these quantities by S = [w + (M M1 M2 + w)]/M [Rand, 1971]. We then
calculate the z-score for the Rand coefficient as [Hubert, 1977, Traud et al., 2010]
1
z=
w
M1 M2
w
M
(2)
where
(4M1 2M)2 (4M2 2M)2
C1 C2
M
+
2
16
256M
16n(n 1)(n 2)
2
[(4M1 2M) 4C1 4M][(4M2 2M)2 4C2 4M]
,
+
64n(n 1)(n 2)(n 3)
w2 =
14
(3)
n is the number of nodes in the network, the coefficients C1 and C2 are given by
C1 = n(n2 3n 2) 8(n + 1)M1 + 4
n3i ,
n3j ,
(4)
nij denotes an element of a contingency table and indicates the number of nodes that
are classified into the ith group of the first partition and the jth group of the second
P
P
partition, ni = j nij is a row sum, and nj = i nij is a column sum. Each z-score
indicates the deviation from randomness in comparing the community structure with
the partitioning based purely on that single demographic characteristic. One needs
to be cautious when interpreting such deviations from randomness as a strength
of correlation. In particular, given the dependence on system size inherent in this
measure, one should not overinterpret the relative values of z-scores from different
institutions. Nevertheless, the z-scores provide a reasonable proxy quantity both for
the statistical significance of correlation and for the relative strength of correlation
in a specified network.
4. Results
We now use the methods outlined in the previous section to study the Facebook
networks. We first follow the order of presentation above and then make some
observations in combinations. Complete results are available in the tables in the
appendix.
15
4.1. Assortativity
We tabulate the assortativities based on gender, major, residence, class year, and
high school for all networks (and subsets thereof) in Table A.2.
For almost all of the institutions and each of the 4 network subsets, the class year
attribute produces higher assortativity values than the other available demographic
characteristics. However, Rice University (31), California Institute of Technology
(36), University of Georgia (50), University of Michigan (67), Auburn University
(71), and University of Oklahoma (97) are each examples in which residence provides the highest assortativity values (again, for each of the 4 network subsets). We
discussed Caltech as a focal example in Traud et al. [2010], in which we introduced
the community comparison methods that we employ below.
Other institutions have varying orderings of class year and residence assortativity
among the 4 network subsets. At MIT (8), USF (51), Notre Dame (57), University of
Maine (59), UC (61), UC (64), and MU (78), residence gives the highest assortativity
in the Male networks. The UCF (52) Female network has its highest assortativity
with residence. Both the Full network and the Male network for University of California at Santa Cruz (68) have their highest assortativity values with residence. Both
the Male and Female networks at University of Illinois at Urbana-Champaign (20),
Tulane (29), UC (33), Florida State University (53), Cal (65), University of Mississippi (66), University of Indiana (69), Texas (80), Texas (84), University of Wisconsin
(87), Baylor (93), University of Pennsylvania (94), and University of Tennessee (95)
have their highest assortativity values with residence; all other networks from these
institutions have their highest assortativity with class year.
16
Some outlying observations can be tied directly to small samples. For example,
Simmons (81) is a female-only college. It has only four males in the Full network;
none of the males had any connections with another male, so the gender assortativity
values for both the Full and Student components are very close to 0. Similar gender
numbers are also present in the data from Wellesley (22) and Smith (60).
4.2. Dyad-Level Regression and Exponential Random Graphs
We use the two statistical models described in Section 3.2 to study the 16 smallest
institutions. The (dyad-independent) logistic regression model includes contributions
from edges (network density) and matched user (node) characteristics for each of
four demographic variables. We present the results for this model in Table A.3. The
second model that we consider is an ERGM, which supplements the first model with
a structural triangle contribution. We present the results for this model in Table
A.4. These calculations give views of the networks at the microscropic (dyad-level)
scale that supplement the results that we obtained using the assortativity statistics.
We consider the results from the 16 smallest institutions by fitting the models to
each of their Full, Student, Female, and Male networks. Because all of the resulting model coefficients appear to be statistically significant at a p-value of less than
104 , we interpret the importance of node matching on the different demographic
characteristics directly from the magnitude of the corresponding model coefficients.
We summarize the results for these 16 institutions using the box plots in Figures
3 and 4. The box plots identify the outliers by institution number: Caltech (36),
Oberlin (44), Smith (60), Simmons (81), Vassar (85), and Reed (98). (As we have
only performed this regression analysis for the 16 smallest institutions in the data,
17
one should not jump to conclusions from this list of outliers.) For all institutions
and all four types of networks for each institution, the highest coefficient in the employed ERGM model (with triangle terms) is given for matching the High School
category, and the value of this coefficient is significantly higher than those for the
other node-matching coefficients. Only the Caltech (36) Female network has ERGM
coefficients for Year, Residence, and High School that are very close to each other.
4.3. Comparison of Communities
We now discuss community-level results for each network using z-scores of the
Rand coefficient to compare partitions obtained via algorithmic community detection to partitions based on each characteristic. That is, each community-detection
result identifies a group assignment for each node, thereby producing a partition
(called a hard partition) in which each node is assigned to exactly one community. One can also obtain a hard partition for each network by selecting a single characteristic and grouping nodes according to that characteristic. Every network that we study (including the subnetworks) has at least one z-score in the set
{zMajor , zYear , zHS , zResidence } with a value greater than 5. Although the distribution of Rand coefficients is decidedly not Gaussian, particularly in the tails of the
distributions [Brook and Stirling, 1984, Kulisnkaya, 1994, Traud et al., 2010], this
z = 5 threshold indicates that at least one characteristic in each network exhibits
strong statistical significance. Moreover, we will see that the vast majority of our
comparisons below exceed the z = 2 threshold. (That is, they essentially lie outside
95% confidence intervals.)
To visualize and compare the varied strengths of organization according to the
18
zMajor
,
zMajor + zYear + zHS + zResidence
z2 =
zResidence
,
zMajor + zYear + zHS + zResidence
z3 =
zYear
,
zMajor + zYear + zHS + zResidence
z4 =
zHS
.
zMajor + zYear + zHS + zResidence
(5)
z1
Z=
z2 .
z3
(6)
20
social organization of a few institutions differs considerably from that of the majority.
Each of these institutions lies close to the Residence vertex, so their community
structures are organized predominantly according to dormitory residence. Foremost
among these institutions are Rice (31) and California Institute of Technology (36).
As we discussed in Traud et al. [2010], California Institute of Technology (Caltech)
is well-known to be organized almost exclusively according to its undergraduate
House system [Looijen and Porter, 2007].
In repeatedly observing a strong correlation of class year with community structure, it is relevant to recall that the community detection method that we have
employed optimizes modularity at the default resolution. Because of the resolution
limit of modularity [Fortunato and Barthelemy, 2007], it might be interesting to explore individual networks at different scales using resolution parameters [Fortunato,
2010, Porter et al., 2009, Reichardt and Bornholdt, 2006]. We reiterate, however,
that our focus in the present paper is on large-scale features rather than precise node
membership of network partitions.
In Figure 5, we show the social organization tetrahedron for the Full networks
(i.e, for the the largest connected components of the complete networks) for each
institution. Although the community structure of nearly all of the Full networks are
organized overwhelmingly by class year, a few of them are also heavily influenced by
dormitory residence. (We already mentioned above that Rice (31) and Caltech (36)
are organized predominantly by Residence.) For example, dormitory residence also
dominates the community structure at UC Santa Cruz [UCSC] (68), though to a
lesser extent than at Rice and Caltech. We also observe relatively high Residence z-
21
scores at Smith (60), Auburn (71), and University of Oklahoma (97). Major seems to
be most important relative to the other available characteristics at Oberlin (44) and
Maine (59), though in both cases its relative correlation pales in comparison to that of
class year. High School seems to be most important at USF (51) and Tennessee (95),
though class year is again more important. Most of the institutions are clustered
tightly near the Year vertex, but Residence can often be rather important (and
sometimes even the most important category, as we have seen in three cases).
In Figure 6, we show the social organization tetrahedron for the Student networks
(i.e., for the largest connected component of the student-only subnetworks) for each
institution. As we saw with the Full networks, most of the institutions have community structures that are organized overwhelming according to class year. Rice,
Caltech, Smith, UCSC, Auburn, and Oklahoma are again exceptions, as dormitory
residence also exerts considerable (or even primary) influence at these institutions.
Additionally, considering the Student network reduces the relative dominance of the
Year vertex, although it clearly still dominates the social organization. This feature
is illustrated by institutions such as UC (64), UF (21), and Rutgers (89).
In Figure 7, we show the social organization tetrahedron for the Female networks
(i.e., for the largest connected component of the female-only subnetworks) for each
institution. Class year is once again the overwhelmingly dominant organizing characteristic, and dormitory residence is again important at institutions such as Rice,
Caltech, Smith, UCSC, Auburn, and Oklahoma. However, we now observe an increased importance of the High School vertex. USF (51), Tennessee (95), UF (21),
FSU (53), and GWU (54) all lie closer to the High School vertex than was the case
22
stronger correlation with high school. This suggests that there are potential differences in the gender patterns of friendships, which would be interesting to investigate
in future studies with new data. We do not explore this general issue further and
instead attempt to identify interesting comparisons with the results that we obtained
above. Although it is of course impossible to be exhaustive in our observations, we
present all of our assortativity values, regression model coefficients, and communitycomparing z-scores in the tables in Appendix A. We also highlight some interesting
facets of our results.
Of particular interest is the comparison of results from the dyad-level regression
models to those from community-level correlations. We note in particular that the
logistic regression and exponential random graph models that we employed for the
smallest 16 institutions specify that almost all institutions and all of their subnetworks give the highest model coefficient contribution towards a link between nodes
from a common High School. However, as we have seenand which is particularly
evident using the visualizations with tetrahedraat the community level, most institutions are organized by class year and have a relatively small correlation with
high school.
Even in the rare cases in which the rank ordering of the four correlations (with
Year, Residence, Major, and High School) at the community level matches that obtained via dyad-level model coefficients, such as with the logistic regression model for
the Full and Female networks from Caltech (36), the relative sizes of the contributions at the dyad level are completely different from those observed at the community
level. Caltech supplies an illustrative example of the different insights obtained from
24
25
second largest z-score in the Female and Male networks of Stanford (3), but it gives
the fourth largest z-score in Stanfords Full network. Even more interesting, Major
gives the second largest z-score for the Female network at UVA (16), the third largest
z-score for UVAs Male network, and the fourth largest z-score for its Full network.
The communities in the Auburn (71) Female network are dominated by Residence,
but those in the other Auburn networks are not. Similarly, the communities in the
MIT (8) Male network are dominated by Residence, but those in the other MIT (8)
networks are not. Another interesting disparity based on gender occurs in the communities in the Tennessee (95) Full and Student networks, which have their second
largest contributions from High School, whereas those in the other two Tennessee
networks have their second largest contributions from Residence.
5. Conclusions
We have studied the social structure of Facebook friendship networks at one
hundred American institutions at a single point in time (using data from September
2005). To compare the organizations of the 100 institutions using categorical data,
we considered both microscopic and macroscopic perspectives. In particular, calculating assortativity coefficients and regression model coefficients based on observed
ties allows one to examine homophily at the local level, and algorithmic community
detection allows a complementary macroscopic picture. These approaches complement each other, providing different perspectives on investigations of these Facebook
networks. Such complementary calculations are particularly valuable when the microscopic and macroscopic perspectives identify different dominant contributions.
For example, in the Caltech networks, the assumed ground truth of the importance
26
27
works exhibit a wider variation. Investigating this thoroughly would require different
data sets and methodologies, especially if one wishes to discern the causes of such
friendships from observed correlations.
The Facebook networks that we study offer imperfect representations of corresponding real-life social networks, which have different properties from online social
networks. It is thus crucial that our results are complemented by studies of the
corresponding real networks in order to quantify the extent of such differences.
Acknowledgements
We thank Adam DAngelo and Facebook for providing the data used in this
study. We also acknowledge Sandra Gonzalez-Bailon and Erik Kelsic for useful discussions. We thank Christina Frost for developing some of the graph visualization
code that we used (available at http://netwiki.amath.unc.edu/VisComms). ALT
was funded by the NSF through the UNC AGEP (NSF HRD-0450099) and by the
UNC ECHO program. PJM was funded by the NSF (DMS-0645369) and the UNC
ECHO program. MAP acknowledges a research award (#220020177) from the James
S. McDonnell Foundation.
28
Figure 1: Largest connected component of the student-only subset of the Reed College Facebook
network. (We used a Fruchterman and Reingold [1991] visualization.) Different node shapes and
gray scale indicate different class years (gray circles denote users who did not identify an affiliation), and the edges are randomly shaded for easy viewing. Clusters of nodes with the same
grayscale/shape suggest that common class year has an important effect on the aggregate Facebook
structure.
29
Figure 2: [Color] (Left) Vizualization of community structure of the Reed College Student Facebook
network shown in Figure 1. Node shapes and colors indicate class year (gray dots denote users who
did not identify an affiliation), and the edges are randomly shaded for easy viewing. We place
the communities using a Fruchterman and Reingold [1991] layout and use a Kamada and Kawai
[1989] layout to position the nodes within communities [Traud et al., 2009]. (Right) The same
network layout but with each community depicted as a pie. Larger pies represent communities
with larger numbers of nodes. Darker edges indicate the presence of more connections between the
corresponding communities.
30
60
36
36
2
44
60
60
Model Coefficients
Model Coefficients
60
44
()Edges
Year
Residence
High School
Major
()Edges
Year
Full Networks
High School
Major
60
60
36
36
44
Model Coefficients
Model Coefficients
Residence
Student Networks
36
36
3
36
2
85
()Edges
Year
Residence
High School
Major
Female Networks
()Edges
Year
Residence
High School
Major
Male Networks
Figure 3: Box plots (indicating median, quartiles, extent, and outliers of the distribution) of the
logistic regression nodematch coefficients for the 16 smallest institutions in the data for the model
described in the main text. We plot the edges values to present results with greater resolution.
We separately present our results for the Full, Student, Female, and Male networks.
31
60
4
3
Model Coefficients
Model Coefficients
60
36
2
1
36
3
2
60
81
0
1
85
()Edges
Triangles
Year
Residence
High School
Major
()Edges
Triangles
Full Networks
6
High School
Major
5
60
44 85
98
Model Coefficients
Model Coefficients
Residence
3
2
36
4
3
2
Year
Student Networks
()Edges
Triangles
Year
Residence
High School
Major
Female Networks
()Edges
Triangles
Year
Residence
High School
Major
Male Networks
Figure 4: Box plots (indicating median, quartiles, extent, and outliers of the distribution) of the
exponential random graph model coefficients described in the main text for the 16 smallest institutions in the data. We plot the edges values to present results with greater resolution. We
separately present our results for the Full, Student, Female, and Male networks.
32
1.1
.9
.7
.5
.3
.1
d [0,0.1): 63 cases
High School
d [.1,.2): 25 cases
d [.2,.3): 2 cases
d [.3,.4): 3 cases
d .4459: FSU 53
d [.5,.6): 3 cases
d .6250: Texas 84
d .7971: Auburn 71
d .8283: Texas 80
51
Year
80
Major
97
71
60
51
68
31 36
Residence
84
.2
.1
Major (0.19)
Year
.3
95
80
53
21
35
44
66
50
28
85
38
65
76
97
71
42
Residence (0.36)
67
Figure 5: [Color online] (Upper Left) Social organization tetrahedron for the community structures
of the Full component (largest connected component) of the networks for each of the 100 institutions.
Lighter disks indicate an organization that is based more predominantly on class year. See the main
text for a description of this figure. (Lower Right) Magnification near the Year vertex. The legend
illustrates the disk size as a function of the maximum distance d between the 6 different partitions
of the network. Most cases (88 out of 100 institutions) have d < .2.
33
1.1
.9
.7
.5
.3
.1
High School
d [0,0.1): 72 cases
d [.1,.2): 16 cases
d [.2,.3): 5 cases
d .3126: Maine 59
d .4084: USF 51
d [.6, .7): 4 cases
d .7244: Texas 80
51
59
Year
21
38
Major
44
68
36 31
51
.22
Residence
.1467
.0733
59
21
95
Year
3
6
80
7
9
2 1
5
Major (0.19)
64
8
44
97
71
67
38
65
60
Residence (0.58)
Figure 6: [Color online] (Upper Left) Social organization tetrahedron for the community structures
of the Student component of the networks for each of the 100 institutions. Lighter disks indicate an
organization that is based more predominantly on class year. See the main text for a description
of this figure. (Lower Right) Magnification near the Year vertex. As in Figure 5, the disk sizes
correspond to the maximum distances between partitions.
34
High School
1.1
.9
.7
.5
.3
.1
d [0,0.1): 50 cases
d [.1,.2): 30 cases
d [.2,.3): 7 cases
d [.3,.4): 4 cases
d .4418: UCF 52
d .5772: Oklahoma 97
d [.6,.7): 3 cases
d [.7, .8): 2 cases
d .8666: UF 21
d .9314: Texas 84
Year
21
Major
35 38
71
97
68
36
31
Residence
.3
53
21
.2
80
50
Major (0.38)
Year
84
95
.1
59
14
3
6
7
9 2
10
5 8
44
38
3566
67
71
Residence (0.36)
97
Figure 7: [Color online] (Upper Left) Social organization tetrahedron for the community structures
of the Female component of the networks for each of the 100 institutions. Lighter disks indicate an
organization that is based more predominantly on class year. See the main text for a description
of this figure. (Lower Right) Magnification near the Year vertex. As in the two previous figures,
the disk sizes indicate the maximum distances between partitions.
35
1.1
.9
.7
.5
.3
.1
d [0,0.1): 31 cases
High School
d [.1,.2): 35 cases
d [.2,.3): 14 cases
d [.3,.4): 9 cases
d [.4,.5): 4 cases
d [.5,.6): 3 cases
d .6542: UIllinois 20
95
71
59
Year
Major
97
57 68
Residence
95
.3
.2
71
.1
24
52
59
36 31
51
80
20
66
Year69
35
97
Major (0.34)
50
13
5
34
42
67
89
25
43
39
44
38
Residence (0.37)
Figure 8: [Color online] (Upper Left) Social organization tetrahedron for the community structures
of the Male component of the networks for each of the 100 institutions. Lighter disks indicate an
organization that is based more predominantly on class year. See the main text for a description of
this figure. (Lower Right) Magnification near the Year vertex. As in the three previous figures, disk
size indicates the maximum distance between partitions. We note that there are more d > .2 cases
here than in the previous figures, which illustrates the greater variability in the relative positions
of the z-scores in the different Male networks than was the case for the Full, Student, and Female
networks.
36
Appendix A. Tables
In Table A.1, we give for each of the 100 institutions the numbers of nodes
and edges for each of the Facebook networks (and subsets thereof) that we have
investigated. In Table A.2, we give the assortativity values for each of the networks.
For each institution, we calculate assortativity values for Gender only for the Full
and Student network subsets. We calculate Major, Residence, Year, and High School
assortativity values for each of the four network subsets (Full, Student, Female, and
Male).
Recall that we studied regression models for the 16 institutions with the smallest
Facebook networks. In Table A.4, we report the results of a logistic regression model
with edge and nodematch terms. (All coefficients differ from zero with p-values
less than 1 104 .) In Table A.5, we similarly report the results of an ERGM that
supplements the logistic regression model with triangle terms. (Again, all resulting
model coefficients differ from zero with a p-value of less than 1 104 .)
In Table A.5, we report the maximum z-score for each demographic category that
we obtained from the 6 different community detection partitions (described in the
text) of each Facebook network (and their subsets) compared to categorical partitions
based on each of Major, Residence, Year, and High School. We divide the networks
in this table into five sections: (1) networks for which the High School category gives
the highest z-score; (2) networks for which the Residence category gives the highest
z-score; (3) networks for which Year gives the highest z-score and High School gives
the second highest; (4) networks for which Year gives the highest z-score and Major
gives the second highest; and (5) networks for which Year gives the highest z-score
37
38
Table A.1: Characteristics for each of the networks and subnetworks: institution name, the identifying number given by Facebook, number of nodes in each network and subnetwork, and the
number of edges in each network and subnetwork.
39
Institution
Harvard
Columbia
Stanford
Yale
Cornell
Dartmouth
UPenn
MIT
NYU
BU
Brown
Princeton
Berkeley
Duke
Georgetown
UVA
BC
Tufts
Northeastern
U Illinios
UF
Wellesley
Michigan
MSU
Northwestern
UCLA
Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
40
Institution
Emory
UNC
Tulane
UChicago
Rice
WashU
UC
UCSD
USC
Caltech
UCSB
Rochester
Bucknell
Williams
Amherst
Swarthmore
Wesleyan
Oberlin
Middlebury
Hamilton
Bowdoin
Vanderbilt
Carnegie
UGA
USF
UCF
FSU
GWU
Johns Hopkins
Number
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
41
Institution
Syracuse
Notre Dame
Maryland
Maine
Smith
UC
Villanova
Virginia
UC
Cal
Mississippi
Michigan
UCSC
Indiana
Vermont
Auburn
USFCA
Wake
Santa
American
Haverford
Williams
MU
JMU
Texas
Simmons
Bingham
Temple
Texas
Number
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
42
Institution
Vassar
Pepperdine
Wisconsin
Colgate
Rutgers
Howard
UConn
UMass
Baylor
Penn
Tennessee
Lehigh
Oklahoma
Reed
Brandeis
Trinity
Number
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Table A.2: Assortativity values for each category for each of the
4 networks (Full, Student, Female, and Male) for each of the 100
institutions. We only calculate assortativity by Gender for the Full
and Student networks. (We leave blank spots in the corresponding
table entries for the Male and Female networks.)
Institution No.
Harvard 1
Gender
Major
Residence
Year
High School
Columbia 2
Gender
Major
Residence
Year
High School
Stanford 3
Gender
Major
Residence
Year
High School
Yale 4
Gender
Major
Residence
Year
High School
Cornell 5
Gender
Major
Residence
Year
High School
Dartmouth 6
Gender
Major
Residence
Year
High School
Continued on Next
Full
Student
Female
Male
0.058144
0.056293
0.14679
0.47981
0.023132
0.049178
0.046659
0.11951
0.60723
0.02419
0.051852
0.13803
0.4871
0.024247
0.064064
0.15431
0.44035
0.026473
0.087283
0.045257
0.13271
0.51348
0.029259
0.085847
0.036112
0.13551
0.6002
0.030061
0.043728
0.1625
0.55303
0.03254
0.06024
0.14249
0.47743
0.028501
0.056583
0.048574
0.12067
0.44456
0.021472
0.049545
0.033901
0.10887
0.54456
0.023851
0.042221
0.1499
0.43978
0.022906
0.058083
0.16531
0.40632
0.022649
0.036704
0.041703
0.26727
0.48308
0.018269
0.031144
0.046659
0.11951
0.60723
0.02419
0.041228
0.27204
0.52242
0.019705
0.044829
0.26567
0.43417
0.020295
0.090725
0.10367
0.25426
0.47504
0.033164
0.0879
0.095703
0.23819
0.56588
0.03543
0.10503
0.35124
0.47434
0.029579
0.10218
0.34471
0.42828
0.037021
0.10284
0.03729
0.17773
0.49014
0.014366
Page. . .
0.062793
0.029281
0.12551
0.61052
0.015213
0.037923
0.24733
0.53787
0.015285
0.039882
0.28336
0.41358
0.014707
43
44
Male
0.062728
0.34866
0.41714
0.034844
0.067387
0.34262
0.28538
0.016056
0.14657
0.19867
0.50102
0.043782
0.073732
0.16631
0.49861
0.037717
0.049159
0.15337
0.44248
0.025049
0.051068
0.096722
0.47155
0.024282
0.081518
0.24591
0.35734
45
Male
0.10399
0.047729
0.22614
0.45159
0.01879
0.049058
0.18187
0.50492
0.033592
0.054598
0.2914
0.41752
0.085465
0.038293
0.16084
0.61562
0.049044
0.040925
0.14288
0.46043
0.022329
0.11922
0.20364
0.46986
0.051935
0.063856
0.38529
46
Male
0.36043
0.19005
0.051894
0.24929
0.31326
0.19965
0
0
0
0
0.074332
0.28886
0.3765
0.14946
0.048454
0.28035
0.32903
0.2291
0.096049
0.32476
0.33697
0.021729
0.056997
0.32916
0.33708
0.089885
0.028936
47
Male
0.31422
0.42816
0.020963
0.052124
0.2459
0.32994
0.12872
0.059948
0.45224
0.36371
0.029477
0.063178
0.32858
0.32378
0.016953
0.061407
0.50887
0.28657
0.016986
0.038292
0.20308
0.46872
0.019846
0.044466
0.48007
0.42992
0.12326
48
Male
0.040088
0.40003
0.43005
0.10091
0.096026
0.31081
0.3378
0.055851
0.03799
0.48219
0.26326
0.0022829
0.054137
0.39801
0.44318
0.070505
0.087977
0.298
0.38851
0.017675
0.0633
0.24857
0.46584
0.011993
0.038403
0.12653
0.45584
0.011736
49
Male
0.03902
0.077428
0.40988
0.013311
0.054311
0.11301
0.32337
0.001478
0.067817
0.13583
0.42467
0.020264
0.1097
0.13695
0.29632
0.010669
0.03541
0.18188
0.47478
0.018134
0.030141
0.12034
0.37736
0.01371
0.038996
0.10406
0.44429
0.015678
50
Male
0.069603
0.27768
0.43518
0.026794
0.14232
0.28937
0.38927
0.023089
0.041416
0.45489
0.31287
0.19317
0.037232
0.27454
0.2527
0.16121
0.041218
0.25123
0.30039
0.17058
0.054106
0.41138
0.26617
0.13913
0.048212
0.16795
0.42325
51
Male
0.020061
0.078166
0.13263
0.42893
0.01772
0.10169
0.35714
0.40872
0.030692
0.04508
0.50916
0.42931
0.034602
0.069243
0.24207
0.40721
0.1872
0.088474
0.28775
0.2253
0.20329
0
0
0
0
0.059913
0.4018
52
Male
0.37918
0.096607
0.06106
0.19118
0.58783
0.038226
0.060859
0.19839
0.34308
0.13209
0.054324
0.42637
0.34304
0.089578
0.12329
0.3968
0.31782
0.080967
0.049216
0.48978
0.2597
0.12691
0.093527
0.47499
0.26151
0.056137
0.059811
53
Male
0.48138
0.4235
0.07741
0.051613
0.55282
0.33748
0.17705
0.051976
0.27715
0.43136
0.075569
0.0671
0.67726
0.24262
0.17876
0.060111
0.27651
0.50134
0.027827
0.039131
0.30473
0.38007
0.017433
0.059648
0.19262
0.42026
0.058071
54
Male
0.052386
0.32847
0.31716
0.0082203
0.031393
0.12643
0.32713
0.0043886
0.045295
0.29039
0.42774
0.033152
0.046951
0.45396
0.42438
0.10153
0.069393
0.22017
0.43857
0.10848
0.085572
0.37658
0.31836
0.1849
0
0
0
0
55
Male
0.053214
0.19757
0.32871
0.084548
0.07327
0.2333
0.40727
0.10703
0.067313
0.38312
0.30186
0.15246
0.058073
0.23329
0.39599
0.011445
0.041034
0.27511
0.37374
0.0073864
0.043372
0.39075
0.31841
0.15286
0.039073
0.25962
0.46105
0.014768
56
Male
0.069841
0.28142
0.37527
0.17869
0.048478
0.20127
0.39081
0.013863
0.049408
0.14729
0.36441
0.17911
0.095808
0.27678
0.38243
0.14951
0.051998
0.50984
0.35905
0.058399
0.059916
0.35064
0.37095
0.18528
0.05075
0.33083
0.26175
Table
Institution
High School
Lehigh 96
Gender
Major
Residence
Year
High School
Oklahoma 97
Gender
Major
Residence
Year
High School
Reed 98
Gender
Major
Residence
Year
High School
Brandeis 99
Gender
Major
Residence
Year
High School
Trinity 100
Gender
Major
Residence
Year
High School
0.059833
0.045209
0.25827
0.55849
0.019471
0.040137
0.43546
0.49009
0.013934
0.056438
0.39254
0.44806
0.024868
0.11176
0.04115
0.40326
0.29235
0.1583
0.1172
0.032522
0.39682
0.31461
0.16712
0.039645
0.58012
0.29748
0.12993
0.04512
0.5948
0.2493
0.17418
0.021903
0.047233
0.13295
0.34748
0.0032333
0.012225
0.037594
0.13377
0.39118
0.0028504
0.058292
0.14915
0.42112
0.0020284
0.052558
0.090487
0.27715
0.0016893
0.019401
0.041497
0.19401
0.52964
0.014241
0.022782
0.035748
0.19338
0.61682
0.014872
0.04476
0.22725
0.58517
0.014966
0.043044
0.18293
0.47524
0.014663
0.052012
0.050441
0.10577
0.5079
0.01613
0.041459
0.043578
0.10634
0.58971
0.016656
0.045839
0.12206
0.55402
0.014522
0.065184
0.10248
0.43875
0.021751
57
58
Wellesley 22
Full
Student
Female
Male
Caltech 36
Full
Student
Female
Male
Williams 40
Full
Student
Female
Male
Amherst 41
Full
Student
Female
Male
Swarthmore 42
Full
Student
Female
Edge
4.4291(0.0047086)
4.3656(0.0063232)
4.4673(0.0054048)
NA
Edge
3.6903(0.012891)
3.4932(0.017086)
3.045(0.035464)
3.7902(0.022582)
Edge
4.221(0.0045298)
4.1503(0.0062345)
4.2218(0.0097145)
4.0071(0.0095503)
Edge
3.9164(0.0049089)
3.8449(0.0066932)
3.8278(0.010538)
3.8611(0.010649)
Edge
3.635(0.0058633)
3.5712(0.0077451)
3.5944(0.012607)
Year
1.8249(0.0067097)
1.7437(0.0082165)
1.8587(0.0073823)
NA
Year
1.5382(0.018233)
1.4006(0.021534)
1.4288(0.049983)
1.5104(0.029781)
Year
2.1133(0.0063052)
2.0076(0.007883)
2.1577(0.012801)
1.9273(0.013487)
Year
2.0068(0.0069995)
1.9466(0.0086346)
2.1198(0.014293)
1.8312(0.015146)
Year
1.7006(0.0085934)
1.6388(0.010329)
1.7912(0.017337)
Residence
1.2546(0.012002)
1.2512(0.013129)
1.2749(0.012577)
NA
Residence
2.4151(0.018644)
2.3896(0.022905)
2.1684(0.053205)
2.4803(0.029657)
Residence
0.93506(0.011943)
0.95814(0.012448)
1.0063(0.023198)
0.88885(0.024598)
Residence
1.1385(0.017204)
1.0997(0.018196)
1.1944(0.034174)
1.2709(0.033384)
Residence
0.70677(0.014092)
0.70249(0.015382)
0.70752(0.028369)
High School
3.1738(0.041398)
3.2966(0.05217)
3.19(0.044896)
NA
High School
2.3789(0.14869)
2.5169(0.1944)
1.3514(0.43722)
2.887(0.23382)
High School
3.1413(0.036901)
3.3846(0.047399)
3.1839(0.06889)
3.0015(0.07507)
High School
2.7878(0.043122)
3.0146(0.053588)
2.9552(0.091756)
2.6513(0.076298)
High School
2.8177(0.087157)
3.108(0.11187)
3.1246(0.17728)
Major
0.70232(0.013501)
0.62071(0.01745)
0.66471(0.014677)
NA
Major
0.53388(0.02881)
0.47013(0.035936)
0.44336(0.072743)
0.51028(0.044684)
Major
0.63891(0.01226)
0.59197(0.015798)
0.62403(0.02406)
0.63484(0.023232)
Major
0.56196(0.014974)
0.45937(0.019933)
0.44155(0.03109)
0.57283(0.028969)
Major
0.71062(0.015732)
0.62213(0.020307)
0.71791(0.03107)
59
Male
Oberlin 44
Full
Student
Female
Male
Middlebury 45
Full
Student
Female
Male
Hamilton 46
Full
Student
Female
Male
Bowdoin 47
Full
Student
Female
Male
Smith 60
Full
Student
Female
Male
USFCA 72
Full
Student
Female
Male
Haverford 76
Full
Student
Female
3.4944(0.012307)
Edge
4.3357(0.0045547)
4.3477(0.0057572)
4.382(0.0092964)
4.2048(0.011081)
Edge
4.4107(0.0045357)
4.4519(0.0061279)
4.4496(0.0096593)
4.2906(0.01027)
Edge
3.9231(0.0047892)
3.9278(0.0062417)
3.8481(0.0095168)
3.7214(0.010321)
Edge
4.0994(0.0053132)
4.0369(0.0068883)
4.0971(0.011542)
4.0007(0.011566)
Edge
4.5226(0.0048951)
4.5565(0.0064751)
4.6143(0.0058739)
NA
Edge
4.6268(0.0058034)
4.6201(0.0064663)
4.7401(0.0097375)
4.3531(0.018115)
Edge
3.4051(0.0060883)
3.2009(0.0074664)
3.3442(0.011877)
1.553(0.018316)
Year
1.4322(0.0071089)
1.4406(0.0081899)
1.512(0.013473)
1.3069(0.017141)
Year
2.0753(0.0059187)
2.0652(0.0073273)
2.1589(0.01186)
1.9748(0.013337)
Year
1.8442(0.0067955)
1.8496(0.0080481)
1.9502(0.012943)
1.5511(0.015012)
Year
2.0771(0.0073015)
1.9903(0.0087614)
2.1747(0.014967)
1.8768(0.016042)
Year
1.44(0.0070185)
1.4702(0.008444)
1.5123(0.0079156)
NA
Year
1.6115(0.0083192)
1.6342(0.0088723)
1.6713(0.013044)
1.6391(0.025116)
Year
1.7879(0.0088662)
1.6081(0.0099901)
1.9069(0.016546)
0.73946(0.027981)
Residence
1.0716(0.013797)
1.1044(0.014159)
1.1808(0.024895)
1.0426(0.031315)
Residence
0.76052(0.0074835)
0.82491(0.0082675)
0.82401(0.014034)
0.72369(0.016737)
Residence
0.84034(0.011975)
0.83128(0.012498)
0.9582(0.021613)
0.90341(0.025581)
Residence
0.9616(0.012875)
0.96466(0.013573)
1.1435(0.023846)
1.0069(0.027116)
Residence
3.0814(0.0086746)
3.065(0.010562)
3.1297(0.009554)
NA
Residence
0.90162(0.011441)
0.8675(0.01168)
0.99928(0.016643)
0.96613(0.033156)
Residence
0.45404(0.011702)
0.39078(0.012184)
0.42992(0.022171)
2.4786(0.18762)
High School
3.2257(0.042543)
3.3936(0.050744)
3.3713(0.077285)
3.051(0.10063)
High School
3.3979(0.031385)
3.6831(0.04)
3.6264(0.063049)
3.3119(0.064183)
High School
3.026(0.042724)
3.2264(0.052715)
3.0543(0.085707)
3.1322(0.081785)
High School
3.1465(0.041196)
3.3839(0.050362)
3.1707(0.083632)
3.2853(0.080941)
High School
3.8(0.049519)
4.0877(0.062345)
3.9079(0.054194)
NA
High School
3.1032(0.031585)
3.2174(0.033997)
3.3412(0.044919)
2.7253(0.088089)
High School
2.9137(0.07691)
3.0223(0.092203)
2.9156(0.14531)
0.73991(0.030376)
Major
1.4604(0.010714)
1.3832(0.01303)
1.5071(0.019651)
1.3883(0.025176)
Major
0.79632(0.012067)
0.71206(0.015883)
0.77215(0.023298)
0.76615(0.024516)
Major
0.66129(0.014902)
0.59501(0.018189)
0.65026(0.02958)
0.57013(0.027734)
Major
0.63376(0.015324)
0.58314(0.018703)
0.58128(0.033008)
0.68168(0.028132)
Major
0.93814(0.013074)
0.86763(0.017044)
0.94156(0.01446)
NA
Major
0.66308(0.011574)
0.629(0.012479)
0.83048(0.017243)
0.41582(0.032116)
Major
0.64285(0.019116)
0.51009(0.02355)
0.59125(0.034678)
60
Male
Simmons 81
Full
Student
Female
Male
Vassar 85
Full
Student
Female
Male
Reed 98
Full
Student
Female
Male
Trinity 100
Full
Student
Female
Male
3.2342(0.013433)
Edge
4.2939(0.0087542)
4.2823(0.010262)
4.266(0.0093971)
NA
Edge
4.4257(0.0045202)
4.3601(0.0058449)
4.582(0.0088969)
4.195(0.011564)
Edge
3.6205(0.0099372)
3.6229(0.012141)
3.6937(0.020247)
3.3999(0.025163)
Edge
4.1159(0.0046382)
4.1318(0.0060873)
4.0764(0.0098017)
4.0567(0.010444)
1.5054(0.020176)
Year
1.9127(0.011746)
1.941(0.013004)
1.8995(0.01233)
NA
Year
1.813(0.0060722)
1.7041(0.0072602)
1.9572(0.011179)
1.5908(0.015975)
Year
1.5(0.015705)
1.4782(0.017725)
1.614(0.028894)
1.3096(0.039777)
Year
2.0271(0.0063319)
2.0143(0.0076607)
2.1975(0.012669)
1.7776(0.014516)
0.44079(0.02505)
Residence
0.71252(0.017391)
0.67853(0.017657)
0.69221(0.017762)
NA
Residence
1.3142(0.007704)
1.4151(0.0083399)
1.3373(0.013584)
1.2077(0.020043)
Residence
1.4399(0.033769)
1.4925(0.034523)
1.5385(0.060679)
1.2103(0.086745)
Residence
0.77702(0.012227)
0.7988(0.01275)
0.79113(0.022218)
0.76933(0.027546)
2.9901(0.16665)
High School
3.1819(0.061849)
3.2452(0.06925)
3.16(0.063873)
NA
High School
3.4271(0.039439)
3.7486(0.049088)
3.7342(0.0691)
3.1518(0.093015)
High School
2.9666(0.14784)
3.0584(0.17396)
2.8801(0.24827)
3.2633(0.36283)
High School
3.1233(0.032458)
3.4011(0.040157)
3.2724(0.067273)
3.0224(0.060545)
0.62004(0.040993)
Major
0.95847(0.019342)
0.93096(0.021004)
0.93484(0.019949)
NA
Major
0.92801(0.012093)
0.79613(0.015441)
0.8989(0.021542)
1.0176(0.028251)
Major
0.78979(0.029502)
0.6773(0.035648)
0.86436(0.049343)
1.0037(0.067066)
Major
0.80619(0.012694)
0.71446(0.016092)
0.89966(0.025649)
0.73818(0.024533)
Table A.4: ERGM coefficients for the model (described in the text) that combines density (edge) and triangle terms with nodematch contributions representing the increased propensity for two nodes with the same categorical value
to have an edge connected between them. (This is calculated individually for
Year, Residence, High School, and Major.) We give the standard error for each
coefficient in parentheses. All coefficients are statistically significantly different from zero with p-value less than 1 104 . Wellesley (22), Smith (60), and
Simmons (81) are female-only institutions, so we list the values for their Male
networks as NA.
61
Wellesley 22
Full
Student
Female
Male
Caltech 36
Full
Student
Female
Male
Williams 40
Full
Student
Female
Male
Amherst 41
Full
Student
Female
Male
Swarthmore 42
Full
Student
Female
Male
Oberlin 44
Full
Student
Female
Male
Middlebury 45
Full
Student
Female
Edges
5.5166(0.29946)
5.395(0.40299)
5.5395(0.47528)
NA
Edges
4.9776(0.0013776)
4.8284(0.001786)
4.5427(0.058123)
4.9734(0.033352)
Edges
5.3284(0.19432)
5.1347(0.24863)
5.3368(0.013971)
5.2602(0.014726)
Edges
5.0914(0.097866)
4.9092(0.071772)
5.0074(0.01569)
5.106(0.016455)
Edges
4.8312(0.17358)
4.698(0.011101)
4.7717(0.018696)
4.8247(0.019575)
Edges
5.3989(0.088183)
5.3757(0.67096)
5.4066(0.013259)
5.2834(0.016255)
Edges
5.5042(0.67137)
5.3837(0.77243)
5.5484(0.0067934)
Triangles
0.18714(0.00040795)
0.18873(0.00054665)
0.20854(0.00050963)
NA
Triangles
0.17766(1.64e 005)
0.1836(1.89e 005)
0.34325(0.0067684)
0.28127(0.0030727)
Triangles
0.14604(0.00031271)
0.16169(0.00043304)
0.28741(0.0012684)
0.26068(0.001206)
Triangles
0.12103(0.00030109)
0.12695(0.011275)
0.21904(0.0011268)
0.24842(0.0013)
Triangles
0.12423(0.016066)
0.12352(8.09e 005)
0.21474(0.0013858)
0.24087(0.001571)
Triangles
0.19739(0.015958)
0.21399(0.00056897)
0.38758(0.0016842)
0.39322(0.0021443)
Triangles
0.14939(0.07867)
0.15998(0.00039806)
0.29748(0.0011819)
Year
1.0432(1.1574)
0.89815(0.93197)
1.0713(0.70673)
NA
Year
0.99434(0.0014976)
0.89239(0.0017737)
1.0623(0.016542)
1.0405(0.036294)
Year
0.85073(0.0080651)
0.60271(0.47267)
0.89619(0.01105)
1.0773(0.0035795)
Year
0.88125(0.30594)
0.73901(0.18123)
0.98463(0.018087)
0.99941(0.019207)
Year
0.96422(0.26284)
0.85491(0.018375)
1.0746(0.00017492)
0.98215(0.022589)
Year
0.7668(0.42501)
0.68832(1.9864)
0.79641(0.097431)
0.8105(0.021123)
Year
0.98073(0.9038)
0.79027(1.1441)
0.92112(0.026494)
Residence
1.2079(0.014731)
1.262(0.83757)
1.2339(0.53312)
NA
Residence
1.1638(0.0010284)
1.2991(0.00098228)
1.3504(0.035112)
1.1781(0.039183)
Residence
1.1718(0.82606)
1.1717(0.94455)
1.3187(0.0024651)
1.1681(0.0051222)
Residence
0.88007(0.60964)
0.94128(0.9257)
0.98007(1.1477)
0.86681(0.006822)
Residence
0.79737(0.34465)
0.85656(0.034037)
0.93786(0.0024781)
0.6948(0.00082592)
Residence
1.1172(0.9797)
1.1047(2.2993)
1.2488(3.0612)
1.0634(0.027757)
Residence
0.61487(1.2033)
0.64183(0.9573)
0.56416(0.0037157)
High School
3.612(8.5012)
3.7139(8.9764)
3.6698(6.0111)
NA
High School
2.8536(0.087757)
3.0022(0.12434)
1.6776(0.099503)
3.3862(0.25271)
High School
3.5184(4.5405)
3.7627(15.1999)
3.6318(0.13512)
3.4581(0.044289)
High School
3.1539(2.531)
3.3902(3.3786)
3.3534(12.2931)
3.0624(0.12347)
High School
3.2278(11.7489)
3.5384(0.14695)
3.6738(0.38146)
2.8505(0.14226)
High School
3.5716(69.9269)
3.7576(66.9912)
3.7024(30.0234)
3.3725(1.3266)
High School
3.7714(8.0401)
4.0159(11.4279)
4.0895(0.086636)
Major
0.58573(0.01648)
0.44432(1.2761)
0.61932(1.858)
NA
Major
0.64673(0.0021013)
0.59894(0.001494)
0.64556(0.011212)
0.61173(0.17517)
Major
0.39443(0.42159)
0.38077(1.3158)
0.45421(0.0031961)
0.31836(0.0021898)
Major
0.5757(0.89419)
0.53866(0.93067)
0.5091(0.47295)
0.5913(0.0030018)
Major
0.63143(0.75281)
0.52548(0.014076)
0.49991(0.0025816)
0.61786(0.003426)
Major
0.4834(0.79777)
0.66727(3.0137)
0.37583(0.90965)
0.83047(0.018054)
Major
0.51794(5.4033)
0.47678(2.1974)
0.61937(0.020853)
62
Male
Hamilton 46
Full
Student
Female
Male
Bowdoin 47
Full
Student
Female
Male
Smith 60
Full
Student
Female
Male
USFCA 72
Full
Student
Female
Male
Haverford 76
Full
Student
Female
Male
Simmons 81
Full
Student
Female
Male
Vassar 85
Full
Student
Female
Male
Reed 98
Full
Student
Female
Male
Trinity 100
Full
Student
5.4012(0.014882)
Edges
5.1526(0.15758)
5.0475(0.20987)
5.1103(0.00025686)
5.2164(0.017572)
Edges
5.1231(0.49764)
4.9871(0.17599)
5.1156(0.00099468)
5.2312(0.00035892)
Edges
5.7499(0.46896)
5.6751(0.35105)
5.7559(0.14784)
NA
Edges
5.5339(0.08133)
5.4978(0.4816)
5.6942(0.013524)
5.2138(0.024502)
Edges
4.5864(0.17922)
4.5248(0.011488)
4.5842(0.018029)
4.5974(0.021694)
Edges
5.1447(0.011497)
5.0396(0.012814)
5.0919(0.012148)
NA
Edges
5.4365(0.042913)
5.3447(0.66641)
5.4876(0.01176)
5.2473(0.016541)
Edges
4.7342(0.014847)
4.6732(0.017455)
4.7287(0.028907)
4.4269(0.036284)
Edges
5.2594(0.50302)
5.144(0.82673)
0.27717(3.31e 006)
Triangles
0.13229(0.010524)
0.13542(0.00038928)
0.22978(0.0010554)
0.25379(0.0012891)
Triangles
0.12537(0.0003663)
0.13258(0.0004469)
0.2751(0.00149)
0.28383(0.0015608)
Triangles
0.23032(0.040735)
0.25538(0.00069)
0.28145(0.0054268)
NA
Triangles
0.21369(0.019896)
0.218(0.00060193)
0.31715(0.0013151)
0.4218(0.0034552)
Triangles
0.099312(0.00033645)
0.097998(0.00038937)
0.18037(0.0011641)
0.20668(0.0015226)
Triangles
0.2364(0.00096724)
0.23882(0.0001411)
0.23884(0.00075486)
NA
Triangles
0.16286(0.00033258)
0.1653(0.0004181)
0.31638(9.93e 005)
0.31715(0.0017542)
Triangles
0.19271(0.0010667)
0.20779(0.0013555)
0.34763(0.0037349)
0.38754(0.0057624)
Triangles
0.13124(0.067744)
0.13839(0.030441)
1.165(0.01032)
Year
0.89247(0.38942)
0.78719(0.39866)
0.91713(0.1191)
1.1662(0.019208)
Year
0.84602(0.44413)
0.73847(0.010887)
0.89048(0.0045874)
1.1377(0.00064391)
Year
1.0244(0.71782)
0.87631(0.61986)
1.0443(0.22894)
NA
Year
0.81903(0.31274)
0.77135(0.20243)
1.0311(0.016106)
0.71035(0.032585)
Year
0.88251(0.36604)
0.8307(0.012088)
0.83102(0.020477)
1.0335(0.024614)
Year
0.62361(0.015007)
0.52922(0.01664)
0.61268(0.0044493)
NA
Year
0.89224(0.36023)
0.78236(1.1368)
0.85905(0.013732)
1.0972(0.019096)
Year
0.89641(0.019018)
0.81768(0.021382)
0.97335(0.034508)
0.81151(0.047624)
Year
0.88149(1.0778)
0.66726(0.81806)
0.70103(0.023808)
Residence
0.60097(0.9291)
0.61503(0.85391)
0.6133(0.99652)
0.81602(0.0043879)
Residence
0.86002(1.7887)
0.9108(0.85058)
0.96132(0.015526)
0.81674(0.0047172)
Residence
1.318(1.7496)
1.4951(1.5255)
1.2561(0.53313)
NA
Residence
0.75232(0.48257)
0.75418(0.5606)
0.85946(0.031407)
0.74138(0.0012662)
Residence
0.4303(0.49797)
0.50822(0.027194)
0.45689(0.026914)
0.47377(0.00061681)
Residence
0.04641(0.022947)
0.017321(0.093667)
0.0096188(0.026243)
NA
Residence
1.0009(1.5575)
1.0869(0.73536)
1.0423(0.0013954)
1.0899(0.024127)
Residence
1.5839(0.41431)
1.586(0.040154)
1.7788(0.0037734)
1.2315(0.005725)
Residence
0.81391(1.9123)
0.86642(1.6612)
3.7024(0.051105)
High School
3.4533(2.764)
3.639(5.3808)
3.5653(6.2228)
3.6524(0.13302)
High School
3.5147(14.3023)
3.7404(2.129)
3.6624(0.068102)
3.7484(0.056753)
High School
4.3908(27.5879)
4.6639(26.1729)
4.466(6.4417)
NA
High School
3.3646(3.7542)
3.4592(2.2951)
3.5978(0.072184)
3.0314(0.020408)
High School
3.4762(5.3548)
3.6771(0.19927)
3.5413(0.15545)
3.5159(0.26688)
High School
3.6491(0.066845)
3.6926(0.16704)
3.6168(0.099064)
NA
High School
3.8325(13.4455)
4.1626(2.7852)
4.0763(0.073194)
3.5254(0.05049)
High School
3.4991(10.8338)
3.5945(0.18597)
3.3521(0.12521)
3.9308(0.15859)
High School
3.5938(3.3991)
3.7899(23.8254)
0.24567(0.0032737)
Major
0.57065(0.66635)
0.52289(1.3097)
0.71784(0.30967)
0.26404(0.0025362)
Major
0.53053(1.4671)
0.48614(0.94847)
0.62781(0.0045592)
0.405(0.0021074)
Major
0.95945(1.1995)
0.96959(1.0772)
0.96695(0.7678)
NA
Major
0.65908(0.40194)
0.61892(0.66365)
0.78521(0.0056399)
0.41939(0.00063143)
Major
0.68087(0.86129)
0.54286(0.016676)
0.60269(0.04109)
0.66119(0.0035633)
Major
0.95822(0.006254)
0.88137(0.012457)
0.94789(0.0039044)
NA
Major
0.76399(0.83096)
0.66064(1.1939)
0.77276(0.033551)
0.71203(0.0019611)
Major
0.94969(0.24672)
0.80753(0.04143)
0.8996(0.0035483)
0.96303(0.0026607)
Major
0.62169(1.5141)
0.57219(1.6865)
Female
Male
5.2108(0.014321)
5.3106(4.73e 006)
0.21239(0.00096177)
0.27532(0.0013436)
1.1086(0.016232)
1.1575(0.26773)
0.78131(0.0033131)
0.92286(7.0628)
3.8014(0.056335)
3.5167(22.874)
0.75326(0.0233)
0.53424(0.6823)
63
Major
Residence
Graduation Year
High School
37.6893
15.6741
14.8497
20.4034
42.7784
42.8019
70.6776
58.5508
15.3123
4.0649
36.7889
19.2677
3.0762
28.8843
4.912
1 .4788
24.6517
11.011
13.9046
3.3216
15.0014
23.497
1404.4502
222.9566
945.8502
1523.4423
202.1448
1240.1219
882.3474
74.1988
558.2736
62.316
703.6264
168.4986
881.5186
421.0489
196.14
8.5967
481.3222
137.8101
13.7929
584.0597
45.6332
7.6852
315.4706
14.4551
30.5491
7.7892
301.8338
185.544
4.3858
4.9078
10.3043
1 .6253
6.303
5.7107
2.9773
1 .2637
6.338
33.9673
2.3049
1 .0672
8.4277
6.0851
32.9283
13.2168
23.9067
6.484
13.0291
15.5217
9.9473
6.0026
14.6714
14.6714
46.2515
17.0962
78.8738
10.1971
11.0127
11.4586
17.6237
22.8679
13.517
13.517
707.9697
178.794
486.6029
105.5474
349.4409
105.1908
133.8587
135.8974
31.8319
31.8319
47.4424
19.1333
86.6653
24.1933
24.3501
31.4381
29.9162
30.2374
24.5193
24.5193
64
65
High School
17.7493
5.7939
7.8962
4.8342
17.8845
11.864
18.3328
2.6847
15.0646
12.443
11.3959
11.8733
5.4097
10.1694
4.2095
1 .0234
12.0078
11.3447
6.1141
36.2596
14.1069
6.8457
5.0467
8.8687
9.9162
2.8402
4.7979
3.507
3.455
4.2153
3.4239
17.8818
31.3413
5.8206
10.9256
4.201
15.0616
9.9322
14.8522
10.5517
66
High School
27.0165
16.5656
8.67
6.5701
28.4218
20.272
20.272
6.6782
7.6147
3.2915
5.4715
12.0415
3.0992
3.6838
10.1691
7.6544
2.6807
22.4089
22.4089
9.2008
5.1958
11.5949
0 .16511
12.6565
3.0583
26.3601
7.2293
29.9686
11.8998
21.2781
28.9704
44.2896
17.3298
26.0983
33.7839
20.0139
25.3283
17.9455
17.3751
24.5735
67
High School
30.7617
26.6722
12.8595
5.3753
28.2528
22.0401
6.2652
16.4774
13.2794
10.9341
14.1023
17.0108
14.6558
8.6329
15.1781
15.5454
24.8996
28.5275
16.7272
5.2523
6.7421
7.8417
4.9872
15.926
3.319
12.7389
4.3781
6.5484
1 .9616
18.017
19.6318
16.1353
15.8778
4.9392
10.6129
24.4243
6.2027
21.766
14.0708
14.2972
5.8402
68
High School
7.2362
31.6267
2.542
26.1569
3.754
21.1789
1 .6058
3.2152
11.5192
2.9285
5.6861
10.0082
10.8734
33.7045
1 .0627
6.2484
2.0241
8.9735
14.576
8.087
0 .90495
13.8896
9.5124
9.884
17.3355
23.1512
13.0267
21.6188
1 .8708
3.4923
9.2816
40.0407
17.5707
18.8581
20.4126
25.7134
33.1935
2.8024
14.1117
33.2104
17.9475
69
High School
57.5358
16.8765
18.918
87.9151
24.9138
11.8712
21.5303
4.8722
20.804
27.9011
10.7809
15.1437
14.6514
14.2458
14.5701
11.7835
7.7714
2.2012
20.1586
19.8743
24.1509
11.7636
2.9423
0 .46698
10.0502
2.5291
32.2286
3.6865
4.2397
6.3982
9.1869
18.5072
19.2879
20.1894
16.989
10.3499
1 .956
0 .40435
16.3225
5.8634
16.7394
70
High School
8.0883
4.1314
5.1582
7.768
25.5351
8.9041
21.7507
4.3133
30.7547
2.6253
3.8915
8.4787
2.4023
4.6039
7.29
20.8494
1 .3604
23.3638
2.2797
16.5861
2.4749
0 .48056
0 .25577
16.7557
5.9009
2.9387
10.4414
7.3308
8.0472
12.6765
5.6521
23.247
5.16
3.0373
10.661
7.1149
11.7625
9.5636
14.928
2.8138
8.6466
71
High School
9.9511
15.2166
13.3183
25.4317
17.3038
10.7485
7.029
27.0522
21.7408
4.7456
23.0555
37.9331
13.9386
9.2169
11.9534
5.8434
9.4384
20.077
8.6492
12.0622
6.7378
5.2591
1 .4486
2.4484
20.5374
2.3624
11.2319
3.7296
5.1717
3.5987
24.5577
29.1432
6.434
6.608
8.8552
6.4544
2.8517
5.0608
8.2776
17.7199
11.6094
72
High School
17.6171
2.5953
4.9854
5.0599
12.1288
4.5333
2.3453
4.5646
9.6638
16.4519
0 .81406
15.1859
3.3778
5.1597
17.0405
3.1809
2.2627
8.2668
7.5727
33.3907
4.5936
19.4639
7.6875
2.2307
7.1634
30.5129
5.8084
20.4949
12.6391
16.7294
2.275
9.7889
11.7407
8.5448
12.8703
6.8439
14.6271
18.2025
6.0791
3.6724
2.8086
73
High School
23.536
5.2943
4.7514
5.1274
5.1027
6.3597
4.9773
7.3025
19.8062
7.3265
3.909
1 .4625
5.5404
3.4246
13.7607
9.7353
6.4722
2.8667
3.9085
3.8178
27.9265
3.969
18.4576
6.9456
2.3759
6.4155
9.787
16.8436
7.2399
24.6824
2.576
5.028
1 .6298
7.945
3.5432
2.9856
6.8069
2.741
13.2475
3.171
3.8611
High School
9.7034
7.8315
3.7436
28.4517
1 .4911
1 .3155
3.6043
References
Backstrom, L., Huttenlocher, D., Kleinberg, J., Lan, X., 2006. Group formation
in large social networks: Membership, growth, and evolution, in: Proceedings
of 12th International Conference on Knowledge Discovery in Data Mining. ACM
Press, New York, NY, pp. 4454.
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E., 2008. Fast unfolding
of communities in large networks. Journal of Statistical Mechanics: Theory and
Experiment 2008, P10008.
Boyd,
D.M.,
2007a.
and
MySpace.
Viewing
american
Apophenia
class
blog
divisions
essay.
through
June
24.
http://www.danah.org/papers/essays/ClassDivisions.html.
Boyd, D.M., 2007b. Why youth (heart) social network sites: The role of networked
publics in teenage social life, in: Buckingham, D. (Ed.), MacArthur Foundation
Series on Digital Learning - Youth, Identity, and Digital Media Volume. MIT Press,
Cambridge, MA, pp. 119142.
74
Boyd, D.M., Ellison, N.B., 2007. Social network sites: Definition, history, and scholarship. Journal of Computer-Mediated Communication 13, 11.
Brandes, U., Delling, D., Gaertler, M., Goerke, R., Hoefer, M., Nikoloski, Z., Wagner,
D., 2008. On modularity clustering. IEEE Transactions on Knowledge and Data
Engineering 20, 172188.
Brook, R.J., Stirling, W.D., 1984. Agreement between observers when the categories
are not specified in advance. British Journal of Mathematical and Statistical
Psychology 37, 271282.
Brzozowski, M., Hogg, T., Szabo, G., 2008. Friends and foes: Ideological social networking, in: Proceedings of the SIGCHI Conference on Human Factors in Computing. ACM Press, New York, NY.
Callaghan, T., Mucha, P.J., Porter, M.A., 2007. Random walker ranking for NCAA
division I-A football. American Mathematical Monthly 114, 761777.
Chin, A., Chignell, M., 2007. Identifying active subgroups within online communities, in: Proceedings of the Centre for Advanced Studies (CASCON) Conference,
Toronto, Canada.
Fortunato, S., 2010. Community detection in graphs. Physics Reports 486, 75174.
Fortunato, S., Barthelemy, M., 2007. Resolution limit in community detection. Proceedings of the National Academy of Sciences 104, 3641.
Fragoso, S., 2006. WTF a crazy Brazilian invasion, in: Sudweeks, F., Hrachovec, H.
(Eds.), Proceedings of CATaC 2006. Murdoch University, Murdoch, Australia.
75
Frank, O., Strauss, D., 1986. Markov graphs. Journal of the American Statistical
Association 81, 832842.
Franklin, J.N., 2002. Methods of Mathematical Economics: Linear and Nonlinear
Programming, Fixed-Point Theorems. SIAM, Philadelphia, PA.
Fruchterman, T.M.J., Reingold, E.M., 1991. Graph drawing by force-directed placement. SoftwarePractice and Experience 21, 11291164.
Gajjala, R., 2007. Shifting frames: Race, ethnicity, and intercultural communication
in online social networking and virtual work, in: Hinner, M.B. (Ed.), The Role
of Communication in Business Transactions and Relationships. Peter Lang, New
York, NY, pp. 257276.
Geidner, N.W., Fook, C.A., Bell, M.W., 2007. Masculinity and online social networks: Male self-identification on Facebook.com, in: Paper presented at Eastern
Communication Association 98th Annual Meeting. Providence, RI.
Girvan, M., Newman, M.E.J., 2002. Community structure in social and biological
networks. Proceedings of the National Academy of Sciences 99, 78217826.
Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A., 2010. Walking in Facebook:
A Case Study of Unbiased Sampling of OSNs, in: Proceedings of IEEE INFOCOM
10, San Diego, CA.
Golder, S.A., Wilkinson, D., Huberman, B.A., 2007. Rhythms of social interaction:
Messaging within a massive online network, in: Steinfield, C., Pentland, B., Ack-
76
B.,
2009.
the
comparison
API.
of
on
Working
and
offline
paper,
networks
available
at
http://papers.ssrn.com/sol3/papers.cfm?abstract id=1331029.
Hogg, T., Wilkinson, D., Szabo, G., Brzozowski, M., 2008. Multiple relationship
types in online communities and social networks, in: Proceedings of the AAAI
Spring Symposium on Social Information Processing. AAAI Press.
Hubert, L., 1977. Nominal scale response agreement as a generalized correlation.
British Journal of Mathematical and Statistical Psychology 30, 98103.
77
Kamada, T., Kawai, S., 1989. An algorithm for drawing general undirected graphs.
Information Processing Letters 31, 715.
Kernighan, B.W., Lin, S., 1970. An efficient heuristic procedure for partitioning
graphs. The Bell System Technical Journal 49, 291307.
Krebs, V., 2008. Orgnet.com: Social network analysis software & services for organizations, communities, and their consultants. http://www.orgnet.com.
Kulisnkaya, E., 1994. Large sample results for permutation tests of association.
Communications in Statistics Theory and Methods 23, 29392963.
Kumar, R., Novak, J., Tomkins, A., 2006. Structure and evolution of online social
networks, in: KDD 06: Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining, ACM, New York, NY, USA.
pp. 611617.
Kurant, M., Gjoka, M., Butts, C.T., Markopoulou, A., 2011. Walking on a graph
with a magnifying glass. arXiv:1101.5463 .
Lampe, C., Ellison, N.B., Steinfeld, C., 2007. A familiar Face(book): Profile elements
as signals in an online social network, in: Proceedings of Conference on Human
Factors in Computing Systems. ACM Press, New York, NY, pp. 435444.
Lewis, K., Kaufman, J., Christakis, N.A., 2008a. The taste for privacy: An analysis of
college student privacy settings in an online social network. Journal of ComputerMediated Communication 14, 79100.
78
Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, M., Christakis, N.A., 2008b. Tastes,
ties, and time: A new (cultural, multiplex, and longitudinal) social network dataset
using Facebook.com. Social Networks 30, 330342.
Liben-Nowell, D., Novak, J., Kumar, R., Raghavan, P., Tomkins, A., 2005. Geographic routing in social networks. Proceedings of the National Academy of
Sciences 102, 1162311628.
Lievrouw, L.A., Livingstone, S. (Eds.), 2005. The Handbook of New Media. Sage
Publications Ltd., London, UK. updated student edition.
Looijen, A.H., Porter, M.A., 2007. Legends of Caltech III: Techer in the Dark.
Caltech Alumni Association, Pasadena, CA.
Lubbers, M.J., Snijders, T.A.B., 2007. A comparison of various approaches to the
exponential random graph model: A reanalysis of 102 student networks in school
classes. Social Networks 29, 489507.
Mayer, A., Puller, S.L., 2008. The old boy (and girl) network: Social network
formation on university campuses. Journal of Public Economics 92, 329347.
McPherson, M., Smith-Lovin, L., Cook, J.M., 2001. Birds of a feather: Homophily
in social networks. Annual Review of Sociology 27, 415444.
Mucha, P.J., Richardson, T., Macon, K., Porter, M.A., Onnela, J.P., 2010. Community structure in time-dependent, multiscale, and multiplex networks. Science
328, 876878.
79
Newman, M.E.J., 2003. Mixing patterns in networks. Physical Review E 67, 026126.
Newman, M.E.J., 2006a. Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74, 036104.
Newman, M.E.J., 2006b. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 85778582.
Newman, M.E.J., 2010. Networks: An Introduction. Oxford University Press, Oxford, U.K.
Nyland, R., Near, C., 2007. Jesus is my friend: Religiosity as a mediating factor in Internet social networking use, in: Paper presented at AEJMC Midwinter
Conference. Reno, NV.
Onnela, J., Saramaki, J., Hyvonen, J., Szabo, G., Lazer, D., Kaski, K., Kertesz,
J., Barabasi, A.L., 2007. Structure and tie strengths in mobile communication
networks. Proceedings of the National Academy of Sciences 104, 73327336.
Porter, M.A., Mucha, P.J., Newman, M., Warmbrand, C.M., 2005. A network analysis of committees in the United States House of Representatives. Proceedings of
the National Academy of Sciences 102, 70577062.
Porter, M.A., Onnela, J., Mucha, P.J., 2009. Communities in networks. Notices of
the American Mathematical Society 56, 10821097, 11641166.
Rand, W.M., 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 846850.
80
V.,
of
2008.
social
Rapleaf
network
study
users
Press
reveals
gender
release,
and
age
available
at
81
Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications. Structural Analysis in the Social Sciences, Cambridge University Press,
Cambridge, UK.
Wasserman, S., Pattison, P., 1996. Logit models and logistic regressions for social
networks. i: An introduction to markov graphs and p . Psychometrika 61, 401425.
Waugh, A.S., Pei, L., Fowler, J.H., Mucha, P.J., Porter, M.A., 2009. Party polarization in Congress: A network science approach. ArXiv:0907.3509.
Weisstein, E.W., 2011. Barycentric coordinates, in Wolfram Mathworld. Available
at http://mathworld.wolfram.com/BarycentricCoordinates.html .
Zhang, Y., Friend, A.J., Traud, A., L., Porter, M.A., Fowler, J.H., Mucha, P.J.,
2008. Community structure in Congressional cosponsorship networks. Physica A
387, 17051712.
Zheng, R., Provost, F., Ghose, A., 2007. Social network collaborative filtering.
Preprint (CeDED working paper).
82