Analysis and Visualization of Citation Networks
Analysis and Visualization of Citation Networks
net/publication/272174763
CITATIONS READS
107 6,135
2 authors:
Some of the authors of this publication are also working on these related projects:
“Fair representation or winner-take-all: Indications of systematic conscious bias on Wikipedia” View project
All content following this page was uploaded by Andreas Strotmann on 29 August 2015.
Editor
Gary Marchionini, University of North Carolina, Chapel Hill
Synthesis Lectures on Information Concepts, Retrieval, and Services publishes short books on
topics pertaining to information science and applications of technology to information discovery,
production, distribution, and management. Potential topics include: data models, indexing theory
and algorithms, classification, information architecture, information economics, privacy and iden-
tity, scholarly communication, bibliometrics and webometrics, personal information management,
human information behavior, digital libraries, archives and preservation, cultural informatics, in-
formation retrieval evaluation, data fusion, relevance feedback, recommendation systems, question
answering, natural language processing for retrieval, text summarization, multimedia retrieval,
multilingual retrieval, and exploratory search.
Digital Libraries Applications: CBIR, Education, Social Networks, eScience/Simulation, and GIS
Edward A. Fox, Jonathan P. Leidig
On the Efficient Determination of Most Near Neighbors: Horseshoes, Hand Grenades, Web
Search and Other Situations When Close is Close Enough
Mark S. Manasse
Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures,
Streams) Approach
Edward A. Fox, Marcos André Gonçalves, Rao Shen
The Future of Personal Information Management, Part I: Our Information, Always and Forever
William Jones
XML Retrieval
Mounia Lalmas
Faceted Search
Daniel Tunkelang
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quota-
tions in printed reviews, without the prior permission of the publisher.
DOI 10.2200/S00624ED1V01Y201501ICR039
Dangzhi Zhao
School of Library and Information Studies, University of Alberta, Canada
Andreas Strotmann
ScienceXplore, Bad Schandau, Germany
M
&C MORGAN & CLAYPOOL PUBLISHERS
x
ABSTRACT
Citation analysis—the exploration of reference patterns in the scholarly and scientific literature—
has long been applied in a number of social sciences to study research impact, knowledge flows,
and knowledge networks. It has important information science applications as well, particularly
in knowledge representation and in information retrieval.
Recent years have seen a burgeoning interest in citation analysis to help address research,
management, or information service issues such as university rankings, research evaluation, or
knowledge domain visualization. This renewed and growing interest stems from significant im-
provements in the availability and accessibility of digital bibliographic data (both citation and full
text) and of relevant computer technologies. The former provides large amounts of data and the
latter the necessary tools for researchers to conduct new types of large-scale citation analysis, even
without special access to special data collections. Exciting new developments are emerging this way
in many aspects of citation analysis.
This book critically examines both theory and practical techniques of citation network anal-
ysis and visualization, one of the two main types of citation analysis (the other being evaluative
citation analysis). To set the context for its main theme, the book begins with a discussion of the
foundations of citation analysis in general, including an overview of what can and what cannot be
done with citation analysis (Chapter 1). An in-depth examination of the generally accepted steps
and procedures for citation network analysis follows, including the concepts and techniques that are
associated with each step (Chapter 2). Individual issues that are particularly important in citation
network analysis are then scrutinized, namely: field delineation and data sources for citation analy-
sis (Chapter 3); disambiguation of names and references (Chapter 4); and visualization of citation
networks (Chapter 5). Sufficient technical detail is provided in each chapter so the book can serve
as a practical how-to guide to conducting citation network analysis and visualization studies.
While the discussion of most of the topics in this book applies to all types of citation analysis,
the structure of the text and the details of procedures, examples, and tools covered here are geared
to citation network analysis rather than evaluative citation analysis. This conscious choice was based
on the authors’ observation that, compared to evaluative citation analysis, citation network anal-
ysis has not been covered nearly as well by dedicated books, despite the fact that it has not been
subject to nearly as much severe criticism and has been substantially enriched in recent years with
new theory and techniques from research areas such as network science, social network analysis, or
information visualization.
KEYWORDS
citation analysis, citation network analysis, citation data sources, disambiguation in citation analysis,
visualization of citation networks, co-citation analysis, bibliographic coupling analysis, bibliometrics
xi
Contents
Acknowledgment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Dedications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
5.4.2
Conversion of Factor Analysis Results from SPSS to Pajek
Network Format������������������������������������������������������������������������������ 131
5.4.3 Visualization with Loading Summaries as Node Sizes and
Degree Coloring������������������������������������������������������������������������������ 135
5.4.4 Visualization with Node Sizes Reflecting Citedness ���������������������� 138
5.4.5 Visualization with Node Color Reflecting Factor Membership������ 139
5.4.6 Combining Pattern and Structure Matrix Visualizations �������������� 140
5.4.7 Fine-Tuning the Maps�������������������������������������������������������������������� 144
5.4.8 Visualization of Bibliometric Networks without Factor Analysis��� 145
5.5 Concluding Remarks������������������������������������������������������������������������������������ 146
Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Acknowledgment
The authors would like to thank Dr. Howard D. White for his encouragement, input, and feed-
back on our draft manuscript.
xvii
Dedications
Dangzhi Zhao would like to dedicate this book to her father who passed away when this book was
under revision, to her mother, and to her family. She feels fortunate and grateful to have loving
parents and family who provided compassion, care, and support during one of the most difficult
times in her life so that she was able to continue writing this book.
Andreas Strotmann would like to dedicate the book to his mother who passed away before
it started forming but always believed it would come one day.
1
CHAPTER 1
1.1 INTRODUCTION
Citation analysis is a well-known technique that has long been applied in a variety of research
fields to study, among others, knowledge flows, the diffusion of ideas, intellectual structures of
science, relevance of information resources, and evaluation of researchers and research institutions.
Among the research fields that have employed citation analysis methods, sociology, history of
science, library and information science, management science, and research policy are the most
prominent. Together with citation indexing and citation linking, citation analysis also provides the
foundations for effective information retrieval that, applied to web links, was at the core of the
success of Google’s search engine.
Recent years have seen a burgeoning interest in citation analysis to help address various re-
search, management, or information service issues such as university rankings, research evaluation,
and knowledge domain visualization. This renewed interest is a result of the increasingly available
digital citation data and computer power that have made large-scale citation analysis studies possi-
ble, and has resulted in many exciting new developments in data sources, as well as techniques and
tools for citation data collection, analysis, and visualization.
This chapter introduces the concepts of citation and citation analysis, examines the assump-
tions underlying citation analysis, and provides an overview of what can be done with citation anal-
ysis (and why), as well as a discussion of strengths and weaknesses of citation analysis and cautions
required when applying citation analysis. Based on this overview, the scope and structure of this
book are then discussed at the end of this chapter.
Citation analysis deals with the study of these uses and relationships. Although individual
uses and relationships can be useful to examine, citation analysis mostly provides macro perspectives
through the use of large datasets, exploiting the consensus among a large number of citing authors
regarding the influence of and the relationships between scholars and scholarly works.
Based on the basic assumption underlying citation analysis that references indicate useful-
ness or relatedness, a number of different types of applications of citation analysis have been devel-
oped and employed over the years in the study of science and scholarly communication. The basic
assumption itself, however, has also been challenged in the literature. We will begin by discussing
applications of citation analysis before moving on to examine criticisms and challenges.
Here we need to be careful about the terminology. It is common for the term citation to be
used interchangeably for either “citation” or “reference,” with the context providing the meaning.
Similarly, the concept of how authors make references is called either citing behavior or referencing
behavior. When article A makes a reference to article B, it is often said that A cites or references
B, and B is cited by, receives a citation from, or is one of the “cited references” in A. In essence, a
reference from article A to article B is a citation received by B from A.
Both the terms “citation” and “reference” are of course also used in other contexts that do
not directly relate to citation analysis. In the field of library and information studies, for example,
the term “reference librarians” refers to librarians who answer users’ questions regarding the use of
the library and library resources; similarly, citation data or databases may mean the same as bib-
liographic data or databases, which normally only include information about the citing documents
such as title, author, abstract, etc., and may or may not include any information about their cited
references at all.
All these applications of citation analysis rely on the consensus among a large number of
citing authors regarding the influence of and relationships between scholars and scholarly works as
recorded in the reference lists of these authors’ publications. Some areas, however, e.g., applications
3, 4, and 5, also often involve the examination of individual reference links.
This section will review and discuss these applications of citation analysis, as well as some
other uses of citation analysis in the study of science and scholarly communication, including pat-
ent citation analysis for the study of innovations and sociometric studies of science and scholarly
communication.
World Universities (also known as the Shanghai Ranking), use citation analysis to measure research
impact as one of the indicators for ranking (THE Methodology, 2013; Liu and Cheng, 2005).
It is therefore imperative to address the various issues involved in evaluative citation analy-
sis that affect citation counts and, by extension, the fairness of citation-based research evaluation.
Although counting citations or calculating impact indicators derived from citation counts (e.g.,
h-index) is relatively simple, addressing the complicating factors (such as field differences in pub-
lishing and citing behavior and problems of citation data sources in coverage and indexing) is not,
and has therefore been a primary focus of research on evaluative citation analysis. For an in-depth
examination of these issues, readers are referred to Moed (2010), who provides a comprehensive
treatment of evaluative citation analysis. Discussions in some later chapters in this book will also
shed light on some of these issues, such as data sources for citation analysis, research field delinea-
tion, and counting collaborative works in citation analysis.
visualizations together will then inform interpretive descriptions and explanations of the observed
structures and characteristics of the research fields and scholarly communities being studied, and
assist in the examination of their evolution and “in the making of inductive predictions of future
trends” when applied to a series of time periods (Borgman and Furner, 2002, p. 11).
Depending on the units of analysis (documents, or groups of them by authors, journals, re-
search field, nation, etc.) and the thresholds of citation scores, both macro-structures—overall maps
of the entire science endeavor with each node in the network representing a discipline—and mi-
cro-structures—structures of a single specialty with each node in the network representing a single
document—of science can be mapped and studied, allowing the user to get overviews of research
fields as well as to explore their underlying fine structures (Small, 1999b).
There are three types of commonly used citation-based measures of the strength of the in-
terrelationship between two objects:
• inter-citation counts: the number of times two objects have cited each other
• co-citation counts: the number of documents that have cited two objects together, and
• bibliographic coupling frequencies (BCFs): the number of cited references that two ob-
jects have in common.
Analyses using these measures are correspondingly called inter-citation analysis, co-citation
analysis, and bibliographic coupling analysis, respectively. Each type of analysis can employ any one
of a number of different counting and weighing schemes. For example, the BCF of two articles
can be counted simply as the number of items that appear in both articles’ reference lists or their
cited references can be weighted by how many times they are cited in the texts. Such counting and
weighing schemes will be discussed in detail in Chapter 2.
Among the three types of analyses, co-citation analysis is the most commonly used tech-
nique. It is generally accepted that the co-citation concept was discovered independently by Small
(1973) and Marshakova (1973), and that document co-citation analysis was introduced by Small
(1973) and author co-citation analysis by White and Griffith (1981). Many co-citation analysis
studies have been conducted since. They either refine the techniques (Small, 1974; Shaw, 1985;
Zhao and Strotmann, 2008a), explore the application of co-citation analysis in studying various
research areas and in answering various research questions (Small, 1977, 1981; White, 1983; Mc-
Cain, 1984), or discuss limitations of the techniques (Sullivan et al., 1977; Hicks, 1987). Recent
years have also seen studies of the application of advanced scientific visualization technology in
co-citation mapping to dynamically present maps of science (see Small, 1999b and Boyack et al.,
2005 for a good review). As a result, co-citation analysis has developed into a well-known litera-
ture-based technique for studying the intellectual structure of scholarly fields and the characteristics
of scholarly communities.
6 1. FOUNDATIONS OF CITATION ANALYSIS
The assumptions underlying citation network analysis are (1) when two documents cite each
other often, are frequently cited together, or have many references in common, then this indicates
that these two documents are related, that is, generally perceived to be similar in subject matter
or methodological approach; and (2) the more frequently two documents cite each other, are co-
cited, or the more references two documents have in common, the more closely they are related
(Borgman, 1990; White, 1990). These assumptions are generally valid and have not been challenged
much at all, unlike those for evaluative citation analysis, as even when parties who are affected by
citation-based research evaluation do game the system by manipulating their reference lists in their
favor, the less relevant citations they may add will still be related to the citing documents.
the development of various research schools in psychology, sociology, and political science.
McCain and Salvucci (2006) examined the relative use of concepts in Brooks’ Mythical
Man-Month (Brooks, 1975) across five time periods and 15 subject areas ranging from
the “home” discipline of software engineering to areas in the social science, humanities,
and law. Sarafoglou and Paelinck (2008) used citation data to study the diffusion of the
concept/field of “spatial econometrics” by means of, in part, the temporal and subject distri-
bution of citations to the key book in the field—Spatial Econometrics: Methods and Models
(Anselin, 1988). Garfield (1985) reported “over 80 specialties and disciplines” citing Price’s
Little Science, Big Science (Price, 1963). In a more theoretical vein, Van der Veer Martens
and Goodrum (2006) discuss the use of citation content and context analysis along with
other assessment approaches to model the diffusion of eight theories in the social sciences.
They suggest a typology of citation function but do not consider citing subject breadth.
the library’s collections, and results can be used to measure the extent to which library collections
meet the needs of its users (Kayongo and Helm, 2012).
Also by examining uses of scholarly information indicated by citations, the interactions and
interdisciplinarity of disciplines, fields, journals, institutions, or authors’ oeuvres can be assessed
(Zitt and Bassecoulard, 2006; Bassecoulard et al., 2007). For example, Huang and Chang published
two articles in 2012 that classified citations made in LIS journals during the period of 1978–2007
in terms of their disciplines in order to study the interdisciplinary characteristics of LIS (Chang
and Huang, 2012; Huang and Chang, 2012). They found that LIS articles have cited documents
from across 30 disciplines, and the degree of interdisciplinarity of LIS measured by a number of
citation-based indicators was high. Clearly, this type of citation analysis is similar to but different
from the study of knowledge flows as discussed in the previous section.
or combination of these, and retrieve all publications written by them, cited by them, and/or citing
them, which is a very effective method of information retrieval.
Citation databases also support easier and larger-scale collections of data for citation analysis.
The results of citation analyses can in turn further assist information organization, representation,
and retrieval.
For example, evaluative citation analysis results can help retrieve highly influential docu-
ments or publications by influential players (authors, institutes, countries, etc.), while citation net-
work analysis results can facilitate an understanding of the structure of the research field and the
relationships between concepts, documents, or authors. This understanding in turn helps users with
query expansion and search refinement, as well as supports visual browsing interfaces to informa-
tion retrieval systems (Chen, 1999; Chen et al., 1998a; Chen et al. 1998b; Ding et al., 2000; Lin et
al., 2003; Strotmann and Zhao, 2008).
The two largest citation databases, the ISI databases by the Institute for Scientific Informa-
tion (now part of Thomsen Reuters’ Web of Science) and Scopus by Elsevier, have demonstrated
the value of incorporating citation analysis results into information retrieval systems by providing
impact indicators (e.g., citation counts, h-index, and journal impact factor) for articles, authors, and
journals, calculated from evaluative citation analyses of data in the corresponding citation databases.
Search results can be ranked by impact indicators, allowing users to focus on high impact sources.
They also provide related documents based on bibliographic coupling analysis. The ISI databases
also provide a visual representation of citation links both backward and forward, allowing users to
follow these links to retrieve needed information.
Because citation analysis can identify key concepts, documents, authors, and their relation-
ships, studies have also explored the use of citation analysis methods to supplement traditional
manual methods of knowledge organization with automatic summarization, categorization, and
thesaurus construction and maintenance (Chen et al., 2010; Fiszman et al., 2009; Sparck-Jones,
1999; Schneider and Borlund, 2004). As Birger Hjørland (2013, p.1) points out, “the main dif-
ference between traditional knowledge organization systems (KOSs) and KOSs based on citation
analysis is that the first group represents intellectual KOSs, whereas the second represents social
KOSs” as they are based on the collective views of a large number of citing authors regarding rela-
tionships between documents or their authors.
With the amount of available information increasing dramatically and sometimes chaotically,
especially on the Web, it has been and will continue to be of great importance to explore appropri-
ate ways of organizing and searching information there. Citation analysis principles provide unique
and effective ways of enhancing information organization, representation, searching, and browsing.
A good example for this is the success of the Google Web search engine which applies an algorithm
that has close ties to citation analysis to focus on resources that are both high quality and relevant
to users’ information needs (Brin and Page, 1998).
10 1. FOUNDATIONS OF CITATION ANALYSIS
The value of citation analysis in sociometric studies of science and scholarly communication
Although the present book will not examine sociometric analyses, the value of citation analysis
in sociometric studies of scholarly communication will be briefly discussed here to show the full
power of citation analysis.
1.4 EVALUATION OF CITATION ANALYSIS 11
Most types of citation analysis are informed by Merton’s normative view of science (Griffith,
1990; MacRoberts and MacRoberts, 1989; Edge, 1979; Cronin, 1984; Peritz, 1992), which sees
science as a social activity governed by a set of norms. These norms include universalism (the imper-
sonality of science), communism (scientific knowledge is treated as a common good communicated
and distributed freely), disinterestedness (“science for science’s sake” [Cronin, 1984, p. 17]), and orga-
nized skepticism (new knowledge claims are evaluated critically and objectively based on empirical
or theoretical evidence (Merton, 1942)). Citation is considered to be a serious activity of science
and therefore citation behavior is also governed by a set of norms and values. These norms and
values require authors to cite the works that have influenced them in the development of current
papers in order to give credit where credit is due. Although they may not always be clear why they
cite certain works at certain times and how citations are related to the ideology of science—“the
norms and values presupposed in the conduct of science” (Trancy, 1980, p. 191)—authors share “a
tacit understanding of how and why they should acknowledge the works of others” (Cronin, 1984).
The normative view of science is compatible with the assumptions underlying citation anal-
ysis, and therefore makes it possible to conduct valid citation analysis.
However, it has been observed by many studies that scientists’ behavior does not always
adhere to the norms, and that, in terms of citation behavior, various reasons and motivations for
citing do exist—some normative, some egotistical. A number of articles have reviewed these studies,
including Bornmann and Daniel (2008), Cronin (1984), Liu (1993), Nicolaisen (2007), and White
(2010a).
The observed departure of scientists’ behavior from the norms and the existence of egotistical
citations do not invalidate citation analysis for several reasons.
First, the failure of scientists to observe norms strictly does not necessarily mean a violation
of norms. Norms are standards “that are not rigidly defined or precisely restricted to a single specific
behavior. They are far too deeply embedded to be easily legislated into a code of ethics for science
or to be taken out for daily discussion and assessment. Private and consensual discomfort is the
usual response to violations of norms and is also important indicators of their presence” (Griffith,
1990, p. 35).
Second, most scholars do adhere to the norms, and citation analysis is based on the collective
perceptions of citing authors. As Small (1976) observes, “the reasons and motivations for citing
appear to be as subtle and as varied as scientific thought itself, but most references do establish
valid conceptual links between scientific documents” (p. 67). Individual citations may be made for
various reasons that do not conform to the norms (“egotistical citations” in Borgman and Furner’s
(2002) words), but the number of such citations is not likely to become large enough to influence
conclusions of citation analysis because most subsequent writers do not recurrently see the same
influence or relation implied by such citations (White, 1990). Therefore, the accrual of citations or
co-citations indicates a consensus among a large number of citing authors regarding the influence
1.4 EVALUATION OF CITATION ANALYSIS 13
of and the relationships between scholars and scholarly works. Citation analysis, which is concerned
with “achieving a macro perspective on scholarly communication process through the use of volu-
minous datasets” (Borgman, 1990, p. 26), relies on this consensus to draw conclusions in evaluation
of scholarly contributions and in mapping of intellectual structures, rendering the “psychological
approach” (White, 1990) that is concerned with the motives and purposes of individual citations
largely irrelevant.
Third, numerous validation studies of citation analysis provide evidence that the assumptions
underlying citation analysis are statistically valid. There are many empirical studies that test and
verify the validity of citation analysis by various methods. Garfield (1979, p. 241) mentions several
validation studies of citation analysis as an evaluation tool in his book Citation Indexing, including
Carter (1974), Bayer and Folger (1966), and Virgo (1977), that show the high correlations between
citation counts and peer judgments, a widely accepted way of ranking scientific performance. White
(1990, pp. 101–102) summarizes some validation studies of co-citation analysis including Mullins
et al. (1977), Sullivan et al. (1980), and Sullivan et al. (1977), which established the usefulness of
article co-citation mapping despite its limitations; and Keen (1987), Lenk (1983), McCain (1986),
White and Griffith (1981), and White (1983), which validate results from author co-citation anal-
ysis using various validation approaches. McCain (1986) categorizes validation studies of co-cita-
tion results by validation methods used, showing that most studies demonstrate a high correlation
between results from citation analysis and those from other sources, although in some cases a lack
of correlation was observed. Borgman (1990) stresses the importance of comparing the research
objectives or motives when comparing results from citation analysis and those based on other types
of data such as sociometric data and interview data in validity studies. In many cases, the lack of
correlation between the results is because they are measuring different domains of scholarly com-
munication (formal vs. informal), or they are looking at the same phenomena at different levels
(micro-level vs. macro-level, or “ground level” vs. “aerial view”) or different time points (citation
analyses reveal pictures of several years back due to the lag in publication, while interviews provide
current pictures) (White, 1990, pp. 91, 100).
Citation analysis is not only valid but also has high reliability because the data can be col-
lected unobtrusively from readily accessible published records of scholarly communication and thus
can be easily replicated by others. According to Borgman (1990, p. 25), reliability problems “gener-
ally can be identified and corrected by careful researchers,” although they do exist in individual data
sources (Moed and Vriens, 1989; Rice et al., 1989).
(Osareh, 1996). Defenses (notably Garfield, 1979; White, 1990) have focused on the irrelevance
of the (individual-scale) psychological approach to (large-scale statistical) citation analysis and
on the illogic of “quarrelling with imaginary opponents” (White, 1990, p. 91). The following is a
brief discussion of these critiques and defenses. Detailed discussions can be found in the studies
referenced above and in review articles on bibliometrics or citation analysis such as White and
McCain (1989), White (2010a), and Nicolaisen (2007).
Critics of the assumptions either have mixed up the “aerial” and “ground-level” views of cita-
tions as discussed above, or are quarrelling with an imaginary opponent (White, 1990, p. 91). They
claim that citation analysis researchers have made certain assumptions that are problematic, but in
fact the assumptions are rarely found in citation analysis studies (Borgman, 1990; White, 1990).
They question some other assumptions based on the existence of individual egotistical citations,
missing the view that citation analysis is meant for large datasets and macro perspectives, where
small numbers of individual misconduct are mere statistical noise to be filtered out by statistical
means.
For example, although studies (e.g., Mullins et al., 1977; Small, 1977; McCain, 1986) show
that personal communication ties often do exist among frequently co-cited authors and that the
structure of the literature is congruent with the social structure of the field producing it, citation
researchers do not take this as a given; instead, they only assume that the relationship is “generally
perceived similarity of subject or methodological approach in published and cited works,” and stress
the independence of establishing social relationships that may exist among highly co-cited authors
(White, 1990, p. 96). The only assumption underlying evaluative citation analysis that Garfield, the
inventor of citation index and citation analysis, made in his monograph on citation indexing theory
and application is that citation counts represent the perceived utility or impact of scientific work as
determined by the corresponding scientific community (Garfield, 1979).
The problems with the sources of citation data include those that are characteristic of all
sources of citation data and those introduced by using citation databases. Some of the former in-
clude the difficulties in counting citations caused by homonyms (two or more different individuals
having the same name), allonyms (a single individual having more than one name), implicit cita-
tions, self-citations, and errors in citations. Some of the latter include the limited and biased cov-
erage of citation databases and the problems caused by inadequate indexing of cited references (see
Smith, 1981 and MacRoberts and MacRoberts, 1989, for detailed discussions of these problems).
As an imperfect method, citation analysis does suffer from the problems of sources of citation
data. Even Garfield admits these problems while he refutes almost all the critiques of the validity
of citation analysis in his systematic examination of citation analysis as an evaluation tool (Garfield,
1979). However, remedies often can be used to correct the data. For example, two solutions for
distinguishing individuals in the case of homonyms are proposed by Garfield (1979, pp. 243–244):
examining the titles of the journals in which the cited work and the citing work were published,
1.4 EVALUATION OF CITATION ANALYSIS 15
and obtaining a complete bibliography of the individual being evaluated. Various other methods
have also been suggested, such as using author affiliation information to reduce problems caused
by homonyms and allonyms, and using multiple or alternative data sources to alleviate problems
introduced by individual citation databases (Zhao and Logan, 2002; Zhao and Strotmann, 2014b).
In fact, recent years have seen an increased interest in addressing problems in citation data with the
advances in text processing and other technologies (e.g., Boyack et al., 2013; Ding et al., 2013; Hou
et al., 2011; Jeong et al., 2014; Zhu et al., 2014). We therefore include separate chapters on citation
data sources (Chapter 3) and on name disambiguation (Chapter 4).
In summary, regardless of the problems in citation data and the existence of egotistical cita-
tions, citation analysis has been demonstrated to be a unique and valid method for evaluating schol-
arly contributions and for studying intellectual structures. Garfield (1979, p. 250) considers citation
analysis “a valid form of peer judgment that introduces a useful element of objectivity into the eval-
uation process and involves only a small fraction of the cost of surveying techniques.” Arunachalam
(1998, p. 142) stresses that “citation analysis is an imperfect tool but which one could still use with
some caveats to arrive at reasonable conclusions of different levels of validity and acceptability.” It
is generally accepted that citation analysis is most useful when it is used in combination with other
methods such as interviews, surveys, and sociometric studies, and for people who are knowledgeable
in the fields being studied (Borgman, 1990; Garfield, 1979).
2. Citation analysis has a strong dependence on subject experts—people who are knowl-
edgeable in the fields being studied by citation analysis—in interpretations of results
as well as in research field delineation.
3. Citation analysis results are only as good as the analyst’s choice of authors or docu-
ments being analyzed as well as the analytical tools used.
4. Citation analysis results are never “up to the minute” because it takes time for the
documents it analyzes to publish and the cited references they include are even older.
5. Writing, citing, and publishing behavior varies significantly with research fields and
scholarly communities (e.g., mathematics vs. biomedicine), making cross-field com-
parisons difficult, especially in research evaluation.
6. Citation analysis is only applicable in the study of formal aspects of scholarly commu-
nication represented in research publications, and when inferring informal communi-
cation ties and social relationships from the formal communication structures revealed
by citation analysis, other types of data are often required to confirm or further study
the relationships inferred.
7. Citation counts and the scores derived from them measure the impact rather than the
quality of research cited, making its usefulness in evaluation of scholarly contributions
limited. While impact is closely related to quality, citation impact can be affected by
many other factors. For example, review articles tend to have higher citation impact
simply because their wider coverage makes them relevant to more articles than articles
reporting individual studies; and research that is easy to understand and follow tends
to be cited more than more difficult research simply because ease of comprehension is
a major determinant of popularity.
It is important to take advantage of its strengths and to work around the limitations when
designing a citation analysis study and when interpreting citation analysis results.
Problems of citation data and citation databases need to be addressed and alleviated as
much as possible by, e.g., going beyond citation databases when collecting data and performing
disambiguation in processing citation data. Subject experts should be consulted as much as possible
throughout the process, especially for field delineation and interpretation of results. Field-normal-
ized indicators should be devised and used when making comparisons across research (sub)fields
(e.g., percentiles and citation counts normalized by field average), and research fields should be
carefully delineated.
1.5 RELATED FIELDS 17
In fact, every step of citation analysis needs to be carefully designed and thought out, from
field delineation, through selection of objects (e.g., authors, documents) being examined and of
analytical tools used, to analyzing results and drawing conclusions (see Chapter 2 for details).
For example, when applying statistical procedures and visualization tools (e.g., MDS) to
show high-dimensional relationships between objects in two or three dimensions, information is
lost and pictures may be distorted; one needs to be careful to draw conclusions regarding the rela-
tionships between two objects based on their positions on a two-dimensional map because being
close to each other may be an artifact of the tools and procedures. Only features that remain stable
with different algorithms, procedures, or tools can be used to draw conclusions. For example, in a
factor analysis of co-citation data with an oblique rotation, usually the layout of the visual repre-
sentation of the structure matrix remains stable while that of the pattern matrix changes with each
redrawing of the map using Pajek; the structure map was therefore used to study the interrelation-
ships of authors and author groups, and the pattern matrix was only used to show the grouping of
authors (Zhao and Strotmann, 2008a; 2008b; 2008c; 2014a).
Problematic assumptions and overgeneralization in conclusions should be avoided, such as
attempts to equate citation impact to research quality, intellectual connections to social relation-
ships, or one view gained from current data and methods to the view of the field being studied.
For example, citation analysis using data from the ISI databases cannot provide a fair com-
parison between North America and other parts of the world because of their biased coverage in
the favor of English-speaking countries, thus the many criticisms of the THE university rankings
in which the top-ranked universities have always been American or British (Rauhvargers, 2011;
van Leeuwen et al., 2001); while the overall structure of a research field being citation analyzed
remains robust, details do change depending on how the fields are delineated and how citations
are counted, suggesting that citation analysis should only be used to obtain aerial views of research
fields (Leydesdorff, 2008; Zhao, 2009).
Further understanding of above-mentioned approaches to working around limitations of
citation analysis can be gained from discussions in later chapters on citation data sources, field
delineation, disambiguation, citation counting of collaborated works, and visualization of citation
networks.
The term “bibliometrics” is used interchangeably with scientometrics and informetrics, but
with slight difference in scope and focus. Webometrics (or Cybermetrics), which emerged with the
Web in recent years, applies, often with modifications, citation analysis, and other well-established
principles and techniques from bibliometrics to the study of the characteristics and link structures
of the Web. For example, Ingwersen (1998) introduced Web Impact Factor as a criterion for the
evaluation of websites just as the Journal Impact Factor has been used for the evaluation of journals;
sitation analysis, coined by Rousseau (1997), is a Web-based counterpart of citation analysis, which
considers hyperlinks to and from other websites as “bibliographical citations” in traditional citation
analysis; classic bibliometric laws, such as Bradford’s Law, Lotka’s Law, and Zipf ’s Law, have also
been tested on the Web (Cui, 1999; Egghe, 2000; Rousseau, 1997). White (2010b) provides a gloss
on the differences among bibliometrics, scientometrics, informetrics, and webometrics.
Bibliometrics and Webometrics also interact with network science and Web science, which
have emerged recently (Börner et al., 2008; Zhao and Strotmann, 2014a). Network science is a
multi-disciplinary research field concerned with the analysis of all types of large and complex sys-
tems that can be modeled as networks. Citation analysis (especially citation network analysis) and
Webometrics have been substantially influenced by, and also influenced, network science, as seen
from leading researchers in these fields citing each other. The interdisciplinary field of Web Science
aims to consolidate a wide range of research views of the Web—both as a communication tech-
nology and as a complex system of social and cognitive spaces which emerge from its ubiquitous
presence. The study of the social Web is naturally part of Web Science, as well as of Webometrics.
Citation analysis is related to content analysis and discourse analysis through citation context
analysis, which examines the context in which each citation is made in the text (White, 2010a;
Zhang et al., 2013). Citation network analysis has applied techniques from social network analysis,
as well as from co-word analysis and text mining. While some social network analysis techniques
that have been applied in citation network analysis will be discussed in later chapters, readers are
referred to other resources for a thorough treatment of social network analysis as applied to schol-
arly communication networks and to information science (e.g., Börner et al., 2008; Otte et al., 2002;
White, 2011).
networks) are covered in detail, but those specific to other types of citation analysis (e.g., research
evaluation) are not.
This choice was based on the following considerations:
• Although evaluative citation analysis has attracted much of the attention and money,
there have been many criticisms of this type of citation analysis, and many of the issues
involved are difficult to address. Citation network analysis, on the other hand, has not
been criticized much at all, and has started to gain increased attention in recent years
as large-scale citation network analysis has become increasingly feasible, interesting,
and important with the emerging disciplines of network science and Web science.
• Citation network analysis has been applying new techniques from several research
areas such as network science, social network analysis, and information visualization.
By contrast, evaluative citation analysis, which essentially ranks documents, authors,
institutions, nations, etc., by their citation counts or scores derived from citation
counts, is relatively easy to conduct, allowing less room for new techniques, which can
be seen from the “h-bubble”: the research community on evaluative citation analysis
moved immediately and almost completely to a focus on the h-index once this “clever
find” was made in 2005 as a simple way to measure individual scientists’ lifetime
achievements (Rousseau et al., 2013, p. 294; Zhao and Strotmann, 2014a). It appears
that research in this area was waiting for breakthrough ideas on the one hand, and was
feeling the pressures of a huge demand for practical tools for research evaluation on
the other.
• There are already several well-perceived books dedicated to evaluative citation analysis
(e.g., Andres, 2009; De Bellis, 2009; Garfield, 1979; Moed, 2010), but almost none to
citation network analysis.
This book will first present the general steps and procedures of citation network analysis and
the concepts and techniques associated with each step (Chapter 2), which will simultaneously pro-
vide an overview of the theoretical aspects of citation network analysis and also a practical how-to
guide for conducting citation network analysis studies. This is followed by more detailed discussion
of thoughts and ideas about important issues in citation analysis in general and in citation network
analysis in particular, with a focus on those with which the authors have substantial personal expe-
riences, including the following:
• field delineation and data sources for citation analysis (Chapter 3),
There are two types of access to citation databases as data sources for citation analysis: the
regular one via the search interfaces (e.g., Web of Science and Scopus) provided by these companies
(e.g., Thomson Reuters, Elsevier) to retrieve and download datasets from these citation databases,
and a very expensive special direct access to all the database files provided by the citation database
providers for data-mining purposes. The former is usually through a subscription to these databases
with a more or less standard license agreement that prohibits mass downloads and imposes other
limits on their use, while the latter can only be accessed through a negotiated purchase contract.
Discussions of citation databases in this book are based on the experience of the authors with the
“normal” type of access to citation databases, and therefore discussions related to access (e.g., index-
ing and search facilities, downloading options) may or may not apply to the special type of access.
In addition, as is well understood, citation databases can serve as both information retrieval
systems and citation analysis tools. Discussions in this book on citation databases are in the context
of citation databases as data sources for citation analysis, although these discussions may also have
implications for the enhancement of their retrieval functions.
Finally, the term “ISI databases” was chosen in this book to refer to the oldest and most
dominant citation databases, which were created by the Institute for Scientific Information (ISI) in
the 1960s, and include three databases: Science Citation Index, Social Science Citation Index, and
Arts and Humanities Citation Index. These databases have now become part of the core collection
of Thomson Reuters’ Web of Science, which also includes a few other citation databases (e.g., book
and conference citation indexes) that have not reached the same level of quality as the ISI databases
to be used much for citation analysis purposes. In addition to the core collection, Web of Science
also includes a number of other bibliographic databases such as MEDLINE that do not have ci-
tation indexes and therefore cannot be used for citation analysis purposes. To avoid confusion, the
term “Web of Science” is only used in this book when more than the ISI databases are concerned
and discussed or when the context clarifies the meaning.
189
Author Biographies
Dangzhi Zhao is Associate Professor in the School of Library and Information Studies at the
University of Alberta, Canada. Dangzhi earned her Ph.D. from the School of Library and Infor-
mation Studies at The Florida State University, U.S., and her M.S. and B.S. from the Department
of Library and Information Science at Peking University, China.
Her research and teaching interests are in the areas of information systems, bibliometrics,
scholarly communication, and knowledge network analysis and visualization as well as their appli-
cation in information retrieval and digital libraries.
Andreas Strotmann studied Mathematics, Physics, and Linguistics at the University of Co-
logne, where he also spent many years as a staff scientist supporting computational applications in
the sciences and the humanities, including in mathematics, physics, biology, linguistics, education,
and publishing. He earned his doctorate in Computer and Information Science from The Florida
State University. He has worked as a researcher at the University of Cologne, the University of
Alberta, and the GESIS Leibniz Institute for the Social Sciences. For the past decade, he has been
working closely with Dangzhi Zhao on improving scientometric methodology.