0% found this document useful (0 votes)
118 views40 pages

Analysis and Visualization of Citation Networks

Uploaded by

Abdellah Bfsl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views40 pages

Analysis and Visualization of Citation Networks

Uploaded by

Abdellah Bfsl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/272174763

Analysis and Visualization of Citation Networks

Book  in  Synthesis Lectures on Information Concepts Retrieval and Services · February 2015


DOI: 10.2200/S00624ED1V01Y201501ICR039

CITATIONS READS

107 6,135

2 authors:

Dangzhi Zhao Andreas Strotmann


University of Alberta Independent
73 PUBLICATIONS   1,330 CITATIONS    107 PUBLICATIONS   1,094 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

“Fair representation or winner-take-all: Indications of systematic conscious bias on Wikipedia” View project

All content following this page was uploaded by Andreas Strotmann on 29 August 2015.

The user has requested enhancement of the downloaded file.


Analysis and Visualization of
Citation Networks
iii

Synthesis Lectures on Information


Concepts, Retrieval, and Services

Editor
Gary Marchionini, University of North Carolina, Chapel Hill

Synthesis Lectures on Information Concepts, Retrieval, and Services publishes short books on
topics pertaining to information science and applications of technology to information discovery,
production, distribution, and management. Potential topics include: data models, indexing theory
and algorithms, classification, information architecture, information economics, privacy and iden-
tity, scholarly communication, bibliometrics and webometrics, personal information management,
human information behavior, digital libraries, archives and preservation, cultural informatics, in-
formation retrieval evaluation, data fusion, relevance feedback, recommendation systems, question
answering, natural language processing for retrieval, text summarization, multimedia retrieval,
multilingual retrieval, and exploratory search.

Analysis and Visualization of Citation Networks


Dangzhi Zhao and Andreas Strotmann

The Taxobook: Applications, Implementation, and Integration in Search: Part 3


Marjorie M.K. Hlava

The Taxobook: Principles and Practices of Building Taxonomies: Part 2


Marjorie M.K. Hlava

Measuring User Engagement


Mounia Lalmas, Heather O’Brien, Elad Yom-Tov

The Taxobook: History, Theories, and Concepts of Knowledge Organization: Part 1


Marjorie M.K. Hlava

Children’s Internet Search: Using Roles to Understand Children’s Search Behavior


Elizabeth Foss and Allison Druin

Digital Library Technologies: Complex Objects, Annotation, Ontologies, Classification, Ex-


traction, and Security
Edward A. Fox, Ricardo da Silva Torres
iv

Digital Libraries Applications: CBIR, Education, Social Networks, eScience/Simulation, and GIS
Edward A. Fox, Jonathan P. Leidig

Information and Human Values


Kenneth R. Fleischmann

Multiculturalism and Information and Communication Technology


Pnina Fichman and Madelyn R. Sanfilippo

Transforming Technologies to Manage Our Information: The Future of Personal Information


Management, Part II
William Jones

Designing for Digital Reading


Jennifer Pearson, George Buchanan, Harold Thimbleby

Information Retrieval Models: Foundations and Relationships


Thomas Roelleke

Key Issues Regarding Digital Libraries: Evaluation and Integration


Rao Shen, Marcos Andre Goncalves, Edward A. Fox

Visual Information Retrieval using Java and LIRE


Mathias Lux, Oge Marques

On the Efficient Determination of Most Near Neighbors: Horseshoes, Hand Grenades, Web
Search and Other Situations When Close is Close Enough
Mark S. Manasse

The Answer Machine


Susan E. Feldman

Theoretical Foundations for Digital Libraries: The 5S (Societies, Scenarios, Spaces, Structures,
Streams) Approach
Edward A. Fox, Marcos André Gonçalves, Rao Shen

The Future of Personal Information Management, Part I: Our Information, Always and Forever
William Jones

Search User Interface Design


Max L. Wilson

Information Retrieval Evaluation


Donna Harman
v

Knowledge Management (KM) Processes in Organizations: Theoretical Foundations and Practice


Claire R. McInerney, Michael E. D. Koenig

Search-Based Applications: At the Confluence of Search and Database Technologies


Gregory Grefenstette, Laura Wilber

Information Concepts: From Books to Cyberspace Identities


Gary Marchionini

Estimating the Query Difficulty for Information Retrieval


David Carmel, Elad Yom-Tov

iRODS Primer: Integrated Rule-Oriented Data System


Arcot Rajasekar, Reagan Moore, Chien-Yi Hou, Christopher A. Lee, Richard Marciano, Antoine
de Torcy, Michael Wan, Wayne Schroeder, Sheau-Yen Chen, Lucas Gilbert, Paul Tooby, Bing Zhu

Collaborative Web Search: Who, What, Where, When, and Why


Meredith Ringel Morris, Jaime Teevan

Multimedia Information Retrieval


Stefan Rüger

Online Multiplayer Games


William Sims Bainbridge

Information Architecture: The Design and Integration of Information Spaces


Wei Ding, Xia Lin

Reading and Writing the Electronic Book


Catherine C. Marshall

Hypermedia Genes: An Evolutionary Perspective on Concepts, Models, and Architectures


Nuno M. Guimarães, Luís M. Carrico

Understanding User-Web Interactions via Web Analytics


Bernard J. ( Jim) Jansen

XML Retrieval
Mounia Lalmas

Faceted Search
Daniel Tunkelang

Introduction to Webometrics: Quantitative Web Research for the Social Sciences


Michael Thelwall
vi

Exploratory Search: Beyond the Query-Response Paradigm


Ryen W. White, Resa A. Roth

New Concepts in Digital Reference


R. David Lankes

Automated Metadata in Multimedia Information Systems: Creation, Refinement, Use in Surro-


gates, and Evaluation
Michael G. Christel
Copyright © 2015 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quota-
tions in printed reviews, without the prior permission of the publisher.

Analysis and Visualization of Citation Networks


Dangzhi Zhao and Andreas Strotmann
www.morganclaypool.com

ISBN: 9781608459384 print


ISBN: 9781608459391 ebook

DOI 10.2200/S00624ED1V01Y201501ICR039

A Publication in the Morgan & Claypool Publishers series


SYNTHESIS LECTURES ON INFORMATION CONCEPTS, RETRIEVAL, AND SERVICES #39
Series Editor: Gary Marchionini, University of North Carolina, Chapel Hill

Series ISSN 1947-945X Print 1947-9468 Electronic


Analysis and Visualization of
Citation Networks

Dangzhi Zhao
School of Library and Information Studies, University of Alberta, Canada

Andreas Strotmann
ScienceXplore, Bad Schandau, Germany

SYNTHESIS LECTURES ON INFORMATION CONCEPTS, RETRIEVAL,


AND SERVICES #39

M
&C MORGAN & CLAYPOOL PUBLISHERS
x

ABSTRACT
Citation analysis—the exploration of reference patterns in the scholarly and scientific literature—
has long been applied in a number of social sciences to study research impact, knowledge flows,
and knowledge networks. It has important information science applications as well, particularly
in knowledge representation and in information retrieval.
Recent years have seen a burgeoning interest in citation analysis to help address research,
management, or information service issues such as university rankings, research evaluation, or
knowledge domain visualization. This renewed and growing interest stems from significant im-
provements in the availability and accessibility of digital bibliographic data (both citation and full
text) and of relevant computer technologies. The former provides large amounts of data and the
latter the necessary tools for researchers to conduct new types of large-scale citation analysis, even
without special access to special data collections. Exciting new developments are emerging this way
in many aspects of citation analysis.
This book critically examines both theory and practical techniques of citation network anal-
ysis and visualization, one of the two main types of citation analysis (the other being evaluative
citation analysis). To set the context for its main theme, the book begins with a discussion of the
foundations of citation analysis in general, including an overview of what can and what cannot be
done with citation analysis (Chapter 1). An in-depth examination of the generally accepted steps
and procedures for citation network analysis follows, including the concepts and techniques that are
associated with each step (Chapter 2). Individual issues that are particularly important in citation
network analysis are then scrutinized, namely: field delineation and data sources for citation analy-
sis (Chapter 3); disambiguation of names and references (Chapter 4); and visualization of citation
networks (Chapter 5). Sufficient technical detail is provided in each chapter so the book can serve
as a practical how-to guide to conducting citation network analysis and visualization studies.
While the discussion of most of the topics in this book applies to all types of citation analysis,
the structure of the text and the details of procedures, examples, and tools covered here are geared
to citation network analysis rather than evaluative citation analysis. This conscious choice was based
on the authors’ observation that, compared to evaluative citation analysis, citation network anal-
ysis has not been covered nearly as well by dedicated books, despite the fact that it has not been
subject to nearly as much severe criticism and has been substantially enriched in recent years with
new theory and techniques from research areas such as network science, social network analysis, or
information visualization.

KEYWORDS
citation analysis, citation network analysis, citation data sources, disambiguation in citation analysis,
visualization of citation networks, co-citation analysis, bibliographic coupling analysis, bibliometrics
xi

Contents
Acknowledgment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  xv

Dedications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

1 Foundations of Citation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1


1.1 Introduction ���������������������������������������������������������������������������������������������������� 1
1.2 What Is Citation Analysis? ������������������������������������������������������������������������������ 1
1.3 What Can We Do with Citation Analysis, and Why? ������������������������������������ 2
1.3.1 Assessing Information Resources and Evaluating Scholarly
Contributions�������������������������������������������������������������������������������������� 3
1.3.2 Mapping Research Fields ������������������������������������������������������������������ 4
1.3.3 Tracking Knowledge Flows and the Diffusion of Ideas���������������������� 6
1.3.4 Studying Users and Uses of Scholarly Information���������������������������� 7
1.3.5 Assisting Information Organization, Representation, and
Retrieval���������������������������������������������������������������������������������������������� 8
1.3.6 Other Applications���������������������������������������������������������������������������� 10
1.4 Evaluation of Citation Analysis �������������������������������������������������������������������� 11
1.4.1 Validity and Reliability���������������������������������������������������������������������� 11
1.4.2 Critiques and Defense ���������������������������������������������������������������������� 13
1.4.3 Strengths, Limitations, and Special Care Required �������������������������� 15
1.5 Related Fields ������������������������������������������������������������������������������������������������ 17
1.6 Scope, Delimitation, and Structure of this Book�������������������������������������������� 18

2 Conducting Citation Network Analysis: Steps, Concepts, Techniques,


and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  21
2.1 Field Delineation: Collecting Scholarly Publications Produced in a
Research Field or by a Scholarly Community������������������������������������������������ 22
2.2 Selecting Core Sets of Objects for the Study ������������������������������������������������ 24
2.2.1 Selection Criteria ������������������������������������������������������������������������������ 24
2.2.2 Number of Objects���������������������������������������������������������������������������� 26
2.2.3 Types of Objects�������������������������������������������������������������������������������� 26
2.2.4 Citation Counting ���������������������������������������������������������������������������� 28
2.3 Measuring the Connectedness between Core Objects Selected �������������������� 31
xii

2.3.1 Citation Count���������������������������������������������������������������������������������� 31


2.3.2 Co-Citation Count �������������������������������������������������������������������������� 33
2.3.3 Bibliographic Coupling Frequency (BCF) �������������������������������������� 38
2.3.4 Other Similarity Indicators Derived from Citation Counts ������������ 39
2.4 Statistical Analysis of Citation Networks ������������������������������������������������������ 39
2.4.1 Commonly Used Statistical Analysis Methods �������������������������������� 40
2.4.2 Factor Analysis in Author Co-Citation Analysis (ACA) ���������������� 41
2.4.3 Input Data to Statistical Procedures�������������������������������������������������� 43
2.4.4 Determining Similarity Measures������������������������������������������������������ 46
2.5 Network Analysis and Visualization �������������������������������������������������������������� 46
2.6 Interpretation and Validation ������������������������������������������������������������������������ 48
2.7 An Example: Mapping Information Science 2006–2010:
An Author Bibliographic Coupling Analysis ������������������������������������������������ 52
2.7.1 Delineation of the Information Science Research Field ������������������ 53
2.7.2 Selection of Core Authors to Represent the Research Front of
IS 2006–2010������������������������������������������������������������������������������������ 54
2.7.3 Measurement of the Connectedness between Core Authors������������ 55
2.7.4 Factor Analysis �������������������������������������������������������������������������������� 55
2.7.5 Visualization of Factor Structures ���������������������������������������������������� 56
2.7.6 Interpretation of Results ������������������������������������������������������������������ 57

3 Field Delineation and Data Sources for Citation Analysis. . . . . . . . . . . . . . . . . .  61


3.1 Commonly Used Approaches to Field Delineation �������������������������������������� 61
3.2 Effects of Field Delineation on Citation Network Analysis�������������������������� 63
3.3 Requirements for Data Sources for Citation Analysis ���������������������������������� 65
3.4 The ISI Databases ������������������������������������������������������������������������������������������ 68
3.4.1 Advantages of Using the ISI Databases for Citation Analysis
Studies ���������������������������������������������������������������������������������������������� 69
3.4.2 Problems ������������������������������������������������������������������������������������������ 71
3.4.3 How to Delineate a Research Field Using the ISI Databases ���������� 74
3.5 Scopus ������������������������������������������������������������������������������������������������������������ 76
3.5.1 Pros and Cons as a Data Source for Citation Analysis �������������������� 76
3.5.2 How to Delineate a Field Using Scopus ������������������������������������������ 79
3.6 Field Delineation by Combining Citation Databases with Subject
Bibliographic Databases �������������������������������������������������������������������������������� 82
3.6.1 Justification of Combining Scopus and PubMed for Delineating
the Stem Cell Research Field ���������������������������������������������������������� 84
xiii

3.6.2 Process of Matching between Scopus and PubMed for


Field Delineation ������������������������������������������������������������������������������ 85
3.6.3 Dataset Obtained and its Advantages for Citation Analysis ������������ 88
3.7 Field Delineation and In-Text Citation Analysis ������������������������������������������ 89
3.7.1 Feasibility and Benefits of In-Text Citation Analysis ���������������������� 89
3.7.2 Limitations of In-Text Citation Analysis������������������������������������������ 92
3.8 Field Delineation with Google Scholar and Other Citation Data
Sources on the Web �������������������������������������������������������������������������������������� 95
3.8.1 Benefits �������������������������������������������������������������������������������������������� 97
3.8.2 Problems ������������������������������������������������������������������������������������������ 98
3.9 Other Approaches to Field Delineation ������������������������������������������������������ 100
3.10 Additional Remarks�������������������������������������������������������������������������������������� 100

4 Disambiguation in Citation Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 101


4.1 Introduction�������������������������������������������������������������������������������������������������� 101
4.2 Names and Designations in Bibliographic Records ������������������������������������ 102
4.3 The Name Ambiguity Problem�������������������������������������������������������������������� 103
4.4 Ambiguity and Power Laws ������������������������������������������������������������������������ 105
4.5 Effects of Ambiguity on Network Analysis Results ������������������������������������ 106
4.6 Manual Disambiguation ������������������������������������������������������������������������������ 108
4.6.1 Small Networks ������������������������������������������������������������������������������ 108
4.6.2 Large Networks ������������������������������������������������������������������������������ 111
4.7 Algorithmic Disambiguation������������������������������������������������������������������������ 112
4.7.1 Author Name Disambiguation�������������������������������������������������������� 113
4.7.2 Citation Link Disambiguation�������������������������������������������������������� 114
4.8 Back to the Future: Computer-Aided Disambiguation ������������������������������ 115

5 Visualization of Citation Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117


5.1 Why Citation Network Visualization? �������������������������������������������������������� 117
5.2 Cautionary Tales ������������������������������������������������������������������������������������������ 120
5.2.1 Database Artifacts���������������������������������������������������������������������������� 120
5.2.2 Visualization Artifacts �������������������������������������������������������������������� 121
5.2.3 Informative vs. Heavy Links������������������������������������������������������������ 122
5.3 Three Decades of Co-Citation Network Visualizations of the Library
and Information Science Field �������������������������������������������������������������������� 123
5.4 Visualization of Citation Networks Using Pajek ���������������������������������������� 129
5.4.1 Factor Analysis of Bibliometric Data���������������������������������������������� 130
xiv

5.4.2
Conversion of Factor Analysis Results from SPSS to Pajek
Network Format������������������������������������������������������������������������������ 131
5.4.3 Visualization with Loading Summaries as Node Sizes and
Degree Coloring������������������������������������������������������������������������������ 135
5.4.4 Visualization with Node Sizes Reflecting Citedness ���������������������� 138
5.4.5 Visualization with Node Color Reflecting Factor Membership������ 139
5.4.6 Combining Pattern and Structure Matrix Visualizations �������������� 140
5.4.7 Fine-Tuning the Maps�������������������������������������������������������������������� 144
5.4.8 Visualization of Bibliometric Networks without Factor Analysis��� 145
5.5 Concluding Remarks������������������������������������������������������������������������������������ 146

Appendix 3.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Appendix 5.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Author Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189


xv

Acknowledgment
The authors would like to thank Dr. Howard D. White for his encouragement, input, and feed-
back on our draft manuscript.
xvii

Dedications
Dangzhi Zhao would like to dedicate this book to her father who passed away when this book was
under revision, to her mother, and to her family. She feels fortunate and grateful to have loving
parents and family who provided compassion, care, and support during one of the most difficult
times in her life so that she was able to continue writing this book.
Andreas Strotmann would like to dedicate the book to his mother who passed away before
it started forming but always believed it would come one day.
1

CHAPTER 1

Foundations of Citation Analysis

1.1 INTRODUCTION
Citation analysis is a well-known technique that has long been applied in a variety of research
fields to study, among others, knowledge flows, the diffusion of ideas, intellectual structures of
science, relevance of information resources, and evaluation of researchers and research institutions.
Among the research fields that have employed citation analysis methods, sociology, history of
science, library and information science, management science, and research policy are the most
prominent. Together with citation indexing and citation linking, citation analysis also provides the
foundations for effective information retrieval that, applied to web links, was at the core of the
success of Google’s search engine.
Recent years have seen a burgeoning interest in citation analysis to help address various re-
search, management, or information service issues such as university rankings, research evaluation,
and knowledge domain visualization. This renewed interest is a result of the increasingly available
digital citation data and computer power that have made large-scale citation analysis studies possi-
ble, and has resulted in many exciting new developments in data sources, as well as techniques and
tools for citation data collection, analysis, and visualization.
This chapter introduces the concepts of citation and citation analysis, examines the assump-
tions underlying citation analysis, and provides an overview of what can be done with citation anal-
ysis (and why), as well as a discussion of strengths and weaknesses of citation analysis and cautions
required when applying citation analysis. Based on this overview, the scope and structure of this
book are then discussed at the end of this chapter.

1.2 WHAT IS CITATION ANALYSIS?


The reference list in a research paper is an essential part of the paper. By pointing to prior publi-
cations that have influenced the research reported in the current paper in one way or another, the
references link the current paper to these prior publications and, by extension, to the global net-
work of research publications. It is generally assumed that a citation represents the citing author’s
use of the cited work, and indicates an influence of the cited work on the author’s new work, and
as such a flow of knowledge from the cited to the citing works’ authors. Citations also indicate
relatedness (e.g., similar subject matter or methodological approach) between these two works.
2 1. FOUNDATIONS OF CITATION ANALYSIS

Citation analysis deals with the study of these uses and relationships. Although individual
uses and relationships can be useful to examine, citation analysis mostly provides macro perspectives
through the use of large datasets, exploiting the consensus among a large number of citing authors
regarding the influence of and the relationships between scholars and scholarly works.
Based on the basic assumption underlying citation analysis that references indicate useful-
ness or relatedness, a number of different types of applications of citation analysis have been devel-
oped and employed over the years in the study of science and scholarly communication. The basic
assumption itself, however, has also been challenged in the literature. We will begin by discussing
applications of citation analysis before moving on to examine criticisms and challenges.
Here we need to be careful about the terminology. It is common for the term citation to be
used interchangeably for either “citation” or “reference,” with the context providing the meaning.
Similarly, the concept of how authors make references is called either citing behavior or referencing
behavior. When article A makes a reference to article B, it is often said that A cites or references
B, and B is cited by, receives a citation from, or is one of the “cited references” in A. In essence, a
reference from article A to article B is a citation received by B from A.
Both the terms “citation” and “reference” are of course also used in other contexts that do
not directly relate to citation analysis. In the field of library and information studies, for example,
the term “reference librarians” refers to librarians who answer users’ questions regarding the use of
the library and library resources; similarly, citation data or databases may mean the same as bib-
liographic data or databases, which normally only include information about the citing documents
such as title, author, abstract, etc., and may or may not include any information about their cited
references at all.

1.3 WHAT CAN WE DO WITH CITATION ANALYSIS, AND WHY?


In general, citation analysis can effectively assist in the discovery of new knowledge, and in the
management and use of existing knowledge resources (Garfield, 1979; Swanson, 1986; Small,
1999b; White et al., 2000). In particular, citation analysis can be used for:
1.  assessing information resources and evaluating scholarly contributions,

2.  mapping research fields to study their intellectual structures,

3.  tracking knowledge flows and the diffusion of ideas,

4.  studying users and uses of scholarly literature, and

5.  assisting with information organization, representation, and retrieval.


1.3 WHAT CAN WE DO WITH CITATION ANALYSIS, AND WHY? 3

All these applications of citation analysis rely on the consensus among a large number of
citing authors regarding the influence of and relationships between scholars and scholarly works as
recorded in the reference lists of these authors’ publications. Some areas, however, e.g., applications
3, 4, and 5, also often involve the examination of individual reference links.
This section will review and discuss these applications of citation analysis, as well as some
other uses of citation analysis in the study of science and scholarly communication, including pat-
ent citation analysis for the study of innovations and sociometric studies of science and scholarly
communication.

1.3.1 ASSESSING INFORMATION RESOURCES AND EVALUATING


SCHOLARLY CONTRIBUTIONS
It is assumed that the existence of a cited document in a reference list implies its use by the citing
author when the citing document was written. And as Peritz (1992, p. 448) points out, since “it is
generally accepted that the publication of research papers is part of the reward system in science
and hence that citations are, in some sense, tokens of recognition,” the number of citations a paper
receives “can be used as a rough-and-ready indicator of its merit—granting, of course, variations in
the citation’s importance and the inevitable amount of error and noise.” Similarly, and on a broader
scale, the number of citations received by all publications in a journal or in a subject area, or by all
publications written by a scholar or all scholars in a scientific institution or in a nation, roughly
indicates the impact of the journal, the scientist, the scientific institute, etc. Therefore, citation
analysis is considered a legitimate and meaningful research tool in the assessment of information
resources and in the evaluation of scholarly contributions. This type of citation analysis is some-
times referred to as evaluative citation analysis (Borgman and Furner, 2002).
Although the assumptions underlying evaluative citation analysis noted above have been
challenged since their introduction (which will be discussed later in this chapter), citation analysis
has been widely used in research evaluation to inform decisions that may seriously affect individuals
or institutions (Garfield, 1979; Meho and Sonnenward, 2000; Moed, 2010). For example, the Jour-
nal Impact Factor, which is essentially the average number of citations received by all articles pub-
lished in a journal within a time period (e.g., 2 or 5 years), has been used by university librarians to
make journal subscription decisions, and by university management to estimate the quality of jour-
nals in which professors publish for the evaluation of their research performance; and the citation
impact of individual scholars has been used to inform decisions on their hiring, tenure, promotion,
and research funding. Furthermore, most of the world university ranking systems that emerged in
recent years and have had profound impact on universities (especially regarding recruitment), such
as the Times Higher Education (THE) World University Rankings and the Academic Ranking of
4 1. FOUNDATIONS OF CITATION ANALYSIS

World Universities (also known as the Shanghai Ranking), use citation analysis to measure research
impact as one of the indicators for ranking (THE Methodology, 2013; Liu and Cheng, 2005).
It is therefore imperative to address the various issues involved in evaluative citation analy-
sis that affect citation counts and, by extension, the fairness of citation-based research evaluation.
Although counting citations or calculating impact indicators derived from citation counts (e.g.,
h-index) is relatively simple, addressing the complicating factors (such as field differences in pub-
lishing and citing behavior and problems of citation data sources in coverage and indexing) is not,
and has therefore been a primary focus of research on evaluative citation analysis. For an in-depth
examination of these issues, readers are referred to Moed (2010), who provides a comprehensive
treatment of evaluative citation analysis. Discussions in some later chapters in this book will also
shed light on some of these issues, such as data sources for citation analysis, research field delinea-
tion, and counting collaborative works in citation analysis.

1.3.2 MAPPING RESEARCH FIELDS


Mapping research fields through citation analysis can help with the organization of knowledge,
and also allows researchers to examine the characteristics, structures, and evolution of research
fields and scholarly communities.
As Cronin (1984, p. 25) so eloquently stated, “Citations are frozen footprints on the land-
scape of scholarly achievement.” They represent the decisions made by citing authors regarding
relatedness (e.g., with respect to similarities in the subject, topic, or methodology) of the docu-
ments they were writing (i.e., the citing documents) and the works they decided to reference in it
(i.e., the cited documents). Therefore, by mapping the networks of literatures at some moment in
time through an analysis of the relationships established by citations in scholarly publications that
represent a scientific specialty and community, the structures and characteristics of such a specialty
and community can be studied. By making comparisons between time periods, the historical de-
velopment of the specialty and community can be modeled. In this book, we use the term “citation
network analysis” to refer to this type of citation analysis.
In citation network analysis, a set of objects (documents, authors, journals, or groups of
them) is selected to represent a research area, the strengths of the interrelationships (or levels of
connectedness) between these objects are measured by various scores derived from citation counts,
and structures and characteristics of the corresponding research fields and scholarly communities
are then inferred from these relationships. To reveal the structures that underlie these relationships,
multivariate statistical analyses are often applied using the citation scores as similarity measures.
Since a spatial representation of information, as Small (1999b, p. 799) points out, can “facilitate
our understanding of conceptual relationships and developments,” network visualization tools are
also frequently used to produce visual maps of these relationships. All these measures, analyses, and
1.3 WHAT CAN WE DO WITH CITATION ANALYSIS, AND WHY? 5

visualizations together will then inform interpretive descriptions and explanations of the observed
structures and characteristics of the research fields and scholarly communities being studied, and
assist in the examination of their evolution and “in the making of inductive predictions of future
trends” when applied to a series of time periods (Borgman and Furner, 2002, p. 11).
Depending on the units of analysis (documents, or groups of them by authors, journals, re-
search field, nation, etc.) and the thresholds of citation scores, both macro-structures—overall maps
of the entire science endeavor with each node in the network representing a discipline—and mi-
cro-structures—structures of a single specialty with each node in the network representing a single
document—of science can be mapped and studied, allowing the user to get overviews of research
fields as well as to explore their underlying fine structures (Small, 1999b).
There are three types of commonly used citation-based measures of the strength of the in-
terrelationship between two objects:
• inter-citation counts: the number of times two objects have cited each other

• co-citation counts: the number of documents that have cited two objects together, and

• bibliographic coupling frequencies (BCFs): the number of cited references that two ob-
jects have in common.
Analyses using these measures are correspondingly called inter-citation analysis, co-citation
analysis, and bibliographic coupling analysis, respectively. Each type of analysis can employ any one
of a number of different counting and weighing schemes. For example, the BCF of two articles
can be counted simply as the number of items that appear in both articles’ reference lists or their
cited references can be weighted by how many times they are cited in the texts. Such counting and
weighing schemes will be discussed in detail in Chapter 2.
Among the three types of analyses, co-citation analysis is the most commonly used tech-
nique. It is generally accepted that the co-citation concept was discovered independently by Small
(1973) and Marshakova (1973), and that document co-citation analysis was introduced by Small
(1973) and author co-citation analysis by White and Griffith (1981). Many co-citation analysis
studies have been conducted since. They either refine the techniques (Small, 1974; Shaw, 1985;
Zhao and Strotmann, 2008a), explore the application of co-citation analysis in studying various
research areas and in answering various research questions (Small, 1977, 1981; White, 1983; Mc-
Cain, 1984), or discuss limitations of the techniques (Sullivan et al., 1977; Hicks, 1987). Recent
years have also seen studies of the application of advanced scientific visualization technology in
co-citation mapping to dynamically present maps of science (see Small, 1999b and Boyack et al.,
2005 for a good review). As a result, co-citation analysis has developed into a well-known litera-
ture-based technique for studying the intellectual structure of scholarly fields and the characteristics
of scholarly communities.
6 1. FOUNDATIONS OF CITATION ANALYSIS

The assumptions underlying citation network analysis are (1) when two documents cite each
other often, are frequently cited together, or have many references in common, then this indicates
that these two documents are related, that is, generally perceived to be similar in subject matter
or methodological approach; and (2) the more frequently two documents cite each other, are co-
cited, or the more references two documents have in common, the more closely they are related
(Borgman, 1990; White, 1990). These assumptions are generally valid and have not been challenged
much at all, unlike those for evaluative citation analysis, as even when parties who are affected by
citation-based research evaluation do game the system by manipulating their reference lists in their
favor, the less relevant citations they may add will still be related to the citing documents.

1.3.3 TRACKING KNOWLEDGE FLOWS AND THE DIFFUSION OF IDEAS


Knowledge flows and the diffusion of ideas can be traced through citations because a citation in-
forms the reader of the citing author’s use of the cited document that contains that knowledge and
those ideas (Rogers and Cottrill, 1990). Of course, they can also be tracked through terminology,
especially after the knowledge and ideas have been integrated into larger contexts.
Citations are considered concept symbols, in that a citation has a function in the text to link
an idea or concept with the source of that idea or concept, i.e., the publication in which that idea
or concept was addressed (Small, 1978, 1998; Small and Greenlee, 1980). By analyzing how the
publication has been used in later documents, it is therefore possible to trace the spread of the ideas
it contains across disciplinary boundaries as well as over time (McCain, 2011).
In order to do so, a single work or a small set of works “representing a coherent and clearly
defined topic of interest, such as the original statement of a theory, description of a research meth-
odology, or first articulation of an important concept or other significant contribution” (McCain,
2011) is first identified. All of the citations to this set of works are then collected, and their distri-
bution by subject area as well as by time can then be studied.
This type of citation analysis has been carried out from the very start of citation analysis
studies, but relatively more rarely than other types of citation analysis (McCain, 2011). Borgman
(1990, p. 20) provides some examples of the ideas traced through citations including the “double
helix” (Winstanley, 1976), Shannon’s information theory (Dahling, 1962), and topics relevant to
psychiatry originating in related fields (Davis, 1970). McCain (2011, p. 1413) reviews some more
recent studies of this kind:
… Oehler (1990) traced citations to a set of 29 articles in general equilibrium theory and
noted the citation counts in non-social science journals (e.g., natural sciences, mathemat-
ics, statistics, and engineering.) O’Rand (1992) studied the diffusion of cooperative game
theory and used the spread of citations to The Theory of Games and Economic Behavior (von
Neumann and Morgenstern, 1944) across disciplinary boundaries to support discussion of
1.3 WHAT CAN WE DO WITH CITATION ANALYSIS, AND WHY? 7

the development of various research schools in psychology, sociology, and political science.
McCain and Salvucci (2006) examined the relative use of concepts in Brooks’ Mythical
Man-Month (Brooks, 1975) across five time periods and 15 subject areas ranging from
the “home” discipline of software engineering to areas in the social science, humanities,
and law. Sarafoglou and Paelinck (2008) used citation data to study the diffusion of the
concept/field of “spatial econometrics” by means of, in part, the temporal and subject distri-
bution of citations to the key book in the field—Spatial Econometrics: Methods and Models
(Anselin, 1988). Garfield (1985) reported “over 80 specialties and disciplines” citing Price’s
Little Science, Big Science (Price, 1963). In a more theoretical vein, Van der Veer Martens
and Goodrum (2006) discuss the use of citation content and context analysis along with
other assessment approaches to model the diffusion of eight theories in the social sciences.
They suggest a typology of citation function but do not consider citing subject breadth.

1.3.4 STUDYING USERS AND USES OF SCHOLARLY INFORMATION


The information behavior of scientists and engineers has been of great interest to LIS researchers,
and interviews and surveys have comprised the primary sources of data for their studies (Brown
and Ortega, 2005; Brown, 2007, 2010; Hemminger et al., 2007). Citation analysis can enhance
results from these types of studies by providing a different perspective on people’s information
behavior gleaned from their citing behavior.
A citation represents a citing author’s use of a cited document. An examination of all the
references made by a group of authors can therefore reveal an important aspect of their information
behavior, i.e., what scholarly information they have used in the development of their own scholarly
publications. Details can be determined about how often information of different types (e.g., jour-
nals, books, e-resources), years, languages, countries, and subjects have been used by this group of
authors, and how this group compares with other groups of people in terms of these uses.
For example, by comparing the types of references made by all students who have taken a cer-
tain course (e.g., bibliographical instruction or how to use an academic library) before and after they
took the course, the effect of this course on students’ information-seeking behavior can be measured
(Brunvand and Pashkova-Balkenhol, 2008; Cooke and Rosenthal, 2011; Reinsfelder, 2012). Simi-
larly, the differences in research by faculty and practitioners in a field can be studied by comparing
what they cite and how they are cited. For example, Zhao (2009) compared the intellectual structure
of LIS through an author co-citation analysis of research publications by LIS faculty and practi-
tioners. Interestingly, and contrary to what one may have expected, the study found, among others,
that LIS practitioners cite theories and foundations more than LIS faculty do. By extension, uses
of scholarly information by users of a research library indicated by citations can be compared with
8 1. FOUNDATIONS OF CITATION ANALYSIS

the library’s collections, and results can be used to measure the extent to which library collections
meet the needs of its users (Kayongo and Helm, 2012).
Also by examining uses of scholarly information indicated by citations, the interactions and
interdisciplinarity of disciplines, fields, journals, institutions, or authors’ oeuvres can be assessed
(Zitt and Bassecoulard, 2006; Bassecoulard et al., 2007). For example, Huang and Chang published
two articles in 2012 that classified citations made in LIS journals during the period of 1978–2007
in terms of their disciplines in order to study the interdisciplinary characteristics of LIS (Chang
and Huang, 2012; Huang and Chang, 2012). They found that LIS articles have cited documents
from across 30 disciplines, and the degree of interdisciplinarity of LIS measured by a number of
citation-based indicators was high. Clearly, this type of citation analysis is similar to but different
from the study of knowledge flows as discussed in the previous section.

1.3.5 ASSISTING INFORMATION ORGANIZATION, REPRESENTATION,


AND RETRIEVAL
It has been a tradition in scientific writing for writers to acknowledge each other by giving cita-
tions to related work. Citations represent the decisions made by citing authors regarding the rela-
tionships (e.g., similarity in the subject, topic, or methodology, etc.) between the documents they
are writing (citing document) and the work they are about to cite (cited document). Following
citation links has been proven to be a unique and effective way of finding related literature because
the relevancy is judged by the authors of citing papers who are domain experts (rather than an
indexer who is often a librarian by training), and documents can be located independent of lan-
guage, keywords, and traditional knowledge classifications, which is of great value particularly for
researchers in interdisciplinary fields. This way of organizing and retrieving information has been
made easier and more efficient by creating “actionable” or “clickable” citation links that lead from
references to the full text articles they represent, thanks to Web technologies and to collaborations
among the various parties in the scholarly publishing enterprise, such as scientific publishers,
libraries, and producers of bibliographical databases including citation databases. CrossRef and
OpenURL were examples of such technology or collaboration (Hitchcock et al., 1997; Hitchcock
et al., 2000; Van de Sompel and Beit-Arie, 2001).
Citation databases are bibliographical databases with citation indexes in addition to standard
indexes such as authors, keywords, and subject headings. Citation indexes record and organize ref-
erence links in such a way as to allow easier and larger-scale navigation through scientific literature
following citation links. With a citation index, it is possible to follow citation links both backward
from current articles to older ones they cite (like following reference links at the end of an article)
and forward from older articles to newer ones. Users can start with a seed paper, author, journal,
1.3 WHAT CAN WE DO WITH CITATION ANALYSIS, AND WHY? 9

or combination of these, and retrieve all publications written by them, cited by them, and/or citing
them, which is a very effective method of information retrieval.
Citation databases also support easier and larger-scale collections of data for citation analysis.
The results of citation analyses can in turn further assist information organization, representation,
and retrieval.
For example, evaluative citation analysis results can help retrieve highly influential docu-
ments or publications by influential players (authors, institutes, countries, etc.), while citation net-
work analysis results can facilitate an understanding of the structure of the research field and the
relationships between concepts, documents, or authors. This understanding in turn helps users with
query expansion and search refinement, as well as supports visual browsing interfaces to informa-
tion retrieval systems (Chen, 1999; Chen et al., 1998a; Chen et al. 1998b; Ding et al., 2000; Lin et
al., 2003; Strotmann and Zhao, 2008).
The two largest citation databases, the ISI databases by the Institute for Scientific Informa-
tion (now part of Thomsen Reuters’ Web of Science) and Scopus by Elsevier, have demonstrated
the value of incorporating citation analysis results into information retrieval systems by providing
impact indicators (e.g., citation counts, h-index, and journal impact factor) for articles, authors, and
journals, calculated from evaluative citation analyses of data in the corresponding citation databases.
Search results can be ranked by impact indicators, allowing users to focus on high impact sources.
They also provide related documents based on bibliographic coupling analysis. The ISI databases
also provide a visual representation of citation links both backward and forward, allowing users to
follow these links to retrieve needed information.
Because citation analysis can identify key concepts, documents, authors, and their relation-
ships, studies have also explored the use of citation analysis methods to supplement traditional
manual methods of knowledge organization with automatic summarization, categorization, and
thesaurus construction and maintenance (Chen et al., 2010; Fiszman et al., 2009; Sparck-Jones,
1999; Schneider and Borlund, 2004). As Birger Hjørland (2013, p.1) points out, “the main dif-
ference between traditional knowledge organization systems (KOSs) and KOSs based on citation
analysis is that the first group represents intellectual KOSs, whereas the second represents social
KOSs” as they are based on the collective views of a large number of citing authors regarding rela-
tionships between documents or their authors.
With the amount of available information increasing dramatically and sometimes chaotically,
especially on the Web, it has been and will continue to be of great importance to explore appropri-
ate ways of organizing and searching information there. Citation analysis principles provide unique
and effective ways of enhancing information organization, representation, searching, and browsing.
A good example for this is the success of the Google Web search engine which applies an algorithm
that has close ties to citation analysis to focus on resources that are both high quality and relevant
to users’ information needs (Brin and Page, 1998).
10 1. FOUNDATIONS OF CITATION ANALYSIS

1.3.6 OTHER APPLICATIONS


Over the years, citation analysis has been used widely and in innovative ways. Although most uses
fall within the categories discussed above, some applications have distinct places in the study of
science and scholarly communication. The following are two such examples.

Patent citation analysis


What makes patent citation analysis special in the area of citation analysis is not the types of
analyses being performed but the data used, i.e., patent documents and their references.
Patents are a major representation of innovations in science and technology, and patent ci-
tation analysis is thus used to study innovation-related phenomena. Examples include the effect of
a major innovative technology (as represented by a high-impact patent) on the science on which it
draws or on the industry to which it belongs, as well as the networks of key players in its industry,
including who has contributed to the technology and who is using it. Particularly when combining
patent and science citation analyses, the interaction or relationship between science, technology, and
innovations can be studied (Sternitzke, 2009; Érdi et al., 2013; Etzkowitz and Leydesdorff, 2000).
It is important to note, however, that the meaning of a patent citation differs considerably
from that of a scholarly citation. While scholarly citation represents influence on a work, a patent
citation represents prior art or prerequisite technology. In principle at least, anyone who licenses a
patent will need to license the technologies in the cited patents as well. As this is likely to reduce the
monetary value of the patent, inventors tend to try and avoid citing patents other than their own,
and patent examiners play a crucial role in completing patent references. Clearly, it is important
to include references provided by both the patent applicants and the examiners in patent citation
analysis.
Unlike citation data for scholarly publications, patent citation data are publicly available and
can be obtained from the websites and databases of patent offices such as the United States Patent
and Trademark Office (http://patft.uspto.gov/netahtml/PTO/search-bool.html) and the European
Patent Office (http://worldwide.espacenet.com). Retrieval and processing of patent citation data
from these sites is considerably more complex than that of scholarly citation data from the standard
citation indexes; however, commercial versions of patent databases and Google Scholar provide
more usable access.

The value of citation analysis in sociometric studies of science and scholarly communication

Although the present book will not examine sociometric analyses, the value of citation analysis
in sociometric studies of scholarly communication will be briefly discussed here to show the full
power of citation analysis.
1.4 EVALUATION OF CITATION ANALYSIS 11

Sociometric studies of scholarly communication seek to reveal the structures of informal


communication in specific scholarly communities by looking at interpersonal interactions among
scholars regarding their research, emphasizing the social properties of scholarly communication
(Lievrouw, 1990).
Citation analysis can aid sociometric studies in identifying scholarly communities or spe-
cialties (De May, 1982, p. 130), which is the first step in all sociometric studies of scholarly com-
munication. Citation analysis can help with recognition of interesting points in the structure and
process of scholarly communication that are worthy of further study by sociometric or other meth-
ods (Crane, 1972; Mullins, 1973; Mullins et al., 1977; Lievrouw et al., 1987). Furthermore, citation
analysis can be used to validate the results from sociometric data, because people who respond to
interviews or surveys in sociometric studies are usually a subset of people whose citation behaviors
are studied by citation analysis (Borgman, 1990).

1.4 EVALUATION OF CITATION ANALYSIS


With the many problems in citation data, citation analysis is an imperfect yet very useful method
for studying knowledge flows, the diffusion of ideas, social and intellectual structures of science,
information resources, and research evaluation.
This section covers problems and criticisms that have been discussed over the years, reasons
why citation analysis is a reliable and valid approach despite the problems and criticisms, and cau-
tions that are required when citation analysis is applied.

1.4.1 VALIDITY AND RELIABILITY


As mentioned earlier, there are two basic assumptions underlying citation analysis: (1) A citation
represents the citing author’s use of the cited work. The more citations a document receives, the
more influence it has had on research. Evaluative citation analysis examines the evaluation of
scholars, journals, institutions, etc., based on this assumption. (2) A citation indicates some rela-
tionship between citing and cited works, i.e., a generally perceived similarity of subject matter or
methodological approach. Two articles being cited together often or having many references in
common indicates some relation between these two articles. The more frequently two documents
are co-cited or the more references two documents have in common, the more closely they are
assumed to be related. Citation network analysis examines structures of literatures and disciplines
based on this assumption.
Other types of citation analysis are essentially variations of these two basic types of analysis.
For example, citation counts have been used as an indicator of the extent of the diffusion of sci-
entific discoveries, and citation links have been used for information representation and retrieval.
12 1. FOUNDATIONS OF CITATION ANALYSIS

Most types of citation analysis are informed by Merton’s normative view of science (Griffith,
1990; MacRoberts and MacRoberts, 1989; Edge, 1979; Cronin, 1984; Peritz, 1992), which sees
science as a social activity governed by a set of norms. These norms include universalism (the imper-
sonality of science), communism (scientific knowledge is treated as a common good communicated
and distributed freely), disinterestedness (“science for science’s sake” [Cronin, 1984, p. 17]), and orga-
nized skepticism (new knowledge claims are evaluated critically and objectively based on empirical
or theoretical evidence (Merton, 1942)). Citation is considered to be a serious activity of science
and therefore citation behavior is also governed by a set of norms and values. These norms and
values require authors to cite the works that have influenced them in the development of current
papers in order to give credit where credit is due. Although they may not always be clear why they
cite certain works at certain times and how citations are related to the ideology of science—“the
norms and values presupposed in the conduct of science” (Trancy, 1980, p. 191)—authors share “a
tacit understanding of how and why they should acknowledge the works of others” (Cronin, 1984).
The normative view of science is compatible with the assumptions underlying citation anal-
ysis, and therefore makes it possible to conduct valid citation analysis.
However, it has been observed by many studies that scientists’ behavior does not always
adhere to the norms, and that, in terms of citation behavior, various reasons and motivations for
citing do exist—some normative, some egotistical. A number of articles have reviewed these studies,
including Bornmann and Daniel (2008), Cronin (1984), Liu (1993), Nicolaisen (2007), and White
(2010a).
The observed departure of scientists’ behavior from the norms and the existence of egotistical
citations do not invalidate citation analysis for several reasons.
First, the failure of scientists to observe norms strictly does not necessarily mean a violation
of norms. Norms are standards “that are not rigidly defined or precisely restricted to a single specific
behavior. They are far too deeply embedded to be easily legislated into a code of ethics for science
or to be taken out for daily discussion and assessment. Private and consensual discomfort is the
usual response to violations of norms and is also important indicators of their presence” (Griffith,
1990, p. 35).
Second, most scholars do adhere to the norms, and citation analysis is based on the collective
perceptions of citing authors. As Small (1976) observes, “the reasons and motivations for citing
appear to be as subtle and as varied as scientific thought itself, but most references do establish
valid conceptual links between scientific documents” (p. 67). Individual citations may be made for
various reasons that do not conform to the norms (“egotistical citations” in Borgman and Furner’s
(2002) words), but the number of such citations is not likely to become large enough to influence
conclusions of citation analysis because most subsequent writers do not recurrently see the same
influence or relation implied by such citations (White, 1990). Therefore, the accrual of citations or
co-citations indicates a consensus among a large number of citing authors regarding the influence
1.4 EVALUATION OF CITATION ANALYSIS 13

of and the relationships between scholars and scholarly works. Citation analysis, which is concerned
with “achieving a macro perspective on scholarly communication process through the use of volu-
minous datasets” (Borgman, 1990, p. 26), relies on this consensus to draw conclusions in evaluation
of scholarly contributions and in mapping of intellectual structures, rendering the “psychological
approach” (White, 1990) that is concerned with the motives and purposes of individual citations
largely irrelevant.
Third, numerous validation studies of citation analysis provide evidence that the assumptions
underlying citation analysis are statistically valid. There are many empirical studies that test and
verify the validity of citation analysis by various methods. Garfield (1979, p. 241) mentions several
validation studies of citation analysis as an evaluation tool in his book Citation Indexing, including
Carter (1974), Bayer and Folger (1966), and Virgo (1977), that show the high correlations between
citation counts and peer judgments, a widely accepted way of ranking scientific performance. White
(1990, pp. 101–102) summarizes some validation studies of co-citation analysis including Mullins
et al. (1977), Sullivan et al. (1980), and Sullivan et al. (1977), which established the usefulness of
article co-citation mapping despite its limitations; and Keen (1987), Lenk (1983), McCain (1986),
White and Griffith (1981), and White (1983), which validate results from author co-citation anal-
ysis using various validation approaches. McCain (1986) categorizes validation studies of co-cita-
tion results by validation methods used, showing that most studies demonstrate a high correlation
between results from citation analysis and those from other sources, although in some cases a lack
of correlation was observed. Borgman (1990) stresses the importance of comparing the research
objectives or motives when comparing results from citation analysis and those based on other types
of data such as sociometric data and interview data in validity studies. In many cases, the lack of
correlation between the results is because they are measuring different domains of scholarly com-
munication (formal vs. informal), or they are looking at the same phenomena at different levels
(micro-level vs. macro-level, or “ground level” vs. “aerial view”) or different time points (citation
analyses reveal pictures of several years back due to the lag in publication, while interviews provide
current pictures) (White, 1990, pp. 91, 100).
Citation analysis is not only valid but also has high reliability because the data can be col-
lected unobtrusively from readily accessible published records of scholarly communication and thus
can be easily replicated by others. According to Borgman (1990, p. 25), reliability problems “gener-
ally can be identified and corrected by careful researchers,” although they do exist in individual data
sources (Moed and Vriens, 1989; Rice et al., 1989).

1.4.2 CRITIQUES AND DEFENSE


Critiques of citation analysis (notably Edge, 1979; MacRoberts and MacRoberts, 1989) have
focused on the assumptions underlying citation analysis, and on the sources of citation data
14 1. FOUNDATIONS OF CITATION ANALYSIS

(Osareh, 1996). Defenses (notably Garfield, 1979; White, 1990) have focused on the irrelevance
of the (individual-scale) psychological approach to (large-scale statistical) citation analysis and
on the illogic of “quarrelling with imaginary opponents” (White, 1990, p. 91). The following is a
brief discussion of these critiques and defenses. Detailed discussions can be found in the studies
referenced above and in review articles on bibliometrics or citation analysis such as White and
McCain (1989), White (2010a), and Nicolaisen (2007).
Critics of the assumptions either have mixed up the “aerial” and “ground-level” views of cita-
tions as discussed above, or are quarrelling with an imaginary opponent (White, 1990, p. 91). They
claim that citation analysis researchers have made certain assumptions that are problematic, but in
fact the assumptions are rarely found in citation analysis studies (Borgman, 1990; White, 1990).
They question some other assumptions based on the existence of individual egotistical citations,
missing the view that citation analysis is meant for large datasets and macro perspectives, where
small numbers of individual misconduct are mere statistical noise to be filtered out by statistical
means.
For example, although studies (e.g., Mullins et al., 1977; Small, 1977; McCain, 1986) show
that personal communication ties often do exist among frequently co-cited authors and that the
structure of the literature is congruent with the social structure of the field producing it, citation
researchers do not take this as a given; instead, they only assume that the relationship is “generally
perceived similarity of subject or methodological approach in published and cited works,” and stress
the independence of establishing social relationships that may exist among highly co-cited authors
(White, 1990, p. 96). The only assumption underlying evaluative citation analysis that Garfield, the
inventor of citation index and citation analysis, made in his monograph on citation indexing theory
and application is that citation counts represent the perceived utility or impact of scientific work as
determined by the corresponding scientific community (Garfield, 1979).
The problems with the sources of citation data include those that are characteristic of all
sources of citation data and those introduced by using citation databases. Some of the former in-
clude the difficulties in counting citations caused by homonyms (two or more different individuals
having the same name), allonyms (a single individual having more than one name), implicit cita-
tions, self-citations, and errors in citations. Some of the latter include the limited and biased cov-
erage of citation databases and the problems caused by inadequate indexing of cited references (see
Smith, 1981 and MacRoberts and MacRoberts, 1989, for detailed discussions of these problems).
As an imperfect method, citation analysis does suffer from the problems of sources of citation
data. Even Garfield admits these problems while he refutes almost all the critiques of the validity
of citation analysis in his systematic examination of citation analysis as an evaluation tool (Garfield,
1979). However, remedies often can be used to correct the data. For example, two solutions for
distinguishing individuals in the case of homonyms are proposed by Garfield (1979, pp. 243–244):
examining the titles of the journals in which the cited work and the citing work were published,
1.4 EVALUATION OF CITATION ANALYSIS 15

and obtaining a complete bibliography of the individual being evaluated. Various other methods
have also been suggested, such as using author affiliation information to reduce problems caused
by homonyms and allonyms, and using multiple or alternative data sources to alleviate problems
introduced by individual citation databases (Zhao and Logan, 2002; Zhao and Strotmann, 2014b).
In fact, recent years have seen an increased interest in addressing problems in citation data with the
advances in text processing and other technologies (e.g., Boyack et al., 2013; Ding et al., 2013; Hou
et al., 2011; Jeong et al., 2014; Zhu et al., 2014). We therefore include separate chapters on citation
data sources (Chapter 3) and on name disambiguation (Chapter 4).
In summary, regardless of the problems in citation data and the existence of egotistical cita-
tions, citation analysis has been demonstrated to be a unique and valid method for evaluating schol-
arly contributions and for studying intellectual structures. Garfield (1979, p. 250) considers citation
analysis “a valid form of peer judgment that introduces a useful element of objectivity into the eval-
uation process and involves only a small fraction of the cost of surveying techniques.” Arunachalam
(1998, p. 142) stresses that “citation analysis is an imperfect tool but which one could still use with
some caveats to arrive at reasonable conclusions of different levels of validity and acceptability.” It
is generally accepted that citation analysis is most useful when it is used in combination with other
methods such as interviews, surveys, and sociometric studies, and for people who are knowledgeable
in the fields being studied (Borgman, 1990; Garfield, 1979).

1.4.3 STRENGTHS, LIMITATIONS, AND SPECIAL CARE REQUIRED


Major strengths of citation analysis are its unobtrusiveness, objectivity, low cost, and reliability
(Harter and Kim, 1996; Smith, 1981). As Smith (1981, pp. 84–85) explains:
… citations are attractive subjects of study because they are both unobtrusive and readily
available. Unlike data obtained by interview and questionnaire, citations are unobtrusive
measures that do not require the cooperation of a respondent and that do not themselves
contaminate the response (i.e., they are nonreactive). Citations are signposts left behind
after information has been utilized and as such provide data by which one may build pic-
tures of user behavior without ever confronting the user himself. Any set of documents
containing reference lists can provide the raw material for citation analysis, and citation
counts based on a given set of documents are precise and objective.
The limitations of citation analysis in the study of science and scholarly communication have
been extensively discussed in the literature (e.g., Garfield, 1979; MacRoberts and MacRoberts,
1989; White, 1990; White and McCain, 1989), including the following:
1.  Citation analysis results are potentially inaccurate and/or biased due to the problems
of citation data and citation databases used as data sources.
16 1. FOUNDATIONS OF CITATION ANALYSIS

2.  Citation analysis has a strong dependence on subject experts—people who are knowl-
edgeable in the fields being studied by citation analysis—in interpretations of results
as well as in research field delineation.

3.  Citation analysis results are only as good as the analyst’s choice of authors or docu-
ments being analyzed as well as the analytical tools used.

4.  Citation analysis results are never “up to the minute” because it takes time for the
documents it analyzes to publish and the cited references they include are even older.

5.  Writing, citing, and publishing behavior varies significantly with research fields and
scholarly communities (e.g., mathematics vs. biomedicine), making cross-field com-
parisons difficult, especially in research evaluation.

6.  Citation analysis is only applicable in the study of formal aspects of scholarly commu-
nication represented in research publications, and when inferring informal communi-
cation ties and social relationships from the formal communication structures revealed
by citation analysis, other types of data are often required to confirm or further study
the relationships inferred.

7.  Citation counts and the scores derived from them measure the impact rather than the
quality of research cited, making its usefulness in evaluation of scholarly contributions
limited. While impact is closely related to quality, citation impact can be affected by
many other factors. For example, review articles tend to have higher citation impact
simply because their wider coverage makes them relevant to more articles than articles
reporting individual studies; and research that is easy to understand and follow tends
to be cited more than more difficult research simply because ease of comprehension is
a major determinant of popularity.
It is important to take advantage of its strengths and to work around the limitations when
designing a citation analysis study and when interpreting citation analysis results.
Problems of citation data and citation databases need to be addressed and alleviated as
much as possible by, e.g., going beyond citation databases when collecting data and performing
disambiguation in processing citation data. Subject experts should be consulted as much as possible
throughout the process, especially for field delineation and interpretation of results. Field-normal-
ized indicators should be devised and used when making comparisons across research (sub)fields
(e.g., percentiles and citation counts normalized by field average), and research fields should be
carefully delineated.
1.5 RELATED FIELDS 17

In fact, every step of citation analysis needs to be carefully designed and thought out, from
field delineation, through selection of objects (e.g., authors, documents) being examined and of
analytical tools used, to analyzing results and drawing conclusions (see Chapter 2 for details).
For example, when applying statistical procedures and visualization tools (e.g., MDS) to
show high-dimensional relationships between objects in two or three dimensions, information is
lost and pictures may be distorted; one needs to be careful to draw conclusions regarding the rela-
tionships between two objects based on their positions on a two-dimensional map because being
close to each other may be an artifact of the tools and procedures. Only features that remain stable
with different algorithms, procedures, or tools can be used to draw conclusions. For example, in a
factor analysis of co-citation data with an oblique rotation, usually the layout of the visual repre-
sentation of the structure matrix remains stable while that of the pattern matrix changes with each
redrawing of the map using Pajek; the structure map was therefore used to study the interrelation-
ships of authors and author groups, and the pattern matrix was only used to show the grouping of
authors (Zhao and Strotmann, 2008a; 2008b; 2008c; 2014a).
Problematic assumptions and overgeneralization in conclusions should be avoided, such as
attempts to equate citation impact to research quality, intellectual connections to social relation-
ships, or one view gained from current data and methods to the view of the field being studied.
For example, citation analysis using data from the ISI databases cannot provide a fair com-
parison between North America and other parts of the world because of their biased coverage in
the favor of English-speaking countries, thus the many criticisms of the THE university rankings
in which the top-ranked universities have always been American or British (Rauhvargers, 2011;
van Leeuwen et al., 2001); while the overall structure of a research field being citation analyzed
remains robust, details do change depending on how the fields are delineated and how citations
are counted, suggesting that citation analysis should only be used to obtain aerial views of research
fields (Leydesdorff, 2008; Zhao, 2009).
Further understanding of above-mentioned approaches to working around limitations of
citation analysis can be gained from discussions in later chapters on citation data sources, field
delineation, disambiguation, citation counting of collaborated works, and visualization of citation
networks.

1.5 RELATED FIELDS


Citation analysis is one of the two major parts of bibliometrics, defined by Fairthorne (1969, p.
319) as “the quantitative treatment of the properties of recorded discourse and behaviour apper-
taining to it” for the study of science and scholarly communication, with the other major part
being publication analysis.
18 1. FOUNDATIONS OF CITATION ANALYSIS

The term “bibliometrics” is used interchangeably with scientometrics and informetrics, but
with slight difference in scope and focus. Webometrics (or Cybermetrics), which emerged with the
Web in recent years, applies, often with modifications, citation analysis, and other well-established
principles and techniques from bibliometrics to the study of the characteristics and link structures
of the Web. For example, Ingwersen (1998) introduced Web Impact Factor as a criterion for the
evaluation of websites just as the Journal Impact Factor has been used for the evaluation of journals;
sitation analysis, coined by Rousseau (1997), is a Web-based counterpart of citation analysis, which
considers hyperlinks to and from other websites as “bibliographical citations” in traditional citation
analysis; classic bibliometric laws, such as Bradford’s Law, Lotka’s Law, and Zipf ’s Law, have also
been tested on the Web (Cui, 1999; Egghe, 2000; Rousseau, 1997). White (2010b) provides a gloss
on the differences among bibliometrics, scientometrics, informetrics, and webometrics.
Bibliometrics and Webometrics also interact with network science and Web science, which
have emerged recently (Börner et al., 2008; Zhao and Strotmann, 2014a). Network science is a
multi-disciplinary research field concerned with the analysis of all types of large and complex sys-
tems that can be modeled as networks. Citation analysis (especially citation network analysis) and
Webometrics have been substantially influenced by, and also influenced, network science, as seen
from leading researchers in these fields citing each other. The interdisciplinary field of Web Science
aims to consolidate a wide range of research views of the Web—both as a communication tech-
nology and as a complex system of social and cognitive spaces which emerge from its ubiquitous
presence. The study of the social Web is naturally part of Web Science, as well as of Webometrics.
Citation analysis is related to content analysis and discourse analysis through citation context
analysis, which examines the context in which each citation is made in the text (White, 2010a;
Zhang et al., 2013). Citation network analysis has applied techniques from social network analysis,
as well as from co-word analysis and text mining. While some social network analysis techniques
that have been applied in citation network analysis will be discussed in later chapters, readers are
referred to other resources for a thorough treatment of social network analysis as applied to schol-
arly communication networks and to information science (e.g., Börner et al., 2008; Otte et al., 2002;
White, 2011).

1.6 SCOPE, DELIMITATION, AND STRUCTURE OF THIS BOOK


The following chapters of this book will discuss citation analysis in the context of citation network
analysis, which is one of the two main types of citation analysis (with the other being evaluative
citation analysis). While the discussion of many topics (e.g., field delineation, citation data sources,
name disambiguation, and collaboration in science) applies to all types of citation analysis, the
structure and details (e.g., procedures, examples, and tools) are built around citation network
analysis. In other words, topics specific to citation network analysis (e.g., visualization of citation
1.6 SCOPE, DELIMITATION, AND STRUCTURE OF THIS BOOK 19

networks) are covered in detail, but those specific to other types of citation analysis (e.g., research
evaluation) are not.
This choice was based on the following considerations:
• Although evaluative citation analysis has attracted much of the attention and money,
there have been many criticisms of this type of citation analysis, and many of the issues
involved are difficult to address. Citation network analysis, on the other hand, has not
been criticized much at all, and has started to gain increased attention in recent years
as large-scale citation network analysis has become increasingly feasible, interesting,
and important with the emerging disciplines of network science and Web science.

• Citation network analysis has been applying new techniques from several research
areas such as network science, social network analysis, and information visualization.
By contrast, evaluative citation analysis, which essentially ranks documents, authors,
institutions, nations, etc., by their citation counts or scores derived from citation
counts, is relatively easy to conduct, allowing less room for new techniques, which can
be seen from the “h-bubble”: the research community on evaluative citation analysis
moved immediately and almost completely to a focus on the h-index once this “clever
find” was made in 2005 as a simple way to measure individual scientists’ lifetime
achievements (Rousseau et al., 2013, p. 294; Zhao and Strotmann, 2014a). It appears
that research in this area was waiting for breakthrough ideas on the one hand, and was
feeling the pressures of a huge demand for practical tools for research evaluation on
the other.

• There are already several well-perceived books dedicated to evaluative citation analysis
(e.g., Andres, 2009; De Bellis, 2009; Garfield, 1979; Moed, 2010), but almost none to
citation network analysis.
This book will first present the general steps and procedures of citation network analysis and
the concepts and techniques associated with each step (Chapter 2), which will simultaneously pro-
vide an overview of the theoretical aspects of citation network analysis and also a practical how-to
guide for conducting citation network analysis studies. This is followed by more detailed discussion
of thoughts and ideas about important issues in citation analysis in general and in citation network
analysis in particular, with a focus on those with which the authors have substantial personal expe-
riences, including the following:
• field delineation and data sources for citation analysis (Chapter 3),

• disambiguation of names and references (Chapter 4), and

• visualization of citation networks (Chapter 5).


20 1. FOUNDATIONS OF CITATION ANALYSIS

There are two types of access to citation databases as data sources for citation analysis: the
regular one via the search interfaces (e.g., Web of Science and Scopus) provided by these companies
(e.g., Thomson Reuters, Elsevier) to retrieve and download datasets from these citation databases,
and a very expensive special direct access to all the database files provided by the citation database
providers for data-mining purposes. The former is usually through a subscription to these databases
with a more or less standard license agreement that prohibits mass downloads and imposes other
limits on their use, while the latter can only be accessed through a negotiated purchase contract.
Discussions of citation databases in this book are based on the experience of the authors with the
“normal” type of access to citation databases, and therefore discussions related to access (e.g., index-
ing and search facilities, downloading options) may or may not apply to the special type of access.
In addition, as is well understood, citation databases can serve as both information retrieval
systems and citation analysis tools. Discussions in this book on citation databases are in the context
of citation databases as data sources for citation analysis, although these discussions may also have
implications for the enhancement of their retrieval functions.
Finally, the term “ISI databases” was chosen in this book to refer to the oldest and most
dominant citation databases, which were created by the Institute for Scientific Information (ISI) in
the 1960s, and include three databases: Science Citation Index, Social Science Citation Index, and
Arts and Humanities Citation Index. These databases have now become part of the core collection
of Thomson Reuters’ Web of Science, which also includes a few other citation databases (e.g., book
and conference citation indexes) that have not reached the same level of quality as the ISI databases
to be used much for citation analysis purposes. In addition to the core collection, Web of Science
also includes a number of other bibliographic databases such as MEDLINE that do not have ci-
tation indexes and therefore cannot be used for citation analysis purposes. To avoid confusion, the
term “Web of Science” is only used in this book when more than the ISI databases are concerned
and discussed or when the context clarifies the meaning.
189

Author Biographies
Dangzhi Zhao is Associate Professor in the School of Library and Information Studies at the
University of Alberta, Canada. Dangzhi earned her Ph.D. from the School of Library and Infor-
mation Studies at The Florida State University, U.S., and her M.S. and B.S. from the Department
of Library and Information Science at Peking University, China.
Her research and teaching interests are in the areas of information systems, bibliometrics,
scholarly communication, and knowledge network analysis and visualization as well as their appli-
cation in information retrieval and digital libraries.
Andreas Strotmann studied Mathematics, Physics, and Linguistics at the University of Co-
logne, where he also spent many years as a staff scientist supporting computational applications in
the sciences and the humanities, including in mathematics, physics, biology, linguistics, education,
and publishing. He earned his doctorate in Computer and Information Science from The Florida
State University. He has worked as a researcher at the University of Cologne, the University of
Alberta, and the GESIS Leibniz Institute for the Social Sciences. For the past decade, he has been
working closely with Dangzhi Zhao on improving scientometric methodology.

View publication stats

You might also like